René Nyffenegger's collection of things on the web
René Nyffenegger on Oracle - Most wanted - Feedback
 

October 18, 2005: On storing hierarchical data

The canonical approach to store hierachical data is to create a self referencing table...
create table hierarchy_tbl (
  id        number primary key,
  parent_id references hierarchy_tbl,
  data      varchar2(30)
);
...and then to use Oracle's start with ... connnect by idiom to select from it.
Unfortunatly, this approach causes a problem if a subtree can be assigned to more than one node. For example, let's assume you sell three products: prod_1, prod_2 and prod_3. In order to produce these products, you buy things (I'll call these things items) from other vendors. These items are then assembled into compositions. These compositions can then be assembled into further compositions. Such a composition might not only consist of other compositions but also of other items. Finally, the compositions (possibly again along with items) are then assembled into a product.
As can be seen, the traditional approach is not ideal for this because data would have to be stored redundantly. As always, an example makes this clearer.
Here's the item table which stores the items bought from a different vendor. An item is idenfied by its id and has a txt attribute for human readability:
create table item (
  id   number primary key,
  txt  varchar2(25) not null
);
Each composition is stored in the composition table. This table, like item, has an id and a txt column.
create table composition (
  id  number primary key,
  txt varchar2(25) not null
);
Lastly, the part_to_composition table. This table allows to assign parts to a composition. A part is either an item (identified by id_item) or a composition (identified by id_non_item). In order to make sure that this rule is adhered to, a check constraint is put in place on the table.
id_composition determines which composition the item or composition is assigned to.
cnt specifies how many parts are assigned to the compositon.
create table part_to_composition (
  id_composition references composition not null,
  id_item        references item,
  id_non_item    references composition,
  cnt            number(5)  not null check (cnt > 0), 
  check (id_item is null and id_non_item is not null or id_item is not null and id_non_item is null)
);
With these tables, we're ready to insert some items:
insert into item values (1, 'item 1');
insert into item values (2, 'item 2');
insert into item values (3, 'item 3');
insert into item values (4, 'item 4');
insert into item values (5, 'item 5');
insert into item values (6, 'item 6');
Before we can assign parts to compositiones, the compositions must be recorded:
insert into composition values (1, 'comp 1');
insert into composition values (2, 'comp 2');
insert into composition values (3, 'comp 3');
insert into composition values (4, 'comp 4');
insert into composition values (5, 'comp 5');
insert into composition values (6, 'prod 1');
insert into composition values (7, 'prod 2');
insert into composition values (8, 'prod 3');
Now, parts (remember: a part is either an item or a composition) can be assigned to other compositions. Because a product is conceptually just another part, the final products are defined with the following insert statements:
insert into part_to_composition values (1,    4, null, 2);
insert into part_to_composition values (1,    6, null, 3);

insert into part_to_composition values (2,    1, null, 5);
insert into part_to_composition values (2,    4, null, 1);
insert into part_to_composition values (2,    5, null, 3);

insert into part_to_composition values (3,    2, null, 2);
insert into part_to_composition values (3, null,    1, 2);

insert into part_to_composition values (4, null,    2, 1);
insert into part_to_composition values (4,    3, null, 4);
insert into part_to_composition values (4,    6, null, 1);

insert into part_to_composition values (5, null,    3, 1);
insert into part_to_composition values (5,    2, null, 4);

insert into part_to_composition values (6, null,    5, 1);
insert into part_to_composition values (6, null,    4, 1);
insert into part_to_composition values (6,    1, null, 1);

insert into part_to_composition values (7, null,    5, 1);
insert into part_to_composition values (7, null,    3, 2);

insert into part_to_composition values (8, null,    5, 1);
insert into part_to_composition values (8,    3, null, 2);
With the data inserted, the following query can be used to show the items that belong to prod_2 (start with id_composition = 7):
select
  substr(lpad(' ', 2*level-1) || 
    nvl(composition.txt, item.txt), 1, 40) txt,
    ' [' || cnt || 'x' || '] ' cnt
from part_to_composition left join item        on part_to_composition.id_item     = item.id
                         left join composition on part_to_composition.id_non_item = composition.id 
start with id_composition = 7
connect by id_composition = prior id_non_item;
Here's the query's output. comp 3 appears twice: once as a component to comp 5 and once as a component to the product itself. Also, the subtrees of comp 3 are in both cases identical:
TXT                                      CNT
---------------------------------------- ---------
 comp 3                                   [2x]
   comp 1                                 [2x]
     item 6                               [3x]
     item 4                               [2x]
   item 2                                 [2x]
 comp 5                                   [1x]
   comp 3                                 [1x]
     comp 1                               [2x]
       item 6                             [3x]
       item 4                             [2x]
     item 2                               [2x]
   item 2                                 [4x]

Update October 27, 2005

More on Oracle

This is an on Oracle article. The most current articles of this series can be found here.