René Nyffenegger's collection of things on the web
René Nyffenegger on Oracle - Most wanted - Feedback -
 

Splitting text lines and populating views

Someone has asked an interesting question on comp.databases.oracle.server. He has the following data:
create table incoming_data (
  d varchar2(50)
);

insert into incoming_data values ('   13002000'                 );
insert into incoming_data values ('00000300   04001000'         );
insert into incoming_data values ('00000300 04001000   12001700');
insert into incoming_data values (''                            );
insert into incoming_data values ('        '                    );
insert into incoming_data values ('      07001500'              );
insert into incoming_data values ('    06001800'                );
insert into incoming_data values ('             21000600'       );
Each record consists of zero or more eight-digit blocks. The task is to split a line into more records of which each contains exactly one eight-digit block. Then, after the split, a view must be populated.
On Oracle 10g the split can be done with regular expressions.
The functionality that uses the regular expressions will be programmed in a function (named split_records). This function returns a table of char(8) which is a user defined type (whose name is table_char_8). It must first be created:
create type table_char_8 as table of char(8);
/
create or replace function split_records
  (in_line in incoming_data.d%type)
  return table_char_8
as
  ret table_char_8;
  s   varchar2(50);
  p   varchar2(50);
  a   char(4);
  b   char(4);
  i   number:=1;
begin
  ret :=table_char_8();

  p := '[[:digit:]]{8}';
  s := regexp_substr(in_line, p);

  while s is not null loop
    ret.extend;
    ret(ret.count) := s;

    i := regexp_instr(in_line,p,i);
    i := i+8;

    s := regexp_substr(in_line, p,i);
  end loop;
  
  return ret;
  
end split_records;
/
The variable p defines the search pattern. In this case, it says, it must be eight digits in a row ([[:digit:]]{8}). This pattern is then repeatedly applied on regexp_substr. regexp_substr returns the string that matches p. In this case, it returns the next eight digit block.
The variable i is used to remember the position of the last match. We do not want to match the same eight digit block twice, so we have to tell regexp_substr at what position it has to start with the match. regexp_instr returns the position of a match which will be assigned to i. i is then incremented by 8 (length of digit block). regexp_substr returns null when it does not match anything. If it returns null, the while loop will exit.
This function can now be applied on the table:
select column_value from incoming_data, table (split_records(d));
COLUMN_V
--------
13002000
00000300
04001000
00000300
04001000
12001700
07001500
06001800
21000600
column_value is, if you want, an alias for the unnamed column that is returned by split_records.
Now, it is easily possible to create the view.
create view v_incoming_data as
  select column_value piece
  from incoming_data, table (split_records(d));