René Nyffenegger's collection of things on the web | |
René Nyffenegger on Oracle - Most wanted - Feedback
|
August 20, 2006: On splitting a string into words with regular expressions | ||
With Oracle's regular expressions, it's relatively easy to split a string into it's words.
Easy, that is, compared with a solution that doesn't use regexp_substr.
It still requires some amount of setup as the following steps show...
First, I need a nested table type. This nested table type (t_vc2_50) is basically an array of varchar2(50) and will contain the words from the
splitted string.
create or replace type t_vc2_50 as table of varchar2(50); /
Then, I need a function that actually splits the string into its words:
create or replace function split_vc2(str in varchar2) return t_vc2_50 as ret t_vc2_50 := t_vc2_50(); i number := 1; part varchar2(50); begin loop part := regexp_substr(str, '([^[:blank:]]+)', 1, i); exit when part is null; ret.extend; ret(i) := part; i := i+1; end loop; return ret; end split_vc2; /
I want a table that stores the words when I execute the function:
create table splitted_words ( id number primary key, word varchar2(50) );
This function basically searches for continuous non white space characters.
[:blank:] finds white space characters, the preceeding
caret (^) negates it (which then searches for non white space characters), then, the + searches for one ore more (continuous) non white space characters.
The loop variable
i , used as fourth parameter to regexp_substr , tells to give the ith matched substring.
The loop is exited when a returned substring is null, that is, when the last possible substring has been
found.
Each found substring is stored at position
i in the collection variable ret . At the end of the function, ret
is returned.
Finally, I execute the function. The table() creates a virtual table (if I may use
this word) that, in this case here, consists of one column whose type is varchar2(50). The number of rows in this virtual table is given
by the number of elements in ret.
Since the column in the virtual table has no name, I have to use Oracle's predefined name
column_value , which refers to that
virtual column.
insert into splitted_words select rownum, column_value from table(split_vc2( 'Hello world, here''s something 4 u ' ));
Let's check what I have inserted into splitted_words:
select * from splitted_words;
Indeed, the words are splitted on white space characters:
ID WORD ---------- --------------- 1 Hello 2 world, 3 here's 4 something 5 4 6 u
It might be argued that the comma in world, should be stripped out as well, but that's left as an exercise for the reader :-).
Links
See also Poor man's text index where I created a function to split text into words
without using Oracle's regular expressions (that was at the time of 8i and 9i).
More on OracleThis is an on Oracle article. The most current articles of this series can be found here.
|