|René Nyffenegger's collection of things on the web|
René Nyffenegger on Oracle - Most wanted - Feedback
August 20, 2006: On splitting a string into words with regular expressions
With Oracle's regular expressions, it's relatively easy to split a string into it's words. Easy, that is, compared with a solution that doesn't use regexp_substr. It still requires some amount of setup as the following steps show...
First, I need a nested table type. This nested table type (t_vc2_50) is basically an array of varchar2(50) and will contain the words from the splitted string.
create or replace type t_vc2_50 as table of varchar2(50); /
Then, I need a function that actually splits the string into its words:
create or replace function split_vc2(str in varchar2) return t_vc2_50 as ret t_vc2_50 := t_vc2_50(); i number := 1; part varchar2(50); begin loop part := regexp_substr(str, '([^[:blank:]]+)', 1, i); exit when part is null; ret.extend; ret(i) := part; i := i+1; end loop; return ret; end split_vc2; /
I want a table that stores the words when I execute the function:
This function basically searches for continuous non white space characters.
The loop variable
The loop is exited when a returned substring is null, that is, when the last possible substring has been found.
Each found substring is stored at position
Finally, I execute the function. The table() creates a virtual table (if I may use this word) that, in this case here, consists of one column whose type is varchar2(50). The number of rows in this virtual table is given by the number of elements in ret.
Since the column in the virtual table has no name, I have to use Oracle's predefined name
Let's check what I have inserted into splitted_words:
select * from splitted_words;
Indeed, the words are splitted on white space characters:
ID WORD ---------- --------------- 1 Hello 2 world, 3 here's 4 something 5 4 6 u
It might be argued that the comma in world, should be stripped out as well, but that's left as an exercise for the reader :-).
See also Poor man's text index where I created a function to split text into words without using Oracle's regular expressions (that was at the time of 8i and 9i).
More on Oracle
This is an on Oracle article. The most current articles of this series can be found here.