| René Nyffenegger's collection of things on the web | |
|
René Nyffenegger on Oracle - Most wanted - Feedback
|
December 22, 2005: On making Oracle's regular expressions a bit easier to use | ||
|
I like programming Perl a lot. When I first used Perl some ten years ago, I immediatly fell in love with this language because it was
so easy to use regular expressions (I don't know much about today's trendy languages like Ruby or Python, but I guess, these
languages make it equally simple to use regular expressions).
Now, Oracle 10g comes with four
regular expressions supporting sql functions:
regexp_instr,
regexp_substr, regexp_replace
and regexp_like. Yet, using them is not nearly as simple as using Perl's regular expressions.
Specifically, I am missing an easy way to get the values matched by paranthesis. So, I had to write a package to do that for me.
Here's the package's specification:
create package re as
type matched is table of varchar2(100);
function match (
str in varchar2,
pat in varchar2,
match# in number,
flags in varchar2 := null) return matched;
end re;
/
In order to use the package, the function
match must be called. str is the string on which the
regular expression pat is applied. match# specifies how many parantheses pairs are in pat.
Finally, flags corresponds to the parameters i, c, n, m and x.
When the function is called, it returns a
table of varchar2 containing match# varchar2s. Each varchar2 found
in the table corresponds to a paranthesis pair given in pat. If it couldn't match the string, it returns
null.
Here's the package's body:
create package body re as
function match (
str in varchar2,
pat in varchar2,
match# in number,
flags in varchar2 := null) return matched is
ret matched;
pos_match_begin number;
pos_match_end number;
str_used varchar2(4000);
begin
if match# > 8 then
raise_application_error(-20000, 'at most 8 matches!');
end if;
pos_match_begin := regexp_instr (
str, --
pat,
1, -- start position
1, -- occurance
0, -- return option
flags
);
if pos_match_begin > 0 then
ret := matched();
ret.extend(match#);
else
return null;
end if;
pos_match_end := regexp_instr (
str, --
pat,
1, -- start position
1, -- occurance
1, -- return option
flags
);
str_used := substr(str, pos_match_begin, pos_match_end - pos_match_begin);
for i in 1 .. match# loop
ret(i) := regexp_replace(str_used, '(' || pat || ')', '\' || (i+1) , 1/*pos*/, 1, flags);
end loop;
return ret;
end match;
end re;
/
I want everyone on my database to profit from this package, so I grant execute on it to
public and give a public synonym
for the package:
grant execute on re to public; create public synonym re for re; First example
The first example tries to find three numbers. The regular expression contains three paranthesis (one for each number I want to find).
Each paranthesis contains a
\d+ which is a short hand form meaning one or more digits. \D+ is a short hand
form meaning any letter except digits. I don't want to find non-digits, so these are not found within paranthesis.
declare
matched re.matched;
begin
matched := re.match('foo 123 bar 456 baz 789 qax',
'(\d+)' ||
'\D+' ||
'(\d+)' ||
'\D+' ||
'(\d+)' ,
3);
for i in 1 .. matched.count loop
dbms_output.put_line(matched(i));
end loop;
end;
/
123 456 789 Second example
The second example tries to match dog or cat in a case insensitive manner. Therefore, I pass an
i (meaning case insensitive)
as forth parameter to match. The pipe symbol in the regular expression must be read as or. Hence, the function returns dog or cat, but
preserves its original case:
declare
matched re.matched;
regexp varchar2(100);
begin
regexp := '(dog|cat)';
matched := re.match('FroG DoG mOusE',
regexp,
1,
'i');
dbms_output.put_line(matched(1));
matched := re.match('A cAtAstrophal caT',
regexp,
1,
'i');
dbms_output.put_line(matched(1));
end;
/
DoG cAt Third example
The third example tries to determine if there is at least an upper case letter in
str. The [[:upper:]]+
matches one or more consecutive upper case letters. If there is none, match will return null.
declare
procedure find_upperace(str in varchar2) is
matched re.matched;
begin
matched := re.match(str, '([[:upper:]]+)', 1);
if matched is null then
dbms_output.put_line('no upper case letters in: ' || str);
else
dbms_output.put_line('Found upper case letters: ' || matched(1));
end if;
end find_upperace;
begin
find_upperace('one two three');
find_upperace('fiveSIXseven' );
end;
/
no upper case letters in: one two three Found upper case letters: SIX Limitations
Unfortunately, at most eight paranthesis pairs can be passed to
match. This is because Oracle's regular expressions
can have at most 9 so called back references, and the package needs one pair for internal use.
More on OracleThis is an on Oracle article. The most current articles of this series can be found here.
|