René Nyffenegger's collection of things on the web
René Nyffenegger on Oracle - Most wanted - Feedback -
 

December 22, 2005: On making Oracle's regular expressions a bit easier to use

I like programming Perl a lot. When I first used Perl some ten years ago, I immediatly fell in love with this language because it was so easy to use regular expressions (I don't know much about today's trendy languages like Ruby or Python, but I guess, these languages make it equally simple to use regular expressions).
Now, Oracle 10g comes with four regular expressions supporting sql functions: regexp_instr, regexp_substr, regexp_replace and regexp_like. Yet, using them is not nearly as simple as using Perl's regular expressions. Specifically, I am missing an easy way to get the values matched by paranthesis. So, I had to write a package to do that for me.
Here's the package's specification:
create package re as
 
  type matched is table of varchar2(100);

  function match (
    str in varchar2,
    pat in varchar2,
    match# in number,
    flags in varchar2 := null) return matched;

end re;
/
In order to use the package, the function match must be called. str is the string on which the regular expression pat is applied. match# specifies how many parantheses pairs are in pat. Finally, flags corresponds to the parameters i, c, n, m and x.
When the function is called, it returns a table of varchar2 containing match# varchar2s. Each varchar2 found in the table corresponds to a paranthesis pair given in pat. If it couldn't match the string, it returns null.
Here's the package's body:
create package body re as

  function match (
     str    in varchar2, 
     pat    in varchar2, 
     match# in number, 
     flags in varchar2 := null) return matched is 

    ret matched;

    pos_match_begin  number;
    pos_match_end    number;
    str_used         varchar2(4000);

  begin

  if match# > 8 then
    raise_application_error(-20000, 'at most 8 matches!');
  end if;
    
    pos_match_begin := regexp_instr (
      str,   -- 
      pat, 
      1,     -- start position
      1,     -- occurance
      0,     -- return option
      flags
    );

    if pos_match_begin > 0 then 
      ret := matched();
      ret.extend(match#);
    else 
      return null;
    end if;

    pos_match_end   := regexp_instr (
      str,   -- 
      pat, 
      1,     -- start position
      1,     -- occurance
      1,     -- return option
      flags
    );

    str_used := substr(str, pos_match_begin, pos_match_end - pos_match_begin);

    for i in 1 .. match# loop
      ret(i) := regexp_replace(str_used, '(' || pat || ')', '\' || (i+1) , 1/*pos*/, 1, flags);
    end loop;

    return ret;

  end match;

end re;
/
I want everyone on my database to profit from this package, so I grant execute on it to public and give a public synonym for the package:
grant execute on re to public;
create public synonym re for re;

First example

The first example tries to find three numbers. The regular expression contains three paranthesis (one for each number I want to find). Each paranthesis contains a \d+ which is a short hand form meaning one or more digits. \D+ is a short hand form meaning any letter except digits. I don't want to find non-digits, so these are not found within paranthesis.
declare
  matched re.matched;
begin

  matched := re.match('foo 123 bar 456 baz 789 qax',
                      '(\d+)'    || 
                      '\D+'      ||
                      '(\d+)'    || 
                      '\D+'      ||
                      '(\d+)'    ,
                      3);

  for i in 1 .. matched.count loop
    dbms_output.put_line(matched(i));
  end loop;

end;
/
123
456
789

Second example

The second example tries to match dog or cat in a case insensitive manner. Therefore, I pass an i (meaning case insensitive) as forth parameter to match. The pipe symbol in the regular expression must be read as or. Hence, the function returns dog or cat, but preserves its original case:
declare
  matched re.matched;
  regexp varchar2(100);
begin

  regexp := '(dog|cat)';

  matched := re.match('FroG DoG mOusE',
                       regexp,
                       1,
                       'i');

  dbms_output.put_line(matched(1));

  matched := re.match('A cAtAstrophal caT',
                       regexp,
                       1,
                       'i');

  dbms_output.put_line(matched(1));

end;
/
DoG
cAt

Third example

The third example tries to determine if there is at least an upper case letter in str. The [[:upper:]]+ matches one or more consecutive upper case letters. If there is none, match will return null.
declare

  procedure find_upperace(str in varchar2) is 
    matched re.matched;
  begin

    matched := re.match(str, '([[:upper:]]+)', 1);

    if matched is null then
      dbms_output.put_line('no upper case letters in: ' || str);
    else
      dbms_output.put_line('Found upper case letters: ' || matched(1));
    end if;

  end find_upperace; 

begin

  find_upperace('one two three');
  find_upperace('fiveSIXseven' );

end;
/
no upper case letters in: one two three
Found upper case letters: SIX

Limitations

Unfortunately, at most eight paranthesis pairs can be passed to match. This is because Oracle's regular expressions can have at most 9 so called back references, and the package needs one pair for internal use.

More on Oracle

This is an on Oracle article. The most current articles of this series can be found here.