(轉)正則表達式在ORACLE中的使用

Oracle使用正則表達式離不開這4個函數： 1。regexp_like 2。regexp_substr 3。regexp_instr 4。regexp_replace 看函數名稱大概就能猜到有什麼用了。 regexp_like 只能用於條件表達式，和 like 類似，但是使用的正則表達式進行匹配， ...

Oracle使用正則表達式離不開這4個函數：

1。regexp_like

2。regexp_substr

3。regexp_instr

4。regexp_replace

看函數名稱大概就能猜到有什麼用了。

regexp_like 只能用於條件表達式，和 like 類似，但是使用的正則表達式進行匹配，語法很簡單：

regexp_substr 函數，和 substr 類似，用於拾取合符正則表達式描述的字元子串，語法如下：

regexp_instr 函數，和 instr 類似，用於標定符合正則表達式的字元子串的開始位置，語法如下：

regexp_replace 函數，和 replace 類似，用於替換符合正則表達式的字元串，語法如下：

這裡解析一下幾個參數的含義：

1。source_char，輸入的字元串，可以是列名或者字元串常量、變數。

2。pattern，正則表達式。

3。match_parameter，匹配選項。

取值範圍： i：大小寫不敏感； c：大小寫敏感；n：點號 . 不匹配換行符號；m：多行模式；x：擴展模式，忽略正則表達式中的空白字元。

4。position，標識從第幾個字元開始正則表達式匹配。

5。occurrence，標識第幾個匹配組。

6。replace_string，替換的字元串。

說了一堆文縐縐的，現在開始實例演練了，在此之前先建好一個表。

create table tmp as

with data as (

select 'like' as id ,'a9999' as str from dual union all

select 'like' ,'a9c' from dual union all

select 'like' ,'A7007' from dual union all

select 'like' ,'123a34cc' from dual union all

select 'substr' ,'123,234,345' from dual union all

select 'substr' ,'12,34.56:78' from dual union all

select 'substr' ,'123456789' from dual union all

select 'instr' ,'192.168.0.1' from dual union all

select 'replace' ,'(020)12345678' from dual union all

select 'replace' ,'001517729C28' from dual

)

select * from data ;

select * from tmp ;

ID STR

------- -------------

like a9999

like a9c

like A7007

like 123a34cc

substr 123,234,345

substr 12,34.56:78

substr 123456789

instr 192.168.0.1

replace (020)12345678

replace 001517729C28

regexp_like 例子：

select str from tmp where id='like' and regexp_like(str,'A\d+','i'); -- 'i' 忽略大小寫

STR

-------------

a9999

a9c

A7007

123a34cc

select str from tmp where id='like' and regexp_like(str, 'a\d+');

STR

-------------

a9999

a9c

123a34cc

select str from tmp where id='like' and regexp_like(str,'^a\d+');

STR

-------------

a9999

a9c

select str from tmp where id='like' and regexp_like(str,'^a\d+$');

STR

-------------

a9999

regexp_substr 例子：

col str format a15;

select

str,

regexp_substr(str,'[^,]+') str,

regexp_substr(str,'[^,]+',1,1) str,

regexp_substr(str,'[^,]+',1,2) str, -- occurrence 第幾個匹配組

regexp_substr(str,'[^,]+',2,1) str -- position 從第幾個字元開始匹配

from tmp

where id='substr';

STR STR STR STR STR

--------------- --------------- --------------- --------------- ---------------

123,234,345 123 123 234 23

12,34.56:78 12 12 34.56:78 2

123456789 123456789 123456789 23456789

select

str,

regexp_substr(str,'\d') str,

regexp_substr(str,'\d+' ,1,1) str,

regexp_substr(str,'\d{2}',1,2) str,

regexp_substr(str,'\d{3}',2,1) str

from tmp

where id='substr';

STR STR STR STR STR

--------------- --------------- --------------- --------------- ---------------

123,234,345 1 123 23 234

12,34.56:78 1 12 34

123456789 1 123456789 34 234

select regexp_substr('123456789','\d',1,level) str --取出每位數字，有時這也是行轉列的方式

from dual

connect by level<=9

STR

---------------

regex_instr 例子：

col ind format 9999;

select

str,

regexp_instr(str,'\.' ) ind ,

regexp_instr(str,'\.',1,2) ind ,

regexp_instr(str,'\.',5,2) ind

from tmp where id='instr';

STR IND IND IND

--------------- ----- ----- -----

192.168.0.1 4 8 10

select

regexp_instr('192.168.0.1','\.',1,level) ind , -- 點號. 所在的位置

regexp_instr('192.168.0.1','\d',1,level) ind -- 每個數字的位置

from dual

connect by level <= 9

IND IND

----- -----

4 1

8 2

10 3

0 5

0 6

0 7

0 9

0 11

0 0

regex_replace 例子：

select

str,

regexp_replace(str,'020','GZ') str,

regexp_replace(str,'(\d{3})(\d{3})','<\2\1>') str -- 將第一、第二捕獲組交換位置，用尖括弧標識出來

from tmp

where id='replace';

STR STR STR

--------------- --------------- ---------------

(020)12345678 (GZ)12345678 (020)<456123>78

001517729C28 001517729C28 <517001>729C28

綜合應用的例子：

col row_line format a30;

with sudoku as (

select '020000080568179234090000010030040050040205090070080040050000060289634175010000020' as line

from dual

tmp as (

select regexp_substr(line,'\d{9}',1,level) row_line,

level col

from sudoku

connect by level<=9

)

select regexp_replace( row_line ,'(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)','\1 \2 \3 \4 \5 \6 \7 \8 \9') row_line

from tmp

ROW_LINE

------------------------------

0 2 0 0 0 0 0 8 0

5 6 8 1 7 9 2 3 4

0 9 0 0 0 0 0 1 0

0 3 0 0 4 0 0 5 0

0 4 0 2 0 5 0 9 0

0 7 0 0 8 0 0 4 0

0 5 0 0 0 0 0 6 0

2 8 9 6 3 4 1 7 5

0 1 0 0 0 0 0 2 0

oracle的正則表達式(regular expression)簡單介紹
目前，正則表達式已經在很多軟體中得到廣泛的應用，包括*nix（Linux, Unix等），HP等操作系統，PHP，C#，Java等開發環境。
Oracle 10g正則表達式提高了SQL靈活性。有效的解決了數據有效性，重覆詞的辨認, 無關的空白檢測，或者分解多個正則組成的字元串等問題。
Oracle 10g支持正則表達式的四個新函數分別是：REGEXP_LIKE、REGEXP_INSTR、REGEXP_SUBSTR、和REGEXP_REPLACE。
它們使用POSIX 正則表達式代替了老的百分號（%）和通配符（_）字元。
特殊字元：
'^' 匹配輸入字元串的開始位置，在方括弧表達式中使用，此時它表示不接受該字元集合。
'$' 匹配輸入字元串的結尾位置。如果設置了 RegExp 對象的 Multiline 屬性，則 $ 也匹配 '\n' 或'\r'。
'.' 匹配除換行符 \n之外的任何單字元。
'?' 匹配前面的子表達式零次或一次。
'*' 匹配前面的子表達式零次或多次。
'+' 匹配前面的子表達式一次或多次。
'( )' 標記一個子表達式的開始和結束位置。
'[]' 標記一個中括弧表達式。
'{m,n}' 一個精確地出現次數範圍，m=<出現次數<=n，'{m}'表示出現m次，'{m,}'表示至少出現m次。
'|' 指明兩項之間的一個選擇。例子'^([a-z]+|[0-9]+)$'表示所有小寫字母或數字組合成的字元串。
\num 匹配 num，其中 num 是一個正整數。對所獲取的匹配的引用。
正則表達式的一個很有用的特點是可以保存子表達式以後使用，被稱為Backreferencing. 允許複雜的替換能力
如調整一個模式到新的位置或者指示被代替的字元或者單詞的位置. 被匹配的子表達式存儲在臨時緩衝區

中，緩衝區從左到右編號, 通過\數字元號訪問。下麵的例子列出了把名字 aa bb cc 變成cc, bb, aa.
Select REGEXP_REPLACE('aa bb cc','(.*) (.*) (.*)', '\3, \2, \1') FROM dual；
REGEXP_REPLACE('ELLENHILDISMIT
cc, bb, aa
'\' 轉義符。
字元簇：
[[:alpha:]] 任何字母。
[[:digit:]] 任何數字。
[[:alnum:]] 任何字母和數字。
[[:space:]] 任何白字元。
[[:upper:]] 任何大寫字母。
[[:lower:]] 任何小寫字母。
[[unct:]] 任何標點符號。
[[:xdigit:]] 任何16進位的數字，相當於[0-9a-fA-F]。
各種操作符的運算優先順序
\ 轉義符
(), (?, (?=), [] 圓括弧和方括弧
*, +, ?, {n}, {n,}, {n,m} 限定符
^, $, \anymetacharacter 位置和順序
| “或”操作