Erlang的正则表达式模块re

re模块包含了一些操作字符串或字节串的函数

以下介绍re模块的示例用到了正则表达式

1. 正则表达式匹配运行，以下摘抄Erlang文档函数原型

run(Subject,RE)->{match,Captured}|nomatch
Types:
Subject=iodata()|unicode:charlist()
RE=mp()|iodata()
Captured=[CaptureData]
CaptureData={integer(),integer()}
Thesameasrun(Subject,RE,[]).
run(Subject,Options)->
{match,Captured}|match|nomatch|{error,ErrType}
Types:
Subject=iodata()|unicode:charlist()
RE=mp()|iodata()|unicode:charlist()
Options=[Option]
Option=anchored
|global
|notbol
|noteol
|notempty
|notempty_atstart
|report_errors
|{offset,integer()>=0}
|{match_limit,integer()>=0}
|{match_limit_recursion,integer()>=0}
|{newline,NLSpec::nl_spec()}
|bsr_anycrlf
|bsr_unicode
|{capture,ValueSpec}
|{capture,ValueSpec,Type}
|CompileOpt
Type=index|list|binary
ValueSpec=all
|all_but_first
|all_names
|first
|none
|ValueList
ValueList=[ValueID]
ValueID=integer()|string()|atom()
CompileOpt=compile_option()
Seecompile/2above.
Captured=[CaptureData]|[[CaptureData]]
CaptureData={integer(),integer()}
|ListConversionData
|binary()
ListConversionData=string()
|{error,string(),binary()}
|{incomplete,binary()}
ErrType=match_limit
|match_limit_recursion
|{compile,CompileErr}
CompileErr=
{ErrString::string(),Position::integer()>=0}

RE有三种类型，一种是编译后的regular expression(透明的数据类型，{re_pattern,term(),term()}，可以用re:compile获得)；一种是list；一种是binary。

例子1：shell上运行

1>{ok,MP}=re:compile("<.*?>",[caseless]).
{ok,{re_pattern,<<69,82,67,80,53,1,7,60,62,...>>}}
2>re:run("<HTML><body>helloworld</body></html>",MP,[]).
{match,[{0,6}]}

返回结果{match,6}]}，表示匹配的是第一个位置，6个字节长度("<HTML>")

例子2：也可以不用编译后的regular expression，直接使用字符串列表作为正则表达式

7>re:run("<HTML><body>helloworld</body></html>","<.*?>",[{capture,first,list}]).
{match,["<HTML>"]}

这个例子中，不是返回匹配的位置和长度，而是匹配的字符串。这个选项通过{capture,...,list}指定

例子3：返回所有的匹配字符串

re:run("<HTML><body>helloworld</body></html>",list},global]).
{match,[["<HTML>"],["<body>"],["</body>"],["</html>"]]}

注意返回的匹配字符串结果多了一层嵌套列表([])

例子4：贪婪模式和非贪婪模式比较

9>re:run("<HTML><body>helloworld</body></html>","<.*>",[["<HTML><body>helloworld</body></html>"]]}
10>
10>
10>re:run("<HTML><body>helloworld</body></html>",global,ungreedy]).
{match,["</html>"]]}

贪婪模式是在匹配的情况下，尽可能长的匹配。非贪婪模式则相反

例子5：分组匹配

11>Data="io:format(\"hello~p~n\",[world])".
"io:format(\"hello~p~n\",[world])"
12>
12>RE="(.*):(.*)\s*\\((.*)\s*\\)\s*$".
"(.*):(.*)*\\((.*)*\\)*$"
13>
13>{match,[M,F,A]}=re:run(Data,[1,2,3],["io","format","\"hello~p~n\",[world]"]}
14>
14>M.
"io"
15>F.
"format"
16>A.
"\"hello~p~n\",[world]"

正则表达式使用小括号表示的分组。分组编号从左到右依次是1,3,......

2. 字符串替换

replace原型如下：

replace(Subject,Replacement)->iodata()|unicode:charlist()
Types:
Subject=iodata()|unicode:charlist()
RE=mp()|iodata()
Replacement=iodata()|unicode:charlist()
Thesameasreplace(Subject,Replacement,[]).
replace(Subject,Options)->
iodata()|unicode:charlist()
Types:
Subject=iodata()|unicode:charlist()
RE=mp()|iodata()|unicode:charlist()
Replacement=iodata()|unicode:charlist()
Options=[Option]
Option=anchored
|global
|notbol
|noteol
|notempty
|notempty_atstart
|{offset,NLSpec}
|bsr_anycrlf
|{match_limit,integer()>=0}
|bsr_unicode
|{return,ReturnType}
|CompileOpt
ReturnType=iodata|list|binary
CompileOpt=compile_option()
NLSpec=cr|crlf|lf|anycrlf|any

例子6: 替换第一次匹配的字符串

17>re:replace("hello","l","k",[{return,list}]).
"heklo"

例子6: 替换所有匹配的字符串

18>re:replace("hello",global]).
"hekko"

3. 字符串分割

split原型如下：

split(Subject,RE)->SplitList

Types:
Subject=iodata()|unicode:charlist()
RE=mp()|iodata()
SplitList=[iodata()|unicode:charlist()]

Thesameassplit(Subject,[]).

split(Subject,Options)->SplitList

Types:
Subject=iodata()|unicode:charlist()
RE=mp()|iodata()|unicode:charlist()
Options=[Option]
Option=anchored
|notbol
|noteol
|notempty
|notempty_atstart
|{offset,nl_spec()}
|{match_limit,integer()>=0}
|bsr_anycrlf
|bsr_unicode
|{return,ReturnType}
|{parts,NumParts}
|group
|trim
|CompileOpt
NumParts=integer()>=0|infinity
ReturnType=iodata|list|binary
CompileOpt=compile_option()
Seecompile/2above.
SplitList=[RetData]|[GroupedRetData]
GroupedRetData=[RetData]
RetData=iodata()|unicode:charlist()|binary()|list()

例子：

19>re:split("hello,world",",list}]).
["hello","world"]

Erlang的正则表达式模块re

猜你在找的正则表达式相关文章