re模块包含了一些操作字符串或字节串的函数
以下介绍re模块的示例用到了正则表达式
1. 正则表达式匹配运行,以下摘抄Erlang文档函数原型
run(Subject,RE)->{match,Captured}|nomatch Types: Subject=iodata()|unicode:charlist() RE=mp()|iodata() Captured=[CaptureData] CaptureData={integer(),integer()} Thesameasrun(Subject,RE,[]). run(Subject,Options)-> {match,Captured}|match|nomatch|{error,ErrType} Types: Subject=iodata()|unicode:charlist() RE=mp()|iodata()|unicode:charlist() Options=[Option] Option=anchored |global |notbol |noteol |notempty |notempty_atstart |report_errors |{offset,integer()>=0} |{match_limit,integer()>=0} |{match_limit_recursion,integer()>=0} |{newline,NLSpec::nl_spec()} |bsr_anycrlf |bsr_unicode |{capture,ValueSpec} |{capture,ValueSpec,Type} |CompileOpt Type=index|list|binary ValueSpec=all |all_but_first |all_names |first |none |ValueList ValueList=[ValueID] ValueID=integer()|string()|atom() CompileOpt=compile_option() Seecompile/2above. Captured=[CaptureData]|[[CaptureData]] CaptureData={integer(),integer()} |ListConversionData |binary() ListConversionData=string() |{error,string(),binary()} |{incomplete,binary()} ErrType=match_limit |match_limit_recursion |{compile,CompileErr} CompileErr= {ErrString::string(),Position::integer()>=0}
RE有三种类型,一种是编译后的regular expression(透明的数据类型,{re_pattern,term(),term()},可以用re:compile获得);一种是list;一种是binary。
例子1:shell上运行
1>{ok,MP}=re:compile("<.*?>",[caseless]). {ok,{re_pattern,<<69,82,67,80,53,1,7,60,62,...>>}} 2>re:run("<HTML><body>helloworld</body></html>",MP,[]). {match,[{0,6}]}
返回结果{match,6}]},表示匹配的是第一个位置,6个字节长度("<HTML>")
例子2:也可以不用编译后的regular expression,直接使用字符串列表作为正则表达式
7>re:run("<HTML><body>helloworld</body></html>","<.*?>",[{capture,first,list}]). {match,["<HTML>"]}
这个例子中,不是返回匹配的位置和长度,而是匹配的字符串。这个选项通过{capture,...,list}指定
例子3:返回所有的匹配字符串
re:run("<HTML><body>helloworld</body></html>",list},global]). {match,[["<HTML>"],["<body>"],["</body>"],["</html>"]]}
注意返回的匹配字符串结果多了一层嵌套列表([])
例子4:贪婪模式和非贪婪模式比较
9>re:run("<HTML><body>helloworld</body></html>","<.*>",[["<HTML><body>helloworld</body></html>"]]} 10> 10> 10>re:run("<HTML><body>helloworld</body></html>",global,ungreedy]). {match,["</html>"]]}
贪婪模式是在匹配的情况下,尽可能长的匹配。非贪婪模式则相反
例子5:分组匹配
11>Data="io:format(\"hello~p~n\",[world])". "io:format(\"hello~p~n\",[world])" 12> 12>RE="(.*):(.*)\s*\\((.*)\s*\\)\s*$". "(.*):(.*)*\\((.*)*\\)*$" 13> 13>{match,[M,F,A]}=re:run(Data,[1,2,3],["io","format","\"hello~p~n\",[world]"]} 14> 14>M. "io" 15>F. "format" 16>A. "\"hello~p~n\",[world]"
正则表达式使用小括号表示的分组。分组编号从左到右依次是1,3,......
2. 字符串替换
replace原型如下:
replace(Subject,Replacement)->iodata()|unicode:charlist() Types: Subject=iodata()|unicode:charlist() RE=mp()|iodata() Replacement=iodata()|unicode:charlist() Thesameasreplace(Subject,Replacement,[]). replace(Subject,Options)-> iodata()|unicode:charlist() Types: Subject=iodata()|unicode:charlist() RE=mp()|iodata()|unicode:charlist() Replacement=iodata()|unicode:charlist() Options=[Option] Option=anchored |global |notbol |noteol |notempty |notempty_atstart |{offset,NLSpec} |bsr_anycrlf |{match_limit,integer()>=0} |bsr_unicode |{return,ReturnType} |CompileOpt ReturnType=iodata|list|binary CompileOpt=compile_option() NLSpec=cr|crlf|lf|anycrlf|any
例子6: 替换第一次匹配的字符串
17>re:replace("hello","l","k",[{return,list}]). "heklo"
例子6: 替换所有匹配的字符串
18>re:replace("hello",global]). "hekko"
3. 字符串分割
split原型如下:
split(Subject,RE)->SplitList Types: Subject=iodata()|unicode:charlist() RE=mp()|iodata() SplitList=[iodata()|unicode:charlist()] Thesameassplit(Subject,[]). split(Subject,Options)->SplitList Types: Subject=iodata()|unicode:charlist() RE=mp()|iodata()|unicode:charlist() Options=[Option] Option=anchored |notbol |noteol |notempty |notempty_atstart |{offset,nl_spec()} |{match_limit,integer()>=0} |bsr_anycrlf |bsr_unicode |{return,ReturnType} |{parts,NumParts} |group |trim |CompileOpt NumParts=integer()>=0|infinity ReturnType=iodata|list|binary CompileOpt=compile_option() Seecompile/2above. SplitList=[RetData]|[GroupedRetData] GroupedRetData=[RetData] RetData=iodata()|unicode:charlist()|binary()|list()
例子:
19>re:split("hello,world",",list}]). ["hello","world"]