function ParseWord(const Source,Table:String;var Index:Integer):String;
Sequential,left to right token parsing using a table of single
character delimiters. Delimiters within quoted strings are ignored.
Quote delimiters are not allowed in Table.Index is a pointer (initialize to ‘1’ for first word) updated by the
function to point to next word. To retrieve the next word,simply
call the function again using the prior returned Index value.Note: If Length(Resultant) = 0,no additional words are available.
Delimiters within quoted strings are ignored. (my emphasis)
这是我到目前为止:
function ParseWord( const Source,Table: String; var Index: Integer):string; var RE : TRegEx; match : TMatch; Table2,chars : string; begin if index = length(Source) then begin result:= ''; exit; end; // escape the special characters and wrap in a Group Table2 :='['+TRegEx.Escape(Table,false)+']'; RE := TRegEx.create(Table2); match := RE.Match(Source,Index); if match.success then begin result := copy( Source,Index,match.Index - Index); Index := match.Index+match.Length; end else begin result := copy(Source,length(Source)-Index+1); Index := length(Source); end; end; while ( Length(result)= 0) and (Index<length(Source)) do begin Inc(Index); result := ParseWord(Source,Table,Index); end;
欢呼和谢谢.
解决方法
Table2 := '''[^'']+''|"[^"]+"|[^' + TRegEx.Escape(Table,false) + ']+';
演示:
这个演示更像是一个POC,因为我无法找到一个在线delphi正则表达式测试器.
>分隔符是空格(ASCII代码32)和管道(ASCII代码124)字符.
>测试句子是:
toto titi “alloa toutou” ‘dfg erre’ 1245|coucou “nestor|delphi” “” ”
讨论:
我假设带引号的字符串是由两个单引号(‘)或两个双引号(“)括起来的字符串.如果我错了,请纠正我.
正则表达式将匹配:
>单引号字符串
>双引号字符串
>一个不是由任何传递的分隔符组成的字符串
已知错误:
由于我不知道ParseWord如何处理在字符串内转义的引用,因此正则表达式不支持此功能.
例如 :
>如何解释这个’foo”bar’? =>两个令牌:’foo’和’bar’或一个令牌’foo”bar’.>这个案子呢:“foo”“bar”? =>两个令牌:“foo”和“bar”或一个令牌“foo”“bar”.