我有一种语言,它将一个字符串定义为单引号或双引号,通过加倍将字符串转义为字符串中的分隔符.例如,所有以下内容都是合法字符串:
'This isn''t easy to parse.' 'Then John said,"Hello Tim!"' "This isn't easy to parse." "Then John said,""Hello Tim!"""
我有一个字符串(上面定义)的集合,由不包含引号的东西分隔.我正在尝试使用正则表达式,是解析列表中的每个字符串.例如,这里是一个输入:
“Some String #1” OR ‘Some String #2’ AND “Some ‘String’ #3” XOR
‘Some “String” #4’ HOWDY “Some “”String”” #5″ FOO ‘Some ”String” #6’
用于确定字符串是否具有这种形式的正则表达式是微不足道的:
^(?:"(?:[^"]|"")*"|'(?:[^']|'')*')(?:\s+[^"'\s]+\s+(?:"(?:[^"]|"")*"|'(?:[^']|'')*')*
运行上面的表达式来测试它是否是一个这样的形式,我需要另一个正则表达式从输入中获取每个分隔的字符串.我打算这样做:
Pattern pattern = Pattern.compile("What REGEX goes here?"); Matcher matcher = pattern.matcher(inputString); int startIndex = 0; while (matcher.find(startIndex)) { String quote = matcher.group(1); String quotedString = matcher.group(2); ... startIndex = matcher.end(); }
我想要一个正则表达式捕获组#1中的引号字符,组#2中的引号内的文本(我正在使用Java Regex).所以,对于上面的输入,我正在寻找一个正则表达式,在每个循环迭代中产生以下输出:
Loop 1: matcher.group(1) = " matcher.group(2) = Some String #1 Loop 2: matcher.group(1) = ' matcher.group(2) = Some String #2 Loop 3: matcher.group(1) = " matcher.group(2) = Some 'String' #3 Loop 4: matcher.group(1) = ' matcher.group(2) = Some "String" #4 Loop 5: matcher.group(1) = " matcher.group(2) = Some ""String"" #5 Loop 6: matcher.group(1) = ' matcher.group(2) = Some ''String'' #6
到目前为止我已经尝试过的模式(未转义,随后为Java代码转义):
(["'])((?:[^\1]|\1\1)*)\1 "([\"'])((?:[^\\1]|\\1\\1)*)\\1" (?<quot>")(?<val>(?:[^"]|"")*)"|(?<quot>')(?<val>(?:[^']|'')*)' "(?<quot>\")(?<val>(?:[^\"]|\"\")*)\"|(?<quot>')(?<val>(?:[^']|'')*)'"
当尝试编译模式时,这两个都会失败.
这样的正则表达式是否可能?
解决方法
制作一个与你相匹配的实用工具类:
class test { private static Pattern pd = Pattern.compile("(\")((?:[^\"]|\"\")*)\""); private static Pattern ps = Pattern.compile("(')((?:[^']|'')*)'"); public static Matcher match(String s) { Matcher md = pd.matcher(s); if (md.matches()) return md; else return ps.matcher(s); } }