正则表达式用嵌套引号解析csv

前端之家收集整理的这篇文章主要介绍了正则表达式用嵌套引号解析csv前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。

Possible Duplicate:
07000
07001

我知道这个问题已经问了很多次,但有不同的答案;我很困惑.

我的行是:

1,3.2,BCD,"qwer 47"" ""dfg""",1

可选的引用和双引号MS Excel标准. (数据:qwer 47“”dfg“表示如下”qwer 47“”“”dfg“”“.)

我需要一个正则表达式.

好的,你从评论中看到正则表达式不是正确的工具.但如果你坚持,这里有:

这个正则表达式将在Java(或.NET和其他支持占有量词和冗长正则表达式的实现)中工作:

^            # Start of string
(?:          # Match the following:
 (?:         #  Either match
  [^",\n]*+  #   0 or more characters except comma,quote or newline
 |           #  or
  "          #   an opening quote
  (?:        #   followed by either
   [^"]*+    #    0 or more non-quote characters
  |          #   or
   ""        #    an escaped quote ("")
  )*         #   any number of times
  "          #   followed by a closing quote
 )           #  End of alternation,#  Match a comma (separating the CSV columns)
)*           # Do this zero or more times.
(?:          # Then match
 (?:         #  using the same rules as above
  [^",\n]*+  #  an unquoted CSV field
 |           #  or a quoted CSV field
  "(?:[^"]*+|"")*"
 )           #  End of alternation
)            # End of non-capturing group
$           # End of string

Java代码

boolean foundMatch = subjectString.matches(
    "(?x)^         # Start of string\n" +
    "(?:           # Match the following:\n" +
    " (?:          #  Either match\n" +
    "  [^\",\\n]*+ #   0 or more characters except comma,quote or newline\n" +
    " |            #  or\n" +
    "  \"          #   an opening quote\n" +
    "  (?:         #   followed by either\n" +
    "   [^\"]*+    #    0 or more non-quote characters\n" +
    "  |           #   or\n" +
    "   \"\"       #    an escaped quote (\"\")\n" +
    "  )*          #   any number of times\n" +
    "  \"          #   followed by a closing quote\n" +
    " )            #  End of alternation\n" +
    ",#  Match a comma (separating the CSV columns)\n" +
    ")*            # Do this zero or more times.\n" +
    "(?:           # Then match\n" +
    " (?:          #  using the same rules as above\n" +
    "  [^\",\\n]*+ #  an unquoted CSV field\n" +
    " |            #  or a quoted CSV field\n" +
    "  \"(?:[^\"]*+|\"\")*\"\n" +
    " )            #  End of alternation\n" +
    ")             # End of non-capturing group\n" +
    "$            # End of string");

请注意,您不能假设CSV文件中的每一行都是完整的行.您可以在CSV行中包含换行符(只要包含换行符的列用引号括起来).这个正则表达式知道这一点,但如果你只给它一个部分行,它就会失败.这是您真正需要CSV解析器来验证CSV文件的另一个原因.这就是解析器的作用.如果您控制输入并且知道在CSV字段中永远不会有换行符,那么您可能会放弃它,但只有这样.

猜你在找的正则表达式相关文章