正则表达式
1适用对象
.net framework提供的正则表达是专门服务于string类型的字符检索和模式匹配的,当然,string累提供的方法,比如IndexOf()等也能检索和匹配想要的方法,但是代码书写繁琐,方便性和灵活性上,都输于正则表达式。
2入门级的正则表达式
2.1Plain-Text查找
如下所示给出一串字符串:
const string myText = @”This comprehensive compendium provides a broad and thorough investigation of all aspects of programming with ASP.NET. Entirely revised and updated for the fourth release of .NET,this book will give you the information you need to master ASP.NET and build a dynamic,successful,enterprise Web application.”;
要想检索出字符子串“ion”所在的所有位置,正则表达该如何书写呢?书写格式非常简单,如下所示:
const string pattern = @"this";
然后利用.net提供的Regex类,匹配所有带有ion的位置,代码如下所示:
MatchCollection myMatches = Regex.Matches(myText,pattern,RegexOptions.IgnoreCase |
RegexOptions.ExplicitCapture); //Mathes()匹配方法
foreach (Match nextMatch in myMatches)
{
Console.WriteLine(string.Format("{0} ",nextMatch.Index));
}
查询搜索后的结果为2个匹配项,index分别为0,181. myText的第0号位置为t,单词为this,第181号位置t,对应的单词也为this。
const string myText = @”This comprehensive compendium provides a broad and thorough investigation of all aspects of programming with ASP.NET. Entirely revised and updated for the fourth release of .NET,this book will give you the information you need to master ASP.NET and build a dynamic,enterprise Web application.”;
像pattern = “this”这种正则表达式,是一种文本模式,翻译过来称为 “plain -text search”
2.2Metacharacters查找
元字符(Metacharacter)are special characters that provide commands,as well as escape sequences(\b),which work in much the same way
as C# escape sequences.They are characters preceded by a backslash (“\”) and have special meanings.
例如1,想要查找以字母t开头的所有单词,
const string pattern = @"\bt";
MatchCollection myMatches = Regex.Matches(myText,RegexOptions.IgnoreCase |
RegexOptions.ExplicitCapture);
结果搜索到单词在myText中的index分别为0,51,153,181,205,230
例如2,如果想要查找以tion结尾的单词,可以使用:
const string pattern = @"ion\b";
MatchCollection myMatches = Regex.Matches(myText,RegexOptions.IgnoreCase |
RegexOptions.ExplicitCapture);
结果搜索到单词在myText中的index分别为70,217,304都以ion结尾。常用的Metacharacters主要包括:
符号 |
描述 | 例子 | 匹配举例 |
---|---|---|---|
^ | Beginning of input text | ^B | B,but only if first character in text |
¥(美元符号) | End of input text | X$ | X,but only if last character in text |
. | Any single character except the newline character (\ ) | i.ation | isation,ization |
* | Preceding character may be repeated zero or more times | ra*t | rt,rat,raat,raaat,and so on |
+ | Preceding character may be repeated one or more times | ra+t | rat,raaat and so on,but not rt |
? | Preceding character may be repeated zero or one time | ra?t | rt and rat only |
\s | Any whitespace character | \sa | [space]a,\ta,\na (\t and \n have the same meanings as in C#) |
\S | Any character that isn’t whitespace | \SF | aF,rF,cF,but not \tf |
\b | Word boundary | ion\b | Any word ending in ion |
\B | Any position that isn’t a word boundary \BX\B | Any X in the middle of a word |
以上元字符的任意组合查询举例:
以a字符开头,以ion结尾的,中间不能出现空格的所有单词,
const string pattern = @"\ba\S*ion\b";
MatchCollection myMatches = Regex.Matches(myText,RegexOptions.IgnoreCase |
RegexOptions.ExplicitCapture);
结果,检索出的位置未334,可以看到时application这个单词。