First of all,here is a very useful regex tool : OSC在线正则表达式工具
1. What is Regular Expression?
A regular expression is a pattern which specifies a set of strings of characters; it is said to match certain strings.
--Ken Thompson
2. What can Regular Expression do?
1. Simple Pattern Matching
Regex ( abbreviation of regular expression,and it will be used in the follow artical ) are all about matching and finding patterns in text,from simple patterns to the very complex,for example:
- matching string literals
- matching digits :
[0-9]
- matching non-digits :
[^\d]
,which is the same as[^0-9]
- matching word and non-word characters :
\w
means matching all word characters and it's the same as[a-zA-Z0-9]
.Use\W
or[^a-zA-Z0-9]
to match a non-word character. - matching whitespace : use
\s
to matchSpaces
,Tab(\t)
,Line Feeds(\n)
andCarriage returns(\r)
.\S
can help to match a non-whitespace character,which means[^ \t\n\r]
or[^\s]
. - another way to match all characters :
"."
,and the number of this dot means the length of the charaters to match. You can also use.{8}
( in the brace you can put any numbers ). Of course,here we can also set some word boundaries like this:\bR.{3}x\b
can match Regex if there is a Regex in your text. - marking up the text : there will be more later in this article.
2. Boundaries
In this part,I am going to talk about zero-width assertions. It does not match a character,but rather a location in a string. Some of these zero-width assertions such as ^
and $
,are also called anchors. Here are some boundaries i am talking about:
- the beginning and end of a line:
- word boundaries:
- the beginning and end of a subject:
- boundaries that quote string liberals: