正则表达式Regular Expressions

我们在读程序或者写程序时候，甚至是用一些文本编辑软件比如（notepad记事本，microsoft office word）时候，经常会遇到用正则表达式查找或者匹配一些字符串的问题。下面简单地介绍一下什么是正则表达式以及简单的用法。

正则表达式使用匹配一定模式字符串的一种强大的语言。我们经常使用的网址eg:www.baidu.com、Email eg;someone@google.com这些字符串都有一定的格式（patterns）。正则表达式就是表达这种patterns的语言。

很多程序语言中都提供对正则表达式（Regular Expressions)的支持。下面将使用Python中的re模块（提供了对正则表达式的支持）来展示正则表达式的用法。

1.基本模式（patterns)

下面的正则表达式符号只匹配字符串中打个字符：

a,X,9-----普通的字符仅仅匹配他们本身。
.（英文符号）-----匹配任意一个字符
\w------匹配一个字母或者数字
\s------匹配单个空格
\d------匹配【0-9】任意数字
^-------字符串的开始位置
$------字符串的结束位置

#匹配原始字符
#在Python中，在正则表达式前加上‘r'，可以不用考虑正则表达式中的’\'转义字符。
match=re.search(r'world','hello world')  #found,match.group()=='world'
match=re.search(r'word','hello world')   #not found,match==Null

# . 匹配任意一个字符
match=re.search(r'..d','hello world')    #found,match.group()=='rld'

# \d 匹配一个数字，\w匹配一个字母
match=re.search(r'\d\d\d\d','happy new year 2015') #found,match.group()=='2015'
match = re.search(r'\w\w\w','@@abcd!!') #  found,match.group() == "abc"

2.重复（Repetition)

+ ----重复左边字符一个或多个
* ----重复左边字符0个或多个
? ----重复左边字符0个或1个
{n1,[n2]}----重复左边字符n1次或者重复【n1-n2】次，比如：\d{3}匹配三个数字eg:123

# +匹配一个或者多个左边的字符，尽可能多的匹配
match=re.search(r'abc+','baccccc') #found match.group()=='abccccc'
match=re.search(r'c+','abccdccccc') #found match.group()=='cc'

# *匹配0个或多个左边字符，\s*匹配0个或多个空格
  match = re.search(r'\d\s*\d\s*\d','xx1 2   3xx') #  found,match.group() == "1 2   3"
  match = re.search(r'\d\s*\d\s*\d','xx12  3xx') #  found,match.group() == "12  3"
  match = re.search(r'\d\s*\d\s*\d','xx123xx') #  found,match.group() == "123"

# ^ 表示一个字符串的开始，下面将会失败。
  match = re.search(r'^b\w+','foobar') #  not found,match == None 因为‘footbar' 不是以’b‘开头的。
# 将正则表达式中’^'去掉，就能匹配成功
  match = re.search(r'b\w+','foobar') #  found,match.group() == "bar"

3.用中括号建正则表达式字符集

下面用一个例子说明：从字符串中提取alice-b@google.com这样的邮箱。

<span style="font-size:14px;">str = 'purple alice-b@google.com monkey dishwasher'
match = re.search(r'\w+@\w+',str)
if match:
    print match.group()  ## 'b@google'</span>

显然，没有提取成功，因为\w不会匹配‘-'，所以从‘’b'重新匹配，同理，'\w'不匹配’.'，所以匹配到google就结束了。

为了解决这个问题，我们可以使用正则表达式中的中括号：

  match = re.search(r'[\w.-]+@[\w.-]+',str)
  if match:
    print match.group()  ## 'alice-b@google.com'

中括号中的正则表达式构成了集合，表示可以在集合中选取部分或者全部进行匹配。'.'放进中括号内仅匹配‘.'这个字符本身。

4.分组提取

正则表达式中的分组允许你提取匹配出的字符串中的一部分。用法很简单，只需要用圆括号（）把需要提取的那部分正则表达式括起来就行了。

  str = 'purple alice-b@google.com monkey dishwasher'
  match = re.search('([\w.-]+)@([\w.-]+)',str)
  if match:
    print match.group()   ## 'alice-b@google.com' (the whole match)
    print match.group(1)  ## 'alice-b' (the username,group 1)
    print match.group(2)  ## 'google.com' (the host,group 2)

正则表达式Regular Expressions

猜你在找的正则表达式相关文章