邮件地址的规范来自于 RFC 5322。有一个网站 emailregex.com专门列出各种编程语言下的验证邮件地址的正则表达式,其中很多正则表达式都是我听说过而从未见过的复杂――我想说,做这个网站的程序员是被邮件验证这件事伤害了多深啊!
其实,在产品环境中,一般来说并不需要这么复杂的正则表达式来做到99.99%正确。一般来说,从执行效率和测试覆盖率来说,只需要一个简单的版本即可:
/^[A-Z0-9._%+-]+@[A-Z0-9.-]+/.[A-Z]{2,4}$/i
那么下面我们来看看这些更严谨、更复杂的正则表达式吧:
验证邮件地址的通用正则表达式(符合 RFC 5322 标准)
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:/.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[/x01-/x08/x0b/x0c/x0e-/x1f/x21/x23-/x5b/x5d-/x7f]|//[/x01-/x09/x0b/x0c/x0e-/x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|/[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)/.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[/x01-/x08/x0b/x0c/x0e-/x1f/x21-/x5a/x53-/x7f]|//[/x01-/x09/x0b/x0c/x0e-/x7f])+)/])
由于各种语言对正则表达式的支持不同、语法差异和覆盖率不同,所以,不同语言里面的正则表达式也不同:
Python
这个是个简单的版本:
r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+/.[a-zA-Z0-9-.]+$)"
Javascript
这个有点复杂了:
/^[-a-z0-9~!$%^&*_=+}{/'?]+(/.[-a-z0-9~!$%^&*_=+}{/'?]+)*@([a-z0-9_][-a-z0-9_]*(/.[-a-z0-9_]+)*/.(aero|arpa|biz|com|coop|edu|gov|info|int|mil|museum|name|net|org|pro|travel|mobi|[a-z][a-z])|([0-9]{1,3}/.[0-9]{1,3}))(:[0-9]{1,5})?$/i
Swift
[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+//.[A-Za-z]{2,6}
PHP 的这个版本就更复杂了,覆盖率就更大一些:
/^(?!(?:(?:/x22?/x5C[/x00-/x7E]/x22?)|(?:/x22?[^/x5C/x22]/x22?)){255,})(?!(?:(?:/x22?/x5C[/x00-/x7E]/x22?)|(?:/x22?[^/x5C/x22]/x22?)){65,}@)(?:(?:[/x21/x23-/x27/x2A/x2B/x2D/x2F-/x39/x3D/x3F/x5E-/x7E]+)|(?:/x22(?:[/x01-/x08/x0B/x0C/x0E-/x1F/x21/x23-/x5B/x5D-/x7F]|(?:/x5C[/x00-/x7F]))*/x22))(?:/.(?:(?:[/x21/x23-/x27/x2A/x2B/x2D/x2F-/x39/x3D/x3F/x5E-/x7E]+)|(?:/x22(?:[/x01-/x08/x0B/x0C/x0E-/x1F/x21/x23-/x5B/x5D-/x7F]|(?:/x5C[/x00-/x7F]))*/x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+)*/.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:/[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:/]]){7,})(?:[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,3})?::(?:[a-f0-9]{1,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:/.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))/]))$/iD
Perl / Ruby
对与 PHP 的版本,Perl 和 Ruby 表示不服,可以更严谨:
(?:(?:/r/n)?[/t])*(?:(?:(?:[^()<>@,;://"./[/]/000-/031]+(?:(?:(?:/r/n)?[/t])+|/Z|(?=[/["()<>@,;://"./[/]]))|"(?:[^/"/r//]|//.|(?:(?:/r/n)?[/t]))*"(?:(?:/r/n)?[/t])*)(?:/.(?:(?:/r/n)?[/t])*(?:[^()<>@,;://"./[/]]))|"(?:[^/"/r//]|//.|(?:(?:/r/n)?[/t]))*"(?:(?:/r/n)?[/t])*))*@(?:(?:/r/n)?[/t])*(?:[^()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[/t])*)(?:/.(?:(?:/r/n)?[/t])*(?:[^()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[/t])*))*|(?:[^()<>@,;://"./[/]]))|"(?:[^/"/r//]|//.|(?:(?:/r/n)?[/t]))*"(?:(?:/r/n)?[/t])*)*/<(?:(?:/r/n)?[/t])*(?:@(?:[^()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[/t])*))*(?:,@(?:(?:/r/n)?[/t])*(?:[^()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[/t])*))*)*:(?:(?:/r/n)?[/t])*)?(?:[^()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[/t])*))*/>(?:(?:/r/n)?[/t])*)|(?:[^()<>@,;://"./[/]]))|"(?:[^/"/r//]|//.|(?:(?:/r/n)?[/t]))*"(?:(?:/r/n)?[/t])*)*:(?:(?:/r/n)?[/t])*(?:(?:(?:[^()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[/t])*))*/>(?:(?:/r/n)?[/t])*)(?:,/s*(?:(?:[^()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[/t])*))*/>(?:(?:/r/n)?[/t])*))*)?;/s
Perl 5.10 及以后版本
上面的版本,嗯,我可以说是天书吗?反正我是没有解读的想法了。当然,新版本的 Perl 语言还有一个更易读的版本(你是说真的么?)
/(?(DEFINE) (?<address>(?&mailBox)|(?&group)) (?<mailBox>(?&name_addr)|(?&addr_spec)) (?<name_addr>(?&display_name)?(?&angle_addr)) (?<angle_addr>(?&CFWS)?<(?&addr_spec)>(?&CFWS)?) (?<group>(?&display_name):(?:(?&mailBox_list)|(?&CFWS))?; (?&CFWS)?) (?<display_name>(?&phrase)) (?<mailBox_list>(?&mailBox)(?:,(?&mailBox))*) (?<addr_spec>(?&local_part)/@(?&domain)) (?<local_part>(?&dot_atom)|(?"ed_string)) (?<domain>(?&dot_atom)|(?&domain_literal)) (?<domain_literal>(?&CFWS)?/[(?:(?&FWS)?(?&dcontent))*(?&FWS)? /](?&CFWS)?) (?<dcontent>(?&dtext)|(?"ed_pair)) (?<dtext>(?&NO_WS_CTL)|[/x21-/x5a/x5e-/x7e]) (?<atext>(?&ALPHA)|(?&DIGIT)|[!#/$%&'*+-/=?^_`{|}~]) (?<atom>(?&CFWS)?(?&atext)+(?&CFWS)?) (?<dot_atom>(?&CFWS)?(?&dot_atom_text)(?&CFWS)?) (?<dot_atom_text>(?&atext)+(?:/.(?&atext)+)*) (?<text>[/x01-/x09/x0b/x0c/x0e-/x7f]) (?<quoted_pair>//(?&text)) (?<qtext>(?&NO_WS_CTL)|[/x21/x23-/x5b/x5d-/x7e]) (?<qcontent>(?&qtext)|(?"ed_pair)) (?<quoted_string>(?&CFWS)?(?&DQUOTE)(?:(?&FWS)?(?&qcontent))* (?&FWS)?(?&DQUOTE)(?&CFWS)?) (?<word>(?&atom)|(?"ed_string)) (?<phrase>(?&word)+) #Foldingwhitespace (?<FWS>(?:(?&WSP)*(?&CRLF))?(?&WSP)+) (?<ctext>(?&NO_WS_CTL)|[/x21-/x27/x2a-/x5b/x5d-/x7e]) (?<ccontent>(?&ctext)|(?"ed_pair)|(?&comment)) (?<comment>/((?:(?&FWS)?(?&ccontent))*(?&FWS)?/)) (?<CFWS>(?:(?&FWS)?(?&comment))* (?:(?:(?&FWS)?(?&comment))|(?&FWS))) #Nowhitespacecontrol (?<NO_WS_CTL>[/x01-/x08/x0b/x0c/x0e-/x1f/x7f]) (?<ALPHA>[A-Za-z]) (?<DIGIT>[0-9]) (?<CRLF>/x0d/x0a) (?<DQUOTE>") (?<WSP>[/x20/x09]) ) (?&address)/x
Ruby (简单版)
Ruby 表示,其实人家还有个简单版本:
//A([/w+/-].?)+@[a-z/d/-]+(/.[a-z]+)*/.[a-z]+/z/i
.NET
这样的版本谁没有啊――.NET 说:
^/w+([-+.']/w+)*@/w+([-.]/w+)*/./w+([-.]/w+)*$
grep 命令
用 grep 命令在文件中查找邮件地址,我想你不会写个若干行的正则表达式吧,意思一下就行了:
$grep-E-o"/b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+/.[A-Za-z]{2,6}/b"filename.txt
sql Server
在 sql Server 中也是可以用正则表达式的,不过这个代码片段应该是来自某个产品环境中的,所以,还体贴的照顾了那些把邮件地址写错的人:
selectemail fromtable_namewhere patindex('%[&'',":;!+=//()<>]%',email)>0--Invalidcharacters orpatindex('[@.-_]%',email)>0--Validbutcannotbestartingcharacter orpatindex('%[@.-_]',email)>0--Validbutcannotbeendingcharacter oremailnotlike'%@%.%'--Mustcontainatleastone@andone. oremaillike'%..%'--Cannothavetwoperiodsinarow oremaillike'%@%@%'--Cannothavetwo@anywhere oremaillike'%.@%'oremaillike'%@.%'--Cannothave@and.nexttoeachother oremaillike'%.cm'oremaillike'%.co'--CamaroonorColombia?Typos. oremaillike'%.or'oremaillike'%.ne'--Missinglastletter
Oracle PL/sql
这个是不是有点偷懒?尤其是在那些“复杂”的正则表达式之后:
SELECTemail FROMtable_name WHEREREGEXP_LIKE(email,'[A-Z0-9._%-]+@[A-Z0-9._%-]+/.[A-Z]{2,4}');
好吧,看来最后也一样懒:
SELECT*FROM`users`WHERE`email`NOTREGEXP'^[A-Z0-9._%-]+@[A-Z0-9.-]+/.[A-Z]{2,4}$';
原文来自:http://www.linuxprobe.com/regular-expression-emailregex.html
原文链接:https://www.f2er.com/regex/358508.html