只输出匹配的模式--正则表达式的一个应用

前端之家收集整理的这篇文章主要介绍了只输出匹配的模式--正则表达式的一个应用前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。

先看需要匹配的文件需求:

如下为防火墙日志,其中有字段service=http proto=6,如何输出service=http这个字段呢,service字段有可能包含好几个空格,并且不确定究竟会有几个,但是后面的字段肯定是proto,如何用awk模式匹配输出service这个字段呢?

[dsadm@dataStage test]$ more sedonly.txt 
2011-09-30 00:00:20     Local0.Notice   10.2.0.254      ns50: NetScreen device_id=0019022004000299  [Root]system-notification-00257(traffic): start_time="2011-09-30 00:01:05" duration=15 polic
y_id=103 <span style="font-size:18px;">service=http proto=6</span> src zone=Trust dst zone=Untrust action=Permit sent=2683 rcvd=766 src=10.100.1.43 dst=119.188.11.3 src_port=4048 dst_port=80 src-xlated ip=218.206.244.202 port=467
9 dst-xlated ip=119.188.11.3 port=80 session_id=61727 reason=Close - AGE OUT<000>
 2011-09-30 00:00:20     Local0.Notice   10.2.0.254      ns50: NetScreen device_id=0019022004000299  [Root]system-notification-00257(traffic): start_time="2011-09-30 00:01:05" duration=15 poli
cy_id=103 <span style="font-size:18px;">service=NETBIOS (NS) proto=17 </span>src zone=Trust dst zone=Untrust action=Permit sent=2674 rcvd=766 src=10.100.1.43 dst=119.188.11.3 src_port=4045 dst_port=137 src-xlated ip=218.206.244.2
02 port=15311 dst-xlated ip=119.188.11.3 port=137 session_id=62271 reason=Close - AGE OUT<000>
 2011-09-30 00:00:20     Local0.Notice   10.2.0.254      ns50: NetScreen device_id=0019022004000299  [Root]system-notification-00257(traffic): start_time="2011-09-30 00:01:05" duration=15 poli
cy_id=103 <span style="font-size:18px;">service=VDO Live (tcp) proto=6</span> src zone=Trust dst zone=Untrust action=Permit sent=2645 rcvd=766 src=10.100.1.43 dst=119.188.11.3 src_port=4044 dst_port=7001 src-xlated ip=218.206.244
.202 port=14295 dst-xlated ip=119.188.11.3 port=7001 session_id=59240 reason=Close - AGE OUT<000>
[dsadm@dataStage test]$ 

--解决方法

[dsadm@dataStage test]$ grep -Po 'service=.*(?= proto=)' sedonly.txt 
service=http
service=NETBIOS (NS)
service=VDO Live (tcp)
[dsadm@dataStage test]$ sed -s 's/^.*\(service=.*\) proto=.*$/\1/' sedonly.txt 
service=http
service=NETBIOS (NS)
service=VDO Live (tcp)
[dsadm@dataStage test]$ awk -F 'proto|service' '{print "service"$2}' sedonly.txt 
service=http 
service=NETBIOS (NS) 
service=VDO Live (tcp) 
[dsadm@dataStage test]$ 



转自:http://bbs.chinaunix.net/thread-4132203-1-1.html


一下是我的一个需求

文件类似于下面这样,只取了前面一点

[dsadm@dataStage findjob]$ more alljob.xml 
<?xml version="1.0" encoding="utf-8"?><FindQuerySessionAsyncStateSerialiser xmlns:ibm="http://www.ibm.com/" clientInstallPath_="D:\IBM_IIS\Clients\Classic" generatedDate_="2014年5月29日" gener
atedTime_="11:06:48" serverName_="DATASTAGE" serverVersion_="8.7"><criteria_><caseInsensitive_>1</caseInsensitive_><createdAfter_ /><createdBefore_ /><createdByUser_ /><DependsOnObjects /><des
cription_ /><findWithinLastResultSet_>0</findWithinLastResultSet_><lastModifiedAfter_ /><lastModifiedBefore_ /><lastModifiedByUser_ /><name_>*</name_><nameDescriptionMatchMode_>NameOrDescripti
on</nameDescriptionMatchMode_><repositoryName_>lscrm</repositoryName_><folder_>\</folder_><Types><string>Parallel Jobs</string></Types><WhereUsedObjects /></criteria_><Results><ReposObjectSeri
aliser><className_>CJobDefn</className_><displayName_>CT_ENT_DIST_MAXLNBAL</displayName_><folderPath_>\Jobs\CRM_03_ENT\CRM_0303_ENT_CT\CRM_030303_ENT_CT_DIST</folderPath_><isTopLevel_>1</isTop
Level_><id_>CT_ENT_DIST_MAXLNBAL</id_><platformType_ /><reposID_>c2e76d84.43058877.2174cfdoj.l4f87r0.76hjj8.unm720lidv156as11jdb5</reposID_><reposManagerID_>DATASTAGE:lscrm</reposManagerID_><s
ubType_>3</subType_><typeDefinitionDisplayName_>Parallel Job</typeDefinitionDisplayName_></ReposObjectSerialiser><ReposObjectSerialiser><className_>CJobDefn</className_><displayName_>CopyOfIFS
I_CURTRAN</displayName_><folderPath_>\作业\0001_ODS\00011_ODS_账户信息\00012_ODS_账户交易信息</folderPath_><isTopLevel_>1</isTopLevel_><id_>CopyOfIFSI_CURTRAN</id_><platformType_ /><reposID_>c
2e76d84.43058877.2174ce5cg.e9a93n8.dq7mt3.rilur196dttfpvk1ipaj6</reposID_><reposManagerID_>DATASTAGE:lscrm</reposManagerID_><subType_>3</subType_><typeDefinitionDisplayName_>Parallel Job</type
DefinitionDisplayName_></ReposObjectSerialiser><ReposObjectSerialiser><className_>CJobDefn</className_><displayName_>CopyOfIFSI_DEPTRAN</displayName_><folderPath_>\作业\0001_ODS\00011_ODS_账户
信息\00012_ODS_账户交易信息</folderPath_><isTopLevel_>1</isTopLevel_><id_>CopyOfIFSI_DEPTRAN</id_><platformType_ /><reposID_>c2e76d84.43058877.2174cesld.fcdckp0.c4dm26.ogq04coo9cs4681ed5me0</r
eposID_><reposManagerID_>DATASTAGE:lscrm</reposManagerID_><subType_>3</subType_><typeDefinitionDisplayName_>Parallel Job</typeDefinitionDisplayName_></ReposObjectSerialiser><ReposObjectSeriali
ser><className_>CJobDefn</className_><displayName_>IFSI_CARDTRAN</displayName_><folderPath_>\作业\0001_ODS\00011_ODS_账户信息\00012_ODS_账户交易信息</folderPath_><isTopLevel_>1</isTopLevel_><i
d_>IFSI_CARDTRAN</id_><platformType_ /><reposID_>c2e76d84.43058877.2174b296p.aipqg68.3gs1oe.6id3oi6ifaunehhjd59tl</reposID_><reposManagerID_>DATASTAGE:lscrm</reposManagerID_><subType_>3</subTy
pe_><typeDefinitionDisplayName_>Parallel Job</typeDefinitionDisplayName_></ReposObjectSerialiser><ReposObjectSerialiser><className_>CJobDefn</className_><displayName_>IFSI_CURTRAN</displayName
_><folderPath_>\作业\0001_ODS\00011_ODS_账户信息\00012_ODS_账户交易信息</folderPath_><isTopLevel_>1</isTopLevel_><id_>IFSI_CURTRAN</id_><platformType_ /><reposID_>c2e76d84.43058877.2174b2970.2
r9jn6g.cqvmdf.r4521aevg2eh084hd8pgv</reposID_><reposManagerID_>DATASTAGE:lscrm</reposManagerID_><subType_>3</subType_><typeDefinitionDisplayName_>Parallel Job</typeDefinitionDisplayName_></Rep
osObjectSerialiser><ReposObjectSerialiser><className_>CJobDefn</className_><displayName_>IFSI_DEPTRAN</displayName_><folderPath_>\作业\0001_ODS\00011_ODS_账户信息\00012_ODS_账户交易信息</folde
rPath_><isTopLevel_>1</isTopLevel_><id_>IFSI_DEPTRAN</id_><platformType_ /><reposID_>c2e76d84.43058877.2174b2975.fj9jtmg.e8e747.3j81nbfj2eob0vlonomg5</reposID_><reposManagerID_>DATASTAGE:lscrm
</reposManagerID_><subType_>3</subType_><typeDefinitionDisplayName_>Parallel Job</typeDefinitionDisplayName_></ReposObjectSerialiser><ReposObjectSerialiser><className_>CJobDefn</className_><di
splayName_>IFSI_INTBANKTRAN</displayName_><folderPath_>\作业\0001_ODS\00011_ODS_账户信息\00012_ODS_账户交易信息</folderPath_><isTopLevel_>1</isTopLevel_><id_>IFSI_INTBANKTRAN</id_><platformTyp
e_ /><reposID_>c2e76d84.43058877.2174b2979.5tf23gg.4f527i.niesna07c63s112uhkt15</reposID_><reposManagerID_>DATASTAGE:lscrm</reposManagerID_><subType_>3</subType_><typeDefinitionDisplayName_>Pa
rallel Job</typeDefinitionDisplayName_></ReposObjectSerialiser><ReposObjectSerialiser><className_>CJobDefn</className_><displayName_>IFSI_INTBANKTRAN_PAYFEE</displayName_><folderPath_>\作业\00
01_ODS\00011_ODS_账户信息\00012_ODS_账户交易信息</folderPath_><isTopLevel_>1</isTopLevel_><id_>IFSI_INTBANKTRAN_PAYFEE</id_><platformType_ /><reposID_>c2e76d84.43058877.2174b2979.sng5q9g.67533

我要取<id_>IFSI_INTBANKTRAN_PAYFEE</id_>里面的信息,在文件中大概有两百个

我的处理方法

[dsadm@dataStage findjob]$ sed -s 's/^.*\(<id_>.*<\/id_>\).*$/\1/g' alljob.xml
<id_>REPORT51_score_CONVERGIFT</id_>
[dsadm@dataStage findjob]$

只取到一个

--

[dsadm@dataStage findjob]$ awk -F '<id_>|<\/id_>' '{print $2}' alljob.xml
awk: 警告: 转义序列“\/”被当作单纯的“/”
CT_ENT_DIST_MAXLNBAL
[dsadm@dataStage findjob]$ awk -F '<id_>|</id_>' '{print $2}' alljob.xml
CT_ENT_DIST_MAXLNBAL
[dsadm@dataStage findjob]$

----

[dsadm@dataStage findjob]$ sed -s 's/^.*<id_>\(.*\)<\/id_>.*$/\1/g' alljob.xml
REPORT51_score_CONVERGIFT
[dsadm@dataStage findjob]$

还是值取到一个

why??????????????


--我现在改一下文件的样式成标准XML

[dsadm@dataStage findjob]$ more all.xml 
<?xml version="1.0" encoding="utf-8"?>
<FindQuerySessionAsyncStateSerialiser xmlns:ibm="http://www.ibm.com/" clientInstallPath_="D:\IBM_IIS\Clients\Classic" generatedDate_="2014年5月29日" generatedTime_="11:06:48" serverName_="DATA
STAGE" serverVersion_="8.7">
<criteria_>
<caseInsensitive_>1</caseInsensitive_>
<createdAfter_ />
<createdBefore_ />
<createdByUser_ />
<DependsOnObjects />
<description_ />
<findWithinLastResultSet_>0</findWithinLastResultSet_>
<lastModifiedAfter_ />
<lastModifiedBefore_ />
<lastModifiedByUser_ />
<name_>*</name_>
<nameDescriptionMatchMode_>NameOrDescription</nameDescriptionMatchMode_>
<repositoryName_>lscrm</repositoryName_>
<folder_>\</folder_>
<Types>
<string>Parallel Jobs</string>
</Types>
<WhereUsedObjects />
</criteria_>
<Results>
<ReposObjectSerialiser>
<className_>CJobDefn</className_>
<displayName_>CT_ENT_DIST_MAXLNBAL</displayName_>
<folderPath_>\Jobs\CRM_03_ENT\CRM_0303_ENT_CT\CRM_030303_ENT_CT_DIST</folderPath_>
<isTopLevel_>1</isTopLevel_>
<id_>CT_ENT_DIST_MAXLNBAL</id_>
<platformType_ />
<reposID_>c2e76d84.43058877.2174cfdoj.l4f87r0.76hjj8.unm720lidv156as11jdb5</reposID_>
<reposManagerID_>DATASTAGE:lscrm</reposManagerID_>
<subType_>3</subType_>
<typeDefinitionDisplayName_>Parallel Job</typeDefinitionDisplayName_>
</ReposObjectSerialiser>
<ReposObjectSerialiser>

使用命令

awk -F '<id_>|<\/id_>' '{print $2}' all.xml

每个隔了很多空格,把空格去掉

awk -F '<id_>|</id_>' '{print $2}' all.xml |sed '/^$/d'

OK

--

[dsadm@dataStage findjob]$ sed -n 's/<id_>\(.*\)<\/id_>/\1/p' all.xml |wc -l
290
[dsadm@dataStage findjob]$

注意:

不加-n 和 p的话,每行朝阳输出,匹配的行被替换

只加-n的话,无输出

只有加上-n和p,才打印了我想要的!!


--

grep -Po '<id_>.*<\/id_>' all.xml


打印如下

<id_>score_PLAN_ZB</id_>
<id_>SPECIAL_SHOP</id_>
<id_>REPORT01_score_MSOURCE</id_>
<id_>REPORT02_score_QSOURCE</id_>
<id_>REPORT03_score_YSOURCE</id_>
<id_>REPORT11_score_MCARDORG</id_>
<id_>REPORT12_score_QCARDORG</id_>
<id_>REPORT13_score_YCARDORG</id_>
<id_>REPORT21_score_MCUSTORG</id_>
<id_>REPORT22_score_QCUSTORG</id_>
<id_>REPORT23_score_YCUSTORG</id_>
<id_>REPORT41_score_PART</id_>
<id_>REPORT51_score_CONVERGIFT</id_>

修改如下

[dsadm@dataStage findjob]$ grep -Po '<id_>.*<\/id_>' all.xml |sed 's/<id_>//'|sed 's/</id_>//'
sed:-e 表达式 #1,字符 10:“s”的未知选项
[dsadm@dataStage findjob]$ grep -Po '<id_>.*<\/id_>' all.xml |sed 's/<id_>//'|sed 's/<\/id_>//'
CT_ENT_DIST_MAXLNBAL
CopyOfIFSI_CURTRAN
CopyOfIFSI_DEPTRAN
IFSI_CARDTRAN
IFSI_CURTRAN
IFSI_DEPTRAN







猜你在找的正则表达式相关文章