我有以下输入
XML
<Type> <Source> <TimeStamp>2016-02-19T12:27:06.387Z</TimeStamp> <IPAddress IPVersion="IPv4">x.xx.xxx.xxx</IPAddress> <Port>64435</Port> <DNS_Name>x.xx.xxx.xxx.range9-27.abc.com</DNS_Name> </Source> </Type>
REGISTER piggybank-0.15.0.jar DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath(); A = LOAD 'test.xml' using org.apache.pig.piggybank.storage.XMLLoader('Type') as (x:chararray); B = FOREACH A GENERATE XPath(x,'Source/TimeStamp'),XPath(x,'Source/IPAddress'),'Source/IPAddress/@IPVersion'),'Source/Port'),'Source/DNS_Name');
当我转储B时,我得到以下输出,其中缺少IPVersion的值.
(2016-02-19T12:27:06.387Z,x.xx.xxx.xxx,64435,x.xx.xxx.xxx.range9-27.abc.com)
有谁可以帮我解决这个问题?
解决方法
piggybank的
XPath类中有2个错误:
> ignoreNamespace逻辑中断了对XML属性的搜索
https://issues.apache.org/jira/browse/PIG-4751
> ignoreNamepace参数默认为true且无法覆盖
https://issues.apache.org/jira/browse/PIG-4752