(链接到旧问题:Merging two different XML log files (trace and messages) using date and timestamp?)
我需要合并两个XML日志文件(最多700MB).一个日志文件包含具有位置更新的跟踪.另一个日志文件包含收到的消息.可以存在多个接收到的消息而不在其间进行位置更新,反之亦然.
这两个日志都有时间戳,包括毫秒(本例中为123):
>跟踪日志使用< date> (例如,14.7.2012 11:08:07.123)
>消息日志使用unix时间戳< timeStamp> (例如,1342264087123)
还有其他< timeStamp>消息日志中包含的元素,但只有路径messageList / Message / originator / originatorPosition / timeStamp中的元素是相关的.
以下结构略微简化,因为省略了诸如“加速”等附加内容.此附加内容只需与其他消息/项目一起复制.
位置轨迹的结构如下所示:
<itemList> <item> <date>14.7.2012 12:13:05.123</date> <FilteredPosition> <Latitude>51.12235</Latitude> <Longitude>9.347214</Longitude> </FilteredPosition> </item> <item> <date>14.7.2012 12:13:07.456</date> <FilteredPosition> <Latitude>51.12235</Latitude> <Longitude>9.347214</Longitude> </FilteredPosition> </item> </itemList>
消息日志的结构如下:
<messageList> <Message> <messageId>1234</messageId> <originator> <originatorPosition> <nodeId>2345</nodeId> <timeStamp>1342264087061</timeStamp> </originatorPosition> <senderPosition> <nodeId>2345</nodeId> <timeStamp>1342264087234</timeStamp> </senderPosition> <medium></medium> </originator> <MessagePayload> <generationTime> <timeStamp>1342264087</timeStamp> <milliSec>42</milliSec> </generationTime> </MessagePayload> </Message> <Message> <messageId>1234</messageId> <originator> <originatorPosition> <nodeId>2345</nodeId> <timeStamp>1342264088064</timeStamp> </originatorPosition> <senderPosition> <nodeId>2345</nodeId> <timeStamp>1342264088254</timeStamp> </senderPosition> <medium></medium> </originator> <MessagePayload> <generationTime> <timeStamp>1342264088</timeStamp> <milliSec>42</milliSec> </generationTime> </MessagePayload> </Message> </messageList>
在进行合并时,应该读取时间戳(还要转换/比较“date”和“timestamp”,包括格式为“14.7.2012 11:08:07.123”的毫秒)以及以正确顺序添加的所有位置和消息.
位置数据可以按原样添加.但是,邮件应放在< item>内.标签,< date>应添加标签(基于消息’unix time with milliseconds)和< Message>标签应替换为< m:消息类型=“收到”>标签.这些项目放在根< itemList>内,就像位置跟踪一样.
结果可能如下所示:
<itemList> <item> <date>14.7.2012 12:13:05.123</date> <FilteredPosition> <Latitude>51.12235</Latitude> <Longitude>9.347214</Longitude> </FilteredPosition> </item> <item> <date>14.7.2012 12:13:07.061</date> <m:Message type="received"> <messageId>1234</messageId> <originator> <originatorPosition> <nodeId>2345</nodeId> <timeStamp>1342264087061</timeStamp> </originatorPosition> <senderPosition> <nodeId>2345</nodeId> <timeStamp>1342264087234</timeStamp> </senderPosition> <medium></medium> </originator> <MessagePayload> <generationTime> <timeStamp>1342264087</timeStamp> <milliSec>63</milliSec> </generationTime> </MessagePayload> </m:Message> </item> <item> <date>14.7.2012 12:13:07.456</date> <FilteredPosition> <Latitude>51.12235</Latitude> <Longitude>9.347214</Longitude> </FilteredPosition> </item> <item> <date>14.7.2012 12:13:08.064</date> <m:Message type="received"> <messageId>1234</messageId> <originator> <originatorPosition> <nodeId>2345</nodeId> <timeStamp>1342264088064</timeStamp> </originatorPosition> <senderPosition> <nodeId>2345</nodeId> <timeStamp>1342264088254</timeStamp> </senderPosition> <medium></medium> </originator> <MessagePayload> <generationTime> <timeStamp>1342264088</timeStamp> <milliSec>70</milliSec> </generationTime> </MessagePayload> </m:Message> </item> <itemList>
还有一些< item>位置日志文件中不包含时间戳(并且没有“FilteredPosition”)的元素.这些项目可以忽略,不需要复制.
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:m="http://www.example.com/" exclude-result-prefixes="xs" version="2.0"> <xsl:output indent="yes" method="xml"/> <!-- The two source-documents. --> <xsl:variable name="doc1" select="doc('log1.xml')"/> <xsl:variable name="doc2" select="doc('log2.xml')"/> <!-- Timezone adjustment --> <xsl:variable name="timezoneAdjustment" select="1"/> <!-- Root template to start the transformation. --> <xsl:template match="/"> <!-- Transform and collect all the elements --> <xsl:variable name="data" as="node()*"> <xsl:apply-templates select="$doc1/itemList/item"/> <xsl:apply-templates select="$doc2/messageList/Message"/> </xsl:variable> <!-- Sort by the timestamp,and discard the wrapper. --> <itemList> <xsl:for-each select="$data"> <xsl:sort select="@timestamp" data-type="number"/> <xsl:copy-of select="item"/> </xsl:for-each> </itemList> </xsl:template> <!-- Template to transform <item> elements in the first format. It just parses the date,and adds a wrapper with the timestamp. --> <xsl:template match="item[date]"> <xsl:variable name="dateTimeString" select="date" as="xs:string"/> <xsl:variable name="datePart" select="substring-before($dateTimeString,' ')"/> <xsl:variable name="day" select="xs:integer(substring-before($datePart,'.'))"/> <xsl:variable name="month" select="xs:integer(substring-before(substring-after($datePart,'.'),'.'))"/> <xsl:variable name="year" select="xs:integer(substring-after(substring-after($datePart,'.'))"/> <xsl:variable name="timePart" select="substring-after($dateTimeString,' ')"/> <xsl:variable name="reformatted" select="concat(format-number($year,'0000'),'-',format-number($month,'00'),format-number($day,'T',$timePart)"/> <xsl:variable name="timestamp" select="( xs:dateTime($reformatted) - xs:dateTime('1970-01-01T00:00:00') - $timezoneAdjustment * xs:dayTimeDuration('PT1H') ) div xs:dayTimeDuration('PT0.001S')"/> <wrapper timestamp="{$timestamp}"> <xsl:copy-of select="self::*"/> </wrapper> </xsl:template> <!-- Template to transform <Message> elements in the second log format. It generates an item with the date,and wraps it with the timestamp. --> <xsl:template match="Message[originator/originatorPosition/timeStamp]"> <xsl:variable name="timestamp" select="originator/originatorPosition/timeStamp" as="xs:integer"/> <xsl:variable name="date" select="xs:dateTime('1970-01-01T00:00:00') + $timezoneAdjustment * xs:dayTimeDuration('PT1H') + $timestamp * xs:dayTimeDuration('PT0.001S')"/> <wrapper timestamp="{$timestamp}"> <item> <date> <xsl:value-of select="format-dateTime($date,'[D01].[M01].[Y0001] [H01]:[m01]:[s01].[f001]')"/> </date> <m:Message type="recieved"> <xsl:copy-of select="*"/> </m:Message> </item> </wrapper> </xsl:template> </xsl:stylesheet>
编辑:我添加了一个变量用于消息的时区调整.