xmllint命令处理xml与html的例子(js Command-line JSON)

前端之家收集整理的这篇文章主要介绍了xmllint命令处理xml与html的例子(js Command-line JSON)前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。

例子

curl http://www.111cn.net /ip/?q=8.8.8.8 2>/dev/null | xmllint --html --xpath "//ul[@id='csstb']" - 2>/dev/null | sed -e 's/<[^>]*>//g'
上例中主要是通过在123cha上查询的IP地址的归属情况后,通过提取结果(ul#csstb),只获取文本部分的内容。上面的脚本语句执行后的结果如下:


[您的查询]:8.8.8.8
本站主数据:
美国
本站辅数据:GooglePublic DNS提供:hypo
美国 Google免费的Google Public DNS提供:zwstar参考数据一:美国
参考数据二:美国
下面再结合示例看下其他主要参数的用法

1、 --format

此参数用于格式化xml,使其具有良好的可读性。
假设有xml(person.xml)内容如下:


<person><name>ball</name><age>30</age<sex>male</sex></person>
执行如下操作后其输出为更易读的xml格式:


#xmllint --format person.xml
<?xml version="1.0"?>
<person>
<name>ball</name>
<age>30</age>
<sex>male</sex>
</person>

2、 --noblanks

与--format相反,有时为了节省传输量,我们希望去掉xml中的空白,这时我们可以使用--noblanks命令。
假设xml(person.xml)内容如下


<?xml version="1.0"?>
<person>
<name>ball</name>
<age>30</age>
<sex>male</sex>
</person>
执行该参数操作后,其输出结果为:


#xmllint --noblanks person.xml
<?xml version="1.0"?>
<person><name>ball</name><age>30</age><sex>male</sex></person>
3、--schema

使用scheam验证xml文件的正确性(XML Schema 是基于 XML 的 DTD 替代者)
假设有xml文件(person.xml)和scheam文件(person.xsd)文件内容分别如下

person.xml


<?xml version="1.0"?>
<person>
<name>ball</name>
<age>30</age>
<sex>male</sex>
</person>
person.xsd


<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="name" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
<xs:element name="sex">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="male"/>
<xs:enumeration value="female"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="person">
<xs:complexType>
<xs:all>
<xs:element ref="name"/>
<xs:element ref="age"/>
<xs:element ref="sex"/>
</xs:all>
</xs:complexType>
</xs:element>
</xs:schema>
按如下命令执行后的结果是:


#xmllint --schema person.xsd person.xml
<?xml version="1.0"?>
<person>
<name>ball</name>
<age>30</age>
<sex>male</sex>
</person>
person.xml validates
注:默认情况下,验证后会输出验证的文件内容,可以使用 --noout选项去掉此输出,这样我们可以只得到最后的验证结果。


#xmllint --noout --schema person.xsd person.xml
person.xml validates
下面我们改动person.xml,使这份文件age字段和sex都是不符合xsd定义的。


#xmllint --noout --schema person.xsd person.xml
person.xml:4: element age: Schemas validity error : Element 'age': 'not age' is not a valid value of the atomic type 'xs:integer'.
person.xml:5: element sex: Schemas validity error : Element 'sex': [facet 'enumeration'] The value 'test' is not an element of the set {'male','female'}.
person.xml:5: element sex: Schemas validity error : Element 'sex': 'test' is not a valid value of the local atomic type.
person.xml fails to validate
可以看到xmllint成功的报出了错误

4、 关于--schema的输出

在讲输出之前先看下面一个场景,假如你想通过PHP执行xmllint然后拿到返回结果,你的代码通常应该是这个样子valid.PHP


<?PHP
$command = "xmllint --noout --schema person.xsd person.xml";
exec($command,$output,$retval);
//出错时返回值不为0
if ($retval != 0){
var_dump($output);
}
else{
echo "yeah!";
}
我们保持上文中person.xml的错误
执行此代码,你会发现,你拿到的output不是错误,而是array(0) {},amazing!
为什么会这样呢?

因为xmllint --schema,如果验证出错误错误信息并不是通过标准输出(stdout)显示的,而是通过标准错误(stderr)进行显示的。
而exec的output参数拿到的,只能是标准输出(stdout)显示内容
所以,为了拿到出错信息,我们需要将标准错误重定向到标准输出,对应修改代码


$command = "xmllint --noout --schema person.xsd person.xml 2>$1";
再次执行valid.PHP错误信息顺利拿到!

首先建立一份 xml 文档,命名为 po.xml,其内容如下:

<?xml version="1.0"?>
<purchaSEOrder orderDate="1999-10-20">
<shipTo country="US">
<name>Alice Smith</name>
<street>123 Maple Street</street>
<city>Mill Valley</city>
<state>CA</state>
<zip>90952</zip>
</shipTo>
<billTo country="US">
<name>Robert Smith</name>
<street>8 Oak Avenue</street>
<city>Old Town</city>
<state>PA</state>
<zip>95819</zip>
</billTo>
<comment>Hurry,my lawn is going wild!</comment>
<items>
<item partNum="872-AA">
<productName>Lawnmower</productName>
<quantity>1</quantity>
<USPrice>148.95</USPrice>
<comment>Confirm this is electric</comment>
</item>
<item partNum="926-AA">
<productName>Baby Monitor</productName>
<quantity>1</quantity>
<USPrice>39.98</USPrice>
<shipDate>1999-05-21</shipDate>
</item>
</items>
</purchaSEOrder>然后为 po.xml 写的 schema 文件,取名为 po.xsd,内容如下:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:annotation>
<xsd:documentation xml:lang="en">
Purchase order schema for Example.com.
Copyright 2000 Example.com. All rights reserved.
</xsd:documentation>
</xsd:annotation>
<xsd:element name="purchaSEOrder" type="PurchaSEOrderType"/>
<xsd:element name="comment" type="xsd:string"/>
<xsd:complexType name="PurchaSEOrderType">
<xsd:sequence>
<xsd:element name="shipTo" type="USAddress"/>
<xsd:element name="billTo" type="USAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
</xsd:sequence>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
<xsd:complexType name="USAddress">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="street" type="xsd:string"/>
<xsd:element name="city" type="xsd:string"/>
<xsd:element name="state" type="xsd:string"/>
<xsd:element name="zip" type="xsd:decimal"/>
</xsd:sequence>
<xsd:attribute name="country" type="xsd:NMTOKEN"
fixed="US"/>www.111cn.net
</xsd:complexType>
<xsd:complexType name="Items">
<xsd:sequence>
<xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="productName" type="xsd:string"/>
<xsd:element name="quantity">
<xsd:simpleType>
<xsd:restriction base="xsd:positiveInteger">
<xsd:maxExclusive value="100"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="USPrice" type="xsd:decimal"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="partNum" type="SKU" use="required"/>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
<!-- Stock Keeping Unit,a code for identifying products -->
<xsd:simpleType name="SKU">
<xsd:restriction base="xsd:string">
<xsd:pattern value="d{3}-[A-Z]{2}"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>使用 xmllint 对 po.xml 文件进行校验:

$ xmllint -schema po.xsd po.xml如果无出错信息,就说明校验通过了。

The xmllint Shell

PeterLavin

2013-06-15

XML files are human-readable,text files so it is easy to search them from the command line using grep or from within a text editor. But if you want to do something a little more sophisticated-count the number of elements,for example-you'll need to take a different approach. You could write a transformation style sheet to extract such information but this would be overkill. It is much easier to usexmllintfrom the command line to find out this kind of information.

This command is available on Mac OS X and Linux. It is installed by default on Mac OS X and,on Linux,if it isn't already installed,you can quickly do so by installing thelibxml2package.

1. xmllint Options@H_812_301@

One of the primary uses for thexmllintcommand is to validate that an XML file is well formed and that it conforms to a specific DTD or schema; this is done by using the--validoption. If your XML file contains other XIncluded files you can also use xmllint in the following way to resolve included files and output the result to a file:

shell> xmllint --xinclude manual.xml output tmpxml@H_403_330@ 
 

The output filetmp.xmlwill include the contents of anyxi:includeelements. Also,the--formatoption is very useful for quickly formatting files from the command line. However,the most interesting option is the--shelloption.

For a complete list of all the options available view thexmllintman page.

2. The xmllint Shell@H_812_301@

Use xmllint with the--shelloption in the following way:

shell xmlfile_name@H_403_330@ 
 

You can use other options with the--shelloption. For example,if you wish to resolve included files,use the--xincludeoption as well.

You can display the list of the commands available from the shell by typinghelp. You should see output similar to the following:

  base         display XML base of the node
  setbase URI  change the XML base of the node
  bye          leave shell
  cat [node]   display node or current node
  cd [path]    change directory to path or to root
  dir [path]   dumps informations about the node 
               (namespace,attributes,content)
  du [path]    show the structure of the subtree under 
               path or the current node
  exit         leave shell
  help         display this help
  free         display memory usage
  load [name]  load a new document with name
  ls [path]    list contents of path or the current directory
  set xml_fragment replace the current node content with the 
               fragment parsed in context
  xpath expr   evaluate the XPath expression in that context 
               and print the result
  setns nsreg  register a namespace to a prefix in the 
               XPath evaluation context
               format for nsreg is: prefix=[nsuri] 
               (i.e. prefix= unsets a prefix)
  setrootns    register all namespace found on the 
               root element the default namespace 
               if any uses 'defaultns' prefix
  pwd          display current working directory
  quit         leave shell
  save [name]  save this document to name or the original name
  write [name] write the current node to the filename
  validate     check the document for errors
  relaxng rng  validate the document against the Relax-NG schemas
  grep string  search for a string in the subtree
@H_403_330@ 
 

There are a number of relatively trivial but necessary commands such ashelpandexit. All the commands are useful but this article deals primarily with the following commands:

@H_233_403@ @H_405_404@

catnode- output all nodes below the current node

@H_405_404@

cdpath- change to another node; you can only use this command with unique nodes.

@H_405_404@

dir- dump information about the current node

@H_405_404@

xpathexpression- evaluate and print the XPath expression

@H_405_404@

setns- register a namespace

@H_405_404@

writefilename- write the current node to file

If you want to write your complete shell session to file run the shell after first issuing thescriptcommand. This can be particularly useful on Mac OS X where thewritecommand does not work.

3. Using Shell Commands@H_812_301@

When you first open the xmllint shell the cursor,/ >,indicates that you are at the root node. You will likely want to navigate to specific nodes and view the file contents below that node. You can do this with thecdandcatcommands.

/  cd optionsoption[@name= 'address_metrics_lifetime']
option >@H_403_330@ 
  
 

On success the cursor changes to the name of the current node. To view the current node,use thecatcommand-this displays output to the screen. To create a text file of the output of cat,usewritefile_name.xml.

You can only usecdto navigate to unique nodes. Attempt to navigate to a non-unique node and you will see output such as the following:

option
option is a 353 NodeSet@H_403_330@ 
 

If there is no unique identifier for the node that you wish to navigate to,you can use a subscript in the following way:

1>@H_403_330@ 
 

To output information about the current node use thedircommand:

option  dir 
ELEMENT option
  ATTRIBUTE name
    TEXT
      contentaddress_metrics_cleanse_interval
  ATTRIBUTE type
    TEXT
      contentsending
option  @H_403_330@ 
4. Working with Multiple Files@H_812_301@

You can open the xmllint shell specifying multiple files but the behavIoUr is not intuitive. In the following example,the shell is opened with two different files that have the same structure. Theoptions.xmlhas a root element<options>with 353<option>s while thesmpp_options.xmlhas a root element<options>containing only 57<option>s.

shell optionsxml smpp_optionsxml
 base
options xpath count(//)
Object is a number : 
 bye
 base
smpp_options57 setbase optionsxml@H_403_330@ 
 

If you invokehelpfrom the shell thebyecommand is tersely described asleave shell. As this sequence of commands shows,monospace; font-size:16px; font-weight:bold">byealso exits the first file passed to the--shelloption.

Once you have exited the first shell,you cannot return to it by usingsetbaseeven though the command seems to have performed it's function-as the output ofbaseerroneously indicates. For this reason it is perhaps less confusing to open the shell specifying only one file and then use the load command to switch to a different file:

 load smpp_options57@H_403_330@ 
 

The second count indicates that the load command executed successfully.

5. Using Namespaces@H_812_301@

To this point none of the examples use namespaces. To use an XML file with namespaces you must use thesetnscommand. Use it in the following way:

xinclude shell manualxml 
 setns xhttp://docbookorgnsdocbook
 dir
DOCUMENT
version=1.0
URLmanualxml
standalonetrue
namespace xml hrefwwww3XML/1998namespace
xbookchapter@xmlid='apis'
chapter  dir
ELEMENT chapter
  ATTRIBUTE id
    TEXT
      contentapis@H_403_330@ 
 

Thedircommand shown above confirms that you have navigated to the specified node. From that node you can executexpathcommands using absolute or relative paths.

chapter (/]/section15refentry135'structs'18140(135@H_403_330@ 
 

There are 15 sections in theapischapter and these 15 sections have 135 refentries. Note the difference in output between the paths//x:section/x:refentryandx:section/x:refentry. The difference in output shows that only the latter is relative to the current node.

When your XML file uses IDs,an easier way to navigate is to use theidfunction:

 chapter  xpath id( is a Set contains 1 nodes
  ELEMENT chapter
    ATTRIBUTE id
      TEXT
        contentapis
 cd id)/135@H_403_330@ 
 
@H_403_330@$ dt=$(xmllint --shell file <<< "cat //IntrBkSttlmDt/text()" | grep -v "^/ >")$ echo $dt1967-0813@H_403_330@<root> <FIToFICstmrDrctDbt><GrpHdr><MsgId>A</MsgId><CreDtTm>2001-12-17T09:30:47</CreDtTm><NbOfTxs>0</NbOfTxs><TtlIntrBkSttlmAmt Ccy="EUR">0.0</TtlIntrBkSttlmAmt><IntrBkSttlmDt>1967-08-13</IntrBkSttlmDt><SttlmInf><SttlmMtd>CLRG</SttlmMtd><ClrSys><Prtry>xx</Prtry></ClrSys></SttlmInf><InstgAgt><FinInstnId><BIC>AAAAAAAAAAA</BIC></FinInstnId></InstgAgt></GrpHdr></FIToFICstmrDrctDbt></root>@H_403_330@ 

猜你在找的XML相关文章