在几年前,我在网上发现了一些Perl,它在单行时整齐地格式化了有效的
XML(标签和换行符).代码如下.
它使用XML :: Twig来做到这一点.它创建了没有keep_encoding的XML :: Twig对象($twig = XML :: Twig-> new())但如果我给它一个带有非ASCII字符的UTF-8编码的XML文件,它会生成一个文件根据Ubuntu上的isutf8命令,它不是有效的UTF-8.在xxd中打开文件,我可以看到字符从2字节变为1.
如果我使用我的$twig = XML :: Twig-> new(keep_encoding => 1);相同的输入产生有效的UTF-8并保留两个字节.
根据Perldoc的keep_encoding
This is a (slightly?) evil option: if the XML document is not UTF-8
encoded and you want to keep it that way,then setting keep_encoding
will use theExpat original_string method for character,thus keeping
the original encoding,as well as the original entities in the
strings.
为什么在没有该选项的情况下生成非UTF-8文档,为什么设置它会导致保留UTF-8-ness?
顺便说一下,非ASCII字符是一个不间断的空格(c2 a0).
use strict; use warnings; use XML::Twig; my $sXML = join "",(<>); my $params = [qw(none nsgmls nice indented record record_c)]; my $sPrettyFormat = $params->[3] || 'none'; my $twig = XML::Twig->new(); $twig->set_indent(" "x4); $twig->parse( $sXML ); $twig->set_pretty_print( $sPrettyFormat ); $sXML = $twig->sprint; print $xXML;