我有一个从
Citibank下载的
OFX文件,这个文件有一个在
http://www.ofx.net/DownloadPage/Files/ofx102spec.zip定义的DTD(文件OFXBANK.DTD),OFX文件似乎是
SGML有效.
我正在尝试使用PHP 5.4.13的 DomDocument,但是我收到了几个警告并且文件未被解析.我的代码是:
我正在尝试使用PHP 5.4.13的 DomDocument,但是我收到了几个警告并且文件未被解析.我的代码是:
$file = "source/ACCT_013.OFX"; $dtd = "source/ofx102spec/OFXBANK.DTD"; $doc = new DomDocument(); $doc->loadHTMLFile($file); $doc->schemaValidate($dtd); $dom->validateOnParse = true;
OFX文件以:
OFXHEADER:100 DATA:OFXSGML VERSION:102 SECURITY:NONE ENCODING:USASCII CHARSET:1252 COMPRESSION:NONE OLDFILEUID:NONE NEWFILEUID:NONE <OFX> <SIGNONMSGSRSV1> <SONRS> <STATUS> <CODE>0 <SEVERITY>INFO </STATUS> <DTSERVER>20130331073401 <LANGUAGE>SPA </SONRS> </SIGNONMSGSRSV1> <BANKMSGSRSV1> <STMTTRNRS> <TRNUID>0 <STATUS> <CODE>0 <SEVERITY>INFO </STATUS> <STMTRS> <CURDEF>COP <BANKACCTFROM> ...
我打开安装和使用Server(Centos)中的任何程序来从PHP调用.
PD:这个课程http://www.phpclasses.org/package/5778-PHP-Parse-and-extract-financial-records-from-OFX-files.html对我不起作用.
首先,即使XML是SGML的子集,有效的SGML文件也不能是格式良好的XML文件. XML更严格,不使用SGML提供的所有功能.
由于DOMDocument是基于XML(而不是SGML),因此它并不真正兼容.
在该问题旁边,请参阅Ofexfin1.doc中的2.2 Open Financial Exchange Headers.它为您解释了这一点
The contents of an Open Financial Exchange file consist of a simple set of headers followed by contents defined by that header
进一步说:
A blank line follows the last header. Then (for type OFXSGML),the SGML-readable data begins with the <OFX> tag.
因此,找到第一个空行并剥离每个空格直到那里.然后通过首先将SGML转换为XML将SGML部分加载到DOMDocument中:
$source = fopen('file.ofx','r'); if (!$source) { throw new Exception('Unable to open OFX file.'); } // skip headers of OFX file $headers = array(); $charsets = array( 1252 => 'WINDOWS-1251',); while(!feof($source)) { $line = trim(fgets($source)); if ($line === '') { break; } list($header,$value) = explode(':',$line,2); $headers[$header] = $value; } $buffer = ''; // dead-cheap SGML to XML conversion // see as well http://www.hanselman.com/blog/PostprocessingAutoClosedSGMLTagsWithTheSGMLReader.aspx while(!feof($source)) { $line = trim(fgets($source)); if ($line === '') continue; $line = iconv($charsets[$headers['CHARSET']],'UTF-8',$line); if (substr($line,-1,1) !== '>') { list($tag) = explode('>',2); $line .= '</' . substr($tag,1) . '>'; } $buffer .= $line ."\n"; } // use DOMDocument with non-standard recover mode $doc = new DOMDocument(); $doc->recover = true; $doc->preserveWhiteSpace = false; $doc->formatOutput = true; $save = libxml_use_internal_errors(true); $doc->loadXML($buffer); libxml_use_internal_errors($save); echo $doc->saveXML();
然后,此代码示例输出以下(重新格式化的)XML,该XML还显示DOMDocument正确加载数据:
<?xml version="1.0"?> <OFX> <SIGNONMSGSRSV1> <SONRS> <STATUS> <CODE>0</CODE> <SEVERITY>INFO</SEVERITY> </STATUS> <DTSERVER>20130331073401</DTSERVER> <LANGUAGE>SPA</LANGUAGE> </SONRS> </SIGNONMSGSRSV1> <BANKMSGSRSV1> <STMTTRNRS> <TRNUID>0</TRNUID> <STATUS> <CODE>0</CODE> <SEVERITY>INFO</SEVERITY> </STATUS> <STMTRS><CURDEF>COP</CURDEF><BANKACCTFROM> ...</BANKACCTFROM> </STMTRS> </STMTTRNRS> </BANKMSGSRSV1> </OFX>
我不知道这是否可以针对DTD进行验证.也许这有效.此外,如果SGML没有使用同一行上的标记值(并且每行只需要一个元素),那么这个脆弱的转换将会中断.