我们有一个场景,我们需要在小块中拆分大小超过10GB的大型xml文件.每个块应包含100或200个元素.示例xml
<Employees> <Employee id="1"> <age>29</age> <name>Pankaj</name> <gender>Male</gender> <role>Java Developer</role> </Employee> <Employee id="3"> <age>35</age> <name>Lisa</name> <gender>Female</gender> <role>CEO</role> </Employee> <Employee id="3"> <age>40</age> <name>Tom</name> <gender>Male</gender> <role>Manager</role> </Employee> <Employee id="3"> <age>25</age> <name>Meghna</name> <gender>Female</gender> <role>Manager</role> </Employee> <Employee id="3"> <age>29</age> <name>Pankaj</name> <gender>Male</gender> <role>Java Developer</role> </Employee> <Employee id="3"> <age>35</age> <name>Lisa</name> <gender>Female</gender> <role>CEO</role> </Employee> <Employee id="3"> <age>40</age> <name>Tom</name> <gender>Male</gender> <role>Manager</role> </Employee> </Employees>
我有Stax解析器代码,它将文件分成小块.但是每个文件只包含一个完整的Employee元素,我需要100或200或更多< Employee>单个文件中的元素.这是我的java代码
public static void main(String[] s) throws Exception{ String prefix = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"+"\n"; String suffix = "\n</Employees>\n"; int count=0; try { int i=0; XMLInputFactory xif = XMLInputFactory.newInstance(); XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("D:\\Desktop\\Test\\latestxml\\test.xml")); xsr.nextTag(); // Advance to statements element TransformerFactory tf = TransformerFactory.newInstance(); Transformer t = tf.newTransformer(); while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) { File file = new File("C:\\Users\\test\\Desktop\\xml\\"+"out" +i+ ".xml"); FileOutputStream fos=new FileOutputStream(file,true); t.transform(new StAXSource(xsr),new StreamResult(fos)); i++; } } catch (Exception e) { e.printStackTrace(); }
解决方法
不要在每次迭代时使用i,当迭代次数达到100或200时,应该使用最新计数进行更新
喜欢:
String outputPath = "/test/path/foo.txt"; while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) { FileOutputStream file = new FileOutputStream(outputPath,true); ... ... count ++; if(count == 100){ i++; outputPath = "/test/path/foo"+i+"txt"; count = 0; } }