从xml读写中文数据时乱码问题

最近，关于小项目中的一个编码问题折腾了好久，今天终于得到解决了，终于找到问题的根源了！

主要是后端从xml中读取中文数据时乱码，基于dom向xml读取数据时流的转换出问题。原始程序如下：

public static JSONArray readXMLFile(String filepath) throws ParserConfigurationException,SAXException,IOException {
		
		DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
		DocumentBuilder db = dbf.newDocumentBuilder();
		<span style="color:#ff0000;">Document doc = db.parse(new InputSource(new FileReader(filepath)));</span>
		NodeList list = doc.getElementsByTagName("order");
		
		JSONArray jsonArray = null;
		List<DataInfo> dataList = new ArrayList<DataInfo>();
		for (int i = 0; i < list.getLength(); i++) {
			DataInfo data = new DataInfo();
			data.setRoom(doc.getElementsByTagName("room").item(i).getFirstChild().getNodeValue());
			data.setDate(doc.getElementsByTagName("date").item(i).getFirstChild().getNodeValue());
			data.setTime(doc.getElementsByTagName("time").item(i).getFirstChild().getNodeValue());
			data.setName(doc.getElementsByTagName("name").item(i).getFirstChild().getNodeValue());
			data.setPerson(doc.getElementsByTagName("person").item(i).getFirstChild().getNodeValue());
			dataList.add(data);
		}
		
		br.close();
		
		jsonArray = JSONArray.fromObject(dataList);
		return jsonArray;
	}

经过多次验证后，红色地方为出错的地方

一直通过org.xml.sax API中的inputSource对象读取xml中的字符流，并交给SAX解析器进行解析：

Document doc = db.parse(newInputSource(new FileReader(filepath)));

问题的关键就是这儿，xml中存储数据时通过utf-8编码进行字节流存储，而读取的时候却是通过字符流来读取，肯定会乱码。又查看了inputSource API的说明，如下：

SAX 解析器将使用InputSource 对象来确定如何读取 XML 输入。如果有字符流可用，则解析器将直接读取该流，而忽略该流中找到的任何文本编码声明。如果没有字符流，但却有字节流，则解析器将使用该字节流，从而使用在 InputSource 中指定的编码，或者另外（如果未指定编码）通过使用某种诸如 XML 规范中的算法算法自动探测字符编码。如果既没有字符流，又没有字节流可用，则解析器将尝试打开到由系统标识符标识的资源的 URI 连接。

可以看出，SAX 解析器使用InputSource 对象读取 XML时：

1、可以直接读取字符流，与存储字符流的文件的编码无关，所以，即使xml中有编码申明encoding=”UTF-8”，也失效，还是会出现中文乱码

2、可以直接读取字节流，需要指定编码方式，否则会中文乱码。但是InputSource 对象没有提供相关的字节流的编码，所以，需要通过InputStreamReader对象将字节流转化为字符流，并指定编码方式，转化为字符流来处理。

3、如果字节流和字符流都没指定，则可以通过系统标识符标识的资源来读取

所以，修改后为

		BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(filepath),"UTF-8"));
		Document doc = db.parse(new InputSource(br));

这样，读出来的中文数据不会乱码

同理，向xml写入中文数据时，也需要进行字节流向字符流转换的处理

public static void writeXMLFile(JSONObject jsonObject,String filepath) throws ParserConfigurationException,FileNotFoundException,IOException {
		
		DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
		DocumentBuilder db = dbf.newDocumentBuilder();
//		Document document = db.parse(new InputSource(new FileReader(filepath)));
		BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(filepath),"UTF-8"));
		Document document = db.parse(new InputSource(br));
		
		Element root = document.getDocumentElement();
		Element subroot = document.createElement("order");
		root.appendChild(subroot);
		
		@SuppressWarnings("rawtypes")
		Iterator it = jsonObject.keys();
		while (it.hasNext()) {
			String key = (String) it.next();
			String value = (String) jsonObject.get(key);
			Element element = document.createElement(key);
			Text text = document.createTextNode(value);
			element.appendChild(text);
			subroot.appendChild(element);
//			System.out.println(key+","+value);
		}
		
		TransformerFactory tff = TransformerFactory.newInstance(); 
        Transformer tf = null; 
        try { 
            tf = tff.newTransformer(); 
        } catch (TransformerConfigurationException e) { 
            e.printStackTrace(); 
        } 
        tf.setOutputProperty(OutputKeys.INDENT,"yes");//格式化XML，自动缩进
        DOMSource ds = new DOMSource(document); 
        StreamResult sr = new StreamResult(new File(filepath));
       
        try { 
            tf.transform(ds,sr); 
        } catch (TransformerException e) { 
            e.printStackTrace(); 
        }

从xml读写中文数据时乱码问题

猜你在找的XML相关文章