我是android的新手,在我的应用程序中我必须解析数据,我需要在屏幕上显示.但是在一个特定的标签数据中,我无法解析为什么因为某些特殊字符也会进入该标签.在下面我显示我的代码.
我的解析器功能:
protected ArrayListRSS/gamestar.RSS").openConnection().getInputStream());
Element root = document.getDocumentElement();
NodeList docItems = root.getElementsByTagName("item");
Node nodeItem;
for(int i = 0;i
输入:
RSS xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
输出:
lllegal settlements ----> title tag text
India was joined by all members of the 15-nation UN Security Council except the US to condemn Israel announcement of new construction activity in Palestinian territories and demand immediate dismantling of the illegal settlements. -----> description tag text
UN Secretary General Ban Ki-moon also expressed his deep concern by the heightened settlement activity in West Bank,saying the move by Israel gravely threatens efforts to establish a viable Palestinian state. ----> description tag text.
最佳答案
您的文本节点包含转义的HTML实体(& gt;大于,大于)和垃圾字符(“非常”).您应该首先根据您的输入源调整编码,然后您可以使用Apache Commons Lang
StringUtils.escapeHtml4(String)
取消HTML.
此方法(希望)返回一个XML,您可以查询(例如使用XPath)以提取所需的文本节点,或者您可以将整个字符串提供给JSOUP或the Android Html
class
// JSOUP,"html" is the unescaped string. Returns a string
Jsoup.parse(html).text();
// Android
android.text.Html.fromHtml(instruction).toString()
测试程序(需要JSOUP和Commons-Lang)
package stackoverflow;
import org.apache.commons.lang3.StringEscapeUtils;
import org.jsoup.Jsoup;
import org.jsoup.safety.Whitelist;
public class EmbeddedHTML {
public static void main(String[] args) {
String src = "