我正在尝试从
http://api.freebase.com/api/trans/raw/m/0h47检索数据
正如你在文本中可以看到的,有这样的歌:/ældʒɪəriə/.
当我尝试从页面获取来源时,我会得到像##250这样的音乐文字;等等
到目前为止,我尝试过以下代码:
urlConnection.setRequestProperty("Accept-Charset","UTF-8"); urlConnection.setRequestProperty("Content-Type","application/x-www-form-urlencoded;charset=utf-8");
我究竟做错了什么?
我的整个代码:
URL url = null; URLConnection urlConn = null; DataInputStream input = null; try { url = new URL("http://api.freebase.com/api/trans/raw/m/0h47"); } catch (MalformedURLException e) {e.printStackTrace();} try { urlConn = url.openConnection(); } catch (IOException e) { e.printStackTrace(); } urlConn.setRequestProperty("Accept-Charset","UTF-8"); urlConn.setRequestProperty("Content-Type","text/plain; charset=utf-8"); urlConn.setDoInput(true); urlConn.setUseCaches(false); StringBuffer strBseznam = new StringBuffer(); if (strBseznam.length() > 0) strBseznam.deleteCharAt(strBseznam.length() - 1); try { input = new DataInputStream(urlConn.getInputStream()); } catch (IOException e) { e.printStackTrace(); } String str = ""; StringBuffer strB = new StringBuffer(); strB.setLength(0); try { while (null != ((str = input.readLine()))) { strB.append(str); } input.close(); } catch (IOException e) { e.printStackTrace(); }
解决方法
HTML页面是UTF-8,可以使用阿拉伯字符等.但Unicode 127以上的字符仍然被编码为数字实体,如#250;接受编码不会,帮助和加载,因为UTF-8是完全正确的.
你必须自己解码实体.就像是:
String decodeNumericEntities(String s) { StringBuffer sb = new StringBuffer(); Matcher m = Pattern.compile("\\&#(\\d+);").matcher(s); while (m.find()) { int uc = Integer.parseInt(m.group(1)); m.appendReplacement(sb,""); sb.appendCodepoint(uc); } m.appendTail(sb); return sb.toString(); }
通过这些实体可以源自处理的HTML表单,所以在网络应用程序的编辑方面.
后面的代码:
我已经用(缓冲)阅读器替换了DataInputStream文本. InputStreams读取二进制数据,字节;读者文字,字符串. InputStreamReader具有InputStream和编码的参数,并返回一个Reader.
try { BufferedReader input = new BufferedReader( new InputStreamReader(urlConn.getInputStream(),"UTF-8")); StringBuilder strB = new StringBuilder(); String str; while (null != (str = input.readLine())) { strB.append(str).append("\r\n"); } input.close(); } catch (IOException e) { e.printStackTrace(); }