我试图使用JSoup获取此URL
http://betatruebaonline.com/img/parte/330/CIGUEÑAL.JPG@H_301_3@
即使使用编码,我也有例外.
我不明白为什么编码错了.它回来了@H_301_3@
07001@H_301_3@
而是正确的@H_301_3@
07002@H_301_3@
private static void GetUrl() { try { String url = "http://betatruebaonline.com/img/parte/330/"; String encoded = URLEncoder.encode("CIGUEÑAL.JPG","UTF-8"); Response img = Jsoup .connect(url + encoded) .ignoreContentType(true) .execute(); System.out.println(url); System.out.println("PASSED"); } catch(Exception e) { System.out.println("Error getting url"); System.out.println(e.getMessage()); } }
解决方法
编码没有错,这里的问题是复合unicode&预组合的字符“Ñ”可以用两种方式显示,它们看起来相同但真的不同
precomposed unicode: Ñ -> %C3%91 composite unicode: N and ~ -> N%CC%83
我强调两个都是正确的,这取决于你想要的unicode类型:@H_301_3@
String normalize = Normalizer.normalize("Ñ",Normalizer.Form.NFD); System.out.println(URLEncoder.encode("Ñ","UTF-8")); //%C3%91 System.out.println(URLEncoder.encode(normalize,"UTF-8")); //N%CC%83