从25号系统看python,到今天,期间还去天才吧返厂手机和重装系统,其余除了吃喝拉撒都在看Python网络数据采集这本书,以及一些小的demo的测试和实验。
基本看得差不多了,以一个xml转json的demo暂时结束这一阶段的学习。
套用一句话说,不管是爬虫还是其他json的一些应用,大多数并不是Start from scratch,而是有各种各样优秀的库,仿佛确实是站在巨人的肩膀上,只待振臂一呼,就能完成相应的功能。背后的知识,是网页标签,网络原理,数据库等,最重要的是,要寻找规律,细心分析,明白自己想要做什么。
from xml.parsers.expat import ParserCreate import json class Xml2Json: LIST_TAGS = ['COMMANDS'] def __init__(self,data = None): self._parser = ParserCreate() self._parser.StartElementHandler = self.start self._parser.EndElementHandler = self.end self._parser.CharacterDataHandler = self.data self.result = None if data: self.Feed(data) self.close() def Feed(self,data): self._stack = [] self._data = '' self._parser.Parse(data,0) def close(self): self._parser.Parse("",1) del self._parser def start(self,tag,attrs): assert attrs == {} assert self._data.strip() == '' self._stack.append([tag]) self._data = '' def end(self,tag): last_tag = self._stack.pop() assert last_tag[0] == tag if len(last_tag) == 1: #leaf data = self._data else: if tag not in Xml2Json.LIST_TAGS: # build a dict,repeating pairs get pushed into lists data = {} for k,v in last_tag[1:]: if k not in data: data[k] = v else: el = data[k] if type(el) is not list: data[k] = [el,v] else: el.append(v) else: #force into a list data = [{k:v} for k,v in last_tag[1:]] if self._stack: self._stack[-1].append((tag,data)) else: self.result = {tag:data} self._data = '' def data(self,data): self._data = data if __name__ == '__main__': xml = open("city.xml",'r',encoding='UTF-8').read() result = Xml2Json(xml).result; outputfile = open("city.json",'w',encoding='UTF-8') outputfile.write(str(result)) outputfile.close()
city.xml如下
<?xml version="1.0" encoding="utf-8"?> <country> <name>中国</name> <province> <name>黑龙江</name> <cities> <city>哈尔滨</city> <city>大庆</city> </cities> </province> <province> <name>广东</name> <cities> <city>广州</city> <city>深圳</city> <city>珠海</city> </cities> </province> <province> <name>台湾</name> <cities> <city>台北</city> <city>高雄</city> </cities> </province> <province> <name>新疆</name> <cities> <city>乌鲁木齐</city> </cities> </province> </country>python3 xmlToJson之后的输出:
{'country': {'name': '中国','province': [{'name': '黑龙江','cities': {'city': ['哈尔滨','大庆']}},{'name': '广东','cities': {'city': ['广州','深圳','珠海']}},{'name': '台湾','cities': {'city': ['台北','高雄']}},{'name': '新疆','cities': {'city': '乌鲁木齐'}}]}}
参考链接:
http://www.jianshu.com/p/f21fb92a2b66
http://www.cnblogs.com/gooseeker/p/5603530.html