我有一个CSV file,它是从数据库下载的(因为它是CSV格式),现在我必须解析为JSON Schema..不要担心此链接,只是github gist
我面临的问题是其多行标题检查CSV File Here
>在CSV的第一行上具有第一行标题,然后在下一行具有
这些标头的所有值.
>在CSV文件的第三行上,它具有标题的第二行,然后是下一行
具有这些标头的所有值.
>在CSV文件的第5行,它具有标题的第3行,然后是下一行
具有这些标头的所有值.
您也可以在这里注意到模式,
>第一行标题没有任何标签
>第二行标题只有一个标签
>页眉的第三行有两个选项卡
这适用于所有记录.
现在的第一个问题是标题的多行.
第二个问题是如何像我一样将其解析为嵌套的json.
我尝试过Create nested JSON from CSV的解决方案之一,并注意到我的csv第一个问题.
我的样子是这样我只在尝试解析架构的初始字段的地方
import csv
import json
def csvParse(csvfile):
# Open the CSV
f = open(csvfile,'r')
# Change each fieldname to the appropriate field name.
reader = csv.DictReader(f,fieldnames=("Order Ref","Order
Status","Affiliate","Source","Agent","Customer Name","Email
Address","Telephone","Mobile","Address 1","Address 2","City","County/State","Postal Code","Country","Voucher Code"," Voucher Amount","Order Date","Item ID","Type","Supplier Code","Supplier Name","Booking Ref","Supplier Price","Currency","Selling Price","Depart","Arrive","Origin","Destination","Carrier","Flight No","Class","Pax Type","Title","Firstname","Surname","DOB","Gender","FOID Type"))
customer = []
data = []
# data frame names in a list
for row in reader:
frame = {"orderRef": row["Order Ref"],"orderStatus": row["Order Status"],"affiliate": row["Affiliate"],"source": row["Source"],"customers": []}
data.append(frame)
最佳答案
这不是csv解析器友好的,因为该文件包含多个csv,并且至少一个包含2个具有相同名称的列,这会阻止使用DictReader.
我将首先构建一个包装程序,该包装程序可以解析每个csv片段,并在到达另一个片段时引发一次停止迭代.我将使用re模块来查找不同的标头.
以下是包装程序的代码:
class multi_csv:
levels = [re.compile('Order Ref,Order Status,Affiliate,Source,Agent,'
'.*,Country,Voucher Code,Voucher Amount,Order Date'),re.compile('\s*,Item ID,Type,Supplier Code,Supplier Name,'
'.*,Arrive,Origin,Destination,Carrier,Flight No,Class,'),\s*,Pax Type,Title,Firstname,Surname,DOB,Gender,'
'FOID Type,*')
]
def __init__(self,fd):
self.fd = fd
self.level = 0
self.end = False
def __iter__(self):
return self
def __next__(self):
try:
line = next(self.fd)
except StopIteration:
self.end = True
raise
for lev,rx in enumerate(self.levels):
if rx.match(line):
self.level = lev
raise StopIteration('New level')
return line
然后可以根据您的Json模式将其用于构建Python对象:
mc = multi_csv(open(csvfile,'r')
orders = []
while not mc.end:
rd = csv.reader(mc)
for row in rd:
if mc.level == 0:
items = []
orders.append({
'orderRef': int(row[0]),'orderStatus': row[1],'affiliate': row[2],'source': row[3],'agent': row[4],'customer': {
'name': row[5],'email': row[6],'telephone': row[7],'mobile': row[8],'address': {
'address1': row[9],'address2': row[10],'city': row[11],'county': row[12],'postCode': row[13],'country': row[14],},'voucherCode': row[15],'voucherAmount': int(row[16]),'orderDate': row[17],'items': items,})
elif mc.level == 1:
if len(row[1].strip()) != 0:
legs = []
passengers = []
items.append({
'itemId': int(row[1]),'type': row[2],'supplierCode': row[3],'supplierName': row[4],'bookingReference': row[5],'supplierPrice': row[6],'supplierPriceCurrency': row[7],'sellingPrice': row[8],'sellingPriceCurrency': row[9],'legs': legs,'passengers': passengers,})
legs.append({
'departureDate': row[10],'arrivalDate': row[11],'origin': row[12],'destination': row[13],'carrier': row[14],'flightNumber': row[15],'class': row[16],})
else: # mc.level == 2
passengers.append({
'passengerType': row[2],'title': row[3],'firstName': row[4],'surName': row[5],'dob': row[6],'gender': row[7],'foidType': row[8],})
使用您的文件,可以得到预期的结果:
pprint.pprint(orders)
[{'affiliate': ' ','agent': 'akjsd@ad.com','customer': {'address': {'address1': ' ','address2': ' ','city': ' ','country': ' ','county': ' ','postCode': ' '},'email': 'asd@asd.com','mobile': ' ','name': 'Mr Kas Iasd','telephone': '3342926655'},'items': [{'bookingReference': 'ABC123','itemId': 125,'legs': [{'arrivalDate': 'ONEWAY','carrier': 'PK','class': 'Economy','departureDate': '12/01/2018 13:15','destination': 'LHE','flightNumber': '354','origin': 'KHI'}],'passengers': [{'dob': '19/09/1995','firstName': 'Aasdsa','foidType': 'None','gender': 'Male','passengerType': 'Adult','surName': 'Mas','title': 'Mr'},{'dob': '07/12/1995','firstName': 'Asdad','surName': 'Dasd','title': 'Mr'}],'sellingPrice': '5002','sellingPriceCurrency': 'PKR','supplierCode': 'SB','supplierName': 'Sabre','supplierPrice': '5002','supplierPriceCurrency': 'PKR','type': 'Flight'}],'orderDate': '11/01/2018 18:51','orderRef': 1234,'orderStatus': 'PayOfflineConfirmedManual','source': ' ','voucherAmount': 0,'voucherCode': ' '},{'affiliate': ' ','agent': 'asdss@asda.com','email': 'ads@ads.com','mobile': '3332784342','name': 'Mr Asdsd Asdsd','telephone': '3332324252'},'items': [{'bookingReference': 'Failed','itemId': 123,'legs': [{'arrivalDate': '18/01/2018 14:25','carrier': 'PA','departureDate': '18/01/2018 11:40','destination': 'DXB','flightNumber': '210','origin': 'KHI'},{'arrivalDate': '25/01/2018 10:40','departureDate': '25/01/2018 6:25','flightNumber': '211','origin': 'DXB'}],'passengers': [{'dob': '11/08/1991','firstName': 'Asd','surName': 'Azam',{'dob': '01/07/1974','firstName': 'Aziz','surName': 'Asdsd',{'dob': '28/05/1995','firstName': 'mureed','surName': 'ahmed',{'dob': '14/04/2012','firstName': 'abdullah','gender': 'Female','passengerType': 'Child','surName': 'Cdsd',{'dob': '17/12/1999','firstName': 'Asdsd','surName': 'Ahmed','sellingPrice': '154340','supplierCode': 'PITCH','supplierName': 'Kicker','supplierPrice': '154340','orderDate': '11/01/2018 17:06','orderRef': 1235,'orderStatus': 'PayOfflinePendingManualProcessing','voucherAmount': 100,'voucherCode': 'ABC123'},'agent': 'asda@asdad.com','email': 'asd@asdsd.com','mobile': '3067869234','name': 'Mr Asds Sdsd','telephone': '3067869234'},'items': [{'bookingReference': ' ','itemId': 124,'departureDate': '23/01/2018 2:00','destination': 'SHJ','flightNumber': '812','passengers': [{'dob': '01/12/1994','firstName': 'Asds','foidType': 'Passport','surName': 'raza','sellingPrice': '20134','supplierPrice': '20134','orderDate': '11/01/2018 16:23','orderRef': 1236,'voucherCode': ' '}]