我已经设置了一个sqlite3数据库.我填写了一些数据(约400万条记录~1.2Gb的数据).
然后我做一些查询(选择/删除/更新).
问题是有时在插入后脚本会停止而不会出错.有时它会一直运行到最后.
这些是我运行的查询类型:
然后我做一些查询(选择/删除/更新).
问题是有时在插入后脚本会停止而不会出错.有时它会一直运行到最后.
这些是我运行的查询类型:
from __future__ import print_function import sqlite3 import csv import os import glob import sys import time db = 'test.db' conn = sqlite3.connect(db) conn.text_factory = str # allows utf-8 data to be stored c = conn.cursor() i = 0 ### traverse the directory and process each .csv file ##print("debug") csvfile =('/home/Desktop/Untitled Folder/Crimes_-_2001_to_present.csv') with open(csvfile,"rb") as f: reader = csv.reader(f) t = time.time() header = True for row in reader: if header: # gather column names from the first row of the csv header = False sql = "DROP TABLE IF EXISTS test_table" c.execute(sql) #print("debug 1") sql = "CREATE TABLE test_table (ID INTEGER,FBI_Code INTEGER,Updated_On TEXT,District TEXT,Beat INTEGER,Primary_Type TEXT,Location BLOB,Latitude REAL,Arrest INTEGER,Domestic INTEGER,Longitude REAL,Community_Area INTEGER,Case_Number INTEGER,Block TEXT,Location_Description TEXT,Ward INTEGER,IUCR INTEGER,Year INTEGER,Date TEXT,Y_Coordinate INTEGER,Description TEXT,X_Coordinate INTEGER);" c.execute(sql) #print("debug 2") insertsql = "INSERT INTO test_table VALUES (?,?,?)" rowlen = len(row) #print("debug 3") else: # skip lines that don't have the right number of columns #print("debug 4") #if len(row) == rowlen: #print("debug 5") try: c.execute(insertsql,row) except: print("problem in row %d" % i) print(row) continue # print("debug 6") i +=1 if i == 1000: conn.commit() #### if i == 4000000: #### break ## #print (row) conn.commit() print('\nTime for Insertions TOTAL~> \n') print(float( time.time() -t )) print('\nTime for Insertions per Query~> \n') print(float( time.time() -t )/i) del rows rows = list() print('\nTime for Selections ~> Domestic\n') t = time.time() c.execute("SELECT * FROM test_table WHERE Domestic == 'false'") rows = c.fetchall() print(float( time.time() -t )) print(len(rows)) del rows rows = list() print('\nTime for Selections ~> Arrests\n') t = time.time() c.execute("SELECT * FROM test_table WHERE Arrest == 'false'") rows = c.fetchall() print(float( time.time() -t )) print(len(rows)) del rows rows = list() print('\nTime for Selections ID~> \n') t = time.time() c.execute("SELECT * FROM test_table WHERE ID < 9938614") rows = c.fetchall() print(float( time.time() -t )) print(len(rows)) del rows rows = list() print('\nTime for Selections ~> Primary_Type\n') t = time.time() c.execute("SELECT * FROM test_table WHERE Primary_Type == 'BATTERY'") rows = c.fetchall() print(float( time.time() -t )) print(len(rows)) del rows rows = list() print('\nTime for Selections Year~> \n') t = time.time() c.execute("SELECT * FROM test_table WHERE Year <= 2014") rows = c.fetchall() print(float( time.time() -t )) print(len(rows)) del rows rows = [] print('\nTime for Updates ~> YEAR\n') t = time.time() c.execute("UPDATE test_table SET Year = '2016' WHERE Year == '2014'") print(float( time.time() -t )) print('\nTime for Selections Year~> \n') t = time.time() c.execute("SELECT * FROM test_table WHERE Year <= 2014") rows = c.fetchall() print(float( time.time() -t )) print(len(rows)) print('\nTime for DELETIONS ~> Domestic\n') t = time.time() c.execute("DELETE FROM test_table WHERE Domestic == 'false'") rows = c.fetchall() print(float( time.time() -t )) print(len(rows)) del rows c.close() conn.close()
每次我重新分配行列表,因为在一些查询后我的内存不足.但我不认为这是问题(以防万一我使用del行并重新分配它,这样就慢了).在其中一些查询后,脚本停止没有错误,我无法弄清楚为什么,因为有时它运行正常.
编辑
我已经包含了上面的代码.问题是在我执行查询后插入部分后脚本终止而没有任何错误.
例如它直到这里:
…
Time for Selections ~> Arrests123.231
3928182
然后它终止.在第一种方法中,我没有删除列表,当我尝试重新声明列表时,Cython产生了核心转储错误.现在我删除然后声明列表Cython运行正常.我的问题是为什么python没有捕获任何异常?
在重新分配列表之后,垃圾收集器会清除垃圾数据(正如我从linux监视器中看到的那样)但它崩溃没有错误.而最令人讨厌的是,有时它会一直运行到最后.