我们知道,Postgresql的OFFSET要求它扫描所有的行,直到它到达你请求的位置,这使得它对于通过巨大的结果集的分页是无用的,随着OFFSET的增加越来越慢。
SELECT * FROM table ORDER BY somecol LIMIT 10 OFFSET 500
你可以说:
SELECT * FROM (SELECT *,ROW_NUMBER() OVER (ORDER BY somecol ASC) AS rownum FROM table) AS foo WHERE rownum > 500 AND rownum <= 510
我构造了一个测试,它比较OFFSET,光标和ROW_NUMBER()。我对ROW_NUMBER()的印象是,无论您在结果集中的位置如何,它的速度都是一致的,是正确的。但是,这个速度比OFFSET或CURSOR的速度要慢得多,这也是我的印象,速度几乎是相同的,速度越慢,结果就越差。
结果:
offset(100,100): 0.016359 scroll(100,100): 0.018393 rownum(100,100): 15.535614 offset(100,480000): 1.761800 scroll(100,480000): 1.781913 rownum(100,480000): 15.158601 offset(100,999900): 3.670898 scroll(100,999900): 3.664517 rownum(100,999900): 14.581068
测试脚本使用sqlalchemy设置表和1000000行测试数据。然后,它使用psycopg2游标执行每个SELECT语句并使用三种不同的方法获取结果。
from sqlalchemy import * Metadata = MetaData() engine = create_engine('postgresql://scott:tiger@localhost/test',echo=True) t1 = Table('t1',Metadata,Column('id',Integer,primary_key=True),Column('d1',String(50)),Column('d2',Column('d3',Column('d4',Column('d5',String(50)) ) if not engine.has_table('t1'): conn = engine.connect() t1.create(conn) # 1000000 rows for i in range(100): conn.execute(t1.insert(),[ dict( ('d%d' % col,"data data data %d %d" % (col,(i * 10000) + j)) for col in range(1,6) ) for j in xrange(1,10001) ]) import time def timeit(fn,count,*args): now = time.time() for i in xrange(count): fn(*args) total = time.time() - now print "%s(%s): %f" % (fn.__name__,",".join(repr(x) for x in args),total) # this is a raw psycopg2 connection. conn = engine.raw_connection() def offset(limit,offset): cursor = conn.cursor() cursor.execute("select * from t1 order by id limit %d offset %d" % (limit,offset)) cursor.fetchall() cursor.close() def rownum(limit,offset): cursor = conn.cursor() cursor.execute("select * from (select *," "row_number() over (order by id asc) as rownum from t1) as foo " "where rownum>=%d and rownum<%d" % (offset,limit + offset)) cursor.fetchall() cursor.close() def scroll(limit,offset): cursor = conn.cursor('foo') cursor.execute("select * from t1 order by id") cursor.scroll(offset) cursor.fetchmany(limit) cursor.close() print timeit(offset,10,100,100) timeit(scroll,100) timeit(rownum,100) print timeit(offset,480000) timeit(scroll,480000) timeit(rownum,480000) print timeit(offset,999900) timeit(scroll,999900) timeit(rownum,999900)