我感到困惑的是以下两个查询产生相同输出的运行时间.这些查询正在sqlite 3.7.9上运行,大约有450万行的表,每个生成约50行的结果.
以下是查询:
% echo "SELECT DISTINCT acolumn FROM atable ORDER BY acolumn;" | time sqlite3 mydb sqlite3 mydb 8.87s user 15.06s system 99% cpu 23.980 total % echo "SELECT acolumn FROM (SELECT DISTINCT acolumn FROM atable) ORDER BY acolumn;" | time sqlite3 options sqlite3 mydb 1.15s user 0.10s system 98% cpu 1.267 total
两个查询的表现不应该更接近吗?我明白,查询计划程序可能会以不同的顺序执行“排序”和“不同”操作,但如果是,是否需要?还是应该能够弄清楚如何做到最快?
编辑:根据要求,每个查询的“EXPLAIN QUERY PLAN”命令的输出.
对于第一个(单片)查询:
0|0|0|SCAN TABLE atable (~1000000 rows) 0|0|0|USE TEMP B-TREE FOR DISTINCT
1|0|0|SCAN TABLE atable (~1000000 rows) 1|0|0|USE TEMP B-TREE FOR DISTINCT 0|0|0|SCAN SUBQUERY 1 (~1000000 rows) 0|0|0|USE TEMP B-TREE FOR ORDER BY
您的第一个查询首先将记录插入到排序的临时表中,然后通过遍历它们并返回仅与前一个不相同的那些来执行DISTINCT.
(这可以在下面显示的EXPLAIN输出中看到; DISTINCT实际上被转换为一个GROUP BY,其行为相同.)
(这可以在下面显示的EXPLAIN输出中看到; DISTINCT实际上被转换为一个GROUP BY,其行为相同.)
您的第二个查询在理论上与第一个查询相同,但是sqlite的查询优化器相当简单,无法证明此转换是安全的(如subquery flattening documentation所述).
因此,通过首先执行DISTINCT,仅将任何非重复项插入到临时表中,然后使用第二个临时表执行ORDER BY来实现.
第二步完全是多余的,因为第一个临时表已经被排序了,但是对于你的数据来说,这恰好是更快的,因为你有这么多重复的东西从来没有存储在任何一个临时表中.
理论上,您的第一个查询可能会更快,因为sqlite已经认识到DISTINCT和ORDER BY子句可以使用相同的排序临时表来实现.
然而,实际上,sqlite它不够聪明,不能记住DISTINCT意味着重复项不需要存储在临时表中.
(这个特殊的优化可能会添加到sqlite,如果你在mailing list很好地问.)
$sqlite3 mydb sqlite> .explain sqlite> explain SELECT DISTINCT acolumn FROM atable ORDER BY acolumn; addr opcode p1 p2 p3 p4 p5 comment ---- ------------- ---- ---- ---- ------------- -- ------------- 0 Trace 0 0 0 00 1 SorterOpen 1 2 0 keyinfo(1,BINARY) 00 2 Integer 0 3 0 00 clear abort flag 3 Integer 0 2 0 00 indicate accumulator empty 4 Null 0 6 6 00 5 Gosub 5 37 0 00 6 Goto 0 40 0 00 7 OpenRead 0 2 0 1 00 atable 8 Rewind 0 14 0 00 9 Column 0 0 8 00 atable.acolumn 10 Sequence 1 9 0 00 11 MakeRecord 8 2 10 00 12 SorterInsert 1 10 0 00 13 Next 0 9 0 01 14 Close 0 0 0 00 15 OpenPseudo 2 10 2 00 16 SorterSort 1 39 0 00 GROUP BY sort 17 SorterData 1 10 0 00 18 Column 2 0 7 20 19 Compare 6 7 1 keyinfo(1,BINARY) 00 20 Jump 21 25 21 00 21 Move 7 6 0 00 22 Gosub 4 32 0 00 output one row 23 IfPos 3 39 0 00 check abort flag 24 Gosub 5 37 0 00 reset accumulator 25 Column 2 0 1 00 26 Integer 1 2 0 00 indicate data in accumulator 27 SorterNext 1 17 0 00 28 Gosub 4 32 0 00 output final row 29 Goto 0 39 0 00 30 Integer 1 3 0 00 set abort flag 31 Return 4 0 0 00 32 IfPos 2 34 0 00 Groupby result generator entry point 33 Return 4 0 0 00 34 Copy 1 11 0 00 35 ResultRow 11 1 0 00 36 Return 4 0 0 00 end groupby result generator 37 Null 0 1 0 00 38 Return 5 0 0 00 39 Halt 0 0 0 00 40 Transaction 0 0 0 00 41 VerifyCookie 0 2 0 00 42 TableLock 0 2 0 atable 00 43 Goto 0 7 0 00
sqlite> explain SELECT acolumn FROM (SELECT DISTINCT acolumn FROM atable) ORDER BY acolumn; addr opcode p1 p2 p3 p4 p5 comment ---- ------------- ---- ---- ---- ------------- -- ------------- 0 Trace 0 0 0 00 1 Goto 0 39 0 00 2 Goto 0 17 0 00 3 OpenPseudo 0 3 1 01 coroutine for sqlite_subquery_DA7480_ 4 Integer 0 2 0 01 5 OpenEphemeral 2 0 0 keyinfo(1,BINARY) 08 6 OpenRead 1 2 0 1 00 atable 7 Rewind 1 14 0 00 8 Column 1 0 3 00 atable.acolumn 9 Found 2 13 3 1 00 10 MakeRecord 3 1 4 00 11 IdxInsert 2 4 0 00 12 Yield 1 0 0 00 13 Next 1 8 0 01 14 Close 1 0 0 00 15 Integer 1 2 0 00 16 Yield 1 0 0 00 end sqlite_subquery_DA7480_ 17 SorterOpen 3 3 0 keyinfo(1,BINARY) 00 18 Integer 2 1 0 00 19 Yield 1 0 0 00 next row of co-routine sqlite_subquery_DA7480_ 20 If 2 29 0 00 21 Column 0 0 5 00 sqlite_subquery_DA7480_.acolumn 22 MakeRecord 5 1 6 00 23 Column 0 0 7 00 sqlite_subquery_DA7480_.acolumn 24 Sequence 3 8 0 00 25 Move 6 9 0 00 26 MakeRecord 7 3 10 00 27 SorterInsert 3 10 0 00 28 Goto 0 19 0 00 29 OpenPseudo 4 6 1 00 30 OpenPseudo 5 11 3 00 31 SorterSort 3 37 0 00 32 SorterData 3 11 0 00 33 Column 5 2 6 20 34 Column 4 0 5 20 35 ResultRow 5 1 0 00 36 SorterNext 3 32 0 00 37 Close 4 0 0 00 38 Halt 0 0 0 00 39 Transaction 0 0 0 00 40 VerifyCookie 0 2 0 00 41 TableLock 0 2 0 atable 00 42 Goto 0 2 0 00