我感到困惑的是以下两个查询产生相同输出的运行时间.这些查询正在sqlite 3.7.9上运行,大约有450万行的表,每个生成约50行的结果.
以下是查询:
- % echo "SELECT DISTINCT acolumn FROM atable ORDER BY acolumn;" | time sqlite3 mydb
- sqlite3 mydb 8.87s user 15.06s system 99% cpu 23.980 total
- % echo "SELECT acolumn FROM (SELECT DISTINCT acolumn FROM atable) ORDER BY acolumn;" | time sqlite3 options
- sqlite3 mydb 1.15s user 0.10s system 98% cpu 1.267 total
两个查询的表现不应该更接近吗?我明白,查询计划程序可能会以不同的顺序执行“排序”和“不同”操作,但如果是,是否需要?还是应该能够弄清楚如何做到最快?
编辑:根据要求,每个查询的“EXPLAIN QUERY PLAN”命令的输出.
对于第一个(单片)查询:
- 0|0|0|SCAN TABLE atable (~1000000 rows)
- 0|0|0|USE TEMP B-TREE FOR DISTINCT
- 1|0|0|SCAN TABLE atable (~1000000 rows)
- 1|0|0|USE TEMP B-TREE FOR DISTINCT
- 0|0|0|SCAN SUBQUERY 1 (~1000000 rows)
- 0|0|0|USE TEMP B-TREE FOR ORDER BY
您的第一个查询首先将记录插入到排序的临时表中,然后通过遍历它们并返回仅与前一个不相同的那些来执行DISTINCT.
(这可以在下面显示的EXPLAIN输出中看到; DISTINCT实际上被转换为一个GROUP BY,其行为相同.)
(这可以在下面显示的EXPLAIN输出中看到; DISTINCT实际上被转换为一个GROUP BY,其行为相同.)
您的第二个查询在理论上与第一个查询相同,但是sqlite的查询优化器相当简单,无法证明此转换是安全的(如subquery flattening documentation所述).
因此,通过首先执行DISTINCT,仅将任何非重复项插入到临时表中,然后使用第二个临时表执行ORDER BY来实现.
第二步完全是多余的,因为第一个临时表已经被排序了,但是对于你的数据来说,这恰好是更快的,因为你有这么多重复的东西从来没有存储在任何一个临时表中.
理论上,您的第一个查询可能会更快,因为sqlite已经认识到DISTINCT和ORDER BY子句可以使用相同的排序临时表来实现.
然而,实际上,sqlite它不够聪明,不能记住DISTINCT意味着重复项不需要存储在临时表中.
(这个特殊的优化可能会添加到sqlite,如果你在mailing list很好地问.)
- $sqlite3 mydb
- sqlite> .explain
- sqlite> explain SELECT DISTINCT acolumn FROM atable ORDER BY acolumn;
- addr opcode p1 p2 p3 p4 p5 comment
- ---- ------------- ---- ---- ---- ------------- -- -------------
- 0 Trace 0 0 0 00
- 1 SorterOpen 1 2 0 keyinfo(1,BINARY) 00
- 2 Integer 0 3 0 00 clear abort flag
- 3 Integer 0 2 0 00 indicate accumulator empty
- 4 Null 0 6 6 00
- 5 Gosub 5 37 0 00
- 6 Goto 0 40 0 00
- 7 OpenRead 0 2 0 1 00 atable
- 8 Rewind 0 14 0 00
- 9 Column 0 0 8 00 atable.acolumn
- 10 Sequence 1 9 0 00
- 11 MakeRecord 8 2 10 00
- 12 SorterInsert 1 10 0 00
- 13 Next 0 9 0 01
- 14 Close 0 0 0 00
- 15 OpenPseudo 2 10 2 00
- 16 SorterSort 1 39 0 00 GROUP BY sort
- 17 SorterData 1 10 0 00
- 18 Column 2 0 7 20
- 19 Compare 6 7 1 keyinfo(1,BINARY) 00
- 20 Jump 21 25 21 00
- 21 Move 7 6 0 00
- 22 Gosub 4 32 0 00 output one row
- 23 IfPos 3 39 0 00 check abort flag
- 24 Gosub 5 37 0 00 reset accumulator
- 25 Column 2 0 1 00
- 26 Integer 1 2 0 00 indicate data in accumulator
- 27 SorterNext 1 17 0 00
- 28 Gosub 4 32 0 00 output final row
- 29 Goto 0 39 0 00
- 30 Integer 1 3 0 00 set abort flag
- 31 Return 4 0 0 00
- 32 IfPos 2 34 0 00 Groupby result generator entry point
- 33 Return 4 0 0 00
- 34 Copy 1 11 0 00
- 35 ResultRow 11 1 0 00
- 36 Return 4 0 0 00 end groupby result generator
- 37 Null 0 1 0 00
- 38 Return 5 0 0 00
- 39 Halt 0 0 0 00
- 40 Transaction 0 0 0 00
- 41 VerifyCookie 0 2 0 00
- 42 TableLock 0 2 0 atable 00
- 43 Goto 0 7 0 00
- sqlite> explain SELECT acolumn FROM (SELECT DISTINCT acolumn FROM atable) ORDER BY acolumn;
- addr opcode p1 p2 p3 p4 p5 comment
- ---- ------------- ---- ---- ---- ------------- -- -------------
- 0 Trace 0 0 0 00
- 1 Goto 0 39 0 00
- 2 Goto 0 17 0 00
- 3 OpenPseudo 0 3 1 01 coroutine for sqlite_subquery_DA7480_
- 4 Integer 0 2 0 01
- 5 OpenEphemeral 2 0 0 keyinfo(1,BINARY) 08
- 6 OpenRead 1 2 0 1 00 atable
- 7 Rewind 1 14 0 00
- 8 Column 1 0 3 00 atable.acolumn
- 9 Found 2 13 3 1 00
- 10 MakeRecord 3 1 4 00
- 11 IdxInsert 2 4 0 00
- 12 Yield 1 0 0 00
- 13 Next 1 8 0 01
- 14 Close 1 0 0 00
- 15 Integer 1 2 0 00
- 16 Yield 1 0 0 00 end sqlite_subquery_DA7480_
- 17 SorterOpen 3 3 0 keyinfo(1,BINARY) 00
- 18 Integer 2 1 0 00
- 19 Yield 1 0 0 00 next row of co-routine sqlite_subquery_DA7480_
- 20 If 2 29 0 00
- 21 Column 0 0 5 00 sqlite_subquery_DA7480_.acolumn
- 22 MakeRecord 5 1 6 00
- 23 Column 0 0 7 00 sqlite_subquery_DA7480_.acolumn
- 24 Sequence 3 8 0 00
- 25 Move 6 9 0 00
- 26 MakeRecord 7 3 10 00
- 27 SorterInsert 3 10 0 00
- 28 Goto 0 19 0 00
- 29 OpenPseudo 4 6 1 00
- 30 OpenPseudo 5 11 3 00
- 31 SorterSort 3 37 0 00
- 32 SorterData 3 11 0 00
- 33 Column 5 2 6 20
- 34 Column 4 0 5 20
- 35 ResultRow 5 1 0 00
- 36 SorterNext 3 32 0 00
- 37 Close 4 0 0 00
- 38 Halt 0 0 0 00
- 39 Transaction 0 0 0 00
- 40 VerifyCookie 0 2 0 00
- 41 TableLock 0 2 0 atable 00
- 42 Goto 0 2 0 00