我在这里和https://stackoverflow.com看到的许多问题都使用自然键而不是基于IDENTITY()值的代理键.
我在计算机系统中的背景告诉我对整数执行任何比较操作将比比较字符串更快.
This评论让我质疑我的信念,所以我想我会创建一个系统来调查我的论文,即整数比字符串更快,可以用作sql Server中的键.
由于小数据集中可能存在很小的差别,我立即想到了一个两个表设置,其中主表有1,000,000行,而辅助表对于主表中的每一行有10行,总共10,000行辅助表.我测试的前提是创建两组这样的表,一组使用自然键,一组使用整数键,并在一个简单的查询上运行时序测试,如:
SELECT * FROM Table1 INNER JOIN Table2 ON Table1.Key = Table2.Key;
以下是我作为测试床创建的代码:
USE Master; IF (SELECT COUNT(database_id) FROM sys.databases d WHERE d.name = 'NaturalKeyTest') = 1 BEGIN ALTER DATABASE NaturalKeyTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE NaturalKeyTest; END GO CREATE DATABASE NaturalKeyTest ON (NAME = 'NaturalKeyTest',FILENAME = 'C:\sqlServer\Data\NaturalKeyTest.mdf',SIZE=8GB,FILEGROWTH=1GB) LOG ON (NAME='NaturalKeyTestLog',FILENAME = 'C:\sqlServer\Logs\NaturalKeyTest.mdf',SIZE=256MB,FILEGROWTH=128MB); GO ALTER DATABASE NaturalKeyTest SET RECOVERY SIMPLE; GO USE NaturalKeyTest; GO CREATE VIEW GetRand AS SELECT RAND() AS RandomNumber; GO CREATE FUNCTION RandomString ( @StringLength INT ) RETURNS NVARCHAR(max) AS BEGIN DECLARE @cnt INT = 0 DECLARE @str NVARCHAR(MAX) = ''; DECLARE @RandomNum FLOAT = 0; WHILE @cnt < @StringLength BEGIN SELECT @RandomNum = RandomNumber FROM GetRand; SET @str = @str + CAST(CHAR((@RandomNum * 64.) + 32) AS NVARCHAR(MAX)); SET @cnt = @cnt + 1; END RETURN @str; END; GO CREATE TABLE NaturalTable1 ( NaturalTable1Key NVARCHAR(255) NOT NULL CONSTRAINT PK_NaturalTable1 PRIMARY KEY CLUSTERED,Table1TestData NVARCHAR(255) NOT NULL ); CREATE TABLE NaturalTable2 ( NaturalTable2Key NVARCHAR(255) NOT NULL CONSTRAINT PK_NaturalTable2 PRIMARY KEY CLUSTERED,NaturalTable1Key NVARCHAR(255) NOT NULL CONSTRAINT FK_NaturalTable2_NaturalTable1Key FOREIGN KEY REFERENCES dbo.NaturalTable1 (NaturalTable1Key) ON DELETE CASCADE ON UPDATE CASCADE,Table2TestData NVARCHAR(255) NOT NULL ); GO /* insert 1,000 rows into NaturalTable1 */ INSERT INTO NaturalTable1 (NaturalTable1Key,Table1TestData) VALUES (dbo.RandomString(25),dbo.RandomString(100)); GO 1000000 /* insert 10,000 rows into NaturalTable2 */ INSERT INTO NaturalTable2 (NaturalTable2Key,NaturalTable1Key,Table2TestData) SELECT dbo.RandomString(25),T1.NaturalTable1Key,dbo.RandomString(100) FROM NaturalTable1 T1 GO 10 CREATE TABLE IDTable1 ( IDTable1Key INT NOT NULL CONSTRAINT PK_IDTable1 PRIMARY KEY CLUSTERED IDENTITY(1,1),Table1TestData NVARCHAR(255) NOT NULL CONSTRAINT DF_IDTable1_TestData DEFAULT dbo.RandomString(100) ); CREATE TABLE IDTable2 ( IDTable2Key INT NOT NULL CONSTRAINT PK_IDTable2 PRIMARY KEY CLUSTERED IDENTITY(1,IDTable1Key INT NOT NULL CONSTRAINT FK_IDTable2_IDTable1Key FOREIGN KEY REFERENCES dbo.IDTable1 (IDTable1Key) ON DELETE CASCADE ON UPDATE CASCADE,Table2TestData NVARCHAR(255) NOT NULL CONSTRAINT DF_IDTable2_TestData DEFAULT dbo.RandomString(100) ); GO INSERT INTO IDTable1 DEFAULT VALUES; GO 1000000 INSERT INTO IDTable2 (IDTable1Key) SELECT T1.IDTable1Key FROM IDTable1 T1 GO 10
上面的代码创建了一个数据库和4个表,并用数据填充表,准备测试.我运行的测试代码是:
USE NaturalKeyTest; GO DECLARE @loops INT = 0; DECLARE @MaxLoops INT = 10; DECLARE @Results TABLE ( FinishedAt DATETIME DEFAULT (GETDATE()),KeyType NVARCHAR(255),ElapsedTime FLOAT ); WHILE @loops < @MaxLoops BEGIN DBCC FREEPROCCACHE; DBCC FREESESSIONCACHE; DBCC FREESYSTEMCACHE ('ALL'); DBCC DROPCLEANBUFFERS; WAITFOR DELAY '00:00:05'; DECLARE @start DATETIME = GETDATE(); DECLARE @end DATETIME; DECLARE @count INT; SELECT @count = COUNT(*) FROM dbo.NaturalTable1 T1 INNER JOIN dbo.NaturalTable2 T2 ON T1.NaturalTable1Key = T2.NaturalTable1Key; SET @end = GETDATE(); INSERT INTO @Results (KeyType,ElapsedTime) SELECT 'Natural PK' AS KeyType,CAST((@end - @start) AS FLOAT) AS ElapsedTime; DBCC FREEPROCCACHE; DBCC FREESESSIONCACHE; DBCC FREESYSTEMCACHE ('ALL'); DBCC DROPCLEANBUFFERS; WAITFOR DELAY '00:00:05'; SET @start = GETDATE(); SELECT @count = COUNT(*) FROM dbo.IDTable1 T1 INNER JOIN dbo.IDTable2 T2 ON T1.IDTable1Key = T2.IDTable1Key; SET @end = GETDATE(); INSERT INTO @Results (KeyType,ElapsedTime) SELECT 'IDENTITY() PK' AS KeyType,CAST((@end - @start) AS FLOAT) AS ElapsedTime; SET @loops = @loops + 1; END SELECT KeyType,FORMAT(CAST(AVG(ElapsedTime) AS DATETIME),'HH:mm:ss.fff') AS AvgTime FROM @Results GROUP BY KeyType;
结果如下:
我在这里做错了什么,或INT键是否比25个字符的自然键快3倍?
请注意,我已经写了一个后续问题here.
解决方法
默认情况下,sql Server在主键上群集表.聚簇索引键用于标识行,因此它作为包含列添加到每个其他索引.密钥越宽,每个二级索引越大.
更糟糕的是,如果二级索引未明确定义为UNIQUE,则聚簇索引键自动成为每个索引键的一部分.这通常适用于大多数索引,因为通常仅在要求强制实施唯一性时才将索引声明为唯一.
因此,如果问题是,自然与代理聚集指数,代理几乎总是会赢.
另一方面,您将该代理列添加到表中,使表本身更大.这将导致聚集索引扫描变得更加昂贵.因此,如果您只有非常少的二级索引,并且您的工作负载需要经常查看所有(或大多数)行,那么实际上可能更好的是使用自然键来节省那些额外的字节.
最后,自然键通常可以更容易理解数据模型.在使用更多存储空间的同时,自然主键会导致自然外键,从而增加本地信息密度.
因此,正如数据库世界中经常出现的那样,真正的答案是“它取决于”.并且 – 始终使用真实数据在您自己的环境中进行测试.