使用sql查找现有整数子集中不存在的范围内的下一个可用整数

前端之家收集整理的这篇文章主要介绍了使用sql查找现有整数子集中不存在的范围内的下一个可用整数前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
问题陈述:

given a range x -> y of unsigned integers
where x and y are both in the range 0 -> 2n
and n is 0 -> 32 (or 64 in alternate cases)
find the minimum available value
not equal to x or y
that is not in an existing set
where existing sets are arbitrary subsets of x -> y

我正在使用数据库中的IPv4和IPv6子网建模.每个子网由起始地址和结束地址定义(我通过业务规则确保范围的完整性).因为IPv6太大,无法存储在bigint数据类型中,我们将IP地址存储为二进制(4)或二进制(16).

相关数据存储在子网,dhcp_range和ip_address表中:

>子网:
子网范围由一个起始和终止的IP地址定义,并存储在子网表中.子网范围总是大小为2n(根据CIDR /网络掩码的定义).
> IP:
子网具有0 .. * IP地址存储在ip_address表中. IP地址必须位于开始和结束地址之间,但不等于其关联子网定义的范围.
> DHCP范围:
子网具有0 .. * DHCP范围存储在dhcp_range表中.类似于子网,每个DHCP范围定义一个起始和终止地址. DHCP范围由关联的子网范围限制. DHCP范围不重叠.

我想确定的是子网的下一个可用IP:

>尚未分配(不在IP地址表中)
>不在DHCP范围内
>而不等于子网范围的开始或结束地址.

我正在寻找一个找到最小可用地址或所有可用地址的解决方案.

我最初的想法是生成由子网范围限制的可能地址(数字)的范围,然后根据使用的集合删除地址:

declare @subnet_sk int = 42

;with
address_range as (
    select cast(ipv4_begin as bigint) as available_address,cast(ipv4_end as bigint) as end_address,subnet_sk
      from subnet s
     where subnet_sk = @subnet_sk

    union all

    select available_address + 1,end_address,subnet_sk
      from address_range
     where available_address + 1 <= end_address
),assigned_addresses as (
    select ip.[address],subnet_sk
      from ip_address ip
     where ip.subnet_sk = @subnet_sk
       and ip.address_family = 'InterNetwork'),dhcp_ranges as (
    select dhcp.begin_address,dhcp.end_address,subnet_sk
      from dhcp_range dhcp
     where dhcp.subnet_sk = @subnet_sk
       and dhcp.address_family = 'InterNetwork')
select distinct ar.available_address
  from address_range ar
       join dhcp_ranges dhcp
         on ar.available_address
            not between dhcp.begin_address
                    and dhcp.end_address
       left join assigned_addresses aa
         on ar.available_address = aa.[address]
       join subnet s
         on ar.available_address != s.ipv4_begin
        and ar.available_address != s.ipv4_end
 where aa.[address] is null
   and s.subnet_sk = @subnet_sk
order by available_address
option (MAXRECURSION 32767)

上述查询使用递归CTE,不适用于所有数据排列.递归CTE是麻烦的,因为它被限制在32,767的最大大小(比潜在的范围大小小得多),并且具有非常慢的非常真实的可能性.我可能会用递归CTE来解决我的问题,但在以下情况下查询失败:

>当没有分配IP地址或DHCP范围时:它什么都不返回
应返回由子网范围定义的所有IP地址
>分配多个DHCP范围时:返回DHCP范围内的IP

为了帮助解决问题,我创建了一个具有三个子网的SQL Fiddle;每个具有不同的特征:切碎,空或大部分连续.上述查询和小提琴中的设置都适用于大多数连续的子网,但对其他子网失败.还有一个GitHub Gist of the schema and example data.

我已经努力用递归和堆栈的CTE产生数字序列,但如上所述,我们担心它们的性能不佳,在递归CTE的情况下人为限制. Aaron Bertrand在他的系列Generate a set or sequence without loops中详细列出了CTE的一些替代方案.令人遗憾的是,数据集对于数字表来说太大,因为为IPv4地址空间创建一个只需要32 GB的磁盘空间(sql Server存储bigint values in 8 bytes).我宁愿在飞行中生成序列,但是没有提出一个很好的方法.

或者,我试图通过查看我知道要使用的地址来种子我的查询

declare @subnet_sk int = 1

select unassigned_range.*
  from (select cast(l.address as bigint) + 1 as start,min(cast(fr.address as bigint)) - 1 as stop
          from ip_address as l
               left join ip_address as r on l.address = r.address - 1
               left join ip_address as fr on l.address < fr.address
         where r.address is null and fr.address is not null
           and l.subnet_sk = @subnet_sk
        group by l.address,r.address) as unassigned_range
       join dhcp_range dhcp
         on unassigned_range.start
            not between cast(dhcp.begin_address as bigint)
                and cast(dhcp.end_address as bigint)
        and unassigned_range.stop
            not between cast(dhcp.begin_address as bigint)
                and cast(dhcp.end_address as bigint)
 where dhcp.subnet_sk = @subnet_sk

遗憾的是,当ip_address或dhcp_range表中没有任何内容时,上述查询不起作用.更糟糕的是,由于它不知道子网范围的范围,dhcp_range向子网范围的上限将人为地限制返回的值,因为查询不能从边缘的空白空间返回行.表现也不突出.

使用sql或Tsql如何确定在其他范围限制的任意整数范围内的下一个最小可用整数值?

解决方法

在这种情况下,不需要递归,因为我们有LEAD功能.

我会在“差距”和“岛屿”方面考虑这个问题.

我将首先关注IPv4,因为它更容易与它们进行算术运算,但IPv6的想法是一样的,最后我将显示一个通用的解决方案.

首先,我们有一个完整的可能的IP:从0x00000000到0xFFFFFFFF.

在此范围内,dhcp_range:dhcp_range.begin_address,dhcp_range.end_address中的范围(包括)定义的“岛”.您可以将分配的IP地址列表视为另一组岛屿,每个岛屿有一个元素:ip_address.address,ip_address.address.最后,子网本身是两个岛:0x00000000,subnet.ipv4_begin和subnet.ipv4_end,0xFFFFFFFF.

我们知道这些岛屿不重叠,这使我们的生活更加轻松.群岛可以完全相邻.例如,当您连续分配IP地址很少时,它们之间的差距为零.
在所有这些岛中,我们需要找到第一间隙,其具有至少一个元件,即非零间隙,即下一个岛在先前的岛结束之后的某个距离处开始.

所以,我们将把所有的岛屿放在一起使用UNION(CTE_Islands),然后按照end_address(或begin_address,使用具有索引的字段)遍历所有的岛屿,并使用LEAD来查看起始地址的下一个岛屿.最后,我们将有一张表,每行都有当前岛的end_address和下一个岛(CTE_Diff)的begin_address.如果它们之间的差异不止一个,则意味着“差距”足够宽,我们将返回当前岛的end_address加1.

给定子网的第一个可用IP地址

DECLARE @ParamSubnet_sk int = 1;

WITH
CTE_Islands
AS
(
    SELECT CAST(begin_address AS bigint) AS begin_address,CAST(end_address AS bigint) AS end_address
    FROM dhcp_range
    WHERE subnet_sk = @ParamSubnet_sk

    UNION ALL

    SELECT CAST(address AS bigint) AS begin_address,CAST(address AS bigint) AS end_address
    FROM ip_address
    WHERE subnet_sk = @ParamSubnet_sk

    UNION ALL

    SELECT CAST(0x00000000 AS bigint) AS begin_address,CAST(ipv4_begin AS bigint) AS end_address
    FROM subnet
    WHERE subnet_sk = @ParamSubnet_sk

    UNION ALL

    SELECT CAST(ipv4_end AS bigint) AS begin_address,CAST(0xFFFFFFFF AS bigint) AS end_address
    FROM subnet
    WHERE subnet_sk = @ParamSubnet_sk
),CTE_Diff
AS
(
    SELECT
        begin_address,end_address
        --,LEAD(begin_address) OVER(ORDER BY end_address) AS BeginNextIsland,LEAD(begin_address) OVER(ORDER BY end_address) - end_address AS Diff
    FROM CTE_Islands
)
SELECT TOP(1)
    CAST(end_address + 1 AS varbinary(4)) AS NextAvailableIPAddress
FROM CTE_Diff
WHERE Diff > 1
ORDER BY end_address;

如果至少有一个IP地址可用,结果集将包含一行,如果没有IP地址可用,则不会包含行.

For parameter 1 result is `0xAC101129`.
For parameter 2 result is `0xC0A81B1F`.
For parameter 3 result is `0xC0A8160C`.

这是到SQLFiddle的一个链接.它没有使用参数,所以我硬编码1在那里.将其更改为UNION到其他子网ID(2或3)以尝试其他子网.此外,它没有正确显示varbinary的结果,所以我把它作为bigint.使用Windows计算器将其转换为十六进制来验证结果.

如果您没有将结果限制为TOP(1)的第一个差距,您将获得所有可用IP范围(间隙)的列表.

列出给定子网的所有可用IP地址范围

DECLARE @ParamSubnet_sk int = 1;

WITH
CTE_Islands
AS
(
    SELECT CAST(begin_address AS bigint) AS begin_address,LEAD(begin_address) OVER(ORDER BY end_address) - end_address AS Diff
    FROM CTE_Islands
)
SELECT
    CAST(end_address + 1 AS varbinary(4)) AS begin_range_AvailableIPAddress,CAST(BeginNextIsland - 1 AS varbinary(4)) AS end_range_AvailableIPAddress
FROM CTE_Diff
WHERE Diff > 1
ORDER BY end_address;

结果. SQL Fiddle结果是简单的bigint,不是十六进制,并带有硬编码的参数ID.

Result set for ID = 1
begin_range_AvailableIPAddress    end_range_AvailableIPAddress
0xAC101129                        0xAC10112E

Result set for ID = 2
begin_range_AvailableIPAddress    end_range_AvailableIPAddress
0xC0A81B1F                        0xC0A81B1F
0xC0A81B22                        0xC0A81B28
0xC0A81BFA                        0xC0A81BFE

Result set for ID = 3
begin_range_AvailableIPAddress    end_range_AvailableIPAddress
0xC0A8160C                        0xC0A8160C
0xC0A816FE                        0xC0A816FE

每个子网的第一个可用IP地址

很容易扩展查询并返回所有子网的第一个可用IP地址,而不是指定一个特定的子网.使用CROSS APPLY获取每个子网的岛列表,然后将PARTITION BY subnet_sk添加到LEAD函数中.

WITH
CTE_Islands
AS
(
    SELECT
        subnet_sk,begin_address,end_address
    FROM
        subnet AS Main
        CROSS APPLY
        (
            SELECT CAST(begin_address AS bigint) AS begin_address,CAST(end_address AS bigint) AS end_address
            FROM dhcp_range
            WHERE dhcp_range.subnet_sk = Main.subnet_sk

            UNION ALL

            SELECT CAST(address AS bigint) AS begin_address,CAST(address AS bigint) AS end_address
            FROM ip_address
            WHERE ip_address.subnet_sk = Main.subnet_sk

            UNION ALL

            SELECT CAST(0x00000000 AS bigint) AS begin_address,CAST(ipv4_begin AS bigint) AS end_address
            FROM subnet
            WHERE subnet.subnet_sk = Main.subnet_sk

            UNION ALL

            SELECT CAST(ipv4_end AS bigint) AS begin_address,CAST(0xFFFFFFFF AS bigint) AS end_address
            FROM subnet
            WHERE subnet.subnet_sk = Main.subnet_sk
        ) AS CA
),CTE_Diff
AS
(
    SELECT
        subnet_sk,LEAD(begin_address) OVER(PARTITION BY subnet_sk ORDER BY end_address) - end_address AS Diff
    FROM CTE_Islands
)
SELECT
    subnet_sk,CAST(MIN(end_address) + 1 as varbinary(4)) AS NextAvailableIPAddress
FROM CTE_Diff
WHERE Diff > 1
GROUP BY subnet_sk

结果集

subnet_sk    NextAvailableIPAddress
1            0xAC101129
2            0xC0A81B1F
3            0xC0A8160C

这是SQLFiddle.我不得不在sql Fiddle中删除转换为varbinary,因为它显示结果不正确.

用于IPv4和IPv6的通用解决方

所有子网的所有可用IP地址范围

SQL Fiddle with sample IPv4 and IPv6 data,functions and final query

您的IPv6示例数据不太正确 – 子网结束0xFC00000000000000FFFFFFFFFFFFFFFF小于您的dhcp范围,所以我改为0xFC0001066800000000000000FFFFFFFF.此外,您在同一个子网中同时拥有IPv4和IPv6,这是非常麻烦的.为了这个例子,我已经改变了你的架构 – 而不是在子网中有明确的ipv4_begin / end和ipv6_begin / end,所以我将ip_begin / end作为varbinary(16)(与你的其他表相同).我也删除了address_family,否则它对于sql小提琴来说太大了.

算术函数

为了使其适用于IPv6,我们需要弄清楚如何从/二进制中加1 /(16).我会为它做CLR功能.如果您不允许启用CLR,则可以通过标准T-sql进行.我做了两个函数,返回一个表,而不是标量,因为这样,它们可以被优化器内联.我想制作一个通用的解决方案,因此该函数将接受varbinary(16)并且适用于IPv4和IPv6.

这里是T-sql函数将varbinary(16)增加一个.如果参数不是16字节长,我假设它是IPv4,并且只需将其转换为bigint来添加1,然后返回到二进制.否则,我将二进制(16)分成8个字节长的两个部分,并将它们转换成bigint. bigint被签署,但是我们需要无符号增量,所以我们需要检查几个例子.

其他部分是最常见的 – 我们只是逐渐增加一部分,并将结果追加到原来的高部分.

如果低部分是0xFFFFFFFFFFFFFFFF,那么我们将低部分设置为0x0000000000000000并携带该标志,即将高部分递增1.

如果低部分是0x7FFFFFFFFFFFFFFF,那么我们将低部分设置为0x8000000000000000,因为尝试增加这个bigint值会导致溢出.

如果整数为0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,我们将结果设置为0x00000000000000000000000000000000.

一个递减的函数是相似的.

CREATE FUNCTION [dbo].[BinaryInc](@src varbinary(16))
RETURNS TABLE AS
RETURN
    SELECT
    CASE WHEN DATALENGTH(@src) = 16
    THEN
        -- Increment IPv6 by splitting it into two bigints 8 bytes each and then concatenating them
        CASE
        WHEN @src = 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
        THEN 0x00000000000000000000000000000000

        WHEN SUBSTRING(@src,9,8) = 0x7FFFFFFFFFFFFFFF
        THEN SUBSTRING(@src,1,8) + 0x8000000000000000

        WHEN SUBSTRING(@src,8) = 0xFFFFFFFFFFFFFFFF
        THEN CAST(CAST(SUBSTRING(@src,8) AS bigint) + 1 AS binary(8)) + 0x0000000000000000

        ELSE SUBSTRING(@src,8) + CAST(CAST(SUBSTRING(@src,8) AS bigint) + 1 AS binary(8))
        END
    ELSE
        -- Increment IPv4 by converting it into 8 byte bigint and then back into 4 bytes binary
        CAST(CAST(CAST(@src AS bigint) + 1 AS binary(4)) AS varbinary(16))
    END AS Result
    ;
GO

CREATE FUNCTION [dbo].[BinaryDec](@src varbinary(16))
RETURNS TABLE AS
RETURN
    SELECT
    CASE WHEN DATALENGTH(@src) = 16
    THEN
        -- Decrement IPv6 by splitting it into two bigints 8 bytes each and then concatenating them
        CASE
        WHEN @src = 0x00000000000000000000000000000000
        THEN 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

        WHEN SUBSTRING(@src,8) = 0x8000000000000000
        THEN SUBSTRING(@src,8) + 0x7FFFFFFFFFFFFFFF

        WHEN SUBSTRING(@src,8) = 0x0000000000000000
        THEN CAST(CAST(SUBSTRING(@src,8) AS bigint) - 1 AS binary(8)) + 0xFFFFFFFFFFFFFFFF

        ELSE SUBSTRING(@src,8) AS bigint) - 1 AS binary(8))
        END
    ELSE
        -- Decrement IPv4 by converting it into 8 byte bigint and then back into 4 bytes binary
        CAST(CAST(CAST(@src AS bigint) - 1 AS binary(4)) AS varbinary(16))
    END AS Result
    ;
GO

所有子网的所有可用IP地址范围

WITH
CTE_Islands
AS
(
    SELECT subnet_sk,end_address
    FROM dhcp_range

    UNION ALL

    SELECT subnet_sk,address AS begin_address,address AS end_address
    FROM ip_address

    UNION ALL

    SELECT subnet_sk,SUBSTRING(0x00000000000000000000000000000000,DATALENGTH(ip_begin)) AS begin_address,ip_begin AS end_address
    FROM subnet

    UNION ALL

    SELECT subnet_sk,ip_end AS begin_address,SUBSTRING(0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,DATALENGTH(ip_end)) AS end_address
    FROM subnet
),CTE_Gaps
AS
(
    SELECT
        subnet_sk,end_address AS EndThisIsland,LEAD(begin_address) OVER(PARTITION BY subnet_sk ORDER BY end_address) AS BeginNextIsland
    FROM CTE_Islands
),CTE_GapsIncDec
AS
(
    SELECT
        subnet_sk,EndThisIsland,EndThisIslandInc,BeginNextIslandDec,BeginNextIsland
    FROM CTE_Gaps
        CROSS APPLY
        (
            SELECT bi.Result AS EndThisIslandInc
            FROM dbo.BinaryInc(EndThisIsland) AS bi
        ) AS CA_Inc
        CROSS APPLY
        (
            SELECT bd.Result AS BeginNextIslandDec
            FROM dbo.BinaryDec(BeginNextIsland) AS bd
        ) AS CA_Dec
)
SELECT
    subnet_sk,EndThisIslandInc AS begin_range_AvailableIPAddress,BeginNextIslandDec AS end_range_AvailableIPAddress
FROM CTE_GapsIncDec
WHERE CTE_GapsIncDec.EndThisIslandInc <> BeginNextIsland
ORDER BY subnet_sk,EndThisIsland;

结果集

subnet_sk    begin_range_AvailableIPAddress        end_range_AvailableIPAddress
1            0xAC101129                            0xAC10112E
2            0xC0A81B1F                            0xC0A81B1F
2            0xC0A81B22                            0xC0A81B28
2            0xC0A81BFA                            0xC0A81BFE
3            0xC0A8160C                            0xC0A8160C
3            0xC0A816FE                            0xC0A816FE
4            0xFC000000000000000000000000000001    0xFC0000000000000000000000000000FF
4            0xFC000000000000000000000000000101    0xFC0000000000000000000000000001FF
4            0xFC000000000000000000000000000201    0xFC0000000000000000000000000002FF
4            0xFC000000000000000000000000000301    0xFC0000000000000000000000000003FF
4            0xFC000000000000000000000000000401    0xFC0000000000000000000000000004FF
4            0xFC000000000000000000000000000501    0xFC0000000000000000000000000005FF
4            0xFC000000000000000000000000000601    0xFC0000000000000000000000000006FF
4            0xFC000000000000000000000000000701    0xFC0000000000000000000000000007FF
4            0xFC000000000000000000000000000801    0xFC0000000000000000000000000008FF
4            0xFC000000000000000000000000000901    0xFC00000000000000BFFFFFFFFFFFFFFD
4            0xFC00000000000000BFFFFFFFFFFFFFFF    0xFC00000000000000CFFFFFFFFFFFFFFD
4            0xFC00000000000000CFFFFFFFFFFFFFFF    0xFC00000000000000FBFFFFFFFFFFFFFD
4            0xFC00000000000000FBFFFFFFFFFFFFFF    0xFC00000000000000FCFFFFFFFFFFFFFD
4            0xFC00000000000000FCFFFFFFFFFFFFFF    0xFC00000000000000FFBFFFFFFFFFFFFD
4            0xFC00000000000000FFBFFFFFFFFFFFFF    0xFC00000000000000FFCFFFFFFFFFFFFD
4            0xFC00000000000000FFCFFFFFFFFFFFFF    0xFC00000000000000FFFBFFFFFFFFFFFD
4            0xFC00000000000000FFFBFFFFFFFFFFFF    0xFC00000000000000FFFCFFFFFFFFFFFD
4            0xFC00000000000000FFFCFFFFFFFFFFFF    0xFC00000000000000FFFFBFFFFFFFFFFD
4            0xFC00000000000000FFFFBFFFFFFFFFFF    0xFC00000000000000FFFFCFFFFFFFFFFD
4            0xFC00000000000000FFFFCFFFFFFFFFFF    0xFC00000000000000FFFFFBFFFFFFFFFD
4            0xFC00000000000000FFFFFBFFFFFFFFFF    0xFC00000000000000FFFFFCFFFFFFFFFD
4            0xFC00000000000000FFFFFCFFFFFFFFFF    0xFC00000000000000FFFFFFBFFFFFFFFD
4            0xFC00000000000000FFFFFFBFFFFFFFFF    0xFC00000000000000FFFFFFCFFFFFFFFD
4            0xFC00000000000000FFFFFFCFFFFFFFFF    0xFC00000000000000FFFFFFFBFFFFFFFD
4            0xFC00000000000000FFFFFFFBFFFFFFFF    0xFC00000000000000FFFFFFFCFFFFFFFD
4            0xFC00000000000000FFFFFFFCFFFFFFFF    0xFC00000000000000FFFFFFFFBFFFFFFD
4            0xFC00000000000000FFFFFFFFBFFFFFFF    0xFC00000000000000FFFFFFFFCFFFFFFD
4            0xFC00000000000000FFFFFFFFCFFFFFFF    0xFC00000000000000FFFFFFFFFBFFFFFD
4            0xFC00000000000000FFFFFFFFFBFFFFFF    0xFC00000000000000FFFFFFFFFCFFFFFD
4            0xFC00000000000000FFFFFFFFFCFFFFFF    0xFC00000000000000FFFFFFFFFFBFFFFD
4            0xFC00000000000000FFFFFFFFFFBFFFFF    0xFC00000000000000FFFFFFFFFFCFFFFD
4            0xFC00000000000000FFFFFFFFFFCFFFFF    0xFC00000000000000FFFFFFFFFFFBFFFD
4            0xFC00000000000000FFFFFFFFFFFBFFFF    0xFC00000000000000FFFFFFFFFFFCFFFD
4            0xFC00000000000000FFFFFFFFFFFCFFFF    0xFC00000000000000FFFFFFFFFFFFBFFD
4            0xFC00000000000000FFFFFFFFFFFFBFFF    0xFC00000000000000FFFFFFFFFFFFCFFD
4            0xFC00000000000000FFFFFFFFFFFFCFFF    0xFC00000000000000FFFFFFFFFFFFFBFD
4            0xFC00000000000000FFFFFFFFFFFFFBFF    0xFC00000000000000FFFFFFFFFFFFFCFD
4            0xFC00000000000000FFFFFFFFFFFFFCFF    0xFC00000000000000FFFFFFFFFFFFFFBD
4            0xFC00000000000000FFFFFFFFFFFFFFBF    0xFC00000000000000FFFFFFFFFFFFFFCD
4            0xFC00000000000000FFFFFFFFFFFFFFCF    0xFC0001065FFFFFFFFFFFFFFFFFFFFFFF
4            0xFC000106600000000000000100000000    0xFC00010666FFFFFFFFFFFFFFFFFFFFFF
4            0xFC000106670000000000000100000000    0xFC000106677FFFFFFFFFFFFFFFFFFFFF
4            0xFC000106678000000000000100000000    0xFC000106678FFFFFFFFFFFFFFFFFFFFF
4            0xFC000106679000000000000100000000    0xFC0001066800000000000000FFFFFFFE

执行计划

我很好奇,看到这里有不同的解决方案在这里工作,所以我看了他们的执行计划.请记住,这些计划适用于没有任何索引的小样本数据集.

我的IPv4和IPv6的通用解决方案:

类似的解决方案dnoeth:

没有使用LEAD功能的cha的解决方案:

猜你在找的MsSQL相关文章