我真正的问题与记录哪些大量的反病毒产品同意给定的样本是给定的反病毒家族的成员有关.该数据库有数百万个样本,每个样本都有数十种反病毒产品投票.我想问一个问题,如“对于包含名称’XYZ’的恶意软件,哪个样本得票最多,哪些供应商投票支持?”得到如下结果:
"BadBadVirus" V1 V2 V3 V4 V5 V6 V7 Sample 1 - 4 votes 1 0 1 0 0 1 1 Sample 2 - 5 votes 1 0 1 0 1 1 1 Sample 3 - 5 votes 1 0 1 0 1 1 1 total 14 3 3 2 3 3
可能会用来告诉我供应商2和供应商4或者不知道如何
检测这种恶意软件,或者将它们命名为不同的恶意软件.
我会尝试略微概括我的问题,同时希望不会破坏你帮助我的能力.假设我有五个选民(Alex,Bob,Carol,Dave,Ed)被要求查看五张照片(P1,P2,P3,P4,P5)并决定照片的“主要主题”是什么.对于我们的例子,我们只假设它们仅限于“猫”,“狗”或“马”.不是每个选民都对每件事都投票.
数据以这种形式存在于数据库中:
Photo,Voter,Decision (1,'Alex','Cat') (1,'Bob','Dog') (1,'Carol','Dave','Ed','Cat') (2,'Dog') (2,'Dog') (3,'Horse') (3,'Horse') (4,'Cat') (4,'Horse') (5,'Dog') (5,'Cat') (5,'Cat')
目标是,鉴于我们正在寻找的照片主题,我们想知道有多少选民认为这是该照片的主要观点,但也列出了哪些选民认为.
Query for: "Cat" Total Alex Bob Carol Dave Ed 1 - 4 1 0 1 1 1 2 - 3 1 0 1 1 0 3 - 0 0 0 0 0 0 4 - 1 0 0 1 0 0 5 - 4 0 1 1 1 1 ------------------------------------ total 12 2 1 4 3 2 Query for: "Dog" Total Alex Bob Carol Dave Ed 1 - 1 0 1 0 0 0 2 - 2 0 1 0 0 1 3 - 1 0 0 1 0 0 4 - 0 0 0 0 0 0 5 - 1 1 0 0 0 0 ------------------------------------ total 5 1 2 1 0 1
这是我可以用我存储的格式的数据做的事情吗?
我很难得到一个查询来做到这一点 – 虽然它很简单,可以将数据转储出来,然后编写一个程序来做到这一点,我真的希望能够在数据库中做到这一点,如果可以的话.
谢谢你的任何建议.
解决方法
create table vote (Photo integer,Voter text,Decision text); insert into vote values (1,'Cat'),(1,'Dog'),(2,(3,'Horse'),(4,(5,'Cat') ;
对猫的查询:
select photo,alex + bob + carol + dave + ed as Total,alex,bob,carol,dave,ed from crosstab($$ select photo,voter,case decision when 'Cat' then 1 else 0 end from vote order by photo $$,' select distinct voter from vote order by voter ' ) as ( photo integer,Alex integer,Bob integer,Carol integer,Dave integer,Ed integer ); photo | total | alex | bob | carol | dave | ed -------+-------+------+-----+-------+------+---- 1 | 4 | 1 | 0 | 1 | 1 | 1 2 | 3 | 1 | 0 | 1 | 1 | 0 3 | 0 | 0 | 0 | 0 | 0 | 0 4 | 1 | 0 | 0 | 1 | 0 | 0 5 | 4 | 0 | 1 | 1 | 1 | 1
如果选民人数众多或未知,那么可以动态完成:
do $do$ declare voter_list text; r record; begin drop table if exists pivot; voter_list := ( select string_agg(distinct voter,' ' order by voter) from vote ); execute(format(' create table pivot ( decision text,photo integer,Total integer,%1$s )',(replace(voter_list,' ',' integer,') || ' integer') )); for r in select distinct decision from vote loop execute (format($f$ insert into pivot select %3$L as decision,photo,%1$s as Total,%2$s from crosstab($ct$ select photo,case decision when %3$L then 1 else 0 end from vote order by photo $ct$,$ct$ select distinct voter from vote order by voter $ct$ ) as ( photo integer,%4$s );$f$,replace(voter_list,' + '),','),r.decision,') || ' integer' )); end loop; end; $do$;
上面的代码创建了表pivot以及所有决策:
select * from pivot where decision = 'Cat';