sql – Vertica和连接

前端之家收集整理的这篇文章主要介绍了sql – Vertica和连接前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
我正在调整一个Web分析工具,使用Vertica作为数据库.我有优化连接的真正问题.我尝试为我的一些查询创建预先连接的预测,尽管它确实使查询变得更快,但它减慢了数据加载到事实表中以进行爬网.

一个简单的INSERT INTO … SELECT * FROM,我们用来从分段表将数据加载到事实表中,从花费约5秒到花费20分钟.

因此,我删除了所有预先连接的预测,并尝试使用数据库设计器来设计查询特定的预测,但这还不够.即使有这些预测,一个简单的连接也要花费大约14秒钟,预先连接的投影需要1秒钟的时间.

我的问题是:对于预先连接的预测来说,如果这样很慢地插入数据插入是正常的,如果没有,这可能是什么罪魁祸首?如果是正常的,那么这对我们来说是一个阻挡,还有其他的技术可以用来加速连接?

我们在5节点集群上运行Vertica,每个节点具有2 x四核cpu和32 GB内存.我的示例查询中的表分别有188,843,085和25,712,878行.

EXPLAIN输出如下所示:

EXPLAIN SELECT referer_via_.url as referralPageUrl,COUNT(DISTINCT sessio
n.id) as visits FROM owa_session as session JOIN owa_referer AS referer_vi
a_ ON session.referer_id = referer_via_.id WHERE session.yyyymmdd BETWEEN 
'20121123' AND '20121123' AND session.site_id = '49' GROUP BY referer_via_
.url  ORDER BY visits DESC LIMIT 250;

Access Path:
+-SELECT  LIMIT 250 [Cost: 1M,Rows: 250 (STALE STATISTICS)] (PATH ID: 0)
|  Output Only: 250 tuples
|  Execute on: Query Initiator
| +---> SORT [Cost: 1M,Rows: 1 (STALE STATISTICS)] (PATH ID: 1)
| |      Order: count(DISTINCT "session".id) DESC
| |      Output Only: 250 tuples
| |      Execute on: All Nodes
| | +---> GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 1M,Rows: 1 (STALE 
STATISTICS)] (PATH ID: 2)
| | |      Aggregates: count(DISTINCT "session".id)
| | |      Group By: referer_via_.url
| | |      Execute on: All Nodes
| | | +---> GROUPBY HASH (SORT OUTPUT) (RESEGMENT GROUPS) [Cost: 1M,Rows
: 1 (STALE STATISTICS)] (PATH ID: 3)
| | | |      Group By: referer_via_.url,"session".id
| | | |      Execute on: All Nodes
| | | | +---> JOIN HASH [Cost: 1M,Rows: 1 (STALE STATISTICS)] (PATH ID: 
4) Outer (RESEGMENT)
| | | | |      Join Cond: ("session".referer_id = referer_via_.id)
| | | | |      Execute on: All Nodes
| | | | | +-- Outer -> STORAGE ACCESS for session [Cost: 463,Rows: 1 (ST
ALE STATISTICS)] (PUSHED GROUPING) (PATH ID: 5)
| | | | | |      Projection: public.owa_session_projection
| | | | | |      Materialize: "session".id,"session".referer_id
| | | | | |      Filter: ("session".site_id = '49')
| | | | | |      Filter: (("session".yyyymmdd >= 20121123) AND ("session"
.yyyymmdd <= 20121123))
| | | | | |      Execute on: All Nodes
| | | | | +-- Inner -> STORAGE ACCESS for referer_via_ [Cost: 293K,Rows:
26M] (PATH ID: 6)
| | | | | |      Projection: public.owa_referer_DBD_1_seg_Potency_2012112
2_Potency_20121122
| | | | | |      Materialize: referer_via_.id,referer_via_.url
| | | | | |      Execute on: All Nodes

解决方法

加速加入:
– 将会话表设计为“yyyymmdd”列.这将启用分区修剪
– 将列“yyyymmdd”的条件添加到_referer_via_并对其进行分区,如果可能(很可能)
– 将site_id尽可能接近于会话的二次(超)投影中的排序顺序开始

接下来,相应地将referer_id和id两个表分割.

并且在集群节点中有更多的帮助.

原文链接:https://www.f2er.com/mssql/82613.html

猜你在找的MsSQL相关文章