建立
这是我的架构的简单表示:
有几点需要注意:
>用户通过会员资格属于团队.
>团队可以拥有许多馆藏,应用程序和webhook.
>集合也可以有很多webhook.
> webhooks可以属于团队或集合,但只能属于一个.
>事件可以属于任何对象,但只能属于一个对象.
对于大多数SaaS类型的公司来说,这似乎是一个相当基本的设置(例如Slack或Stripe).一切都由团队“拥有”,但用户属于团队并与界面交互.
问题
Find all of the events that are related (directly or indirectly) to a given user by
id
.
我可以轻松编写直接或间接通过特定方式查找的查询.例如…
Find all of the events that are directly related to a user by
id
.
SELECT * FROM events WHERE user_id = ${id}
要么…
Find all of the events that are indirectly related to a user via their teams.
SELECT events.* FROM events JOIN memberships ON memberships.team_id = events.team_id WHERE memberships.user_id = ${id}
甚至…
Find all of the events that are indirectly related to a user via any collections of their teams.
SELECT events.* FROM events JOIN collections ON collections.id = events.collection_id JOIN memberships ON memberships.team_id = collections.team_id WHERE memberships.user_id = ${id}
Webhooks变得更复杂,因为它们可以通过两种不同的方式相关联……
Find all of the events that are indirectly related to a user via any webhooks of their teams or collections.
SELECT * FROM events WHERE webhook_id IN ( SELECT webhooks.id FROM webhooks JOIN memberships ON memberships.team_id = webhooks.team_id WHERE memberships.user_id = ${id} ) OR webhook_id IN ( SELECT webhooks.id FROM webhooks JOIN collections ON collections.id = webhooks.collection_id JOIN memberships ON memberships.team_id = collections.team_id WHERE memberships.user_id = ${id} )
但正如您所看到的,通过所有这些路径,用户可以通过许多不同的方式与发生的事件相关联!因此,当我尝试成功获取所有相关事件的查询时,它最终看起来像……
SELECT * FROM events WHERE user_id = ${id} OR app_id IN ( SELECT apps.id FROM apps JOIN memberships ON memberships.team_id = apps.team_id WHERE memberships.user_id = ${id} ) OR collection_id IN ( SELECT collections.id FROM collections JOIN memberships ON memberships.team_id = collections.team_id WHERE memberships.user_id = ${id} ) OR memberships_id IN ( SELECT id FROM memberships WHERE user_id = ${id} ) OR team_id IN ( SELECT team_id FROM memberships WHERE user_id = ${id} ) OR webhook_id IN ( SELECT webhooks.id FROM webhooks JOIN memberships ON memberships.team_id = webhooks.team_id WHERE memberships.user_id = ${id} ) OR webhook_id IN ( SELECT webhooks.id FROM webhooks JOIN collections ON collections.id = webhooks.collection_id JOIN memberships ON memberships.team_id = collections.team_id WHERE memberships.user_id = ${id} )
问题
>最终“全部包含”查询效率是否非常低效?
>有更有效的方式来编写它吗?
>是否有更简单,更易于阅读的方式来编写它?
解决方法
我能想到处理这类问题的最佳方法(思考可维护性和效率的方法)是通过使用CTE,它允许您创建临时结果并在整个查询中重用该结果. CTE使用WITH关键字,并且基本上将结果别名为表,以便您可以多次JOIN对它:
WITH user_memberships AS ( SELECT * FROM memberships WHERE user_id = ${id} ),user_apps AS ( SELECT * FROM apps INNER JOIN user_memberships ON user_memberships.team_id = apps.team_id ),user_collections AS ( SELECT * FROM collections INNER JOIN user_memberships ON user_memberships.team_id = collections.team_id ),user_webhooks AS ( SELECT * FROM webhooks LEFT OUTER JOIN user_collections ON user_collections.id = webhooks.collection_id INNER JOIN user_memberships ON user_memberships.team_id = webhooks.team_id OR user_memberships.team_id = user_collections.team_id ) SELECT events.* FROM events WHERE app_id IN (SELECT id FROM user_apps) OR collection_id IN (SELECT id FROM user_collections) OR membership_id IN (SELECT id FROM user_memberships) OR team_id IN (SELECT team_id FROM user_memberships) OR user_id = ${id} OR webhook_id IN (SELECT id FROM user_webhooks) ;
这样做的好处是:
>每个CTE都可以利用相应JOIN谓词的索引,并更快地返回该子集的结果,而不是让执行计划程序尝试解析一系列复杂谓词> CTE可以单独维护,使子集的故障排除问题更容易>你没有违反DRY原则>如果CTE具有查询之外的值,则可以将其移动到存储过程中并引用它