Codd's Connection Trap and Oracle's JOIN TO ONE
Codd’s Connection Trap and Oracle’s JOIN TO ONE
Codd 的连接陷阱与 Oracle 的 JOIN TO ONE
In a previous post, I explored Codd’s connection trap in PostgreSQL and MongoDB — the classic pitfall where joining two independent many-to-many relationships through a shared attribute produces spurious combinations that look like facts but aren’t. The example followed Codd’s 1970 suppliers–parts–projects model: we know which suppliers supply which parts, and which projects use which parts, but joining through parts to derive supplier–project relationships is a relational composition — it tells us what could be true, not what is true.
在之前的一篇文章中,我探讨了 PostgreSQL 和 MongoDB 中的 Codd 连接陷阱——这是一个经典的陷阱,即通过共享属性连接两个独立的多对多关系时,会产生看起来像事实但实际上并非如此的虚假组合。该示例遵循了 Codd 1970 年提出的“供应商-零件-项目”模型:我们知道哪些供应商供应哪些零件,以及哪些项目使用哪些零件,但通过零件进行连接以推导出“供应商-项目”关系属于关系组合——它告诉我们的是“可能为真”的情况,而不是“事实为真”的情况。
Oracle Database 26ai introduces JOIN TO ONE, a SQL extension that structurally prevents this class of errors. In this post, I’ll reproduce Codd’s connection trap in Oracle, show how JOIN TO ONE catches it, and demonstrate the correct solutions.
Oracle Database 26ai 引入了 JOIN TO ONE,这是一种从结构上防止此类错误的 SQL 扩展。在本文中,我将在 Oracle 中重现 Codd 的连接陷阱,展示 JOIN TO ONE 如何捕获它,并演示正确的解决方案。
Why This Matters: A Gap in SQL joins
为什么这很重要:SQL 连接中的缺失环节
When developers build joins at the application level — fetching a parent row for a given foreign key in PL/SQL or application code — they naturally get safety checks: TOO_MANY_ROWS tells them a lookup that should have been unique returned multiple rows, and NO_DATA_FOUND tells them the expected parent doesn’t exist. These exceptions act as guardrails, catching data or logic errors immediately.
当开发人员在应用层构建连接时(例如在 PL/SQL 或应用程序代码中根据外键获取父行),他们自然会获得安全检查:TOO_MANY_ROWS 会告诉他们本应唯一的查找返回了多行,而 NO_DATA_FOUND 则会告诉他们预期的父行不存在。这些异常充当了护栏,能够立即捕获数据或逻辑错误。
But when the same logic moves into a SQL JOIN, those guardrails disappear. A join that silently matches multiple rows simply multiplies the result set — no error, no warning, just quietly wrong numbers. A join that finds no match either drops the row (inner join) or pads it with NULLs (outer join), but never raises an alarm about violated assumptions.
但当同样的逻辑转移到 SQL JOIN 中时,这些护栏就消失了。一个静默匹配多行的连接只会导致结果集倍增——没有错误,没有警告,只有悄无声息的错误数据。一个找不到匹配项的连接要么丢弃该行(内连接),要么用 NULL 填充(外连接),但永远不会对违反假设的行为发出警报。
JOIN TO ONE bridges this gap. It brings the equivalent of TOO_MANY_ROWS protection into SQL joins: if a join that you declared as “to one” ever reaches a second row, Oracle raises a runtime error instead of silently corrupting your results. The default outer-join behavior handles the “zero matches” case gracefully (like a NO_DATA_FOUND that returns NULL columns instead of erroring), and you can override it to INNER JOIN TO ONE when the absence of a match should eliminate the row.
JOIN TO ONE 弥补了这一差距。它将等同于 TOO_MANY_ROWS 的保护机制引入了 SQL 连接:如果你声明为“一对一”的连接匹配到了第二行,Oracle 会抛出运行时错误,而不是静默地破坏你的结果。默认的外连接行为可以优雅地处理“零匹配”的情况(类似于返回 NULL 列而不是报错的 NO_DATA_FOUND),当缺少匹配项应排除该行时,你可以将其覆盖为 INNER JOIN TO ONE。
A note on naming: JOIN TO ONE is semantically JOIN TO ZERO OR ONE AND ONLY ONE (for the default outer case) or JOIN TO ONE AND ONLY ONE (for the inner case). SQL has never been shy about verbosity, so a more precise name might have been warranted.
关于命名的说明:JOIN TO ONE 在语义上是“连接到零个或一个且仅一个”(针对默认的外连接情况)或“连接到一个且仅一个”(针对内连接情况)。SQL 从来不避讳冗长,因此一个更精确的名称或许会更好。
Schema & Sample Data
架构与示例数据
Following Codd’s example, and the previous blog post, we have suppliers, parts, projects, and two independent many-to-many relationships — now with quantities to make the consequences of the trap concrete:
遵循 Codd 的示例以及之前的博客文章,我们拥有供应商、零件、项目以及两个独立的多对多关系——现在加上了数量,以便使陷阱的后果具体化:
CREATE TABLE suppliers ( supplier_id VARCHAR2(10) PRIMARY KEY );
CREATE TABLE parts ( part_id VARCHAR2(10) PRIMARY KEY );
CREATE TABLE projects ( project_id VARCHAR2(10) PRIMARY KEY );
-- Supplier supplies parts
CREATE TABLE supplier_part (
supplier_id VARCHAR2(10) REFERENCES suppliers,
part_id VARCHAR2(10) REFERENCES parts,
qty_available INT NOT NULL,
PRIMARY KEY (supplier_id, part_id)
);
-- Project uses parts
CREATE TABLE project_part (
project_id VARCHAR2(10) REFERENCES projects,
part_id VARCHAR2(10) REFERENCES parts,
qty_needed INT NOT NULL,
PRIMARY KEY (project_id, part_id)
);
(Sample data insertion omitted for brevity)
(为简洁起见,省略了示例数据插入部分)
The Connection Trap in Action
连接陷阱实战
A developer wants to know which suppliers are connected to which projects and executes the following query:
开发人员想要了解哪些供应商与哪些项目相关联,于是执行了以下查询:
SELECT sp.supplier_id, pp.project_id, sp.part_id, sp.qty_available, pp.qty_needed
FROM supplier_part sp
JOIN project_part pp ON sp.part_id = pp.part_id
ORDER BY sp.supplier_id, sp.part_id;
6 rows selected. 6 rows from only 4 supplier-part rows and 4 project-part rows. The query asserts, for example, “S2 supplies P2 to Alpha” — but our data only says S2 can supply P2 and Alpha needs P2. The join inferred relationships through the shared attribute part_id that were never recorded as facts. As Codd warned in his 1970 paper, this is exactly the connection trap: deriving relationships that were never asserted.
查询选择了 6 行。仅从 4 行供应商-零件数据和 4 行项目-零件数据中就产生了 6 行结果。例如,该查询断言“S2 向 Alpha 供应 P2”——但我们的数据仅表明 S2 可以供应 P2,且 Alpha 需要 P2。该连接通过共享属性 part_id 推断出了从未被记录为事实的关系。正如 Codd 在其 1970 年的论文中所警告的那样,这正是连接陷阱:推导出了从未被断言过的关系。
The Damage with Aggregates
聚合带来的损害
Now the developer summarizes: 现在开发人员进行汇总:
SELECT sp.supplier_id, SUM(sp.qty_available) AS total_available, SUM(pp.qty_needed) AS total_needed
FROM supplier_part sp
JOIN project_part pp ON sp.part_id = pp.part_id
GROUP BY sp.supplier_id
ORDER BY sp.supplier_id;
(Comparison with actual totals shows the inflated, incorrect numbers caused by the join.)
(与实际总计的对比显示了由连接导致的虚高且错误的数字。)