Skip to content

Fix CTE handling in foreign key join parsing #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: jj/fk-joins-6
Choose a base branch
from

Conversation

joelonsql
Copy link
Owner

Summary

  • ensure drill_down_to_base_rel can look up CTEs defined inside subqueries
  • thread the Query pointer through recursive calls when descending through joins

Testing

  • git status --short

This commit introduces a new SQL syntax, Foreign Key Joins, allowing
joins to be specified based on existing foreign key constraints. This
provides a more declarative and self-documenting way to express joins
that follow defined relationships, enhancing query clarity and resilience
to schema changes like column or table renames.

The new syntax is:
  from_item join_type from_item KEY ( column_name [, ...] ) { <- | -> } from_reference ( column_name [, ...] )

The directionality (`->` or `<-`) indicates which table is referencing
and which is referenced, aligning with the foreign key definition.

Key aspects of the implementation and behavior include:

1.  **Constraint Validation:** The parser and analyzer verify that a
    matching foreign key constraint exists between the underlying base
    tables corresponding to the joined relations. The columns specified
    in the KEY clause must match (order-insensitively) the columns defined
    in an existing foreign key constraint in the specified direction.

2.  **Derived Table Validation:** The feature extends validation beyond base
    tables to derived relations (views, subqueries, CTEs). To ensure
    semantic correctness equivalent to the underlying foreign key, two
    critical properties are checked and must hold through all layers of
    derived relations:
    *   Uniqueness Preservation: The referenced columns must remain unique
      in the derived relation. Operations like joins that might duplicate
      referenced rows invalidate the FK join.
    *   Set Containment (Row Preservation): The referenced derived relation
      must contain all rows necessary to satisfy the foreign key relationship.
      Filtering operations (WHERE, LIMIT, HAVING, RLS policies) on the
      referenced side that could remove potentially referenced rows are
      disallowed.

3.  **Implementation Details:**
    *   Introduces `parse_fkjoin.c` for transformation and validation logic.
    *   Adds `ForeignKeyClause` (parse node) and `ForeignKeyJoinNode`
      (analysis node).
    *   Adds `RTEId`, `uniqueness_preservation`, and
      `functional_dependencies` fields to `RangeTblEntry` to track necessary
      properties through query analysis, particularly for derived tables.
      `RTEId` provides a globally unique identifier for base relation RTEs.
    *   Updates `gram.y` and `scan.l` for the new syntax and operators.
    *   Modifies dependency tracking (`dependency.c`) to add dependencies on
      the underlying `pg_constraint` OID for views using FK joins.
    *   Implements view revalidation (`view.c`) to ensure that replacing a
      view doesn't break dependent views that use FK joins based on the old
      definition. Violations during revalidation now raise a specific error.
    *   Updates ruleutils (`ruleutils.c`) to deparse the KEY syntax correctly
      for view definitions, EXPLAIN, etc.
    *   Extends ECPG and PL/pgSQL parsers to recognize the new syntax.

4.  **Documentation and Testing:**
    *   Adds documentation for the new syntax to `select.sgml`.
    *   Includes extensive regression tests (`foreign_key_join.sql`) covering
      various scenarios, including syntax, basic usage, composite keys,
      derived tables, error conditions, view revalidation, and partitioned
      tables.

Example Usage:
  -- Assuming orders.customer_id REFERENCES customers.id
  SELECT o.*, c.name
  FROM orders o JOIN customers c KEY (id) <- o (customer_id);

  -- Assuming payments.order_id REFERENCES orders.id
  SELECT o.*, p.amount
  FROM orders o JOIN payments p KEY (order_id) -> o (id);

This feature aims to make SQL queries involving relational integrity
constraints more robust and easier to understand. If the validation rules
are not met during query planning, an error is raised, preventing
potentially incorrect results that could arise if the derived tables no
longer accurately reflect the underlying foreign key relationship.
@joelonsql joelonsql force-pushed the jj/fk-joins-6 branch 2 times, most recently from fcfd0ce to 002872c Compare June 17, 2025 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant