Skip to content

Commit 9afd513

Browse files
committed
Fix planner failures with overlapping mergejoin clauses in an outer join.
Given overlapping or partially redundant join clauses, for example t1 JOIN t2 ON t1.a = t2.x AND t1.b = t2.x the planner's EquivalenceClass machinery will ordinarily refactor the clauses as "t1.a = t1.b AND t1.a = t2.x", so that join processing doesn't see multiple references to the same EquivalenceClass in a list of join equality clauses. However, if the join is outer, it's incorrect to derive a restriction clause on the outer side from the join conditions, so the clause refactoring does not happen and we end up with overlapping join conditions. The code that attempted to deal with such cases had several subtle bugs, which could result in "left and right pathkeys do not match in mergejoin" or "outer pathkeys do not match mergeclauses" planner errors, if the selected join plan type was a mergejoin. (It does not appear that any actually incorrect plan could have been emitted.) The core of the problem really was failure to recognize that the outer and inner relations' pathkeys have different relationships to the mergeclause list. A join's mergeclause list is constructed by reference to the outer pathkeys, so it will always be ordered the same as the outer pathkeys, but this cannot be presumed true for the inner pathkeys. If the inner sides of the mergeclauses contain multiple references to the same EquivalenceClass ({t2.x} in the above example) then a simplistic rendering of the required inner sort order is like "ORDER BY t2.x, t2.x", but the pathkey machinery recognizes that the second sort column is redundant and throws it away. The mergejoin planning code failed to account for that behavior properly. One error was to try to generate cut-down versions of the mergeclause list from cut-down versions of the inner pathkeys in the same way as the initial construction of the mergeclause list from the outer pathkeys was done; this could lead to choosing a mergeclause list that fails to match the outer pathkeys. The other problem was that the pathkey cross-checking code in create_mergejoin_plan treated the inner and outer pathkey lists identically, whereas actually the expectations for them must be different. That led to false "pathkeys do not match" failures in some cases, and in principle could have led to failure to detect bogus plans in other cases, though there is no indication that such bogus plans could be generated. Reported by Alexander Kuzmenkov, who also reviewed this patch. This has been broken for years (back to around 8.3 according to my testing), so back-patch to all supported branches. Discussion: https://postgr.es/m/[email protected]
1 parent f724022 commit 9afd513

File tree

6 files changed

+322
-124
lines changed

6 files changed

+322
-124
lines changed

src/backend/optimizer/path/joinpath.c

+14-16
Original file line numberDiff line numberDiff line change
@@ -1009,10 +1009,10 @@ sort_inner_and_outer(PlannerInfo *root,
10091009
outerkeys = all_pathkeys; /* no work at first one... */
10101010

10111011
/* Sort the mergeclauses into the corresponding ordering */
1012-
cur_mergeclauses = find_mergeclauses_for_pathkeys(root,
1013-
outerkeys,
1014-
true,
1015-
extra->mergeclause_list);
1012+
cur_mergeclauses =
1013+
find_mergeclauses_for_outer_pathkeys(root,
1014+
outerkeys,
1015+
extra->mergeclause_list);
10161016

10171017
/* Should have used them all... */
10181018
Assert(list_length(cur_mergeclauses) == list_length(extra->mergeclause_list));
@@ -1102,10 +1102,10 @@ generate_mergejoin_paths(PlannerInfo *root,
11021102
jointype = JOIN_INNER;
11031103

11041104
/* Look for useful mergeclauses (if any) */
1105-
mergeclauses = find_mergeclauses_for_pathkeys(root,
1106-
outerpath->pathkeys,
1107-
true,
1108-
extra->mergeclause_list);
1105+
mergeclauses =
1106+
find_mergeclauses_for_outer_pathkeys(root,
1107+
outerpath->pathkeys,
1108+
extra->mergeclause_list);
11091109

11101110
/*
11111111
* Done with this outer path if no chance for a mergejoin.
@@ -1228,10 +1228,9 @@ generate_mergejoin_paths(PlannerInfo *root,
12281228
if (sortkeycnt < num_sortkeys)
12291229
{
12301230
newclauses =
1231-
find_mergeclauses_for_pathkeys(root,
1232-
trialsortkeys,
1233-
false,
1234-
mergeclauses);
1231+
trim_mergeclauses_for_inner_pathkeys(root,
1232+
mergeclauses,
1233+
trialsortkeys);
12351234
Assert(newclauses != NIL);
12361235
}
12371236
else
@@ -1272,10 +1271,9 @@ generate_mergejoin_paths(PlannerInfo *root,
12721271
if (sortkeycnt < num_sortkeys)
12731272
{
12741273
newclauses =
1275-
find_mergeclauses_for_pathkeys(root,
1276-
trialsortkeys,
1277-
false,
1278-
mergeclauses);
1274+
trim_mergeclauses_for_inner_pathkeys(root,
1275+
mergeclauses,
1276+
trialsortkeys);
12791277
Assert(newclauses != NIL);
12801278
}
12811279
else

src/backend/optimizer/path/pathkeys.c

+121-29
Original file line numberDiff line numberDiff line change
@@ -981,29 +981,27 @@ update_mergeclause_eclasses(PlannerInfo *root, RestrictInfo *restrictinfo)
981981
}
982982

983983
/*
984-
* find_mergeclauses_for_pathkeys
985-
* This routine attempts to find a set of mergeclauses that can be
986-
* used with a specified ordering for one of the input relations.
984+
* find_mergeclauses_for_outer_pathkeys
985+
* This routine attempts to find a list of mergeclauses that can be
986+
* used with a specified ordering for the join's outer relation.
987987
* If successful, it returns a list of mergeclauses.
988988
*
989-
* 'pathkeys' is a pathkeys list showing the ordering of an input path.
990-
* 'outer_keys' is true if these keys are for the outer input path,
991-
* false if for inner.
989+
* 'pathkeys' is a pathkeys list showing the ordering of an outer-rel path.
992990
* 'restrictinfos' is a list of mergejoinable restriction clauses for the
993-
* join relation being formed.
991+
* join relation being formed, in no particular order.
994992
*
995993
* The restrictinfos must be marked (via outer_is_left) to show which side
996994
* of each clause is associated with the current outer path. (See
997995
* select_mergejoin_clauses())
998996
*
999997
* The result is NIL if no merge can be done, else a maximal list of
1000998
* usable mergeclauses (represented as a list of their restrictinfo nodes).
999+
* The list is ordered to match the pathkeys, as required for execution.
10011000
*/
10021001
List *
1003-
find_mergeclauses_for_pathkeys(PlannerInfo *root,
1004-
List *pathkeys,
1005-
bool outer_keys,
1006-
List *restrictinfos)
1002+
find_mergeclauses_for_outer_pathkeys(PlannerInfo *root,
1003+
List *pathkeys,
1004+
List *restrictinfos)
10071005
{
10081006
List *mergeclauses = NIL;
10091007
ListCell *i;
@@ -1044,32 +1042,29 @@ find_mergeclauses_for_pathkeys(PlannerInfo *root,
10441042
*
10451043
* It's possible that multiple matching clauses might have different
10461044
* ECs on the other side, in which case the order we put them into our
1047-
* result makes a difference in the pathkeys required for the other
1048-
* input path. However this routine hasn't got any info about which
1045+
* result makes a difference in the pathkeys required for the inner
1046+
* input rel. However this routine hasn't got any info about which
10491047
* order would be best, so we don't worry about that.
10501048
*
10511049
* It's also possible that the selected mergejoin clauses produce
1052-
* a noncanonical ordering of pathkeys for the other side, ie, we
1050+
* a noncanonical ordering of pathkeys for the inner side, ie, we
10531051
* might select clauses that reference b.v1, b.v2, b.v1 in that
10541052
* order. This is not harmful in itself, though it suggests that
1055-
* the clauses are partially redundant. Since it happens only with
1056-
* redundant query conditions, we don't bother to eliminate it.
1057-
* make_inner_pathkeys_for_merge() has to delete duplicates when
1058-
* it constructs the canonical pathkeys list, and we also have to
1059-
* deal with the case in create_mergejoin_plan().
1053+
* the clauses are partially redundant. Since the alternative is
1054+
* to omit mergejoin clauses and thereby possibly fail to generate a
1055+
* plan altogether, we live with it. make_inner_pathkeys_for_merge()
1056+
* has to delete duplicates when it constructs the inner pathkeys
1057+
* list, and we also have to deal with such cases specially in
1058+
* create_mergejoin_plan().
10601059
*----------
10611060
*/
10621061
foreach(j, restrictinfos)
10631062
{
10641063
RestrictInfo *rinfo = (RestrictInfo *) lfirst(j);
10651064
EquivalenceClass *clause_ec;
10661065

1067-
if (outer_keys)
1068-
clause_ec = rinfo->outer_is_left ?
1069-
rinfo->left_ec : rinfo->right_ec;
1070-
else
1071-
clause_ec = rinfo->outer_is_left ?
1072-
rinfo->right_ec : rinfo->left_ec;
1066+
clause_ec = rinfo->outer_is_left ?
1067+
rinfo->left_ec : rinfo->right_ec;
10731068
if (clause_ec == pathkey_ec)
10741069
matched_restrictinfos = lappend(matched_restrictinfos, rinfo);
10751070
}
@@ -1273,8 +1268,8 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
12731268
* must be applied to an inner path to make it usable with the
12741269
* given mergeclauses.
12751270
*
1276-
* 'mergeclauses' is a list of RestrictInfos for mergejoin clauses
1277-
* that will be used in a merge join.
1271+
* 'mergeclauses' is a list of RestrictInfos for the mergejoin clauses
1272+
* that will be used in a merge join, in order.
12781273
* 'outer_pathkeys' are the already-known canonical pathkeys for the outer
12791274
* side of the join.
12801275
*
@@ -1351,8 +1346,13 @@ make_inner_pathkeys_for_merge(PlannerInfo *root,
13511346
opathkey->pk_nulls_first);
13521347

13531348
/*
1354-
* Don't generate redundant pathkeys (can happen if multiple
1355-
* mergeclauses refer to same EC).
1349+
* Don't generate redundant pathkeys (which can happen if multiple
1350+
* mergeclauses refer to the same EC). Because we do this, the output
1351+
* pathkey list isn't necessarily ordered like the mergeclauses, which
1352+
* complicates life for create_mergejoin_plan(). But if we didn't,
1353+
* we'd have a noncanonical sort key list, which would be bad; for one
1354+
* reason, it certainly wouldn't match any available sort order for
1355+
* the input relation.
13561356
*/
13571357
if (!pathkey_is_redundant(pathkey, pathkeys))
13581358
pathkeys = lappend(pathkeys, pathkey);
@@ -1361,6 +1361,98 @@ make_inner_pathkeys_for_merge(PlannerInfo *root,
13611361
return pathkeys;
13621362
}
13631363

1364+
/*
1365+
* trim_mergeclauses_for_inner_pathkeys
1366+
* This routine trims a list of mergeclauses to include just those that
1367+
* work with a specified ordering for the join's inner relation.
1368+
*
1369+
* 'mergeclauses' is a list of RestrictInfos for mergejoin clauses for the
1370+
* join relation being formed, in an order known to work for the
1371+
* currently-considered sort ordering of the join's outer rel.
1372+
* 'pathkeys' is a pathkeys list showing the ordering of an inner-rel path;
1373+
* it should be equal to, or a truncation of, the result of
1374+
* make_inner_pathkeys_for_merge for these mergeclauses.
1375+
*
1376+
* What we return will be a prefix of the given mergeclauses list.
1377+
*
1378+
* We need this logic because make_inner_pathkeys_for_merge's result isn't
1379+
* necessarily in the same order as the mergeclauses. That means that if we
1380+
* consider an inner-rel pathkey list that is a truncation of that result,
1381+
* we might need to drop mergeclauses even though they match a surviving inner
1382+
* pathkey. This happens when they are to the right of a mergeclause that
1383+
* matches a removed inner pathkey.
1384+
*
1385+
* The mergeclauses must be marked (via outer_is_left) to show which side
1386+
* of each clause is associated with the current outer path. (See
1387+
* select_mergejoin_clauses())
1388+
*/
1389+
List *
1390+
trim_mergeclauses_for_inner_pathkeys(PlannerInfo *root,
1391+
List *mergeclauses,
1392+
List *pathkeys)
1393+
{
1394+
List *new_mergeclauses = NIL;
1395+
PathKey *pathkey;
1396+
EquivalenceClass *pathkey_ec;
1397+
bool matched_pathkey;
1398+
ListCell *lip;
1399+
ListCell *i;
1400+
1401+
/* No pathkeys => no mergeclauses (though we don't expect this case) */
1402+
if (pathkeys == NIL)
1403+
return NIL;
1404+
/* Initialize to consider first pathkey */
1405+
lip = list_head(pathkeys);
1406+
pathkey = (PathKey *) lfirst(lip);
1407+
pathkey_ec = pathkey->pk_eclass;
1408+
lip = lnext(lip);
1409+
matched_pathkey = false;
1410+
1411+
/* Scan mergeclauses to see how many we can use */
1412+
foreach(i, mergeclauses)
1413+
{
1414+
RestrictInfo *rinfo = (RestrictInfo *) lfirst(i);
1415+
EquivalenceClass *clause_ec;
1416+
1417+
/* Assume we needn't do update_mergeclause_eclasses again here */
1418+
1419+
/* Check clause's inner-rel EC against current pathkey */
1420+
clause_ec = rinfo->outer_is_left ?
1421+
rinfo->right_ec : rinfo->left_ec;
1422+
1423+
/* If we don't have a match, attempt to advance to next pathkey */
1424+
if (clause_ec != pathkey_ec)
1425+
{
1426+
/* If we had no clauses matching this inner pathkey, must stop */
1427+
if (!matched_pathkey)
1428+
break;
1429+
1430+
/* Advance to next inner pathkey, if any */
1431+
if (lip == NULL)
1432+
break;
1433+
pathkey = (PathKey *) lfirst(lip);
1434+
pathkey_ec = pathkey->pk_eclass;
1435+
lip = lnext(lip);
1436+
matched_pathkey = false;
1437+
}
1438+
1439+
/* If mergeclause matches current inner pathkey, we can use it */
1440+
if (clause_ec == pathkey_ec)
1441+
{
1442+
new_mergeclauses = lappend(new_mergeclauses, rinfo);
1443+
matched_pathkey = true;
1444+
}
1445+
else
1446+
{
1447+
/* Else, no hope of adding any more mergeclauses */
1448+
break;
1449+
}
1450+
}
1451+
1452+
return new_mergeclauses;
1453+
}
1454+
1455+
13641456
/****************************************************************************
13651457
* PATHKEY USEFULNESS CHECKS
13661458
*

0 commit comments

Comments
 (0)