Skip to content

Commit b661f21

Browse files
author
Thomas Munro
committed
Merge branch 'hj-shared-single-batch' into hj-shared-buf-file
2 parents a64e80f + baa137b commit b661f21

File tree

1 file changed

+43
-1
lines changed

1 file changed

+43
-1
lines changed

src/backend/executor/nodeHashjoin.c

Lines changed: 43 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,52 @@
66
* Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
77
* Portions Copyright (c) 1994, Regents of the University of California
88
*
9-
*
109
* IDENTIFICATION
1110
* src/backend/executor/nodeHashjoin.c
1211
*
12+
* NOTES:
13+
*
14+
* PARALLELISM
15+
*
16+
* Hash joins can participate in parallel queries in two ways: in
17+
* non-parallel-aware mode, where each backend builds an identical hash table
18+
* and then probes it with a partial outer relation, or parallel-aware mode
19+
* where there is a shared hash table that all participants help to build. A
20+
* parallel-aware hash join can divide the work of building the hash table up
21+
* over all workers instead of having each worker build its own copy of the
22+
* whole hash table, but has extra communication overheads.
23+
*
24+
* In both cases, hash joins use a private state machine to track progress
25+
* through the hash join algorithm.
26+
*
27+
* In a parallel-aware hash join, there is also a shared 'phase' which
28+
* co-operating backends use to synchronize their local state machine and
29+
* program counter with the multi-process join. The phase is managed by a
30+
* 'barrier' IPC primitive.
31+
*
32+
* When a participant begins working on a parallel hash join, it must first
33+
* figure out how much progress has already been made, because participants
34+
* don't wait for each other to begin. For this reason there are switch
35+
* statements at key points in the code where we have to synchronize our local
36+
* state machine with the phase, and then jump to the correct part of the
37+
* algorithm so that we can get started.
38+
*
39+
* While running the algorithm, there are key points in the code where we must
40+
* wait for all participants to reach the same point before we can continue,
41+
* in the form of BarrierWait calls. We cannot beginning building the hash
42+
* table until it has been created, and we cannot begin probing it until it is
43+
* entirely built.
44+
*
45+
* The phases are as follows:
46+
*
47+
* PHJ_PHASE_BEGINNING -- initial phase, before any participant acts
48+
* PHJ_PHASE_CREATING -- one participant creates the shmem hash table
49+
* PHJ_PHASE_BUILDING -- all participants build the hash table
50+
* PHJ_PHASE_RESIZING -- one participant decides whether to expand buckets
51+
* PHJ_PHASE_REINSERTING -- all participants reinsert tuples if necessary
52+
* PHJ_PHASE_PROBING -- all participants probe the hash table
53+
* PHJ_PHASE_UNMATCHED -- all participants scan for unmatched tuples
54+
*
1355
*-------------------------------------------------------------------------
1456
*/
1557

0 commit comments

Comments
 (0)