|
6 | 6 | * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
|
7 | 7 | * Portions Copyright (c) 1994, Regents of the University of California
|
8 | 8 | *
|
9 |
| - * |
10 | 9 | * IDENTIFICATION
|
11 | 10 | * src/backend/executor/nodeHashjoin.c
|
12 | 11 | *
|
| 12 | + * NOTES: |
| 13 | + * |
| 14 | + * PARALLELISM |
| 15 | + * |
| 16 | + * Hash joins can participate in parallel queries in two ways: in |
| 17 | + * non-parallel-aware mode, where each backend builds an identical hash table |
| 18 | + * and then probes it with a partial outer relation, or parallel-aware mode |
| 19 | + * where there is a shared hash table that all participants help to build. A |
| 20 | + * parallel-aware hash join can divide the work of building the hash table up |
| 21 | + * over all workers instead of having each worker build its own copy of the |
| 22 | + * whole hash table, but has extra communication overheads. |
| 23 | + * |
| 24 | + * In both cases, hash joins use a private state machine to track progress |
| 25 | + * through the hash join algorithm. |
| 26 | + * |
| 27 | + * In a parallel-aware hash join, there is also a shared 'phase' which |
| 28 | + * co-operating backends use to synchronize their local state machine and |
| 29 | + * program counter with the multi-process join. The phase is managed by a |
| 30 | + * 'barrier' IPC primitive. |
| 31 | + * |
| 32 | + * When a participant begins working on a parallel hash join, it must first |
| 33 | + * figure out how much progress has already been made, because participants |
| 34 | + * don't wait for each other to begin. For this reason there are switch |
| 35 | + * statements at key points in the code where we have to synchronize our local |
| 36 | + * state machine with the phase, and then jump to the correct part of the |
| 37 | + * algorithm so that we can get started. |
| 38 | + * |
| 39 | + * While running the algorithm, there are key points in the code where we must |
| 40 | + * wait for all participants to reach the same point before we can continue, |
| 41 | + * in the form of BarrierWait calls. We cannot beginning building the hash |
| 42 | + * table until it has been created, and we cannot begin probing it until it is |
| 43 | + * entirely built. |
| 44 | + * |
| 45 | + * The phases are as follows: |
| 46 | + * |
| 47 | + * PHJ_PHASE_BEGINNING -- initial phase, before any participant acts |
| 48 | + * PHJ_PHASE_CREATING -- one participant creates the shmem hash table |
| 49 | + * PHJ_PHASE_BUILDING -- all participants build the hash table |
| 50 | + * PHJ_PHASE_RESIZING -- one participant decides whether to expand buckets |
| 51 | + * PHJ_PHASE_REINSERTING -- all participants reinsert tuples if necessary |
| 52 | + * PHJ_PHASE_PROBING -- all participants probe the hash table |
| 53 | + * PHJ_PHASE_UNMATCHED -- all participants scan for unmatched tuples |
| 54 | + * |
13 | 55 | *-------------------------------------------------------------------------
|
14 | 56 | */
|
15 | 57 |
|
|
0 commit comments