Skip to content

Commit

Permalink
Doc: Add the new section "Logical Replication Failover".
Browse files Browse the repository at this point in the history
This aids the users to ensure that the failover marked slots are synced
to the standby and subscribers can continue replication even when the
publisher node goes down.

Author: Hou Zhijie, Shveta Malik, Amit Kapila
Reviewed-by: Peter Smith, Bertrand Drouvot
Discussion: https://postgr.es/m/OS0PR01MB57164D6F53FB4F6AD29AD9C594FB2@OS0PR01MB5716.jpnprd01.prod.outlook.com
  • Loading branch information
Amit Kapila committed Jun 7, 2024
1 parent 4b87917 commit b560a98
Show file tree
Hide file tree
Showing 2 changed files with 103 additions and 0 deletions.
9 changes: 9 additions & 0 deletions doc/src/sgml/high-availability.sgml
Original file line number Diff line number Diff line change
Expand Up @@ -1487,6 +1487,15 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
Written administration procedures are advised.
</para>

<para>
If you have opted for logical replication slot synchronization (see
<xref linkend="logicaldecoding-replication-slots-synchronization"/>),
then before switching to the standby server, it is recommended to check
if the logical slots synchronized on the standby server are ready
for failover. This can be done by following the steps described in
<xref linkend="logical-replication-failover"/>.
</para>

<para>
To trigger failover of a log-shipping standby server, run
<command>pg_ctl promote</command> or call <function>pg_promote()</function>.
Expand Down
94 changes: 94 additions & 0 deletions doc/src/sgml/logical-replication.sgml
Original file line number Diff line number Diff line change
Expand Up @@ -687,6 +687,100 @@ ALTER SUBSCRIPTION

</sect1>

<sect1 id="logical-replication-failover">
<title>Logical Replication Failover</title>

<para>
To allow subscriber nodes to continue replicating data from the publisher
node even when the publisher node goes down, there must be a physical standby
corresponding to the publisher node. The logical slots on the primary server
corresponding to the subscriptions can be synchronized to the standby server by
specifying <literal>failover = true</literal> when creating subscriptions. See
<xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
Enabling the
<link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
parameter ensures a seamless transition of those subscriptions after the
standby is promoted. They can continue subscribing to publications on the
new primary server without losing data. Note that in the case of
asynchronous replication, there remains a risk of data loss for transactions
committed on the former primary server but have yet to be replicated to the new
primary server.
</para>

<para>
Because the slot synchronization logic copies asynchronously, it is
necessary to confirm that replication slots have been synced to the standby
server before the failover happens. To ensure a successful failover, the
standby server must be ahead of the subscriber. This can be achieved by
configuring
<link linkend="guc-standby-slot-names"><varname>standby_slot_names</varname></link>.
</para>

<para>
To confirm that the standby server is indeed ready for failover, follow these
steps to verify that all necessary logical replication slots have been
synchronized to the standby server:
</para>

<procedure>
<step performance="required">
<para>
On the subscriber node, use the following SQL to identify which slots
should be synced to the standby that we plan to promote. This query will
return the relevant replication slots, including the main slots and table
synchronization slots associated with the failover-enabled subscriptions.
Note that the table sync slot should be synced to the standby server only
if the table copy is finished (See <xref linkend="catalog-pg-subscription-rel"/>).
We don't need to ensure that the table sync slots are synced in other scenarios
as they will either be dropped or re-created on the new primary server in those
cases.
<programlisting>
test_sub=# SELECT
array_agg(slot_name) AS slots
FROM
((
SELECT r.srsubid AS subid, CONCAT('pg_', srsubid, '_sync_', srrelid, '_', ctl.system_identifier) AS slot_name
FROM pg_control_system() ctl, pg_subscription_rel r, pg_subscription s
WHERE r.srsubstate = 'f' AND s.oid = r.srsubid AND s.subfailover
) UNION (
SELECT s.oid AS subid, s.subslotname as slot_name
FROM pg_subscription s
WHERE s.subfailover
))
WHERE slot_name IS NOT NULL;
slots
-------
{sub1,sub2,sub3}
(1 row)
</programlisting></para>
</step>
<step performance="required">
<para>
Check that the logical replication slots identified above exist on
the standby server and are ready for failover.
<programlisting>
test_standby=# SELECT slot_name, (synced AND NOT temporary AND NOT conflicting) AS failover_ready
FROM pg_replication_slots
WHERE slot_name IN ('sub1','sub2','sub3');
slot_name | failover_ready
-------------+----------------
sub1 | t
sub2 | t
sub3 | t
(3 rows)
</programlisting></para>
</step>
</procedure>

<para>
If all the slots are present on the standby server and the result
(<literal>failover_ready</literal>) of the above SQL query is true, then
existing subscriptions can continue subscribing to publications now on the
new primary server without losing data.
</para>

</sect1>

<sect1 id="logical-replication-row-filter">
<title>Row Filters</title>

Expand Down

0 comments on commit b560a98

Please sign in to comment.