An extension implementing sharding for PostgreSQL based on logical replication and postgres_fdw. The goal is to scale read queries overcoming main limitation of traditional setups based on streaming replication and hot standbys: lack of sharding and large storage requirements.
See Architecture for more information on inner workings.
Setting up and maintaining a highly available cluster of sharded storage servers is inherently tricky, especially during changes to cluster topology. Adding a new replica often requires rebalancing (ie. reorganizing data placement among replicas).
pgwrh minimizes the need to copy data by utilizing Weighted Randezvous Hashing algorithm to distribute shards among replicas. Adding replicas never requires moving data between existing ones.
pgwrh maintains requested level of redundancy of shard data.
Administrator can specify:
- the percentage of replicas to host each shard
- the minimum number of copies of any shard (regardless of the percentage setting above)
So it is possible to implement policies like: "Shards X, Y, Z should be distributed among 20% of replicas in the cluster, but in no fewer than 2 copies".
Replicas can be assigned to availability zones and pgwrh ensures shard copies are distributed evenly across all of them.
Changing cluster topology very often requires lengthy process of data copying and indexing. Exposing replicas that do not have necessary indexes created imposes a risk of downtimes due to long queries causing exhaustion of connection pools.
pgwrh makes sure the cluster can operate without disruptions and that not-yet-ready replicas are isolated from query traffic.
pgwrh does not dictate how data is split into shards. It is possible to implement any sharding policy by utilizing PostgreSQL partitioning. pgwrh will distribute leaves of partition hierarchy among replicas. It is also possible to specify different levels of redundancy for different subtrees of partitioning hierarchy.
Thanks to this it is possible to have more replicas maintain hot data and have cold data storage requirements minimized.
This makes it easy to use pgwrh in cloud environments that limit possibilities of custom extension installation.
Caveat at the moment pgwrh requires pg_background to operate as it needs a way to execute SQL commands outside current transaction (CREATE/ALTER SUBSCRIPTION must not be executed in transaction).
Contrary to other PostgreSQL sharding solutions that implement a query parser and interpreter to direct queries to the right replicas, pgwrh reuses built-in PostgreSQL features: partitioning and postgres_fdw.
PostgreSQL query planner and executor - while still somewhat limited - have capabilities to distribute computing among multiple machines by:
- pushing down filtering and aggregates (see https://www.postgresql.org/docs/current/runtime-config-query.html#GUC-ENABLE-PARTITIONWISE-AGGREGATE)
- skip execution of unnecessary query plan nodes (see https://www.postgresql.org/docs/current/runtime-config-query.html#GUC-ENABLE-PARTITION-PRUNING)
Name | Version |
---|---|
PostgreSQL | 16+ |
pg_background | 1.2+ |
Clone the Git repository.
git clone https://github.com/mkleczek/pgwrh.git
Install the extension.
cd pgwrh
make install
Create extension in PostgreSQL database.
psql -c "CREATE EXTENSION pgwrh CASCADE"
The below example would create a two-level partition hierarchy for test.my_table
:
- First level by dates in
col3
(split by year) - Second level by hash on
col2
CREATE SCHEMA IF NOT EXISTS test;
CREATE TABLE test.my_data (col1 text, col2 text, col3 date) PARTITION BY RANGE (col3);
CREATE TABLE test.my_data_2023 PARTITION OF parent FOR VALUES FROM (make_date(2023, 1, 1)) TO (make_date(2024, 1, 1));
CREATE TABLE test.my_data_2024 PARTITION OF parent FOR VALUES FROM (make_date(2024, 1, 1)) TO (make_date(2025, 1, 1));
CREATE TABLE test.my_data_2025 PARTITION OF parent FOR VALUES FROM (make_date(2025, 1, 1)) TO (make_date(2026, 1, 1));
CREATE SCHEMA IF NOT EXISTS test_shards;
DO$$
DECLARE
r record;
BEGIN
FOR r IN
SELECT
format('CREATE TABLE test_shards.my_data_%1$s_%2$s PARTITION OF test.my_data_%1$s (PRIMARY KEY (col1)) FOR VALUES WITH (MODULUS 16, REMAINDER %2$s)', year, rem) stmt
FROM generate_series(2023, 2025) year, generate_series(0, 15) rem
LOOP
EXECUTE r.stmt;
END LOOP;
END$$;
That gives 48 (16 * 3) shards in total.
Note that there are no specific requirements for the partitioning hierarchy and any partitioned table can be sharded - the above is only for illustration purposes.
Example:
SELECT pgwrh.create_replica_cluster('c01');
(Optional) Create a role for you cluster replicas and grant rights to SELECT from shards.
CREATE ROLE c01_replica;
GRANT SELECT ON ALL TABLES IN SCHEMA test_shards TO c01_replica;
Create account for each replica.
CREATE USER c01r01 PASSWORD 'c01r01Password' REPLICATION IN ROLE c01_replica;
Make sure pgwrh
extension is installed.
Call configure_controller
function providing username and password of this replica account created on master.
SELECT configure_controller(
host => 'master.myorg',
port => '5432',
username => 'cr01r01', -- same as above
password => 'c01r01Password' -- same as above
);
Example below would configure distribution of every partition of test.my_data
to half (50%) of replicas,
except partitions of test.my_data_2024
which will be copied to all (100%) replicas.
WITH st(schema_name, table_name, replication_factory) AS (
VALUES
('test', 'my_data', 50),
('test', 'my_data_2024', 100)
)
INSERT INTO pgwrh.sharded_table (replication_group_id, sharded_table_schema, sharded_table_name, replication_factor)
SELECT
'c01', schema_name, table_name, replication_factor
FROM
st;
Add replica to configuration:
SELECT pgwrh.add_replica('c01', 'c01r01', 'replica01.cluster01.myorg', 5432);
SELECT pgwrh.start_rollout('c01');
New configuration is now visible to connected replicas which will start data replication.
Once all replicas confirmed configuration changes, execute:
SELECT pgwrh.commit_rollout('c01');
(this will fail if some replicas are not reconfigured yet)
CREATE USER c01r02 PASSWORD 'c01r02Password' REPLICATION IN ROLE c01_replica;
CREATE USER c01r03 PASSWORD 'c01r03Password' REPLICATION IN ROLE c01_replica;
CREATE USER c01r04 PASSWORD 'c01r04Password' REPLICATION IN ROLE c01_replica;
select pgwrh.add_replica(
_replication_group_id := 'c01',
_host_id := 'c01r02',
_host_name := 'replica02.cluster01.myorg',
_port := 5432);
select pgwrh.add_replica(
_replication_group_id := 'c01',
_host_id := 'c01r03',
_host_name := 'replica03.cluster01.myorg',
_port := 5432,
_weight := 70);
select pgwrh.add_replica(
_replication_group_id := 'c01',
_host_id := 'c01r04',
_host_name := 'replica04.cluster01.myorg',
_port := 5432);
It is possible to adjust the number of shards assigned to replicas by setting replica weight:
SELECT pgwrh.set_replica_weight('c01', 'c01r04', 200);
To deploy new configuration:
SELECT pgwrh.start_rollout('c01');
And then:
SELECT pgwrh.commit_rollout('c01');