Migrate PostgreSQL Database Using Logical Replication and Close to Zero Downtime
After only about a month I needed to migrated the database again. I needed to add pgvector support and that meant moving to another database based on a Docker image that included the
I previously used the common approach by manually copying data using
pg_restore. The downside to this is that it often results in significant downtime. Last time I migrated was during Christmas where traffic and usage were exceptionally low and therefore 20-30 min of downtime was manageable. I don't have this window of opportunity now in January where usage rates have even increased.
Average usage varies between 20-40 active users per half hour and there are multiple external web hooks running at anytime. Not high usage numbers by most comparisons but having read the blog post Zero downtime Postgres upgrades I thought there might be a better alternative to the manual approach, namely using PostgreSQL's logical replication.
In a nutshell the migration process is setting up a publication-subscription circuit between the old and the new database. The old database will then copy all existing data to the new database and synchronize any changes happening afterwards. When this is done, the production environment can be switch over to using to the new database.
Naturally it goes without saying that before starting this you should always backup your data.
To be able to use logical replication we first need to enable it on the old database:
ALTER SYSTEM SET wal_level = logical;
It is subsequently necessary to restart the database as Lauenz Alrbe points out. There is no way around it.
This is where the only downtime related to the database migration occurs. Once the old database has been restarted after a hopefully brief hiatus there shouldn't be anymore downtime. If
wal_level was already set to
logical, the downtime could be reduced to zero.
Set the old database to publish all its tables to connected subscribers:
CREATE PUBLICATION db_migration_publication FOR ALL TABLES;
The old database now continuously publishes changes for all its tables as well as copies existing data. This is the default configuration but CREATE PUBLICATION can be customized depending on the use case. If the database is large, it might be recommendable to manually add tables to be synchronized one by one.
Before setting up a subscription on the new database we must ensure that the database schemas are identical. Simply run
pg_dump --schema-only on the old database and import it into the new database.
Once the database schema is added to the new database, create a subscription:
CREATE SUBSCRIPTION db_migration_subscription
CONNECTION 'host=your_publisher_host port=your_publisher_port dbname=your_publisher_db user=your_replication_user password=your_replication_password'
PostgreSQL now copies all existing data from the old database to the new database. The process can be verified in the logs:
2024-01-22 09:59:37.649 UTC  LOG: logical replication table synchronization worker for subscription "db_migration_subscription", table "User" has started
2024-01-22 09:59:38.365 UTC  LOG: logical replication table synchronization worker for subscription "db_migration_subscription", table "User" has finished
Any data updates on the old database will now be replicated on the subscribed new database. When the logs indicate all data has been transferred you can verify it just to be sure. Retrieve and compare all database tables:
table_schema = 'public'
Number of rows can be checked by comparing the count on each;
SELECT count(*) FROM table_name;.
After checking that the data was migrated correctly, I simply updated the environment variable
DATABASE_URL to the new database and redeployed. Subsequently, the subscription can be stopped and removed:
ALTER SUBSCRIPTION db_migration_subscription DISABLE;
DROP SUBSCRIPTION db_migration_subscription;
The subscription is now removed. The database has been migrated with minimal downtime and frankly effort. It was surprisingly straight-forward to setup logical replication, migrate the data and switch between databases.