Migration
CHT couch2pg is deprecated. For data synchronization, refer to CHT Sync.
Assumptions & Prerequisites
This guide assumes:
- Your CHT instance set up.
- Your Postgres server is set up. We’ll be using
10.195.130.93
in this documentation, but be sure to use your production Postgres address. The server needs to havepg_dump
command available. - Your CHT couch2pg (aka couch2pg) instance is set up. In this guide, we’ll assume it’s on the same server as the Postgres instance.
- You want to move both couch2pg instance and the Postgres data to a new server
Further, be sure you meet the following prerequisites:
- Have provisioned a new Postgres server. The server needs to have
pg_restore
command available. - Have access to existing Postgres server to be able to dump the data
- Have access to couch2pg instance, including the CHT Core credentials it’s using
- Have 3x the disk space as your data on both the old and new server - see below
Note: If you don’t mind waiting and don’t want to deal with the trouble of copying large data files around as documented on this page, it will be easier to set up a clean install of couch2pg
Time to copy
Copying and loading large amounts of Postgres data can take a long time. If you make a mistake, going back to the first step can take more time. Don’t plan a production migration without doing at least one dry run! Do more dry runs until your confident that everything works as expected.
Consider keeping your old couch2pg system running alongside your new one. This will allow you to fail over to the old one easily and it will be up to date with data from your CHT instance. It is safe to run multiple couch2pg instances against the same CHT instance.
When you’re confident the new system is up and running, is stable and performant, go ahead and decommission the old system.
Disk space prerequisites
You need 3 times the database size of free disk space on both the old and new Postgres servers. The 3x number comes from 3 sources of data:
- Existing Postgres database
- Dump of this same data via
pg_dump
- making a compressed copy with
gzip
While you may not fully need all of 3x the disk space, having the extra space will be important to ensure you don’t accidentally fill up the disk on a production instance.
For example, on a Postgres server with 400GB
disk with millions of documents in Postgres:
175GB
- Existing Postgres database175GB
- Dump of this same data viapg_dump
- ~
60GB
- making a compressed copy withgzip
The server is healthy woth just over 40% of the disk used day to day. However, if you make a copy of the data (175GB
+ 175GB
= 350GB
), you now have the disk over 85% full with only 50GB free. You very likely will not have room to compress the data (175GB
+ 175GB
+ 60GB
= 410GB
).
The best work around is to increase the size of your Postgres server assuming you’re on a cloud provider that offers this. Another work around can be to run pg_dump
from another computer with more disk space, but note that this will send the uncompressed data over the network which may take a long time. The same is true doing the restore over the network - it will be much slower than if you did it locally.
To show disk use of all databases, run this command, being sure to replace couch2pg
with your user:
psql -U couch2pg -c '\l+'
Instructions
Current Postgres data and couch2pg config
- SSH to your postgres server
- Connect to the
cht
database and run this query:We’ll use this in the next server to validate data was imported. On large datasets this may take a long while to run.select count(*) from couchdb;
- Create a tarball of your database. This assumes you’re using the default
cht
name for your database with a username ofcouch2pg
. Replace with your database and username if they’re different:Note - If you get errors likepg_dump -U couch2pg -d cht -F tar -f couch2pg.tar gzip couch2pg.tar
query failed: ERROR: permission denied
- run this command as thepostgres
user. - Check the size of the gzip - we’ll use this in the next section:
ls -al couch2pg.tar.gz
- Copy the resulting
couch2pg.tar.gz
file to your computer -scp
is good for this! - Check the values for all the environment variables for couch2pg. For example, here’s what our current couch2pg has for it’s config. In our case the
POSTGRESQL_URL
islocalhost
because we’re on the same server as the Postgres server. Be sure to usePOSTGRESQL_URL
andCOUCHDB_URL
that match your deployment:POSTGRESQL_URL=postgres://couch2pg:passwordHere1@localhost:5432/cht COUCHDB_URL=https://medic:passwordHere2@192-168-68-23.local-ip.medicmobile.org:10443/medic COUCH2PG_SLEEP_MINS=360 COUCH2PG_DOC_LIMIT=100 COUCH2PG_CHANGES_LIMIT=5000 COUCH2PG_RETRY_COUNT=5 COUCH2PG_USERS_META_DOC_LIMIT=50
New Postgres server and couch2pg instance
- Copy the
couch2pg.tar.gz
file to the new Postgres server - SSH to the new Postgres server
- Make sure the gzip’s bytes on disk exactly match the one from step 4 in the prior section:
ls -al couch2pg.tar.gz
- Uncompress the file:
gunzip couch2pg.tar.gz
- Ensure your have a
cht
database already created on your new Postgres instance:CREATE DATABASE cht;
- Stop
couch2pg
if it is running - Load the data from the dump file, again be sure to use the current user if
couch2pg
is the same for you:pg_restore -U couch2pg -d cht couch2pg.tar
- Connect to the
cht
database and run this query:This should match the same number as on step 2 above. On large datasets this may take a long while to run.select count(*) from couchdb;
- Start couch2pg, being sure to use the exact same environment variables as step 6 above, but possibly with different
POSTGRES_*
values if they’ve changed for the new Postgres server - Check the logs of couch2pg and ensure there’s no errors.
Dashboards
Check the logs of couch2pg. Be sure there’s no errors and that materialized views are updating as expected. Complex materialized views can take hours to update. Don’t hesitate to let multiple periods of COUCH2PG_SLEEP_MINS
(how frequently couch2pg runs) pass to ensure no errors occur.
Double check data is synchronizing by running a SQL query one of your dashboards uses. Make sure the data is both accurate and up to date. When you are confident the data is valid and working correctly, change any downstream sources, like Superset or Klipfolio, to use the new Postgres server IP and credentials
If at a later date you find there’s substantial errors, you can always change your downstream sources back to using the old Postgres server and attempt the migration again, fixing errors where needed.
Cleanup
When you’re 100% sure the migration was successful - don’t rush this part! - be sure to clean up file, servers and services that may still be running:
- Delete any
*.tar
and*.tar.gz
files on the old Postgres server, on your computer and on the new Postgres server - Stop couch2pg running on the old service
- If the data is truly not needed, drop the
cht
database on the old Postgres server - Delete any backups of the old server and service
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.