Hosting > 4.x > Data migration to 4.x
Guide to migrate existent data from CHT 3.x to CHT 4.x
Horizontally scaling is the ability to add more servers to an application to make it more performant. This often yields better performance than vertical scaling, which is adding more resources like RAM or CPU to a single server.
CHT Core 4.0.0 introduces a new architecture for hosting which gives it the ability to easily scale horizontally. This enables large deployments to support more concurrent users and better utilize the underlying server hardware.
Before getting into how the CHT horizontally scales, it should be well understood the importance of vertical scaling and what it is. This is the ability of the CHT to support more users by adding more RAM and CPU to either the bare-metal or virtual machine host. This ensures key services like API, Sentinel and, most importantly, CouchDB, can operate without performance degradation.
When thousands of users are simultaneously trying to synchronize with the CHT, the load can overwhelm CouchDB. As discovered through extensive research and large production deployments, administrators will start to see errors in their logs and end users will complain of slow sync times. Before moving to more CouchDB nodes, administrators should consider adding more RAM and CPU to the single server where the CHT is hosted. This applies to both CHT 3.x and CHT 4.x. Given the ease of allocating more resources, presumably in virtualized environment like EC2, Proxmox or ESXi, this is much easier than moving from a single to multi-node CouchDB instance.
Here we see a normal deployment following the bare minimum hosting requirements for the CHT. We’ll call this a “short” deployment because it is not yet vertically scaled:
flowchart TD subgraph couch1[" CouchDB - 1 x ''short'' Node "] couchInner1["4 CPU/8 GB RAM"] end API["API"] --> HAProxy --> couch1
After looking at the logs, and seeing error messages about API timeouts to CouchDB, the CHT admin can make this “taller” by adding both more RAM and CPU, so it looks like this:
flowchart TD subgraph couch2[" CouchDB - 1 x ''tall'' Node "] couchInner2["16 CPU/64 GB RAM"] end API["API"] --> HAProxy --> couch2
Since both CHT 3.x and 4.x support this, vertical scaling is an easy, good first step in addressing performance issues in the CHT.
For those self hosting who are looking to maximize their vertically scaled deployment, consider splitting CouchDB shards to have more shards. CouchDB uses 1 core to manage each shard. By default, a CHT Core 4.x deployment will have 8 shards. If you have available unused CPUs, by re-sharding you divide up CouchDB’s shard management to take advantage of more cores.
When:
it is time to consider horizontally scaling your CHT instance. The benefit is that CouchDB has been proven to use resources much better when there’s multiple of instances of it, each taking a share of the work. Here we see 18 CPUs being spread across 3 nodes (vs 16 CPUs on one instance above) with a load balancer (HAProxy) distributing requests :
flowchart TD subgraph couch4["CouchDB - 3 x Nodes"] couchInner4["6 CPU/6 GB RAM "] couchInner5["6 CPU/6 GB RAM "] couchInner6["6 CPU/6 GB RAM "] end API["API"] --> HAProxy --> couch4
To read up on how to migrate your data from a single to multi-node, please see the data migration guide.
It should be noted that, unlike vertical scaling, horizontal scaling of a large, existing dataset can take a while to prepare the transfer (hours to days) and may involve a brief service outage. This should be taken into consideration when planning a move of a CHT instance with a lot of data.
Guide to migrate existent data from CHT 3.x to CHT 4.x
The different pieces of a CHT project, how they interact, and what they’re used for
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.