CHT Sync and CHT Pipeline

Data synchronization tools to enable analytics

Overview

CHT Sync is an integrated solution designed to enable data synchronization between CouchDB and PostgreSQL for the purpose of analytics. It combines several technologies to achieve this synchronization and provides an efficient workflow for data processing and visualization. The synchronization occurs in real-time, ensuring that the data displayed on dashboards is up-to-date.

Read more about setting up setting up CHT Sync.

CHT Sync uses Logstash and PostgREST to replicate data from CouchDB to PostgreSQL in a real-time manner. It listens to changes in the CHT database, and updates the analytics database accordingly. It is not designed to be accessed by users, and it does not have a user interface. It is designed to be run on the same server as the CHT, but it can be run on a separate server if necessary.

As CHT Sync puts all new data into a PostgreSQL database into a single table that has a jsonb column, this is not very useful for analytics. CHT Pipeline is a set of SQL queries that transform the data in the jsonb column into a more useful format. It uses DBT to define the models that are translated into PostgreSQL tables or views, which makes it easier to query the data in the analytics platform of choice.

Logstash

Logstash streams data from CouchDB and forwards it to PostgREST, ensuring real-time updates in PostgreSQL.

PostgREST

PostgREST acts as a RESTful API layer, by providing endpoints to store and retrieve the data in/from the PostgreSQL database.

PostgreSQL

A free and open source SQL database used for analytics queries. See more at the PostgreSQL site.

DBT

Once the data is synchronized and stored in PostgreSQL, it undergoes transformation using predefined DBT models from the cht-pipeline. DBT is used to ingest raw JSON data from the PosgtreSQL database (jsonb column) and normalize it into a relational schema to make it easier to query. A daemon runs CHT Pipeline, and it updates the database whenever the data in the jsonb column changes.

Data Visualization

We recommend Apache Superset as the Data Visualization Tool. Superset is a free, open-source platform for data exploration and data visualization.

CHT Core Framework & CouchDB

For more information on these technologies, see CHT Core overview.


CHT Core Framework > Overview > CHT Core

The different pieces of a CHT project, how they interact, and what they’re used for

CHT Core Framework > Overview > Data Flows

An overview of data flows in the CHT for analytics, impact monitoring, and data science

CHT Applications > Quick Guides > Data > Data Synchronization and Analytics

Using CHT Sync and CHT Pipeline for data synchronization and analytics