Table of Contents

Page Properties

id	techscoping

Responsible team member

Pierre Dane

Current Team Member

Matt Berg

Status

colour	Yellow
title	In Progress

Targeted release Date

2020-02-14

Scoping Complete Date

2020-03-31

Jira Issue

Jira Legacy

server	System JIRA
serverId	3420e60a-4e6f-3f80-8335-059c22bb40aa
key	RVL-790

LOE

XXL

Priority

4

Status

#	Step	Status
1	Responsible prepares rough business analysis/reqs doc	Done
2

Get CHAI review3

Get

Akros Review4Get

Ona feedback

Pending

5

3	Responsible - 1 iteration for feedback

6


4	Ona sign off

7


5	Ona tech spec scoping

8


6	Ona LOE

9


7	Ona scheduling

Definitions

Requirements

Mock-Ups

Views

Justification

Dependencies

Notes

Questions

Test Case

#

Step

Pass / Fail

Comment

Requirements

Proposed Solutions

This solution should apply to any destination data warehouse. At present these consist of the OpenSRP Postgres data warehouse and Biophics for Thailand, a SQL Server warehouse.

Raw JSON data from the OpenSRP server should be made available to the data warehouse database
These can be polled using the serverVersion parameter to get updates via Nifi and posted to the warehouse in a similar fashion to how the data is currently being sent, but without any processing on the Nifi side - Nifi would act as a simple pipe streaming entities.
Nifi could later be replaced with another pipeline (e.g. Kafka)
ENTITIES:
- Clients (Families and Family Members)
- Locations
- Structures
- Plans
- Tasks
- Events
- Organisations
- Providers
PAYLOAD:
- The payload could be an exact representation of the entities in JSON as returned by the existing API queries e.g. /rest/plan, /rest/event, /rest/task
  - TODO: APIs may need to be updated to provide ALL entities, including those that have been archived or voided (getAll has been implemented in some cases already)
  - TODO: Decision logic may need to be replicated in the data warehouse e.g. which event is displayed for a task if 2 events for that task have been submitted
- The payload could be packaged in a format that is easier for the relational database to consume - e.g. CSV
- The payload could include all related entities necessary (e.g. the sync could be based on LOCATIONS and for any location that has had an action, all related entities could be packaged)
- The payload SHOULD be optimized for size if possible e.g. compressed and possibly packaged as Protocol Buffers, Thrift or Avro messages (nice deck here)
- The payload could be batches of records or single records
DESERIALIZATION
- Deserialization will depend on the structure of the payload
- Deserialization of JSON is described below in the Assumed Solution below
- Deserialization is expensive
PIPELINE
- The pipeline could be a pub-sub append-only log type e.g. Kafka (Intro to Kafka for Mac here - REALLY easy to get up and running quickly)
- The pipeline could be a polling service e.g. Nifi
- The pipeline could be standard Postgres replication in cases where the warehouse is Postgres (replicate the OpenSRP database, ETL that data into the warehouse using cross-database queries

Assumed Solution:

JSON Entities will arrive in warehouse via Nifi
JSON will be parsed using SQL Server or Postgres JSON query language and inserted into raw relation tables (e.g. event, event_details, event_obs for the event entities, clients and families for the client resources, tasks etc)
the raw tables will be queried and relevant logic applied to populate HLD tables (tables specific to the entity type and sub-type (e.g. blood screening vs family registration events) - these will be the 'materialized views'
These views should be partitioned by date (suggest monthly partitions) where there are large amounts of data - in Postgres , in SQL Server. This enables faster querying, as well as vastly improved repopulation of the 'materialized views' when data changes as partitions can be repopulated individually (deleting partitions is also almost instant)

Example SQL code for deserialization:

...

8

...

select  *
from raw_data
cross apply openjson(JSON_query(event_json,'$.details')) WITH (
    taskstatus  varchar(500) '$.taskStatus',
    locationUUID uniqueidentifier '$.locationUUID'
) as obs
where ISJSON(event_json) = 1

Version	Old Version 1	New Version Current
Changes made by	Pierre Dane	Pierre Dane
Saved on	Jan 20, 2020	Feb 05, 2020

Versions Compared

Key

Status

Definitions

Requirements

Mock-Ups

Views

Justification

Dependencies

Notes

Questions

Test Case

Requirements

Proposed Solutions

Assumed Solution:

Content Comparison

Versions Compared

Key

Status

Definitions

Requirements

Mock-Ups

Views

Justification

Dependencies

Notes

Questions

Test Case

Requirements

Proposed Solutions

Assumed Solution: