Patient Deduplication Strategy

Work in Progress

This document is a work in progress and will be removed when complete.


This document defines the current gaps in the identification of duplicate patients across the OpenSRP platform and will ultimately propose a list of technical solutions to be explored that could resolve the gaps. It's clear that the core data changes need to somehow be made in the OpenSRP server's database.

Problem Statements

  • OpenSRP doesn't have a straightforward process for identifying duplicate patients in the system.
  • Identifying and merging duplicate records requires a database engineer. The community expects the process 
  • When a patient is identified, there isn't an easy way to merge two clients into a single record, nor the associated events
    • The act of merging has implications on the following components:
      • Android Client
        • What data model changes need to be made on the client?
        • How should this merge be displayed to the user?
      • OpenMRS
        • If we merge a record in OpenSRP, should we merge it in OpenMRS? Should this be automated?
      • RapidPro
        • If we merge a record in OpenSRP, should we delete the contact in RapidPro?
        • Should we notify the user of this change through a RapidPro flow?
      • Reporting
        • Reporting systems generally run incrementally, not from the beginning of time. If we merge a record at a point of time, how will this be reflected in reports?
  • OpenSRP doesn't have a standard mechanism to "archive" client records in the database.
    • Archiving records will have downstream effects on OpenMRS and reporting.
  • The OpenMRS merge patient feature doesn't make changes in OpenSRP.
  • The OpenMRS merge patient feature is an incredibly heavy process that can shutdown the server when run at scale. Other times, it may take hours to identify duplicates

Core Features

  • End users should have access to a user interface that allows health workers to search for duplicates, identify duplicates and merge duplicates.
    • These changes should be made system-wide
    • This search should include the ability to perform a fuzzy search for potential duplicates based on name
    • (Stretch) users can choose from a number of predefined duplicate search strategies
  • All merge events from the central system(s) should be reflected in the Android client
  • If an Android client user identifies a duplicate on their device, they should be able to archive or merge the duplicate record so it is no longer displayed on their Android client, or on other devices at the same facility
  • The merging process should include a robust audit trail that is centrally viewable
  • OpenSRP server should provide specific role based access controls to limit the number of users who have the ability to view and merge patient records from a central location

Alternative Solutions To Be Researched

  • We review and test OpenMRS merge features at scale and identify processes in OpenSRP server that kick off when a merge is processed
  • We can integrate with a third party tool to more quickly identify duplicates (Jembi's HEARTH, OpenEMPI, MEDIC CR) and develop a process for integrating those changes in OpenSRP server