This page documents how to hash data while its already into postgres.
Enable the package uuid-ossp on postgres to enable random uuid generation. Run the below command on psql prompt
...
Run the below script to create a function for generating a random number within a range
...
language | sql |
---|---|
title | random_between function |
...
Run the script create_prerequisite_objects.sql to create the dependencies needed by the hashing scripts
Hashing data from couchDB
...
- Checkout the repository https://github.com/OpenSRP/path-zambia-etl
- The hashing scripts are on the directory
hashing_data
- Run the script hash_couchdb_doc_id_and_rev.sql
This will hash the couchdb document id to a random uuid and set the revision to a constant so that its not possible to identify the data. This script runs on the couchdb staging table and should be run before any data migration scripts. - Run the data migration scripts to transform the data to the correct domain tables on the postgres model. This will be the step 7 on the page Data migration from CouchDb
- We used a random generator to generate 100,000 random person names and these are assigned randomly when hashing the data. The random names in the file person_names.csv. You can use your own file.
- Import random names to be used to used to hash the clients. Run the script person_names.sql to import the CSV file with the random names to a database table. It create a table person_names and populates it with random person names. Make sure to update the path of CSV file, it must be the absolute path on the local computer.
- Run the script hash_clients.sql
This script hashes personal identifiers and attributes. Random uuids are used, and random names used from the the above script. - Run the script update_client_references.sql
This script will update the client reference usuallybaseEntityId
to the random uuid that was generated in step 7 above for the dependent documents e.g events, alerts, multimedia etc - Clean up the created tables. The scripts above will create the tables
ids_mapping
andperson_names
in the public schema of the postgres database being used. Drop the two tables and also the couchDB staging tablecouchdb
. Use the below commands
drop table couchdb;
drop table ids_mapping;
drop table person_names;Run the script clean_up.sql
Hashing data already on the postgres model
...