Implement SHA1 hashing for PII data when migrating from SQL Server to PostgreSQL
Created by Rajkumar Raghuwanshi (AWS) and Jagadish Kantubugata (AWS)
Summary
This pattern describes how to implement Secure Hash Algorithm 1 (SHA1) hashing for email addresses when migrating from SQL Server to either Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL-Compatible. An email address is an example of personally identifiable information (PII). PII is information that, when viewed directly or paired with other related data, can be used to reasonably infer the identity of an individual.
This pattern covers the challenges of maintaining consistent hash values across different database collations and character encodings, and provides a solution using PostgreSQL functions and triggers. Although this pattern focuses on SHA1 hashing, it can be adapted for other hashing algorithms supported by PostgreSQL's pgcrypto
module. Always consider the security implications of your hashing strategy and consult with security experts if handling sensitive data.
Prerequisites and limitations
Prerequisites
An active AWS account
Source SQL Server database
Target PostgreSQL database (Amazon RDS for PostgreSQL or Aurora PostgreSQL-Compatible)
PL/pgSQL coding expertise
Limitations
This pattern requires database-level collation changes based on use cases.
The performance impact on large datasets has not been evaluated.
Some AWS services aren’t available in all AWS Regions. For Region availability, see AWS Services by Region
. For specific endpoints, see Service endpoints and quotas, and choose the link for the service.
Product versions
Microsoft SQL Server 2012 or later
Architecture
Source technology stack
SQL Server
.NET Framework
Target technology stack
PostgreSQL
pgcrypto
extension
Automation and scale
Consider implementing the hashing function as a stored procedure for easier maintenance.
For large datasets, evaluate performance and consider batch processing or indexing strategies.
Tools
AWS services
Amazon Aurora PostgreSQL-Compatible is a fully managed, ACID-compliant relational database engine that helps you set up, operate, and scale PostgreSQL deployments.
AWS Database Migration Service (AWS DMS) helps you migrate data stores into the AWS Cloud or between combinations of cloud and on-premises setups.
Amazon Relational Database Service Amazon RDS for PostgreSQL helps you set up, operate, and scale a PostgreSQL relational database in the AWS Cloud.
AWS Schema Conversion Tool (AWS SCT) supports heterogeneous database migrations by automatically converting the source database schema and a majority of the custom code to a format that’s compatible with the target database.
Other tools
pgAdmin
is an open source management tool for PostgreSQL. It provides a graphical interface that helps you create, maintain, and use database objects. SQL Server Management Studio (SSMS)
is an integrated environment for managing any SQL infrastructure.
Best practices
Use appropriate collation settings for handling special characters on the target database side.
Test thoroughly with a variety of email addresses, including addresses with non-ASCII characters.
Maintain consistency in uppercase and lowercase handling between the application and database layers.
Benchmark performance of queries using the hashed values.
Epics
Task | Description | Skills required |
---|---|---|
Review SQL Server code. | To review SQL Server code that generates SHA1 hashes, do the following:
| Data engineer, DBA, App developer |
Document the hashing algorithm and data transformations. | To document the exact hashing algorithm and data transformations, do the following:
| App developer, Data engineer, DBA |
Task | Description | Skills required |
---|---|---|
Create | To create the
| DBA, Data engineer |
Implement a PostgreSQL function. | Implement the following PostgreSQL function to replicate the SQL Server hashing logic. At a high level, this function uses the following steps:
| Data engineer, DBA, App developer |
Test the function. | To test the function, use sample data from SQL Server to verify matching hash values. Run the following command:
| App developer, DBA, Data engineer |
Task | Description | Skills required |
---|---|---|
Create triggers on relevant tables. | To create triggers on relevant tables to automatically generate hash values on insert or update, run the following command:
| App developer, Data engineer, DBA |
Task | Description | Skills required |
---|---|---|
Develop a migration script or use AWS DMS. | Develop a migration script or use AWS DMS to populate hash values for existing data (including hash values stored as
| Data engineer, App developer, DBA |
Use the new PostgreSQL hashing function. | To use the new PostgreSQL hashing function to ensure consistency, do the following:
| App developer, DBA, DevOps engineer |
Task | Description | Skills required |
---|---|---|
Identify application queries. | To identify application queries that use hashed values, do the following:
| App developer, DBA, Data engineer |
Modify queries. | If necessary, modify queries to use the new PostgreSQL hashing function. Do the following:
| App developer, DBA, Data engineer |
Task | Description | Skills required |
---|---|---|
Perform testing. | To perform thorough testing with a subset of production data, do the following:
| App developer, Data engineer, DBA |
Validate that hash values match. | To validate that hash values match between SQL Server and PostgreSQL, do the following:
| App developer, Data engineer, DBA |
Verify application functionality. | To verify application functionality by using the migrated data and the new hashing implementation, do the following:
| App developer, DBA, Data engineer |
Troubleshooting
Issue | Solution |
---|---|
Hash values don’t match. | Verify character encodings and collations between source and target. For more information, see Manage collation changes in PostgreSQL on Amazon |
Related resources
AWS Blogs
Other resources
PostgreSQL pgcrypto module
(PostgreSQL documentation) PostgreSQL trigger functions
(PostgreSQL documentation) SQL Server HASHBYTES function
(Microsoft documentation)
Additional information
This pattern focuses on SHA1 hashing, but can be adapted for other hashing algorithms supported by PostgreSQL's pgcrypto module. Always consider the security implications of your hashing strategy and consult with security experts if handling sensitive data.