Pilih preferensi cookie Anda

Kami menggunakan cookie penting serta alat serupa yang diperlukan untuk menyediakan situs dan layanan. Kami menggunakan cookie performa untuk mengumpulkan statistik anonim sehingga kami dapat memahami cara pelanggan menggunakan situs dan melakukan perbaikan. Cookie penting tidak dapat dinonaktifkan, tetapi Anda dapat mengklik “Kustom” atau “Tolak” untuk menolak cookie performa.

Jika Anda setuju, AWS dan pihak ketiga yang disetujui juga akan menggunakan cookie untuk menyediakan fitur situs yang berguna, mengingat preferensi Anda, dan menampilkan konten yang relevan, termasuk iklan yang relevan. Untuk menerima atau menolak semua cookie yang tidak penting, klik “Terima” atau “Tolak”. Untuk membuat pilihan yang lebih detail, klik “Kustomisasi”.

Implement SHA1 hashing for PII data when migrating from SQL Server to PostgreSQL - AWS Prescriptive Guidance
Halaman ini belum diterjemahkan ke dalam bahasa Anda. Minta terjemahan

Implement SHA1 hashing for PII data when migrating from SQL Server to PostgreSQL

Created by Rajkumar Raghuwanshi (AWS) and Jagadish Kantubugata (AWS)

Summary

This pattern describes how to implement Secure Hash Algorithm 1 (SHA1) hashing for email addresses when migrating from SQL Server to either Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL-Compatible. An email address is an example of personally identifiable information (PII). PII is information that, when viewed directly or paired with other related data, can be used to reasonably infer the identity of an individual.

This pattern covers the challenges of maintaining consistent hash values across different database collations and character encodings, and provides a solution using PostgreSQL functions and triggers. Although this pattern focuses on SHA1 hashing, it can be adapted for other hashing algorithms supported by PostgreSQL's pgcrypto module. Always consider the security implications of your hashing strategy and consult with security experts if handling sensitive data.

Prerequisites and limitations

Prerequisites

  • An active AWS account

  • Source SQL Server database

  • Target PostgreSQL database (Amazon RDS for PostgreSQL or Aurora PostgreSQL-Compatible)

  • PL/pgSQL coding expertise

Limitations

  • This pattern requires database-level collation changes based on use cases.

  • The performance impact on large datasets has not been evaluated.

  • Some AWS services aren’t available in all AWS Regions. For Region availability, see AWS Services by Region. For specific endpoints, see Service endpoints and quotas, and choose the link for the service.

Product versions

  • Microsoft SQL Server 2012 or later

Architecture

Source technology stack

  • SQL Server

  • .NET Framework

Target technology stack

  • PostgreSQL

  • pgcrypto extension

Automation and scale

  • Consider implementing the hashing function as a stored procedure for easier maintenance.

  • For large datasets, evaluate performance and consider batch processing or indexing strategies.

Tools

AWS services

Other tools

  • pgAdmin is an open source management tool for PostgreSQL. It provides a graphical interface that helps you create, maintain, and use database objects.

  • SQL Server Management Studio (SSMS) is an integrated environment for managing any SQL infrastructure.

Best practices

  • Use appropriate collation settings for handling special characters on the target database side.

  • Test thoroughly with a variety of email addresses, including addresses with non-ASCII characters.

  • Maintain consistency in uppercase and lowercase handling between the application and database layers.

  • Benchmark performance of queries using the hashed values.

Epics

TaskDescriptionSkills required

Review SQL Server code.

To review SQL Server code that generates SHA1 hashes, do the following:

  • Analyze the existing SQL Server implementation of SHA1 hashing.

  • Identify the exact methods used for hash generation.

  • Document the input parameters and output format.

  • Review any data type conversions or transformations.

  • Examine collation settings and their impact.

Data engineer, DBA, App developer

Document the hashing algorithm and data transformations.

To document the exact hashing algorithm and data transformations, do the following:

  • Create detailed technical documentation of the hashing process.

  • Document the step-by-step transformation logic.

  • Specify input and output formats and data types.

  • Include edge cases and special character handling.

App developer, Data engineer, DBA

Analyze source hashing implementation

TaskDescriptionSkills required

Review SQL Server code.

To review SQL Server code that generates SHA1 hashes, do the following:

  • Analyze the existing SQL Server implementation of SHA1 hashing.

  • Identify the exact methods used for hash generation.

  • Document the input parameters and output format.

  • Review any data type conversions or transformations.

  • Examine collation settings and their impact.

Data engineer, DBA, App developer

Document the hashing algorithm and data transformations.

To document the exact hashing algorithm and data transformations, do the following:

  • Create detailed technical documentation of the hashing process.

  • Document the step-by-step transformation logic.

  • Specify input and output formats and data types.

  • Include edge cases and special character handling.

App developer, Data engineer, DBA
TaskDescriptionSkills required

Create pgcrypto extension.

To create the pgcrypto extension, use pgAdmin/psql to run the following command:

CREATE EXTENSION pgcrypto;
DBA, Data engineer

Implement a PostgreSQL function.

Implement the following PostgreSQL function to replicate the SQL Server hashing logic. At a high level, this function uses the following steps:

  1. Optionally converts the input to uppercase.

  2. Creates an SHA1 hash of the input.

  3. Takes the last 10 bytes (80 bits) of this hash.

  4. Converts these bytes to a 64-bit integer.

CREATE OR REPLACE FUNCTION utility.hex_to_bigint ( par_val character varying, par_upper character varying DEFAULT 'lower'::character varying) RETURNS bigint LANGUAGE 'plpgsql' AS $BODY$ DECLARE retnumber bigint; digest_bytes bytea; BEGIN if lower(par_upper) = 'upper' then digest_bytes := digest(upper(par_val), 'sha1'); else digest_bytes := digest((par_val), 'sha1'); end if; retnumber := ('x' || encode(substring(digest_bytes, length(digest_bytes)-10+1), 'hex'))::bit(64)::bigint; RETURN retnumber; END; $BODY$;
Data engineer, DBA, App developer

Test the function.

To test the function, use sample data from SQL Server to verify matching hash values. Run the following command:

select 'alejandro_rosalez@example.com' as Email, utility.hex_to_bigint('alejandro_rosalez@example.com','upper') as HashValue; --OUTPUT /* email hashvalue "alejandro_rosalez@example.com" 451397011176045063 */
App developer, DBA, Data engineer

Create PostgreSQL hashing function

TaskDescriptionSkills required

Create pgcrypto extension.

To create the pgcrypto extension, use pgAdmin/psql to run the following command:

CREATE EXTENSION pgcrypto;
DBA, Data engineer

Implement a PostgreSQL function.

Implement the following PostgreSQL function to replicate the SQL Server hashing logic. At a high level, this function uses the following steps:

  1. Optionally converts the input to uppercase.

  2. Creates an SHA1 hash of the input.

  3. Takes the last 10 bytes (80 bits) of this hash.

  4. Converts these bytes to a 64-bit integer.

CREATE OR REPLACE FUNCTION utility.hex_to_bigint ( par_val character varying, par_upper character varying DEFAULT 'lower'::character varying) RETURNS bigint LANGUAGE 'plpgsql' AS $BODY$ DECLARE retnumber bigint; digest_bytes bytea; BEGIN if lower(par_upper) = 'upper' then digest_bytes := digest(upper(par_val), 'sha1'); else digest_bytes := digest((par_val), 'sha1'); end if; retnumber := ('x' || encode(substring(digest_bytes, length(digest_bytes)-10+1), 'hex'))::bit(64)::bigint; RETURN retnumber; END; $BODY$;
Data engineer, DBA, App developer

Test the function.

To test the function, use sample data from SQL Server to verify matching hash values. Run the following command:

select 'alejandro_rosalez@example.com' as Email, utility.hex_to_bigint('alejandro_rosalez@example.com','upper') as HashValue; --OUTPUT /* email hashvalue "alejandro_rosalez@example.com" 451397011176045063 */
App developer, DBA, Data engineer
TaskDescriptionSkills required

Create triggers on relevant tables.

To create triggers on relevant tables to automatically generate hash values on insert or update, run the following command:

CREATE OR REPLACE FUNCTION update_email_hash() RETURNS TRIGGER AS $$ BEGIN NEW.email_hash = utility.hex_to_bigint(NEW.email, 'upper'); RETURN NEW; END; $$ LANGUAGE plpgsql;
CREATE TRIGGER email_hash_trigger BEFORE INSERT OR UPDATE ON users FOR EACH ROW EXECUTE FUNCTION update_email_hash();
App developer, Data engineer, DBA

Implement triggers for automatic hashing

TaskDescriptionSkills required

Create triggers on relevant tables.

To create triggers on relevant tables to automatically generate hash values on insert or update, run the following command:

CREATE OR REPLACE FUNCTION update_email_hash() RETURNS TRIGGER AS $$ BEGIN NEW.email_hash = utility.hex_to_bigint(NEW.email, 'upper'); RETURN NEW; END; $$ LANGUAGE plpgsql;
CREATE TRIGGER email_hash_trigger BEFORE INSERT OR UPDATE ON users FOR EACH ROW EXECUTE FUNCTION update_email_hash();
App developer, Data engineer, DBA
TaskDescriptionSkills required

Develop a migration script or use AWS DMS.

Develop a migration script or use AWS DMS to populate hash values for existing data (including hash values stored as BIGINT in the source system.) Complete the following tasks:

  • Create migration scripts for data transfer with hash values.

  • Configure AWS DMS tasks with appropriate transformation rules.

  • Set up source and target endpoints in AWS DMS.

  • Implement error handling and logging mechanisms.

  • Design a batch processing strategy for large datasets.

  • Create validation queries for data verification.

Data engineer, App developer, DBA

Use the new PostgreSQL hashing function.

To use the new PostgreSQL hashing function to ensure consistency, do the following:

  • Implement validation procedures to verify hash consistency.

  • Create comparison scripts between the source and target systems.

  • Set up automated testing for hash value verification.

  • Document any discrepancies and resolution steps.

App developer, DBA, DevOps engineer

Migrate existing data

TaskDescriptionSkills required

Develop a migration script or use AWS DMS.

Develop a migration script or use AWS DMS to populate hash values for existing data (including hash values stored as BIGINT in the source system.) Complete the following tasks:

  • Create migration scripts for data transfer with hash values.

  • Configure AWS DMS tasks with appropriate transformation rules.

  • Set up source and target endpoints in AWS DMS.

  • Implement error handling and logging mechanisms.

  • Design a batch processing strategy for large datasets.

  • Create validation queries for data verification.

Data engineer, App developer, DBA

Use the new PostgreSQL hashing function.

To use the new PostgreSQL hashing function to ensure consistency, do the following:

  • Implement validation procedures to verify hash consistency.

  • Create comparison scripts between the source and target systems.

  • Set up automated testing for hash value verification.

  • Document any discrepancies and resolution steps.

App developer, DBA, DevOps engineer
TaskDescriptionSkills required

Identify application queries.

To identify application queries that use hashed values, do the following:

  • Analyze the application codebase for queries using hash values.

  • Review stored procedures and functions that reference hash operations.

  • Document query performance metrics and execution plans.

  • Identify dependencies on hash-based lookups.

  • Map out affected application components.

App developer, DBA, Data engineer

Modify queries.

If necessary, modify queries to use the new PostgreSQL hashing function. Do the following:

  • Refactor existing queries to use the PostgreSQL hashing function.

  • Update stored procedures and functions.

  • Implement and test new query patterns.

  • Optimize modified queries for performance.

App developer, DBA, Data engineer

Update application queries

TaskDescriptionSkills required

Identify application queries.

To identify application queries that use hashed values, do the following:

  • Analyze the application codebase for queries using hash values.

  • Review stored procedures and functions that reference hash operations.

  • Document query performance metrics and execution plans.

  • Identify dependencies on hash-based lookups.

  • Map out affected application components.

App developer, DBA, Data engineer

Modify queries.

If necessary, modify queries to use the new PostgreSQL hashing function. Do the following:

  • Refactor existing queries to use the PostgreSQL hashing function.

  • Update stored procedures and functions.

  • Implement and test new query patterns.

  • Optimize modified queries for performance.

App developer, DBA, Data engineer
TaskDescriptionSkills required

Perform testing.

To perform thorough testing with a subset of production data, do the following:

  • Create a test plan for data subset validation.

  • Extract representative sample of production data.

  • Set up a test environment with proper configurations.

  • Execute data loading and transformation tests.

  • Perform volume and stress testing.

App developer, Data engineer, DBA

Validate that hash values match.

To validate that hash values match between SQL Server and PostgreSQL, do the following:

  • Develop comparison scripts for hash values.

  • Create validation reports for hash matching.

  • Implement automated verification procedures.

  • Document any discrepancies found.

  • Analyze and resolve hash mismatches.

App developer, Data engineer, DBA

Verify application functionality.

To verify application functionality by using the migrated data and the new hashing implementation, do the following:

  • Execute end-to-end application testing.

  • Validate all application features using hashed data.

  • Test application performance with the new implementation.

  • Verify API integrations and dependencies.

App developer, DBA, Data engineer

Test and validate

TaskDescriptionSkills required

Perform testing.

To perform thorough testing with a subset of production data, do the following:

  • Create a test plan for data subset validation.

  • Extract representative sample of production data.

  • Set up a test environment with proper configurations.

  • Execute data loading and transformation tests.

  • Perform volume and stress testing.

App developer, Data engineer, DBA

Validate that hash values match.

To validate that hash values match between SQL Server and PostgreSQL, do the following:

  • Develop comparison scripts for hash values.

  • Create validation reports for hash matching.

  • Implement automated verification procedures.

  • Document any discrepancies found.

  • Analyze and resolve hash mismatches.

App developer, Data engineer, DBA

Verify application functionality.

To verify application functionality by using the migrated data and the new hashing implementation, do the following:

  • Execute end-to-end application testing.

  • Validate all application features using hashed data.

  • Test application performance with the new implementation.

  • Verify API integrations and dependencies.

App developer, DBA, Data engineer

Troubleshooting

IssueSolution

Hash values don’t match.

Verify character encodings and collations between source and target. For more information, see Manage collation changes in PostgreSQL on Amazon Aurora and Amazon RDS (AWS Blog).

Related resources

AWS Blogs

Other resources

Additional information

This pattern focuses on SHA1 hashing, but can be adapted for other hashing algorithms supported by PostgreSQL's pgcrypto module. Always consider the security implications of your hashing strategy and consult with security experts if handling sensitive data.

PrivasiSyarat situsPreferensi cookie
© 2025, Amazon Web Services, Inc. atau afiliasinya. Semua hak dilindungi undang-undang.