# Load BLOB files into TEXT by using file encoding in Aurora PostgreSQL-Compatible *Bhanu Ganesh Gudivada and Jeevan Shetty, Amazon Web Services* ## Summary Often during migration, there are cases where you have to process unstructured and structured data that is loaded from files on a local file system. The data might also be in a character set that differs from the database character set. These files hold the following types of data: + **Metadata** – This data describes the file structure. + **Semi-structured data** – These are textual strings in a specific format, such as JSON or XML. You might be able to make assertions about such data, such as "will always start with '<' " or "does not contain any newline characters." + **Full text** – This data usually contains all types of characters, including newline and quote characters. It might also consist of multibyte characters in UTF-8. + **Binary data** – This data might contain bytes or combinations of bytes including, nulls and end-of-file markers. Loading a mixture of these types of data can be a challenge. The pattern can be used with on-premises Oracle databases , Oracle databases that are on Amazon Elastic Compute Cloud (Amazon EC2) instances on the Amazon Web Services (AWS) Cloud, and Amazon Relational Database Service (Amazon RDS) for Oracle databases. As an example, this pattern is using Amazon Aurora PostgreSQL-Compatible Edition. In Oracle Database, with the help of a `BFILE` (binary file) pointer, the `DBMS_LOB` package, and Oracle system functions, you can load from file and convert to CLOB with character encoding. Because PostgreSQL does not support the BLOB data type when migrating to an Amazon Aurora PostgreSQL-Compatible Edition database, these functions must be converted to PostgreSQL-compatible scripts. This pattern provides two approaches for loading a file into a single database column in an Amazon Aurora PostgreSQL-Compatible database: + Approach 1 – You import data from your Amazon Simple Storage Service (Amazon S3) bucket by using the `table_import_from_s3` function of the `aws_s3` extension with the encode option. + Approach 2 – You encode to hexadecimal outside of the database, and then you decode to view `TEXT` inside the database. We recommend using Approach 1 because Aurora PostgreSQL-Compatible has direct integration with the `aws_s3` extension. This pattern uses the example of loading a flat file that contains an email template, which has multibyte characters and distinct formatting, into an Amazon Aurora PostgreSQL-Compatible database. ## Prerequisites and limitations **Prerequisites ** + An active AWS account + An Amazon RDS instance or an Aurora PostgreSQL-Compatible instance + A basic understanding of SQL and Relational Database Management System (RDBMS) + An Amazon Simple Storage Service (Amazon S3) bucket. + Knowledge of system functions in Oracle and PostgreSQL + RPM Package HexDump-XXD-0.1.1 (included with Amazon Linux 2) **Note** Amazon Linux 2 is nearing end of support. For more information, see the [Amazon Linux 2 FAQs](http://aws.amazon.com/amazon-linux-2/faqs/). **Limitations ** + For the `TEXT` data type, the longest possible character string that can be stored is about 1 GB. **Product versions** + Aurora supports the PostgreSQL versions listed in [Amazon Aurora PostgreSQL updates](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraPostgreSQLReleaseNotes/AuroraPostgreSQL.Updates.html). ## Architecture **Target technology stack ** + Aurora PostgreSQL-Compatible **Target architecture** *Approach 1 – Using aws\_s3.table\_import\_from\_s3 * From an on-premises server, a file containing an email template with multibyte characters and custom formatting is transferred to Amazon S3. The custom database function provided by this pattern uses the `aws_s3.table_import_from_s3` function with `file_encoding` to load files into the database and return query results as the `TEXT` data type. ![Four-step process from the on-premises server to the TEXT output from the Aurora database.](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/cbf63cac-dcea-4e18-ab4f-c4f6296f60e7/images/9c46b385-e8a0-4e50-b856-d522c44d79e3.png) 1. Files are transferred to the staging S3 bucket. 1. Files are uploaded to the Amazon Aurora PostgreSQL-Compatible database. 1. Using the pgAdmin client, the custom function `load_file_into_clob` is deployed to the Aurora database. 1. The custom function internally uses `table_import_from_s3` with file\_encoding. The output from the function is obtained by using `array_to_string` and `array_agg` as `TEXT` output. *Approach 2 – Encoding to hexadecimal outside of the database and decoding to view TEXT inside the database* A file from an on-premises server or a local file system is converted into a hex dump. Then the file is imported into PostgreSQL as a `TEXT` field. ![Three-step process using Hex dump.](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/cbf63cac-dcea-4e18-ab4f-c4f6296f60e7/images/563038ca-f890-4874-85df-d0f82d99800a.png) 1. Convert the file to a hex dump in the command line by using the `xxd -p` option. 1. Upload the hex dump files into Aurora PostgreSQL-Compatible by using the `\copy` option, and then decode the hex dump files to binary. 1. Encode the binary data to return as `TEXT`. ## Tools **AWS services** + [Amazon Aurora PostgreSQL-Compatible Edition](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraPostgreSQL.html) is a fully managed, ACID-compliant relational database engine that helps you set up, operate, and scale PostgreSQL deployments. + [AWS Command Line Interface (AWS CLI)](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html) is an open-source tool that helps you interact with AWS services through commands in your command-line shell. **Other tools** + [pgAdmin4](https://www.pgadmin.org/) is an open source administration and development platform for PostgreSQL. pgAdmin4 can be used on Linux, Unix, mac OS, and Windows to manage PostgreSQL. ## Epics ### Approach 1: Import data from Amazon S3 to Aurora PostgreSQL-Compatible | Task | Description | Skills required | | --- | --- | --- | | Launch an EC2 instance. | For instructions on launching an instance, see [Launch your instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/LaunchingAndUsingInstances.html). | DBA | | Install the PostgreSQL client pgAdmin tool. | Download and install [pgAdmin](https://www.pgadmin.org/download/). | DBA | | Create an IAM policy. | Create an AWS Identity and Access Management (IAM) policy named `aurora-s3-access-pol` that grants access to the S3 bucket where the files will be stored. Use the following code, replacing `` with the name of your S3 bucket.

{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:AbortMultipartUpload",
                "s3:DeleteObject",
                "s3:ListMultipartUploadParts",
                "s3:PutObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::/*",
                "arn:aws:s3:::"
            ]
        }
    ]
}

| DBA | | Create an IAM role for object import from Amazon S3 to Aurora PostgreSQL-Compatible. | Use the following code to create an IAM role named `aurora-s3-import-role` with the [AssumeRole](https://docs.amazonaws.cn/en_us/STS/latest/APIReference/API_AssumeRole.html) trust relationship. `AssumeRole` allows Aurora to access other AWS services on your behalf.

{
  "Version": "2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow","Principal": {
        "Service": "rds.amazonaws.com"
      },"Action": "sts:AssumeRole"
    }
  ]
}

| DBA | | Associate the IAM role to the cluster. | To associate the IAM role with the Aurora PostgreSQL-Compatible database cluster, run the following AWS CLI command. Change `` to the ID of the AWS account that hosts the Aurora PostgreSQL-Compatible database. This enables the Aurora PostgreSQL-Compatible database to access the S3 bucket.

aws rds add-role-to-db-cluster --db-cluster-identifier aurora-postgres-cl
--feature-name s3Import --role-arn arn:aws:iam:::role/aurora-s3-import-role

| DBA | | Upload the example to Amazon S3. | [See the AWS documentation website for more details](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/load-blob-files-into-text-by-using-file-encoding-in-aurora-postgresql-compatible.html) | DBA, App owner | | Deploy the custom function. | [See the AWS documentation website for more details](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/load-blob-files-into-text-by-using-file-encoding-in-aurora-postgresql-compatible.html) | App owner, DBA | | Run the custom function for importing the data into the database. | Run the following SQL command, replacing the items in angle brackets with the appropriate values.

select load_file_into_clob('aws-s3-import-test'::text,'us-west-1'::text,'employee.salary.event.notification.email.vm'::text);

Replace the items in angle brackets with the appropriate values, as shown in the following example, before running the command.

Select load_file_into_clob('aws-s3-import-test'::text,'us-west-1'::text,'employee.salary.event.notification.email.vm'::text);

The command loads the file from Amazon S3 and returns the output as `TEXT`. | App owner, DBA | ### Approach 2: Convert the template file into a hex dump in a local Linux system | Task | Description | Skills required | | --- | --- | --- | | Convert the template file into a hex dump. | The Hexdump utility displays the contents of binary files in hexadecimal, decimal, octal, or ASCII. The `hexdump` command is part of the `util-linux` package and comes pre-installed in Linux distributions. The Hexdump RPM package is part of Amazon Linux 2 as well. (: Amazon Linux 2 is nearing end of support. For more information, see the [Amazon Linux 2 FAQs](http://aws.amazon.com/amazon-linux-2/faqs/).)To convert the file contents into a hex dump, run the following shell command.

xxd -p  | tr -d '\n' >

Replace the path and file with the appropriate values, as shown in the following example.

xxd -p employee.salary.event.notification.email.vm | tr -d '\n' > employee.salary.event.notification.email.vm.hex

| DBA | | Load the hexdump file into the database schema. | Use the following commands to load the hexdump file into the Aurora PostgreSQL-Compatible database.[See the AWS documentation website for more details](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/load-blob-files-into-text-by-using-file-encoding-in-aurora-postgresql-compatible.html) | DBA | ## Related resources **References** + [Using a PostgreSQL database as a target for AWS Database Migration Service](https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.PostgreSQL.html) + [Oracle Database 19c to Amazon Aurora with PostgreSQL Compatibility (12.4) Migration Playbook](https://d1.awsstatic.com/whitepapers/Migration/oracle-database-amazon-aurora-postgresql-migration-playbook-12.4.pdf) + [Creating IAM policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create.html) + [Associating an IAM role with an Amazon Aurora MySQL DB cluster](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.Authorizing.IAM.AddRoleToDBCluster.html) + [pgAdmin](https://www.pgadmin.org/) **Tutorials** + [Getting Started with Amazon RDS](https://aws.amazon.com/rds/getting-started/) + [Migrate from Oracle to Amazon Aurora](https://aws.amazon.com/getting-started/projects/migrate-oracle-to-amazon-aurora/) ## Additional information **load\_file\_into\_clob custom function** ``` CREATE OR REPLACE FUNCTION load_file_into_clob( s3_bucket_name text, s3_bucket_region text, file_name text, file_delimiter character DEFAULT '&'::bpchar, file_encoding text DEFAULT 'UTF8'::text) RETURNS text LANGUAGE 'plpgsql' COST 100 VOLATILE PARALLEL UNSAFE AS $BODY$ DECLARE blob_data BYTEA; clob_data TEXT; l_table_name CHARACTER VARYING(50) := 'file_upload_hex'; l_column_name CHARACTER VARYING(50) := 'template'; l_return_text TEXT; l_option_text CHARACTER VARYING(150); l_sql_stmt CHARACTER VARYING(500); BEGIN EXECUTE format ('CREATE TEMPORARY TABLE %I (%I text, id_serial serial)', l_table_name, l_column_name); l_sql_stmt := 'select ''(format text, delimiter ''''' || file_delimiter || ''''', encoding ''''' || file_encoding || ''''')'' '; EXECUTE FORMAT(l_sql_stmt) INTO l_option_text; EXECUTE FORMAT('SELECT aws_s3.table_import_from_s3($1,$2,$6, aws_commons.create_s3_uri($3,$4,$5))') INTO l_return_text USING l_table_name, l_column_name, s3_bucket_name, file_name,s3_bucket_region,l_option_text; EXECUTE format('select array_to_string(array_agg(%I order by id_serial),E''\n'') from %I', l_column_name, l_table_name) INTO clob_data; drop table file_upload_hex; RETURN clob_data; END; $BODY$; ``` **Email template** ``` ###################################################################################### ## ## ## johndoe Template Type: email ## ## File: johndoe.salary.event.notification.email.vm ## ## Author: Aimée Étienne Date 1/10/2021 ## ## Purpose: Email template used by EmplmanagerEJB to inform a johndoe they ## ## have been given access to a salary event ## ## Template Attributes: ## ## invitedUser - PersonDetails object for the invited user ## ## salaryEvent - OfferDetails object for the event the user was given access ## ## buyercollege - CompDetails object for the college owning the salary event ## ## salaryCoordinator - PersonDetails of the salary coordinator for the event ## ## idp - Identity Provider of the email recipient ## ## httpWebRoot - HTTP address of the server ## ## ## ###################################################################################### $!invitedUser.firstname $!invitedUser.lastname, Ce courriel confirme que vous avez ete invite par $!salaryCoordinator.firstname $!salaryCoordinator.lastname de $buyercollege.collegeName a participer a l'evenement "$salaryEvent.offeringtitle" sur johndoeMaster Sourcing Intelligence. Votre nom d'utilisateur est $!invitedUser.username Veuillez suivre le lien ci-dessous pour acceder a l'evenement. ${httpWebRoot}/myDashboard.do?idp=$!{idp} Si vous avez oublie votre mot de passe, utilisez le lien "Mot de passe oublie" situe sur l'ecran de connexion et entrez votre nom d'utilisateur ci-dessus. Si vous avez des questions ou des preoccupations, nous vous invitons a communiquer avec le coordonnateur de l'evenement $!salaryCoordinator.firstname $!salaryCoordinator.lastname au ${salaryCoordinator.workphone}. ******* johndoeMaster Sourcing Intelligence est une plateforme de soumission en ligne pour les equipements, les materiaux et les services. Si vous avez des difficultes ou des questions, envoyez un courriel a support@johndoeMaster.com pour obtenir de l'aide. ```