Load BLOB files into TEXT by using file encoding in Aurora PostgreSQL-Compatible
Created by Bhanu Ganesh Gudivada (AWS) and Jeevan Shetty (AWS)
Summary
Often during migration, there are cases where you have to process unstructured and structured data that is loaded from files on a local file system. The data might also be in a character set that differs from the database character set.
These files hold the following types of data:
Metadata – This data describes the file structure.
Semi-structured data – These are textual strings in a specific format, such as JSON or XML. You might be able to make assertions about such data, such as "will always start with '<' " or "does not contain any newline characters."
Full text – This data usually contains all types of characters, including newline and quote characters. It might also consist of multibyte characters in UTF-8.
Binary data – This data might contain bytes or combinations of bytes including, nulls and end-of-file markers.
Loading a mixture of these types of data can be a challenge.
The pattern can be used with on-premises Oracle databases , Oracle databases that are on Amazon Elastic Compute Cloud (Amazon EC2) instances on the Amazon Web Services (AWS) Cloud, and Amazon Relational Database Service (Amazon RDS) for Oracle databases. As an example, this pattern is using Amazon Aurora PostgreSQL-Compatible Edition.
In Oracle Database, with the help of a BFILE
(binary file) pointer, the DBMS_LOB
package, and Oracle system functions, you can load from file and convert to CLOB with character encoding. Because PostgreSQL does not support the BLOB data type when migrating to an Amazon Aurora PostgreSQL-Compatible Edition database, these functions must be converted to PostgreSQL-compatible scripts.
This pattern provides two approaches for loading a file into a single database column in an Amazon Aurora PostgreSQL-Compatible database:
Approach 1 – You import data from your Amazon Simple Storage Service (Amazon S3) bucket by using the
table_import_from_s3
function of theaws_s3
extension with the encode option.Approach 2 – You encode to hexadecimal outside of the database, and then you decode to view
TEXT
inside the database.
We recommend using Approach 1 because Aurora PostgreSQL-Compatible has direct integration with the aws_s3
extension.
This pattern uses the example of loading a flat file that contains an email template, which has multibyte characters and distinct formatting, into an Amazon Aurora PostgreSQL-Compatible database.
Prerequisites and limitations
Prerequisites
An active AWS account
An Amazon RDS instance or an Aurora PostgreSQL-Compatible instance
A basic understanding of SQL and Relational Database Management System (RDBMS)
An Amazon Simple Storage Service (Amazon S3) bucket.
Knowledge of system functions in Oracle and PostgreSQL
RPM Package HexDump-XXD-0.1.1 (included with Amazon Linux 2)
Note
Amazon Linux 2 is nearing end of support. For more information, see the Amazon Linux 2 FAQs
.
Limitations
For the
TEXT
data type, the longest possible character string that can be stored is about 1 GB.
Product versions
Aurora supports the PostgreSQL versions listed in Amazon Aurora PostgreSQL updates.
Architecture
Target technology stack
Aurora PostgreSQL-Compatible
Target architecture
Approach 1 – Using aws_s3.table_import_from_s3
From an on-premises server, a file containing an email template with multibyte characters and custom formatting is transferred to Amazon S3. The custom database function provided by this pattern uses the aws_s3.table_import_from_s3
function with file_encoding
to load files into the database and return query results as the TEXT
data type.

Files are transferred to the staging S3 bucket.
Files are uploaded to the Amazon Aurora PostgreSQL-Compatible database.
Using the pgAdmin client, the custom function
load_file_into_clob
is deployed to the Aurora database.The custom function internally uses
table_import_from_s3
with file_encoding. The output from the function is obtained by usingarray_to_string
andarray_agg
asTEXT
output.
Approach 2 – Encoding to hexadecimal outside of the database and decoding to view TEXT inside the database
A file from an on-premises server or a local file system is converted into a hex dump. Then the file is imported into PostgreSQL as a TEXT
field.

Convert the file to a hex dump in the command line by using the
xxd -p
option.Upload the hex dump files into Aurora PostgreSQL-Compatible by using the
\copy
option, and then decode the hex dump files to binary.Encode the binary data to return as
TEXT
.
Tools
AWS services
Amazon Aurora PostgreSQL-Compatible Edition is a fully managed, ACID-compliant relational database engine that helps you set up, operate, and scale PostgreSQL deployments.
AWS Command Line Interface (AWS CLI) is an open-source tool that helps you interact with AWS services through commands in your command-line shell.
Other tools
pgAdmin4
is an open source administration and development platform for PostgreSQL. pgAdmin4 can be used on Linux, Unix, mac OS, and Windows to manage PostgreSQL.
Epics
Task | Description | Skills required |
---|---|---|
Launch an EC2 instance. | For instructions on launching an instance, see Launch your instance. | DBA |
Install the PostgreSQL client pgAdmin tool. | Download and install pgAdmin | DBA |
Create an IAM policy. | Create an AWS Identity and Access Management (IAM) policy named
| DBA |
Create an IAM role for object import from Amazon S3 to Aurora PostgreSQL-Compatible. | Use the following code to create an IAM role named
| DBA |
Associate the IAM role to the cluster. | To associate the IAM role with the Aurora PostgreSQL-Compatible database cluster, run the following AWS CLI command. Change
| DBA |
Upload the example to Amazon S3. |
| DBA, App owner |
Deploy the custom function. |
| App owner, DBA |
Run the custom function for importing the data into the database. | Run the following SQL command, replacing the items in angle brackets with the appropriate values.
Replace the items in angle brackets with the appropriate values, as shown in the following example, before running the command.
The command loads the file from Amazon S3 and returns the output as | App owner, DBA |
Task | Description | Skills required |
---|---|---|
Convert the template file into a hex dump. | NoteThe Hexdump utility displays the contents of binary files in hexadecimal, decimal, octal, or ASCII. The To convert the file contents into a hex dump, run the following shell command.
Replace the path and file with the appropriate values, as shown in the following example.
| DBA |
Load the hexdump file into the database schema. | Use the following commands to load the hexdump file into the Aurora PostgreSQL-Compatible database.
| DBA |
Related resources
References
Tutorials
Additional information
load_file_into_clob custom function
CREATE OR REPLACE FUNCTION load_file_into_clob(
s3_bucket_name text,
s3_bucket_region text,
file_name text,
file_delimiter character DEFAULT '&'::bpchar,
file_encoding text DEFAULT 'UTF8'::text)
RETURNS text
LANGUAGE 'plpgsql'
COST 100
VOLATILE PARALLEL UNSAFE
AS $BODY$
DECLARE
blob_data BYTEA;
clob_data TEXT;
l_table_name CHARACTER VARYING(50) := 'file_upload_hex';
l_column_name CHARACTER VARYING(50) := 'template';
l_return_text TEXT;
l_option_text CHARACTER VARYING(150);
l_sql_stmt CHARACTER VARYING(500);
BEGIN
EXECUTE format ('CREATE TEMPORARY TABLE %I (%I text, id_serial serial)', l_table_name, l_column_name);
l_sql_stmt := 'select ''(format text, delimiter ''''' || file_delimiter || ''''', encoding ''''' || file_encoding || ''''')'' ';
EXECUTE FORMAT(l_sql_stmt)
INTO l_option_text;
EXECUTE FORMAT('SELECT aws_s3.table_import_from_s3($1,$2,$6, aws_commons.create_s3_uri($3,$4,$5))')
INTO l_return_text
USING l_table_name, l_column_name, s3_bucket_name, file_name,s3_bucket_region,l_option_text;
EXECUTE format('select array_to_string(array_agg(%I order by id_serial),E''\n'') from %I', l_column_name, l_table_name)
INTO clob_data;
drop table file_upload_hex;
RETURN clob_data;
END;
$BODY$;
Email template
######################################################################################
## ##
## johndoe Template Type: email ##
## File: johndoe.salary.event.notification.email.vm ##
## Author: Aimée Étienne Date 1/10/2021 ##
## Purpose: Email template used by EmplmanagerEJB to inform a johndoe they ##
## have been given access to a salary event ##
## Template Attributes: ##
## invitedUser - PersonDetails object for the invited user ##
## salaryEvent - OfferDetails object for the event the user was given access ##
## buyercollege - CompDetails object for the college owning the salary event ##
## salaryCoordinator - PersonDetails of the salary coordinator for the event ##
## idp - Identity Provider of the email recipient ##
## httpWebRoot - HTTP address of the server ##
## ##
######################################################################################
$!invitedUser.firstname $!invitedUser.lastname,
Ce courriel confirme que vous avez ete invite par $!salaryCoordinator.firstname $!salaryCoordinator.lastname de $buyercollege.collegeName a participer a l'evenement "$salaryEvent.offeringtitle" sur johndoeMaster Sourcing Intelligence.
Votre nom d'utilisateur est $!invitedUser.username
Veuillez suivre le lien ci-dessous pour acceder a l'evenement.
${httpWebRoot}/myDashboard.do?idp=$!{idp}
Si vous avez oublie votre mot de passe, utilisez le lien "Mot de passe oublie" situe sur l'ecran de connexion et entrez votre nom d'utilisateur ci-dessus.
Si vous avez des questions ou des preoccupations, nous vous invitons a communiquer avec le coordonnateur de l'evenement $!salaryCoordinator.firstname $!salaryCoordinator.lastname au ${salaryCoordinator.workphone}.
*******
johndoeMaster Sourcing Intelligence est une plateforme de soumission en ligne pour les equipements, les materiaux et les services.
Si vous avez des difficultes ou des questions, envoyez un courriel a support@johndoeMaster.com pour obtenir de l'aide.