Convert mainframe files from EBCDIC format to character-delimited ASCII format in Amazon S3 using AWS Lambda - AWS Prescriptive Guidance

Convert mainframe files from EBCDIC format to character-delimited ASCII format in Amazon S3 using AWS Lambda

Created by Luis Gustavo Dantas (AWS)

Code repository: Mainframe Data Utilities

Environment: PoC or pilot

Source: IBM EBCDIC files

Target: Delimited ASCII files

R Type: Replatform

Workload: IBM

Technologies: Mainframe

AWS services: AWS CloudShell; AWS Lambda; Amazon S3; Amazon CloudWatch

Summary

This pattern shows you how to launch an AWS Lambda function that automatically converts mainframe EBCDIC (Extended Binary Coded Decimal Interchange Code) files to character-delimited ASCII (American Standard Code for Information Interchange) files. The Lambda function runs after the ASCII files are uploaded to an Amazon Simple Storage Service (Amazon S3) bucket. After the file conversion, you can read the ASCII files on x86-based workloads or load the files into modern databases.

The file conversion approach demonstrated in this pattern can help you overcome the challenges of working with EBCDIC files on modern environments. Files encoded in EBCDIC often contain data represented in a binary or packed decimal format, and fields are fixed-length. These characteristics create obstacles because modern x86-based workloads or distributed environments generally work with ASCII-encoded data and can’t process EBCDIC files.

Prerequisites and limitations

Prerequisites

  • An active AWS account

  • An S3 bucket

  • An AWS Identity and Access Management (IAM) user with administrative permissions

  • AWS CloudShell

  • Python 3.8.0 or later

  • A flat file encoded in EBCDIC and its corresponding data structure in a common business-oriented language (COBOL) copybook

Note: This pattern uses a sample EBCDIC file (CLIENT.EBCDIC.txt) and its corresponding COBOL copybook (COBKS05.cpy). Both files are available in the GitHub mainframe-data-utilities repository.

Limitations

  • COBOL copybooks usually hold multiple layout definitions. The mainframe-data-utilities project can parse this kind of copybook but can't infer which layout to consider on data conversion. This is because copybooks don't hold this logic (which remains on COBOL programs instead). Consequently, you must manually configure the rules for selecting layouts after you parse the copybook.

  • This pattern is subject to Lambda quotas.

Architecture

Source technology stack

  • IBM z/OS, IBM i, and other EBCDIC systems

  • Sequential files with data encoded in EBCDIC (such as IBM Db2 unloads)

  • COBOL copybook

Target technology stack

  • Amazon S3

  • Amazon S3 event notification

  • IAM

  • Lambda function

  • Python 3.8 or later

  • Mainframe Data Utilities

  • JSON metadata

  • Character-delimited ASCII files

Target architecture

The following diagram shows an architecture for converting mainframe EBCDIC files to ASCII files.

Architecture for converting mainframe EBCDIC files to ASCII files

The diagram shows the following workflow:

  1. The user runs the copybook parser script to convert the COBOL copybook into a JSON file.

  2. The user uploads the JSON metadata to an S3 bucket. This makes the metadata readable by the data conversion Lambda function.

  3. The user or an automated process uploads the EBCDIC file to the S3 bucket.

  4. The S3 notification event triggers the data conversion Lambda function.

  5. AWS verifies the S3 bucket read-write permissions for the Lambda function.

  6. Lambda reads the file from the S3 bucket and locally converts the file from EBCDIC to ASCII.

  7. Lambda logs the process status in Amazon CloudWatch.

  8. Lambda writes the ASCII file back to Amazon S3.

Note: The copybook parser script runs only once, after it converts the metadata to JSON and then uploads that data to an S3 bucket. After the initial conversion, any EBCDIC file that uses the same JSON file that's uploaded to the S3 bucket will use the same metadata.

Tools

AWS tools

  • Amazon CloudWatch helps you monitor the metrics of your AWS resources and the applications that you run on AWS in real time.

  • Amazon Simple Storage Service (Amazon S3) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.

  • AWS CloudShell is a browser-based shell that you can use to manage AWS services by using the AWS Command Line Interface (AWS CLI) and a range of preinstalled development tools.

  • AWS Identity and Access Management (IAM) helps you securely manage access to your AWS resources by controlling who is authenticated and authorized to use them.

  • AWS Lambda is a compute service that helps you run code without needing to provision or manage servers. Lambda runs your code only when needed and scales automatically, so you pay only for the compute time that you use.

Other tools

  • GitHub is a code-hosting service that provides collaboration tools and version control.

  • Python is a high-level programming language.

Code

The code for this pattern is available in the GitHub mainframe-data-utilities repository.

Best practices

Consider the following best practices:

  • Set the required permissions at the Amazon Resource Name (ARN) level.

  • Always grant least-privilege permissions for IAM policies. For more information, see Security best practices in IAM in the IAM documentation.

Epics

TaskDescriptionSkills required

Create the environment variables.

Copy the following environment variables to a text editor, and then replace the <placeholder> values in the following example with your resource values:

bucket=<your_bucket_name> account=<your_account_number> region=<your_region_code>

Note: You will create references to your S3 bucket, AWS account, and AWS Region later.

To define environment variables, open the CloudShell console, and then copy and paste your updated environment variables onto the command line.

Note: You must repeat this step every time the CloudShell session restarts.

General AWS

Create a working folder.

To simplify the resource clean-up process later on, create a working folder in CloudShell by running the following command:

mkdir workdir; cd workdir

Note: You must change the directory to the working directory (workdir) every time you lose a connection to your CloudShell session.

General AWS
TaskDescriptionSkills required

Create a trust policy for the Lambda function.

The EBCDIC converter runs in a Lambda function. The function must have an IAM role. Before you create the IAM role, you must define a trust policy document that enables resources to assume that policy.

From the CloudShell working folder, create a policy document by running the following command:

E2ATrustPol=$(cat <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "lambda.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF ) printf "$E2ATrustPol" > E2ATrustPol.json
General AWS

Create the IAM role for Lambda conversion.

To create an IAM role, run the following AWS CLI command from the CloudShell working folder:

aws iam create-role --role-name E2AConvLambdaRole --assume-role-policy-document file://E2ATrustPol.json
General AWS

Create the IAM policy document for the Lambda function.

The Lambda function must have read-write access to the S3 bucket and write permissions for Amazon CloudWatch Logs.

To create an IAM policy, run the following command from the CloudShell working folder:

E2APolicy=$(cat <<EOF { "Version": "2012-10-17", "Statement": [ { "Sid": "Logs", "Effect": "Allow", "Action": [ "logs:PutLogEvents", "logs:CreateLogStream", "logs:CreateLogGroup" ], "Resource": [ "arn:aws:logs:*:*:log-group:*", "arn:aws:logs:*:*:log-group:*:log-stream:*" ] }, { "Sid": "S3", "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:GetObjectVersion" ], "Resource": [ "arn:aws:s3:::%s/*", "arn:aws:s3:::%s" ] } ] } EOF ) printf "$E2APolicy" "$bucket" "$bucket" > E2AConvLambdaPolicy.json
General AWS

Attach the IAM policy document to the IAM role.

To attach the IAM policy to the IAM role, run the following command from your CloudShell working folder:

aws iam put-role-policy --role-name E2AConvLambdaRole --policy-name E2AConvLambdaPolicy --policy-document file://E2AConvLambdaPolicy.json
General AWS
TaskDescriptionSkills required

Download the EBCDIC conversion source code.

From the CloudShell working folder, run the following command to download the mainframe-data-utilities source code from GitHub:

git clone https://github.com/aws-samples/mainframe-data-utilities.git mdu
General AWS

Create the ZIP package.

From the CloudShell working folder, run the following command to create the ZIP package that creates the Lambda function for EBCDIC conversion:

cd mdu; zip ../mdu.zip *.py; cd ..
General AWS

Create the Lambda function.

From the CloudShell working folder, run the following command to create the Lambda function for EBCDIC conversion:

aws lambda create-function \ --function-name E2A \ --runtime python3.9 \ --zip-file fileb://mdu.zip \ --handler extract_ebcdic_to_ascii.lambda_handler \ --role arn:aws:iam::$account:role/E2AConvLambdaRole \ --timeout 10 \ --environment "Variables={layout=$bucket/layout/}"

Note: The environment variable layout tells the Lambda function where the JSON metadata resides.

General AWS

Create the resource-based policy for the Lambda function.

From the CloudShell working folder, run the following command to allow your Amazon S3 event notification to trigger the Lambda function for EBCDIC conversion:

aws lambda add-permission \ --function-name E2A \ --action lambda:InvokeFunction \ --principal s3.amazonaws.com \ --source-arn arn:aws:s3:::$bucket \ --source-account $account \ --statement-id 1
General AWS
TaskDescriptionSkills required

Create the configuration document for the Amazon S3 event notification.

The Amazon S3 event notification initiates the EBCDIC conversion Lambda function when files are placed in the input folder.

From the CloudShell working folder, run the following command to create the JSON document for the Amazon S3 event notification:

{ "LambdaFunctionConfigurations": [ { "Id": "E2A", "LambdaFunctionArn": "arn:aws:lambda:%s:%s:function:E2A", "Events": [ "s3:ObjectCreated:Put" ], "Filter": { "Key": { "FilterRules": [ { "Name": "prefix", "Value": "input/" } ] } } } ] } EOF ) printf "$S3E2AEvent" "$region" "$account" > S3E2AEvent.json
General AWS

Create the Amazon S3 event notification.

From the CloudShell working folder, run the following command to create the Amazon S3 event notification:

aws s3api put-bucket-notification-configuration --bucket $bucket --notification-configuration file://S3E2AEvent.json
General AWS
TaskDescriptionSkills required

Parse the COBOL copybook.

From the CloudShell working folder, run the following command to parse a sample COBOL copybook into a JSON file (which defines how to read and slice the data file properly):

python3 mdu/parse_copybook_to_json.py \ -copybook mdu/LegacyReference/COBKS05.cpy \ -output CLIENT.json \ -output-s3key CLIENT.ASCII.txt \ -output-s3bkt $bucket \ -output-type s3 \ -print 25
General AWS

Add the transformation rule.

The sample data file and its corresponding COBOL copybook is a multi-layout file. This means that the conversion must slice data based on certain rules. In this case, bytes on position 3 and 4 in each row define the layout.

From the CloudShell working folder, edit the CLIENT.json file and change the contents from "transf-rule": [], to the following:

"transf-rule": [ { "offset": 4, "size": 2, "hex": "0002", "transf": "transf1" }, { "offset": 4, "size": 2, "hex": "0000", "transf": "transf2" } ],
General AWS, IBM Mainframe, Cobol

Upload the JSON metadata to the S3 bucket.

From the CloudShell working folder, run the following AWS CLI command to upload the JSON metadata to your S3 bucket:

aws s3 cp CLIENT.json s3://$bucket/layout/CLIENT.json
General AWS
TaskDescriptionSkills required

Send the EBCDIC file to the S3 bucket.

From the CloudShell working folder, run the following command to send the EBCDIC file to the S3 bucket:

aws s3 cp mdu/sample-data/CLIENT.EBCDIC.txt s3://$bucket/input/

Note: We recommend that you set different folders for input (EBCDIC) and output (ASCII) files to avoid calling the Lambda conversion function again when the ASCII file is uploaded to the S3 bucket.

General AWS

Check the output.

From the CloudShell working folder, run the following command to check if the ASCII file is generated in your S3 bucket:

awss3 ls s3://$bucket/

Note: The data conversion can take several seconds to happen. We recommend that you check for the ASCII file a few times.

After the ASCII file is available, run the following command to download the file from the S3 bucket to the current folder:

aws s3 cp s3://$bucket/CLIENT.ASCII.txt .

Check the ASCII file content:

head CLIENT.ASCII.txt
General AWS
TaskDescriptionSkills required

(Optional) Prepare the variables and folder.

If you lose connection with CloudShell, reconnect and then run the following command to change the directory to the working folder:

cd workdir

Ensure that the environment variables are defined:

bucket=<your_bucket_name> account=<your_account_number> region=<your_region_code>
General AWS

Remove the notification configuration for the bucket.

From the CloudShell working folder, run the following command to remove the Amazon S3 event notification configuration:

aws s3api put-bucket-notification-configuration \ --bucket=$bucket \ --notification-configuration="{}"
General AWS

Delete the Lambda function.

From the CloudShell working folder, run the following command to delete the Lambda function for the EBCDIC converter:

awslambdadelete-function--function-nameE2A
General AWS

Delete the IAM role and policy.

From the CloudShell working folder, run the following command to remove the EBCDIC converter role and policy:

aws iam delete-role-policy --role-name E2AConvLambdaRole --policy-name E2AConvLambdaPolicy aws iam delete-role --role-name E2AConvLambdaRole
General AWS

Delete the files generated in the S3 bucket.

From the CloudShell working folder, run the following command to delete the files generated in the S3 bucket:

aws s3 rm s3://$bucket/layout --recursive aws s3 rm s3://$bucket/input --recursive aws s3 rm s3://$bucket/CLIENT.ASCII.txt
General AWS

Delete the working folder.

From the CloudShell working folder, run the following command to remove workdir and its contents:

cd ..; rm -Rf workdir
General AWS

Related resources