Step 4: Generate an encryption schema for a tabular file
To encrypt data, an encryption schema describing how the data will be used is required. This section describes how the C3R encryption client assists in generating an encryption schema for a CSV file with a header row or a Parquet file.
You only need to do this once per file. After the schema exists, it can be re-used to encrypt the same file (or any file with identical column names). If the column names or desired encryption schema changes, you must update the schema file. For more information, see (Optional) Create a schema (advanced users).
Important
It is paramount that all collaborating parties use the same shared secret key.
Collaborating parties should also coordinate column names to match if they will be
JOINed or otherwise compared for equality in queries. Otherwise, the SQL
queries might produce unexpected or incorrect results. However, this is not necessary if the
collaboration creator enabled the allowJoinsOnColumnsWithDifferentNames
encryption setting during collaboration creation. For more information about
encryption-relevant settings, see Cryptographic computing parameters.
When run in schema mode, the C3R encryption client goes through the input file column by column, prompting you if and how that column should be treated. If the file contains many columns that aren't wanted for the encrypted output, the interactive schema generation might become tedious because you must skip each undesired column. To avoid this, you could manually write a schema, or create a simplified version of the input file featuring only the wanted columns. Then, the interactive schema generator could be run on that reduced file. The C3R encryption client outputs information about the schema file and asks you how the source columns should be included or encrypted (if at all) in the target output.
For each source column in the input file, you are prompted for:
-
How many target columns should be generated
-
How each target column should be encrypted (if at all)
-
The name of each target column
-
How data should be padded before encryption if the column is being encrypted as a sealed column
Note
When you encrypt data for a column that has been encrypted as a sealed column, you must determine which data needs padding. The C3R encryption client suggests a default padding during schema generation that pads all entries in a column to the same length.
When determining the length for fixed
, note that padding is in bytes, not
bits.
The following is a decision table for creating the schema.
Decision | Number of target columns from source column <‘name-of-column’> ? | Target column type: [c] cleartext, [f] fingerprint, or [s] sealed ? | Target column headername <default 'name-of-column'> | Add suffix <suffix> to header to indicate how it was encrypted, [y] yes or [n] no <default 'yes'> | <‘name-of-column_sealed’> padding type: [n] one, [f] fixed, or [m] max <default ’max’> |
---|---|---|---|---|---|
Leave the column unencrypted. | 1 | c | Not applicable | Not applicable | Not applicable |
Encrypt the column as a fingerprint column. | 1 | f | Choose default or enter a new header name. | Enter y to choose default (_fingerprint ) or enter
n . |
Not applicable |
Encrypt the column as a sealed column. | 1 | s | Choose default or enter a new header name. | Enter y to choose default (_sealed ) or enter
n . |
Choose padding type . For more information, see (Optional) Create a schema (advanced users). |
Encrypt the column as both fingerprint and sealed. | 2 |
Enter first target column: f . Enter second target column: s. |
Choose the target headers for each target column. | Enter y to choose default or enter n. |
Choose padding type (for sealed columns only). For more information, see (Optional) Create a schema (advanced users). |
The following are two examples of how to create encryption schemas. The exact content of your interaction depends on the input file and the responses that you provide.
Examples
Example: Generate an encryption schema for a fingerprint column and a cleartext column
In this example, for ads.csv
, there are only two columns:
username
and ad_variant
. For these columns, we want the
following:
-
For the
username
column to be encrypted as afingerprint
column -
For the
ad_variant
column to be acleartext
column
To generate an encryption schema for a fingerprint column and a cleartext column
-
(Optional) To ensure the c3r-cli.jar file and file to be encrypted are present:
-
Navigate to the desired directory and run
ls
(if using a Mac or Unix/Linux) ordir
if using Windows). -
View the list of tabular data files (for example, .csv) and choose a file to encrypt.
In this example,
ads.csv
is the file that we want to encrypt.
-
-
From the CLI, run the following command to create a schema interactively.
java -jar c3r-cli.jar schema ads.csv --interactive --output=ads.json
Note
-
You can run
java --jar PATH/TO/c3r-cli.jar
. Or, if you have addedPATH/TO/c3r-cli.jar
to your CLASSPATH environment variable, you can also run the class name. The C3R encryption client will look in the CLASSPATH to find it (for example,java com.amazon.psion.cli.Main
). -
The
--interactive
flag selects the interactive mode for developing the schema. This walks the user through a wizard for creating the schema. Users with advanced skills can create their own schema JSON without using the wizard. For more information, see (Optional) Create a schema (advanced users). -
The
--output
flag sets an output name. If you don't include the--output
flag, the C3R encryption client tries to pick a default output name (such as<input>.out.csv
or for the schema,<input>.json
).
-
-
For
Number of target columns from source column ‘username’?
, enter1
and then press Enter. -
For
Target column type: [c]leartext, [f]ingerprint, or [s]ealed?
, enterf
and then press Enter. -
For
Target column headername <default 'username'>
, press Enter.The default name ‘
username
’ is used. -
For
Add suffix '_fingerprint' to header to indicate how it was encrypted, [y]es or [n]o <default 'yes'>
, entery
and then press Enter.Note
The interactive mode suggests suffixes to add to the encrypted column headers (
_fingerprint
for fingerprint columns and_sealed
for sealed columns). The suffixes might be helpful when you're performing tasks such as uploading data to AWS services or creating AWS Clean Rooms collaborations. These suffixes can help indicate what can be done with the encrypted data in each column. For example, things will not work if you encrypt a column as a sealed column (_sealed
) and try to JOIN on it or try the reverse. -
For
Number of target columns from source column ‘ad_variant’?
, enter1
and then press Enter. -
For
Target column type: [c]leartext, [f]ingerprint, or [s]ealed?
, enterc
and then press Enter. -
For
Target column headername <default 'username'>
, press Enter.The default name ‘
ad_variant
’ is used.The schema is written to a new file called
ads.json
.Note
You can view the schema by opening it in any text editor, such as Notepad on Windows or TextEdit on macOS.
-
You are now ready to encrypt data.
Example: Generate an encryption schema with sealed, fingerprint, and cleartext columns
In this example, for sales.csv
, there are three columns:
username
, purchased
, and product
. For these
columns, we want the following:
-
For the
product
column to be asealed
column -
For the
username
column to be encrypted as afingerprint
column -
For the
purchased
column to be acleartext
column
To generate an encryption schema with sealed, fingerprint, and cleartext columns
-
(Optional) To ensure the c3r-cli.jar file and file to be encrypted are present:
-
Navigate to the desired directory and run
ls
(if using a Mac or Unix/Linux) ordir
if using Windows). -
View the list of tabular data files (.csv) and choose a file to encrypt.
In this example,
sales.csv
is the file that we want to encrypt.
-
-
From the CLI, run the following command to create a schema interactively.
java -jar c3r-cli.jar schema sales.csv --interactive --output=sales.json
Note
-
The
--interactive
flag selects the interactive mode for developing the schema. This walks the user through a guided workflow for creating the schema. -
If you are an advanced user, you can create your own schema JSON without using the guided workflow. For more information, see (Optional) Create a schema (advanced users).
-
For .csv files with no column headers, see the
--noHeaders
flag for the schema command available in the CLI. -
The
--output
flag sets an output name. If you don't include the--output
flag, the C3R encryption client tries to pick a default output name (such as<input>.out
or for the schema,<input>.json
).
-
-
For
Number of target columns from source column ‘username’?
, enter1
and then press Enter. -
For
Target column type: [c]leartext, [f]ingerprint, or [s]ealed?
, enterf
and then press Enter. -
For
Target column headername <default 'username'>
, press Enter.The default name ‘
username
’ is used. -
For
Add suffix '_fingerprint' to header to indicate how it was encrypted, [y]es or [n]o <default 'yes'>
, entery
and then press Enter. -
For
Number of target columns from source column ‘purchased’?
, enter1
and then press Enter. -
For
Target column type: [c]leartext, [f]ingerprint, or [s]ealed?
, enterc
and then press Enter. -
For
Target column headername <default 'purchased'>
, press Enter.The default name ‘
purchased
’ is used. -
For
Number of target columns from source column ‘product’?
, enter1
and then press Enter. -
For
Target column type: [c]leartext, [f]ingerprint, or [s]ealed?
, enters
and then press Enter. -
For
Target column headername <default 'product'>
, press Enter.The default name ‘
product
’ is used. -
For
‘product_sealed’ padding type: [n]one, [f]ixed, or [m]ax <default ’max’?>
, press Enter to choose the default. -
For
Byte-length beyond max length to pad cleartext to in ‘product_sealed’ <default ‘0’>?
press Enter to choose the default.The schema is written to a new file called
sales.json
. -
You are now ready to encrypt data.