AWS Encryption SDK message format reference
The information on this page is a reference for building your own encryption library that is compatible with the AWS Encryption SDK. If you are not building your own compatible encryption library, you likely do not need this information. To use the AWS Encryption SDK in one of the supported programming languages, see Programming languages. For the specification that defines the elements of a proper AWS Encryption SDK implementation, see the AWS Encryption SDK Specification |
The encryption operations in the AWS Encryption SDK return a single data structure or encrypted message that contains the encrypted data (ciphertext) and all encrypted data keys. To understand this data structure, or to build libraries that read and write it, you need to understand the message format.
The message format consists of at least two parts: a header and a body. In some cases, the message format consists of a third part, a footer. The message format defines an ordered sequence of bytes in network byte order, also called big-endian format. The message format begins with the header, followed by the body, followed by the footer (when there is one).
The algorithms suites supported by the AWS Encryption SDK use one of two message format versions. Algorithm suites without key commitment use message format version 1. Algorithm suites with key commitment use message format version 2.
Header structure
The message header contains the encrypted data key and information about how the message body is formed. The following table describes the fields that form the header in message format versions 1 and 2. The bytes are appended in the order shown.
The Not present value indicates that the field doesn't exist in that version of the message format. Bold text indicates values that are different in each version.
Note
You might need to scroll horizontally or vertically to see all of the data in this table.
Field | Message format version 1 Length (bytes) |
Message format version 2 Length (bytes) |
---|---|---|
Version | 1 | 1 |
Type | 1 | Not present |
Algorithm ID | 2 | 2 |
Message ID | 16 | 32 |
AAD Length | 2 When the encryption context is empty, the value of the 2-byte AAD Length field is 0. |
2 When the encryption context is empty, the value of the 2-byte AAD Length field is 0. |
AAD | Variable. The length of this field appears in the previous 2 bytes (AAD Length field). When the encryption context is empty, there is no AAD field in the header. |
Variable. The length of this field appears in the previous 2 bytes (AAD Length field). When the encryption context is empty, there is no AAD field in the header. |
Encrypted Data Key Count | 2 | 2 |
Encrypted Data Key(s) | Variable. Determined by the number of encrypted data keys and the length of each. | Variable. Determined by the number of encrypted data keys and the length of each. |
Content Type | 1 | 1 |
Reserved | 4 | Not present |
IV Length | 1 | Not present |
Frame Length | 4 | 4 |
Algorithm Suite Data | Not present | Variable. Determined by the algorithm that generated the message. |
Header Authentication | Variable. Determined by the algorithm that generated the message. | Variable. Determined by the algorithm that generated the message. |
- Version
-
The version of this message format. The version is either 1 or 2 encoded as the byte
01
or02
in hexadecimal notation - Type
-
The type of this message format. The type indicates the kind of structure. The only supported type is described as customer authenticated encrypted data. Its type value is 128, encoded as byte
80
in hexadecimal notation.This field is not present in message format version 2.
- Algorithm ID
-
An identifier for the algorithm used. It is a 2-byte value interpreted as a 16-bit unsigned integer. For more information about the algorithms, see AWS Encryption SDK algorithms reference.
- Message ID
-
A randomly generated value that identifies the message. The Message ID:
-
Uniquely identifies the encrypted message.
-
Weakly binds the message header to the message body.
-
Provides a mechanism to securely reuse a data key with multiple encrypted messages.
-
Protects against accidental reuse of a data key or the wearing out of keys in the AWS Encryption SDK.
This value is 128 bits in message format version 1 and 256 bits in version 2.
-
- AAD Length
-
The length of the additional authenticated data (AAD). It is a 2-byte value interpreted as a 16-bit unsigned integer that specifies the number of bytes that contain the AAD.
When the encryption context is empty, the value of the AAD Length field is 0.
- AAD
-
The additional authenticated data. The AAD is an encoding of the encryption context, an array of key-value pairs where each key and value is a string of UTF-8 encoded characters. The encryption context is converted to a sequence of bytes and used for the AAD value. When the encryption context is empty, there is no AAD field in the header.
When the algorithms with signing are used, the encryption context must contain the key-value pair
{'aws-crypto-public-key', Qtxt}
. Qtxt represents the elliptic curve point Q compressed according to SEC 1 version 2.0and then base64-encoded. The encryption context can contain additional values, but the maximum length of the constructed AAD is 2^16 - 1 bytes. The following table describes the fields that form the AAD. Key-value pairs are sorted, by key, in ascending order according to UTF-8 character code. The bytes are appended in the order shown.
AAD Structure Field Length (bytes) Key-Value Pair Count 2 Key Length 2 Key Variable. Equal to the value specified in the previous 2 bytes (Key Length). Value Length 2 Value Variable. Equal to the value specified in the previous 2 bytes (Value Length). - Key-Value Pair Count
-
The number of key-value pairs in the AAD. It is a 2-byte value interpreted as a 16-bit unsigned integer that specifies the number of key-value pairs in the AAD. The maximum number of key-value pairs in the AAD is 2^16 - 1.
When there is no encryption context or the encryption context is empty, this field is not present in the AAD structure.
- Key Length
-
The length of the key for the key-value pair. It is a 2-byte value interpreted as a 16-bit unsigned integer that specifies the number of bytes that contain the key.
- Key
-
The key for the key-value pair. It is a sequence of UTF-8 encoded bytes.
- Value Length
-
The length of the value for the key-value pair. It is a 2-byte value interpreted as a 16-bit unsigned integer that specifies the number of bytes that contain the value.
- Value
-
The value for the key-value pair. It is a sequence of UTF-8 encoded bytes.
- Encrypted Data Key Count
-
The number of encrypted data keys. It is a 2-byte value interpreted as a 16-bit unsigned integer that specifies the number of encrypted data keys. The maximum number of encrypted data keys in each message is 65,535 (2^16 - 1).
- Encrypted Data Key(s)
-
A sequence of encrypted data keys. The length of the sequence is determined by the number of encrypted data keys and the length of each. The sequence contains at least one encrypted data key.
The following table describes the fields that form each encrypted data key. The bytes are appended in the order shown.
Encrypted Data Key Structure Field Length (bytes) Key Provider ID Length 2 Key Provider ID Variable. Equal to the value specified in the previous 2 bytes (Key Provider ID Length). Key Provider Information Length 2 Key Provider Information Variable. Equal to the value specified in the previous 2 bytes (Key Provider Information Length). Encrypted Data Key Length 2 Encrypted Data Key Variable. Equal to the value specified in the previous 2 bytes (Encrypted Data Key Length). - Key Provider ID Length
-
The length of the key provider identifier. It is a 2-byte value interpreted as a 16-bit unsigned integer that specifies the number of bytes that contain the key provider ID.
- Key Provider ID
-
The key provider identifier. It is used to indicate the provider of the encrypted data key and intended to be extensible.
- Key Provider Information Length
-
The length of the key provider information. It is a 2-byte value interpreted as a 16-bit unsigned integer that specifies the number of bytes that contain the key provider information.
- Key Provider Information
-
The key provider information. It is determined by the key provider.
When AWS KMS is the master key provider or you are using an AWS KMS keyring, this value contains the Amazon Resource Name (ARN) of the AWS KMS key.
- Encrypted Data Key Length
-
The length of the encrypted data key. It is a 2-byte value interpreted as a 16-bit unsigned integer that specifies the number of bytes that contain the encrypted data key.
- Encrypted Data Key
-
The encrypted data key. It is the data encryption key encrypted by the key provider.
- Content Type
-
The type of encrypted data, either nonframed or framed.
Note
Whenever possible, use framed data. The AWS Encryption SDK supports nonframed data only for legacy use. Some language implementations of the AWS Encryption SDK can still generate nonframed ciphertext. All supported language implementations can decrypt framed and nonframed ciphertext.
Framed data is divided into equal-length parts; each part is encrypted separately. Framed content is type 2, encoded as the byte
02
in hexadecimal notation.Nonframed data is not divided; it is a single encrypted blob. Non-framed content is type 1, encoded as the byte
01
in hexadecimal notation. - Reserved
-
A reserved sequence of 4 bytes. This value must be 0. It is encoded as the bytes
00 00 00 00
in hexadecimal notation (that is, a 4-byte sequence of a 32-bit integer value equal to 0).This field is not present in message format version 2.
- IV Length
-
The length of the initialization vector (IV). It is a 1-byte value interpreted as an 8-bit unsigned integer that specifies the number of bytes that contain the IV. This value is determined by the IV bytes value of the algorithm that generated the message.
This field is not present in message format version 2, which only supports algorithm suites that use deterministic IV values in the message header.
- Frame Length
-
The length of each frame of framed data. It is a 4-byte value interpreted as a 32-bit unsigned integer that specifies the number of bytes in each frame. When the data is nonframed, that is, when the value of the
Content Type
field is 1, this value must be 0.Note
Whenever possible, use framed data. The AWS Encryption SDK supports nonframed data only for legacy use. Some language implementations of the AWS Encryption SDK can still generate nonframed ciphertext. All supported language implementations can decrypt framed and nonframed ciphertext.
- Algorithm Suite Data
-
Supplementary data needed by the algorithm that generated the message. The length and contents are determined by the algorithm. Its length might be 0.
This field is not present in message format version 1.
- Header Authentication
-
The header authentication is determined by the algorithm that generated the message. The header authentication is calculated over the entire header. It consists of an IV and an authentication tag. The bytes are appended in the order shown.
Header Authentication Structure Field Length in version 1.0 (bytes) Length in version 2.0 (bytes) IV Variable. Determined by the IV bytes value of the algorithm that generated the message. N/A Authentication Tag Variable. Determined by the authentication tag bytes value of the algorithm that generated the message. Variable. Determined by the authentication tag bytes value of the algorithm that generated the message. - IV
-
The initialization vector (IV) used to calculate the header authentication tag.
This field is not present in the header of message format version 2. Message format version 2 only supports algorithm suites that use deterministic IV values in the message header.
- Authentication Tag
-
The authentication value for the header. It is used to authenticate the entire contents of the header.
Body structure
The message body contains the encrypted data, called the ciphertext. The structure of the body depends on the content type (nonframed or framed). The following sections describe the format of the message body for each content type. The message body structure is the same in message format versions 1 and 2.
Non-framed data
Non-framed data is encrypted in a single blob with a unique IV and body AAD.
Note
Whenever possible, use framed data. The AWS Encryption SDK supports nonframed data only for legacy use. Some language implementations of the AWS Encryption SDK can still generate nonframed ciphertext. All supported language implementations can decrypt framed and nonframed ciphertext.
The following table describes the fields that form nonframed data. The bytes are appended in the order shown.
Field | Length, in bytes |
---|---|
IV | Variable. Equal to the value specified in the IV Length byte of the header. |
Encrypted Content Length | 8 |
Encrypted Content | Variable. Equal to the value specified in the previous 8 bytes (Encrypted Content Length). |
Authentication Tag | Variable. Determined by the algorithm implementation used. |
- IV
-
The initialization vector (IV) to use with the encryption algorithm.
- Encrypted Content Length
-
The length of the encrypted content, or ciphertext. It is an 8-byte value interpreted as a 64-bit unsigned integer that specifies the number of bytes that contain the encrypted content.
Technically, the maximum allowed value is 2^63 - 1, or 8 exbibytes (8 EiB). However, in practice the maximum value is 2^36 - 32, or 64 gibibytes (64 GiB), due to restrictions imposed by the implemented algorithms.
Note
The Java implementation of this SDK further restricts this value to 2^31 - 1, or 2 gibibytes (2 GiB), due to restrictions in the language.
- Encrypted Content
-
The encrypted content (ciphertext) as returned by the encryption algorithm.
- Authentication Tag
-
The authentication value for the body. It is used to authenticate the message body.
Framed data
In framed data, the plaintext data is divided into equal-length parts called frames. The AWS Encryption SDK encrypts each frame separately with a unique IV and body AAD.
Note
Whenever possible, use framed data. The AWS Encryption SDK supports nonframed data only for legacy use. Some language implementations of the AWS Encryption SDK can still generate nonframed ciphertext. All supported language implementations can decrypt framed and nonframed ciphertext.
The frame length, which is the length of the encrypted content in the frame, can be different for each message. The maximum number of bytes in a frame is 2^32 - 1. The maximum number of frames in a message is 2^32 - 1.
There are two types of frames: regular and final. Every message must consist of or include a final frame.
All regular frames in a message have the same frame length. The final frame can have a different frame length.
The composition of frames in framed data varies with the length of the encrypted content.
-
Equal to the frame length — When the encrypted content length is the same as the frame length of the regular frames, the message can consist of a regular frame that contains the data, followed by a final frame of zero (0) length. Or, the message can consist only of a final frame that contains the data. In this case, the final frame has the same frame length as the regular frames.
-
Multiple of the frame length — When the encrypted content length is an exact multiple of the frame length of the regular frames, the message can end in a regular frame that contains the data, followed by a final frame of zero (0) length. Or, the message can end in a final frame that contains the data. In this case, the final frame has the same frame length as the regular frames.
-
Not a multiple of the frame length — When the encrypted content length is not an exact multiple of the frame length of the regular frames, the final frame contains the remaining data. The frame length of the final frame is less than the frame length of the regular frames.
-
Less than the frame length — When the encrypted content length is less than the frame length of the regular frames, the message consists of a final frame that contains all of the data. The frame length of the final frame is less than the frame length of the regular frames.
The following tables describe the fields that form the frames. The bytes are appended in the order shown.
Field | Length, in bytes |
---|---|
Sequence Number | 4 |
IV | Variable. Equal to the value specified in the IV Length byte of the header. |
Encrypted Content | Variable. Equal to the value specified in the Frame Length of the header. |
Authentication Tag | Variable. Determined by the algorithm used, as specified in the Algorithm ID of the header. |
- Sequence Number
-
The frame sequence number. It is an incremental counter number for the frame. It is a 4-byte value interpreted as a 32-bit unsigned integer.
Framed data must start at sequence number 1. Subsequent frames must be in order and must contain an increment of 1 of the previous frame. Otherwise, the decryption process stops and reports an error.
- IV
-
The initialization vector (IV) for the frame. The SDK uses a deterministic method to construct a different IV for each frame in the message. Its length is specified by the algorithm suite used.
- Encrypted Content
-
The encrypted content (ciphertext) for the frame, as returned by the encryption algorithm.
- Authentication Tag
-
The authentication value for the frame. It is used to authenticate the entire frame.
Field | Length, in bytes |
---|---|
Sequence Number End | 4 |
Sequence Number | 4 |
IV | Variable. Equal to the value specified in the IV Length byte of the header. |
Encrypted Content Length | 4 |
Encrypted Content | Variable. Equal to the value specified in the previous 4 bytes (Encrypted Content Length). |
Authentication Tag | Variable. Determined by the algorithm used, as specified in the Algorithm ID of the header. |
- Sequence Number End
-
An indicator for the final frame. The value is encoded as the 4 bytes
FF FF FF FF
in hexadecimal notation. - Sequence Number
-
The frame sequence number. It is an incremental counter number for the frame. It is a 4-byte value interpreted as a 32-bit unsigned integer.
Framed data must start at sequence number 1. Subsequent frames must be in order and must contain an increment of 1 of the previous frame. Otherwise, the decryption process stops and reports an error.
- IV
-
The initialization vector (IV) for the frame. The SDK uses a deterministic method to construct a different IV for each frame in the message. The length of the IV length is specified by the algorithm suite.
- Encrypted Content Length
-
The length of the encrypted content. It is a 4-byte value interpreted as a 32-bit unsigned integer that specifies the number of bytes that contain the encrypted content for the frame.
- Encrypted Content
-
The encrypted content (ciphertext) for the frame, as returned by the encryption algorithm.
- Authentication Tag
-
The authentication value for the frame. It is used to authenticate the entire frame.
Footer structure
When the algorithms with signing are used, the message format contains a footer. The message footer contains a digital signature calculated over the message header and body. The following table describes the fields that form the footer. The bytes are appended in the order shown. The message footer structure is the same in message format versions 1 and 2.
Field | Length, in bytes |
---|---|
Signature Length | 2 |
Signature | Variable. Equal to the value specified in the previous 2 bytes (Signature Length). |
- Signature Length
-
The length of the signature. It is a 2-byte value interpreted as a 16-bit unsigned integer that specifies the number of bytes that contain the signature.
- Signature
-
The signature.