ParquetSerDe
A serializer to use for converting data to the Parquet format before storing it in
Amazon S3. For more information, see Apache Parquet
Contents
- BlockSizeBytes
-
The Hadoop Distributed File System (HDFS) block size. This is useful if you intend to copy the data from Amazon S3 to HDFS before querying. The default is 256 MiB and the minimum is 64 MiB. Firehose uses this value for padding calculations.
Type: Integer
Valid Range: Minimum value of 67108864.
Required: No
- Compression
-
The compression code to use over data blocks. The possible values are
UNCOMPRESSED
,SNAPPY
, andGZIP
, with the default beingSNAPPY
. UseSNAPPY
for higher decompression speed. UseGZIP
if the compression ratio is more important than speed.Type: String
Valid Values:
UNCOMPRESSED | GZIP | SNAPPY
Required: No
- EnableDictionaryCompression
-
Indicates whether to enable dictionary compression.
Type: Boolean
Required: No
- MaxPaddingBytes
-
The maximum amount of padding to apply. This is useful if you intend to copy the data from Amazon S3 to HDFS before querying. The default is 0.
Type: Integer
Valid Range: Minimum value of 0.
Required: No
- PageSizeBytes
-
The Parquet page size. Column chunks are divided into pages. A page is conceptually an indivisible unit (in terms of compression and encoding). The minimum value is 64 KiB and the default is 1 MiB.
Type: Integer
Valid Range: Minimum value of 65536.
Required: No
- WriterVersion
-
Indicates the version of row format to output. The possible values are
V1
andV2
. The default isV1
.Type: String
Valid Values:
V1 | V2
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: