@Generated(value="com.amazonaws:aws-java-sdk-code-generator") public class S3ParquetSource extends Object implements Serializable, Cloneable, StructuredPojo
Specifies an Apache Parquet data store stored in Amazon S3.
Constructor and Description |
---|
S3ParquetSource() |
Modifier and Type | Method and Description |
---|---|
S3ParquetSource |
clone() |
boolean |
equals(Object obj) |
S3DirectSourceAdditionalOptions |
getAdditionalOptions()
Specifies additional connection options.
|
String |
getCompressionType()
Specifies how the data is compressed.
|
List<String> |
getExclusions()
A string containing a JSON list of Unix-style glob patterns to exclude.
|
String |
getGroupFiles()
Grouping files is turned on by default when the input contains more than 50,000 files.
|
String |
getGroupSize()
The target group size in bytes.
|
Integer |
getMaxBand()
This option controls the duration in milliseconds after which the s3 listing is likely to be consistent.
|
Integer |
getMaxFilesInBand()
This option specifies the maximum number of files to save from the last maxBand seconds.
|
String |
getName()
The name of the data store.
|
List<GlueSchema> |
getOutputSchemas()
Specifies the data schema for the S3 Parquet source.
|
List<String> |
getPaths()
A list of the Amazon S3 paths to read from.
|
Boolean |
getRecurse()
If set to true, recursively reads files in all subdirectories under the specified paths.
|
int |
hashCode() |
Boolean |
isRecurse()
If set to true, recursively reads files in all subdirectories under the specified paths.
|
void |
marshall(ProtocolMarshaller protocolMarshaller)
Marshalls this structured data using the given
ProtocolMarshaller . |
void |
setAdditionalOptions(S3DirectSourceAdditionalOptions additionalOptions)
Specifies additional connection options.
|
void |
setCompressionType(String compressionType)
Specifies how the data is compressed.
|
void |
setExclusions(Collection<String> exclusions)
A string containing a JSON list of Unix-style glob patterns to exclude.
|
void |
setGroupFiles(String groupFiles)
Grouping files is turned on by default when the input contains more than 50,000 files.
|
void |
setGroupSize(String groupSize)
The target group size in bytes.
|
void |
setMaxBand(Integer maxBand)
This option controls the duration in milliseconds after which the s3 listing is likely to be consistent.
|
void |
setMaxFilesInBand(Integer maxFilesInBand)
This option specifies the maximum number of files to save from the last maxBand seconds.
|
void |
setName(String name)
The name of the data store.
|
void |
setOutputSchemas(Collection<GlueSchema> outputSchemas)
Specifies the data schema for the S3 Parquet source.
|
void |
setPaths(Collection<String> paths)
A list of the Amazon S3 paths to read from.
|
void |
setRecurse(Boolean recurse)
If set to true, recursively reads files in all subdirectories under the specified paths.
|
String |
toString()
Returns a string representation of this object.
|
S3ParquetSource |
withAdditionalOptions(S3DirectSourceAdditionalOptions additionalOptions)
Specifies additional connection options.
|
S3ParquetSource |
withCompressionType(ParquetCompressionType compressionType)
Specifies how the data is compressed.
|
S3ParquetSource |
withCompressionType(String compressionType)
Specifies how the data is compressed.
|
S3ParquetSource |
withExclusions(Collection<String> exclusions)
A string containing a JSON list of Unix-style glob patterns to exclude.
|
S3ParquetSource |
withExclusions(String... exclusions)
A string containing a JSON list of Unix-style glob patterns to exclude.
|
S3ParquetSource |
withGroupFiles(String groupFiles)
Grouping files is turned on by default when the input contains more than 50,000 files.
|
S3ParquetSource |
withGroupSize(String groupSize)
The target group size in bytes.
|
S3ParquetSource |
withMaxBand(Integer maxBand)
This option controls the duration in milliseconds after which the s3 listing is likely to be consistent.
|
S3ParquetSource |
withMaxFilesInBand(Integer maxFilesInBand)
This option specifies the maximum number of files to save from the last maxBand seconds.
|
S3ParquetSource |
withName(String name)
The name of the data store.
|
S3ParquetSource |
withOutputSchemas(Collection<GlueSchema> outputSchemas)
Specifies the data schema for the S3 Parquet source.
|
S3ParquetSource |
withOutputSchemas(GlueSchema... outputSchemas)
Specifies the data schema for the S3 Parquet source.
|
S3ParquetSource |
withPaths(Collection<String> paths)
A list of the Amazon S3 paths to read from.
|
S3ParquetSource |
withPaths(String... paths)
A list of the Amazon S3 paths to read from.
|
S3ParquetSource |
withRecurse(Boolean recurse)
If set to true, recursively reads files in all subdirectories under the specified paths.
|
public void setName(String name)
The name of the data store.
name
- The name of the data store.public String getName()
The name of the data store.
public S3ParquetSource withName(String name)
The name of the data store.
name
- The name of the data store.public List<String> getPaths()
A list of the Amazon S3 paths to read from.
public void setPaths(Collection<String> paths)
A list of the Amazon S3 paths to read from.
paths
- A list of the Amazon S3 paths to read from.public S3ParquetSource withPaths(String... paths)
A list of the Amazon S3 paths to read from.
NOTE: This method appends the values to the existing list (if any). Use
setPaths(java.util.Collection)
or withPaths(java.util.Collection)
if you want to override the
existing values.
paths
- A list of the Amazon S3 paths to read from.public S3ParquetSource withPaths(Collection<String> paths)
A list of the Amazon S3 paths to read from.
paths
- A list of the Amazon S3 paths to read from.public void setCompressionType(String compressionType)
Specifies how the data is compressed. This is generally not necessary if the data has a standard file extension.
Possible values are "gzip"
and "bzip"
).
compressionType
- Specifies how the data is compressed. This is generally not necessary if the data has a standard file
extension. Possible values are "gzip"
and "bzip"
).ParquetCompressionType
public String getCompressionType()
Specifies how the data is compressed. This is generally not necessary if the data has a standard file extension.
Possible values are "gzip"
and "bzip"
).
"gzip"
and "bzip"
).ParquetCompressionType
public S3ParquetSource withCompressionType(String compressionType)
Specifies how the data is compressed. This is generally not necessary if the data has a standard file extension.
Possible values are "gzip"
and "bzip"
).
compressionType
- Specifies how the data is compressed. This is generally not necessary if the data has a standard file
extension. Possible values are "gzip"
and "bzip"
).ParquetCompressionType
public S3ParquetSource withCompressionType(ParquetCompressionType compressionType)
Specifies how the data is compressed. This is generally not necessary if the data has a standard file extension.
Possible values are "gzip"
and "bzip"
).
compressionType
- Specifies how the data is compressed. This is generally not necessary if the data has a standard file
extension. Possible values are "gzip"
and "bzip"
).ParquetCompressionType
public List<String> getExclusions()
A string containing a JSON list of Unix-style glob patterns to exclude. For example, "[\"**.pdf\"]" excludes all PDF files.
public void setExclusions(Collection<String> exclusions)
A string containing a JSON list of Unix-style glob patterns to exclude. For example, "[\"**.pdf\"]" excludes all PDF files.
exclusions
- A string containing a JSON list of Unix-style glob patterns to exclude. For example, "[\"**.pdf\"]"
excludes all PDF files.public S3ParquetSource withExclusions(String... exclusions)
A string containing a JSON list of Unix-style glob patterns to exclude. For example, "[\"**.pdf\"]" excludes all PDF files.
NOTE: This method appends the values to the existing list (if any). Use
setExclusions(java.util.Collection)
or withExclusions(java.util.Collection)
if you want to
override the existing values.
exclusions
- A string containing a JSON list of Unix-style glob patterns to exclude. For example, "[\"**.pdf\"]"
excludes all PDF files.public S3ParquetSource withExclusions(Collection<String> exclusions)
A string containing a JSON list of Unix-style glob patterns to exclude. For example, "[\"**.pdf\"]" excludes all PDF files.
exclusions
- A string containing a JSON list of Unix-style glob patterns to exclude. For example, "[\"**.pdf\"]"
excludes all PDF files.public void setGroupSize(String groupSize)
The target group size in bytes. The default is computed based on the input data size and the size of your
cluster. When there are fewer than 50,000 input files, "groupFiles"
must be set to
"inPartition"
for this to take effect.
groupSize
- The target group size in bytes. The default is computed based on the input data size and the size of your
cluster. When there are fewer than 50,000 input files, "groupFiles"
must be set to
"inPartition"
for this to take effect.public String getGroupSize()
The target group size in bytes. The default is computed based on the input data size and the size of your
cluster. When there are fewer than 50,000 input files, "groupFiles"
must be set to
"inPartition"
for this to take effect.
"groupFiles"
must be set to
"inPartition"
for this to take effect.public S3ParquetSource withGroupSize(String groupSize)
The target group size in bytes. The default is computed based on the input data size and the size of your
cluster. When there are fewer than 50,000 input files, "groupFiles"
must be set to
"inPartition"
for this to take effect.
groupSize
- The target group size in bytes. The default is computed based on the input data size and the size of your
cluster. When there are fewer than 50,000 input files, "groupFiles"
must be set to
"inPartition"
for this to take effect.public void setGroupFiles(String groupFiles)
Grouping files is turned on by default when the input contains more than 50,000 files. To turn on grouping with
fewer than 50,000 files, set this parameter to "inPartition". To disable grouping when there are more than 50,000
files, set this parameter to "none"
.
groupFiles
- Grouping files is turned on by default when the input contains more than 50,000 files. To turn on grouping
with fewer than 50,000 files, set this parameter to "inPartition". To disable grouping when there are more
than 50,000 files, set this parameter to "none"
.public String getGroupFiles()
Grouping files is turned on by default when the input contains more than 50,000 files. To turn on grouping with
fewer than 50,000 files, set this parameter to "inPartition". To disable grouping when there are more than 50,000
files, set this parameter to "none"
.
"none"
.public S3ParquetSource withGroupFiles(String groupFiles)
Grouping files is turned on by default when the input contains more than 50,000 files. To turn on grouping with
fewer than 50,000 files, set this parameter to "inPartition". To disable grouping when there are more than 50,000
files, set this parameter to "none"
.
groupFiles
- Grouping files is turned on by default when the input contains more than 50,000 files. To turn on grouping
with fewer than 50,000 files, set this parameter to "inPartition". To disable grouping when there are more
than 50,000 files, set this parameter to "none"
.public void setRecurse(Boolean recurse)
If set to true, recursively reads files in all subdirectories under the specified paths.
recurse
- If set to true, recursively reads files in all subdirectories under the specified paths.public Boolean getRecurse()
If set to true, recursively reads files in all subdirectories under the specified paths.
public S3ParquetSource withRecurse(Boolean recurse)
If set to true, recursively reads files in all subdirectories under the specified paths.
recurse
- If set to true, recursively reads files in all subdirectories under the specified paths.public Boolean isRecurse()
If set to true, recursively reads files in all subdirectories under the specified paths.
public void setMaxBand(Integer maxBand)
This option controls the duration in milliseconds after which the s3 listing is likely to be consistent. Files with modification timestamps falling within the last maxBand milliseconds are tracked specially when using JobBookmarks to account for Amazon S3 eventual consistency. Most users don't need to set this option. The default is 900000 milliseconds, or 15 minutes.
maxBand
- This option controls the duration in milliseconds after which the s3 listing is likely to be consistent.
Files with modification timestamps falling within the last maxBand milliseconds are tracked specially when
using JobBookmarks to account for Amazon S3 eventual consistency. Most users don't need to set this
option. The default is 900000 milliseconds, or 15 minutes.public Integer getMaxBand()
This option controls the duration in milliseconds after which the s3 listing is likely to be consistent. Files with modification timestamps falling within the last maxBand milliseconds are tracked specially when using JobBookmarks to account for Amazon S3 eventual consistency. Most users don't need to set this option. The default is 900000 milliseconds, or 15 minutes.
public S3ParquetSource withMaxBand(Integer maxBand)
This option controls the duration in milliseconds after which the s3 listing is likely to be consistent. Files with modification timestamps falling within the last maxBand milliseconds are tracked specially when using JobBookmarks to account for Amazon S3 eventual consistency. Most users don't need to set this option. The default is 900000 milliseconds, or 15 minutes.
maxBand
- This option controls the duration in milliseconds after which the s3 listing is likely to be consistent.
Files with modification timestamps falling within the last maxBand milliseconds are tracked specially when
using JobBookmarks to account for Amazon S3 eventual consistency. Most users don't need to set this
option. The default is 900000 milliseconds, or 15 minutes.public void setMaxFilesInBand(Integer maxFilesInBand)
This option specifies the maximum number of files to save from the last maxBand seconds. If this number is exceeded, extra files are skipped and only processed in the next job run.
maxFilesInBand
- This option specifies the maximum number of files to save from the last maxBand seconds. If this number is
exceeded, extra files are skipped and only processed in the next job run.public Integer getMaxFilesInBand()
This option specifies the maximum number of files to save from the last maxBand seconds. If this number is exceeded, extra files are skipped and only processed in the next job run.
public S3ParquetSource withMaxFilesInBand(Integer maxFilesInBand)
This option specifies the maximum number of files to save from the last maxBand seconds. If this number is exceeded, extra files are skipped and only processed in the next job run.
maxFilesInBand
- This option specifies the maximum number of files to save from the last maxBand seconds. If this number is
exceeded, extra files are skipped and only processed in the next job run.public void setAdditionalOptions(S3DirectSourceAdditionalOptions additionalOptions)
Specifies additional connection options.
additionalOptions
- Specifies additional connection options.public S3DirectSourceAdditionalOptions getAdditionalOptions()
Specifies additional connection options.
public S3ParquetSource withAdditionalOptions(S3DirectSourceAdditionalOptions additionalOptions)
Specifies additional connection options.
additionalOptions
- Specifies additional connection options.public List<GlueSchema> getOutputSchemas()
Specifies the data schema for the S3 Parquet source.
public void setOutputSchemas(Collection<GlueSchema> outputSchemas)
Specifies the data schema for the S3 Parquet source.
outputSchemas
- Specifies the data schema for the S3 Parquet source.public S3ParquetSource withOutputSchemas(GlueSchema... outputSchemas)
Specifies the data schema for the S3 Parquet source.
NOTE: This method appends the values to the existing list (if any). Use
setOutputSchemas(java.util.Collection)
or withOutputSchemas(java.util.Collection)
if you want
to override the existing values.
outputSchemas
- Specifies the data schema for the S3 Parquet source.public S3ParquetSource withOutputSchemas(Collection<GlueSchema> outputSchemas)
Specifies the data schema for the S3 Parquet source.
outputSchemas
- Specifies the data schema for the S3 Parquet source.public String toString()
toString
in class Object
Object.toString()
public S3ParquetSource clone()
public void marshall(ProtocolMarshaller protocolMarshaller)
StructuredPojo
ProtocolMarshaller
.marshall
in interface StructuredPojo
protocolMarshaller
- Implementation of ProtocolMarshaller
used to marshall this object's data.