SemanticChunkingConfiguration
Settings for semantic document chunking for a data source. Semantic chunking splits a document into smaller documents based on groups of similar content derived from the text with natural language processing.
Contents
- breakpointPercentileThreshold
-
The dissimilarity threshold for splitting chunks.
Type: Integer
Valid Range: Minimum value of 50. Maximum value of 99.
Required: Yes
- bufferSize
-
The buffer size.
Type: Integer
Valid Range: Minimum value of 0. Maximum value of 1.
Required: Yes
- maxTokens
-
The maximum number of tokens that a chunk can contain.
Type: Integer
Valid Range: Minimum value of 1.
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: