Adding documents directly to an index with batch upload
You can add documents directly to an index using the BatchPutDocument API. You can't add documents directly using the console. If you use the console, you connect to a data source to add documents to your index. Documents can be added from an S3 bucket or supplied as binary data. For a list of document types supported by Amazon Kendra see Types of documents.
Adding documents to an index using BatchPutDocument
is an asynchronous
operation. After you call the BatchPutDocument
API, you use the BatchGetDocumentStatus API to monitor the progress of indexing your
documents. When you call the BatchGetDocumentStatus
API with a list of
document IDs, it returns the status of the document. When the status of the document is
INDEXED
or FAILED
, processing of the document is complete.
When the status is FAILED
, the BatchGetDocumentStatus
API
returns the reason that the document couldn't be indexed.
If you want to alter your content and document metadata fields or attributes during
the document ingestion process, see Amazon Kendra Custom
Document Enrichment. If you want to use a custom data source, each document
you submit using the BatchPutDocument
API requires a data source ID and
execution ID as attributes or fields. For more information, see Required
attributes for custom data sources.
Note
Each document ID must be unique per index. You cannot create a data source to
index your documents with their unique IDs and then use the
BatchPutDocument
API to index the same documents, or vice versa.
You can delete a data source and then use the BatchPutDocument
API to
index the same documents, or vice versa. Using the BatchPutDocument
and
BatchDeleteDocument
APIs in combination with an Amazon Kendra
data source connector for the same set of documents could cause inconsistencies with
your data. Instead, we recommend using the Amazon Kendra custom data
source connector.
The following developer guide documents show how to add documents directly to an index.
Adding documents with the BatchPutDocument API
The following example adds a blob of text to an index by calling BatchPutDocument.
You can use the BatchPutDocument
API to add documents directly to your
index. For a list of document types supported by Amazon Kendra see Types of
documents.
For an example of creating an index using the AWS CLI and SDKs, see Creating an index. To set up the CLI and SDKs, see Setting up Amazon Kendra.
Note
Files added to the index must be in a UTF-8 encoded byte stream.
In the following examples, UTF-8 encoded text is added to the index.
Adding documents from an S3 bucket
You can add documents directly to your index from an Amazon S3 bucket
using the BatchPutDocument
API. You can add up to 10 documents in the same call. When you use an S3 bucket, you
must provide an IAM role with permission to access the bucket that
contains your documents. You specify the role in the RoleArn
parameter.
Using the BatchPutDocument API to add documents from an Amazon S3 bucket is a one-time operation. To keep an index synchronized with the contents of a bucket, create an Amazon S3 data source. For more information, see Amazon S3 data source.
For an example of creating an index using the AWS CLI and SDKs, see Creating an index. To set up the CLI and SDKs, see Setting up Amazon Kendra. For information on creating an S3 bucket, see Amazon Simple Storage Service documentation.
In the following example, two Microsoft Word documents are added to the index
using the BatchPutDocument
API.