CreateDataSource
Connects a knowledge base to a data source. You specify the configuration for the specific data source service in the dataSourceConfiguration
field.
Important
You can't change the chunkingConfiguration
after you create the data source connector.
Request Syntax
PUT /knowledgebases/knowledgeBaseId
/datasources/ HTTP/1.1
Content-type: application/json
{
"clientToken": "string
",
"dataDeletionPolicy": "string
",
"dataSourceConfiguration": {
"confluenceConfiguration": {
"crawlerConfiguration": {
"filterConfiguration": {
"patternObjectFilter": {
"filters": [
{
"exclusionFilters": [ "string
" ],
"inclusionFilters": [ "string
" ],
"objectType": "string
"
}
]
},
"type": "string
"
}
},
"sourceConfiguration": {
"authType": "string
",
"credentialsSecretArn": "string
",
"hostType": "string
",
"hostUrl": "string
"
}
},
"s3Configuration": {
"bucketArn": "string
",
"bucketOwnerAccountId": "string
",
"inclusionPrefixes": [ "string
" ]
},
"salesforceConfiguration": {
"crawlerConfiguration": {
"filterConfiguration": {
"patternObjectFilter": {
"filters": [
{
"exclusionFilters": [ "string
" ],
"inclusionFilters": [ "string
" ],
"objectType": "string
"
}
]
},
"type": "string
"
}
},
"sourceConfiguration": {
"authType": "string
",
"credentialsSecretArn": "string
",
"hostUrl": "string
"
}
},
"sharePointConfiguration": {
"crawlerConfiguration": {
"filterConfiguration": {
"patternObjectFilter": {
"filters": [
{
"exclusionFilters": [ "string
" ],
"inclusionFilters": [ "string
" ],
"objectType": "string
"
}
]
},
"type": "string
"
}
},
"sourceConfiguration": {
"authType": "string
",
"credentialsSecretArn": "string
",
"domain": "string
",
"hostType": "string
",
"siteUrls": [ "string
" ],
"tenantId": "string
"
}
},
"type": "string
",
"webConfiguration": {
"crawlerConfiguration": {
"crawlerLimits": {
"maxPages": number
,
"rateLimit": number
},
"exclusionFilters": [ "string
" ],
"inclusionFilters": [ "string
" ],
"scope": "string
",
"userAgent": "string
"
},
"sourceConfiguration": {
"urlConfiguration": {
"seedUrls": [
{
"url": "string
"
}
]
}
}
}
},
"description": "string
",
"name": "string
",
"serverSideEncryptionConfiguration": {
"kmsKeyArn": "string
"
},
"vectorIngestionConfiguration": {
"chunkingConfiguration": {
"chunkingStrategy": "string
",
"fixedSizeChunkingConfiguration": {
"maxTokens": number
,
"overlapPercentage": number
},
"hierarchicalChunkingConfiguration": {
"levelConfigurations": [
{
"maxTokens": number
}
],
"overlapTokens": number
},
"semanticChunkingConfiguration": {
"breakpointPercentileThreshold": number
,
"bufferSize": number
,
"maxTokens": number
}
},
"customTransformationConfiguration": {
"intermediateStorage": {
"s3Location": {
"uri": "string
"
}
},
"transformations": [
{
"stepToApply": "string
",
"transformationFunction": {
"transformationLambdaConfiguration": {
"lambdaArn": "string
"
}
}
}
]
},
"parsingConfiguration": {
"bedrockDataAutomationConfiguration": {
"parsingModality": "string
"
},
"bedrockFoundationModelConfiguration": {
"modelArn": "string
",
"parsingModality": "string
",
"parsingPrompt": {
"parsingPromptText": "string
"
}
},
"parsingStrategy": "string
"
}
}
}
URI Request Parameters
The request uses the following URI parameters.
- knowledgeBaseId
-
The unique identifier of the knowledge base to which to add the data source.
Pattern:
^[0-9a-zA-Z]{10}$
Required: Yes
Request Body
The request accepts the following data in JSON format.
- clientToken
-
A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, Amazon Bedrock ignores the request, but does not return an error. For more information, see Ensuring idempotency.
Type: String
Length Constraints: Minimum length of 33. Maximum length of 256.
Pattern:
^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,256}$
Required: No
- dataDeletionPolicy
-
The data deletion policy for the data source.
You can set the data deletion policy to:
-
DELETE: Deletes all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. Note that the vector store itself is not deleted, only the data. This flag is ignored if an AWS account is deleted.
-
RETAIN: Retains all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. Note that the vector store itself is not deleted if you delete a knowledge base or data source resource.
Type: String
Valid Values:
RETAIN | DELETE
Required: No
-
- dataSourceConfiguration
-
The connection configuration for the data source.
Type: DataSourceConfiguration object
Required: Yes
- description
-
A description of the data source.
Type: String
Length Constraints: Minimum length of 1. Maximum length of 200.
Required: No
- name
-
The name of the data source.
Type: String
Pattern:
^([0-9a-zA-Z][_-]?){1,100}$
Required: Yes
- serverSideEncryptionConfiguration
-
Contains details about the server-side encryption for the data source.
Type: ServerSideEncryptionConfiguration object
Required: No
- vectorIngestionConfiguration
-
Contains details about how to ingest the documents in the data source.
Type: VectorIngestionConfiguration object
Required: No
Response Syntax
HTTP/1.1 200
Content-type: application/json
{
"dataSource": {
"createdAt": "string",
"dataDeletionPolicy": "string",
"dataSourceConfiguration": {
"confluenceConfiguration": {
"crawlerConfiguration": {
"filterConfiguration": {
"patternObjectFilter": {
"filters": [
{
"exclusionFilters": [ "string" ],
"inclusionFilters": [ "string" ],
"objectType": "string"
}
]
},
"type": "string"
}
},
"sourceConfiguration": {
"authType": "string",
"credentialsSecretArn": "string",
"hostType": "string",
"hostUrl": "string"
}
},
"s3Configuration": {
"bucketArn": "string",
"bucketOwnerAccountId": "string",
"inclusionPrefixes": [ "string" ]
},
"salesforceConfiguration": {
"crawlerConfiguration": {
"filterConfiguration": {
"patternObjectFilter": {
"filters": [
{
"exclusionFilters": [ "string" ],
"inclusionFilters": [ "string" ],
"objectType": "string"
}
]
},
"type": "string"
}
},
"sourceConfiguration": {
"authType": "string",
"credentialsSecretArn": "string",
"hostUrl": "string"
}
},
"sharePointConfiguration": {
"crawlerConfiguration": {
"filterConfiguration": {
"patternObjectFilter": {
"filters": [
{
"exclusionFilters": [ "string" ],
"inclusionFilters": [ "string" ],
"objectType": "string"
}
]
},
"type": "string"
}
},
"sourceConfiguration": {
"authType": "string",
"credentialsSecretArn": "string",
"domain": "string",
"hostType": "string",
"siteUrls": [ "string" ],
"tenantId": "string"
}
},
"type": "string",
"webConfiguration": {
"crawlerConfiguration": {
"crawlerLimits": {
"maxPages": number,
"rateLimit": number
},
"exclusionFilters": [ "string" ],
"inclusionFilters": [ "string" ],
"scope": "string",
"userAgent": "string"
},
"sourceConfiguration": {
"urlConfiguration": {
"seedUrls": [
{
"url": "string"
}
]
}
}
}
},
"dataSourceId": "string",
"description": "string",
"failureReasons": [ "string" ],
"knowledgeBaseId": "string",
"name": "string",
"serverSideEncryptionConfiguration": {
"kmsKeyArn": "string"
},
"status": "string",
"updatedAt": "string",
"vectorIngestionConfiguration": {
"chunkingConfiguration": {
"chunkingStrategy": "string",
"fixedSizeChunkingConfiguration": {
"maxTokens": number,
"overlapPercentage": number
},
"hierarchicalChunkingConfiguration": {
"levelConfigurations": [
{
"maxTokens": number
}
],
"overlapTokens": number
},
"semanticChunkingConfiguration": {
"breakpointPercentileThreshold": number,
"bufferSize": number,
"maxTokens": number
}
},
"customTransformationConfiguration": {
"intermediateStorage": {
"s3Location": {
"uri": "string"
}
},
"transformations": [
{
"stepToApply": "string",
"transformationFunction": {
"transformationLambdaConfiguration": {
"lambdaArn": "string"
}
}
}
]
},
"parsingConfiguration": {
"bedrockDataAutomationConfiguration": {
"parsingModality": "string"
},
"bedrockFoundationModelConfiguration": {
"modelArn": "string",
"parsingModality": "string",
"parsingPrompt": {
"parsingPromptText": "string"
}
},
"parsingStrategy": "string"
}
}
}
}
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
The following data is returned in JSON format by the service.
- dataSource
-
Contains details about the data source.
Type: DataSource object
Errors
For information about the errors that are common to all actions, see Common Errors.
- AccessDeniedException
-
The request is denied because of missing access permissions.
HTTP Status Code: 403
- ConflictException
-
There was a conflict performing an operation.
HTTP Status Code: 409
- InternalServerException
-
An internal server error occurred. Retry your request.
HTTP Status Code: 500
- ResourceNotFoundException
-
The specified resource Amazon Resource Name (ARN) was not found. Check the Amazon Resource Name (ARN) and try your request again.
HTTP Status Code: 404
- ServiceQuotaExceededException
-
The number of requests exceeds the service quota. Resubmit your request later.
HTTP Status Code: 402
- ThrottlingException
-
The number of requests exceeds the limit. Resubmit your request later.
HTTP Status Code: 429
- ValidationException
-
Input validation failed. Check your request parameters and retry the request.
HTTP Status Code: 400
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: