Configuring the `robots.txt` file for Amazon Kendra Web Crawler

Focus mode

Configuring the robots.txt file for Amazon Kendra Web Crawler - Amazon Kendra

Configuring how Amazon Kendra Web Crawler accesses your website Stopping Amazon Kendra Web Crawler from crawling your website

Amazon Kendra is an intelligent search service that AWS customers use to index and search documents of their choice. In order to index documents on the web, customers may use Amazon Kendra Web Crawler, indicating which URL(s) should be indexed and other operational parameters. Amazon Kendra customers are required to obtain authorization before indexing any particular website.

Amazon Kendra Web Crawler respects standard robots.txt directives like Allow and Disallow. You can modify the robots.txt file of your website to control how Amazon Kendra Web Crawler crawls your website.

Configuring how Amazon Kendra Web Crawler accesses your website

You can control how the Amazon Kendra Web Crawler indexes your website using Allow and Disallow directives. You can also control which web pages are indexed and which web pages are not crawled.

To allow Amazon Kendra Web Crawler to crawl all web pages except disallowed web pages, use the following directive:


User-agent: amazon-kendra    # Amazon Kendra Web Crawler
Disallow: /credential-pages/ # disallow access to specific pages

To allow Amazon Kendra Web Crawler to crawl only specific web pages, use the following directive:


User-agent: amazon-kendra    # Amazon Kendra Web Crawler
Allow: /pages/ # allow access to specific pages

To allow Amazon Kendra Web Crawler to crawl all website content and disallow crawling for any other robots, use the following directive:


User-agent: amazon-kendra # Amazon Kendra Web Crawler
Allow: / # allow access to all pages
User-agent: * # any (other) robot
Disallow: / # disallow access to any pages

Stopping Amazon Kendra Web Crawler from crawling your website

You can stop Amazon Kendra Web Crawler from indexing your website using the Disallow directive. You can also control which web pages are crawled and which are not.

To stop Amazon Kendra Web Crawler from crawling the website, use the following directive:


User-agent: amazon-kendra # Amazon Kendra Web Crawler
Disallow: / # disallow access to any pages

If you have any questions or concerns regarding Amazon Kendra Web Crawler, you can reach out to the AWS support team.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Amazon Kendra Web Crawler connector v2.0

Amazon WorkDocs

Next topic:

Amazon WorkDocs

Previous topic:

Amazon Kendra Web Crawler connector v2.0

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

Configuring the `robots.txt` file for Amazon Kendra Web Crawler

Configuring how Amazon Kendra Web Crawler accesses your website

Stopping Amazon Kendra Web Crawler from crawling your website

Next topic:

Previous topic:

Need help?

On this page

Did this page help you?