Connecting Web Crawler to Amazon Q Business
An Amazon Q Business Web Crawler connector crawls and indexes either public facing
websites or internal company websites that use HTTPS. With Amazon Q web crawler,
you can create a generative AI web experience for your end users based on the website data
you crawl using either the AWS Management Console or the CreateDataSource
API.
Note
Amazon Q Web Crawler supports only HTTPS enabled sites. It doesn't support HTTP or self-signed certificate enabled websites.
Important
When selecting websites to index, you must adhere to the Amazon Acceptable Use Policy
If you receive an error when crawling a website, it could be that the website is blocked from crawling. To crawl internal websites, you can set up a web proxy. The web proxy must be public facing. You can also use authentication to access and crawl websites.
Note
Amazon Q Web Crawler connector does not support AWS KMS encrypted Amazon S3 buckets. It supports only server-side encryption with Amazon S3 managed keys.
Topics
- Web Crawler connector overview
- Prerequisites for connecting Amazon Q Business to Web Crawler
- Retrieving XPaths (XML Path Language) for Web Crawler
- Connecting Amazon Q Business to Web Crawler using the console
- Connecting Amazon Q Business to Web Crawler using APIs
- Connecting Amazon Q Business to Web Crawler using AWS CloudFormation
- Web Crawler data source connector field mappings
- IAM role for Amazon Q Business Web Crawler connector
- Configuring a robots.txt file for Amazon Q Business Web Crawler
Learn more
-
For an overview of the Amazon Q web experience creation process using IAM Identity Center, see Configuring an application using IAM Identity Center.
-
For an overview of the Amazon Q web experience creation process using AWS Identity and Access Management, see Configuring an application using IAM.
-
For an overview of connector features, see Data source connector concepts.
-
For information about connector configuration best practices, see Connector configuration best practices.