@Generated(value="com.amazonaws:aws-java-sdk-code-generator") public class SeedUrlConfiguration extends Object implements Serializable, Cloneable, StructuredPojo
Provides the configuration information for the seed or starting point URLs to crawl.
When selecting websites to index, you must adhere to the Amazon Acceptable Use Policy and all other Amazon terms. Remember that you must only use Amazon Kendra Web Crawler to index your own web pages, or web pages that you have authorization to index.
Constructor and Description |
---|
SeedUrlConfiguration() |
Modifier and Type | Method and Description |
---|---|
SeedUrlConfiguration |
clone() |
boolean |
equals(Object obj) |
List<String> |
getSeedUrls()
The list of seed or starting point URLs of the websites you want to crawl.
|
String |
getWebCrawlerMode()
You can choose one of the following modes:
|
int |
hashCode() |
void |
marshall(ProtocolMarshaller protocolMarshaller)
Marshalls this structured data using the given
ProtocolMarshaller . |
void |
setSeedUrls(Collection<String> seedUrls)
The list of seed or starting point URLs of the websites you want to crawl.
|
void |
setWebCrawlerMode(String webCrawlerMode)
You can choose one of the following modes:
|
String |
toString()
Returns a string representation of this object.
|
SeedUrlConfiguration |
withSeedUrls(Collection<String> seedUrls)
The list of seed or starting point URLs of the websites you want to crawl.
|
SeedUrlConfiguration |
withSeedUrls(String... seedUrls)
The list of seed or starting point URLs of the websites you want to crawl.
|
SeedUrlConfiguration |
withWebCrawlerMode(String webCrawlerMode)
You can choose one of the following modes:
|
SeedUrlConfiguration |
withWebCrawlerMode(WebCrawlerMode webCrawlerMode)
You can choose one of the following modes:
|
public List<String> getSeedUrls()
The list of seed or starting point URLs of the websites you want to crawl.
The list can include a maximum of 100 seed URLs.
The list can include a maximum of 100 seed URLs.
public void setSeedUrls(Collection<String> seedUrls)
The list of seed or starting point URLs of the websites you want to crawl.
The list can include a maximum of 100 seed URLs.
seedUrls
- The list of seed or starting point URLs of the websites you want to crawl.
The list can include a maximum of 100 seed URLs.
public SeedUrlConfiguration withSeedUrls(String... seedUrls)
The list of seed or starting point URLs of the websites you want to crawl.
The list can include a maximum of 100 seed URLs.
NOTE: This method appends the values to the existing list (if any). Use
setSeedUrls(java.util.Collection)
or withSeedUrls(java.util.Collection)
if you want to override
the existing values.
seedUrls
- The list of seed or starting point URLs of the websites you want to crawl.
The list can include a maximum of 100 seed URLs.
public SeedUrlConfiguration withSeedUrls(Collection<String> seedUrls)
The list of seed or starting point URLs of the websites you want to crawl.
The list can include a maximum of 100 seed URLs.
seedUrls
- The list of seed or starting point URLs of the websites you want to crawl.
The list can include a maximum of 100 seed URLs.
public void setWebCrawlerMode(String webCrawlerMode)
You can choose one of the following modes:
HOST_ONLY
—crawl only the website host names. For example, if the seed URL is "abc.example.com", then
only URLs with host name "abc.example.com" are crawled.
SUBDOMAINS
—crawl the website host names with subdomains. For example, if the seed URL is
"abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.
EVERYTHING
—crawl the website host names with subdomains and other domains that the web pages link
to.
The default mode is set to HOST_ONLY
.
webCrawlerMode
- You can choose one of the following modes:
HOST_ONLY
—crawl only the website host names. For example, if the seed URL is
"abc.example.com", then only URLs with host name "abc.example.com" are crawled.
SUBDOMAINS
—crawl the website host names with subdomains. For example, if the seed URL is
"abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.
EVERYTHING
—crawl the website host names with subdomains and other domains that the web pages
link to.
The default mode is set to HOST_ONLY
.
WebCrawlerMode
public String getWebCrawlerMode()
You can choose one of the following modes:
HOST_ONLY
—crawl only the website host names. For example, if the seed URL is "abc.example.com", then
only URLs with host name "abc.example.com" are crawled.
SUBDOMAINS
—crawl the website host names with subdomains. For example, if the seed URL is
"abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.
EVERYTHING
—crawl the website host names with subdomains and other domains that the web pages link
to.
The default mode is set to HOST_ONLY
.
HOST_ONLY
—crawl only the website host names. For example, if the seed URL is
"abc.example.com", then only URLs with host name "abc.example.com" are crawled.
SUBDOMAINS
—crawl the website host names with subdomains. For example, if the seed URL is
"abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.
EVERYTHING
—crawl the website host names with subdomains and other domains that the web pages
link to.
The default mode is set to HOST_ONLY
.
WebCrawlerMode
public SeedUrlConfiguration withWebCrawlerMode(String webCrawlerMode)
You can choose one of the following modes:
HOST_ONLY
—crawl only the website host names. For example, if the seed URL is "abc.example.com", then
only URLs with host name "abc.example.com" are crawled.
SUBDOMAINS
—crawl the website host names with subdomains. For example, if the seed URL is
"abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.
EVERYTHING
—crawl the website host names with subdomains and other domains that the web pages link
to.
The default mode is set to HOST_ONLY
.
webCrawlerMode
- You can choose one of the following modes:
HOST_ONLY
—crawl only the website host names. For example, if the seed URL is
"abc.example.com", then only URLs with host name "abc.example.com" are crawled.
SUBDOMAINS
—crawl the website host names with subdomains. For example, if the seed URL is
"abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.
EVERYTHING
—crawl the website host names with subdomains and other domains that the web pages
link to.
The default mode is set to HOST_ONLY
.
WebCrawlerMode
public SeedUrlConfiguration withWebCrawlerMode(WebCrawlerMode webCrawlerMode)
You can choose one of the following modes:
HOST_ONLY
—crawl only the website host names. For example, if the seed URL is "abc.example.com", then
only URLs with host name "abc.example.com" are crawled.
SUBDOMAINS
—crawl the website host names with subdomains. For example, if the seed URL is
"abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.
EVERYTHING
—crawl the website host names with subdomains and other domains that the web pages link
to.
The default mode is set to HOST_ONLY
.
webCrawlerMode
- You can choose one of the following modes:
HOST_ONLY
—crawl only the website host names. For example, if the seed URL is
"abc.example.com", then only URLs with host name "abc.example.com" are crawled.
SUBDOMAINS
—crawl the website host names with subdomains. For example, if the seed URL is
"abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.
EVERYTHING
—crawl the website host names with subdomains and other domains that the web pages
link to.
The default mode is set to HOST_ONLY
.
WebCrawlerMode
public String toString()
toString
in class Object
Object.toString()
public SeedUrlConfiguration clone()
public void marshall(ProtocolMarshaller protocolMarshaller)
StructuredPojo
ProtocolMarshaller
.marshall
in interface StructuredPojo
protocolMarshaller
- Implementation of ProtocolMarshaller
used to marshall this object's data.