Preventing a crawler from changing an existing schema
If you don't want a crawler to overwrite updates you made to existing fields in an Amazon S3 table definition, choose the option on the console to
Add new columns only or set the configuration option MergeNewColumns
.
This applies to tables and partitions, unless Partitions.AddOrUpdateBehavior
is overridden to InheritFromTable
.
If you don't want a table schema to change at all when a crawler runs, set the schema change
policy to LOG
. You can also set a configuration option that sets partition
schemas to inherit from the table.
If you are configuring the crawler on the console, you can choose the following actions:
Ignore the change and don't update the table in the Data Catalog
Update all new and existing partitions with metadata from the table
When you configure the crawler using the API, set the following parameters:
Set the
UpdateBehavior
field inSchemaChangePolicy
structure toLOG
.Set the
Configuration
field with a string representation of the following JSON object in the crawler API; for example:{ "Version": 1.0, "CrawlerOutput": { "Partitions": { "AddOrUpdateBehavior": "InheritFromTable" } } }