Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

ONE_HOT_ENCODING - AWS Glue DataBrew

ONE_HOT_ENCODING

Creates n numerical columns, where n is the number of unique values in a selected categorical variable.

For example, consider a column named shirt_size. Shirts are available in small, medium, large, or extra large. The column data might look like the following.

shirt_size ----------- L XL M S M M S XL M L XL M

In this scenario, there are four distinct values for shirt_size. Therefore, ONE_HOT_ENCODING generates four new columns. Each new column is named shirt_size_x, where x represents a distinct shirt_size value.

The results of shirt_size and the four generated columns look like this.

shirt_size shirt_size_S shirt_size_M shirt_size_L shirt_size_XL ------------ ------------ ------------ ------------ ------------- L 0 0 1 0 XL 0 0 0 1 M 0 1 0 0 S 1 0 0 0 M 0 1 0 0 M 0 1 0 0 S 1 0 0 0 XL 0 0 0 1 M 0 1 0 0 L 0 0 1 0 XL 0 0 0 1 M 0 1 0 0

The column that you specify for ONE_HOT_ENCODING can have a maximum of ten (10) distinct values.

Parameters
  • sourceColumn – The name of an existing column. The column can have a maximum of 10 distinct values.

Example

{ "RecipeAction": { "Operation": "ONE_HOT_ENCODING", "Parameters": { "sourceColumn": "shirt_size" } } }
PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.