쿠키 기본 설정 선택

당사는 사이트와 서비스를 제공하는 데 필요한 필수 쿠키 및 유사한 도구를 사용합니다. 고객이 사이트를 어떻게 사용하는지 파악하고 개선할 수 있도록 성능 쿠키를 사용해 익명의 통계를 수집합니다. 필수 쿠키는 비활성화할 수 없지만 '사용자 지정' 또는 ‘거부’를 클릭하여 성능 쿠키를 거부할 수 있습니다.

사용자가 동의하는 경우 AWS와 승인된 제3자도 쿠키를 사용하여 유용한 사이트 기능을 제공하고, 사용자의 기본 설정을 기억하고, 관련 광고를 비롯한 관련 콘텐츠를 표시합니다. 필수가 아닌 모든 쿠키를 수락하거나 거부하려면 ‘수락’ 또는 ‘거부’를 클릭하세요. 더 자세한 내용을 선택하려면 ‘사용자 정의’를 클릭하세요.

Creating and using AWS Glue DataBrew projects

포커스 모드
Creating and using AWS Glue DataBrew projects - AWS Glue DataBrew
이 페이지는 귀하의 언어로 번역되지 않았습니다. 번역 요청

In AWS Glue DataBrew, a project is the centerpiece of your data analysis and transformation efforts.

When you create a project, you bring together two fundamental components:

The DataBrew console presents your project in a highly interactive, intuitive user interface. It encourages you to experiment with hundreds of data transformations, so you can learn how they work and what effect they have on your data.

The data that you see in project view is a sample of your dataset. Because datasets can be very large, with thousands or even millions of rows, using a sample helps ensure that the DataBrew console remains responsive while you transform the sample data in various ways. By default, the sample consists of the first 500 rows of data from the dataset. You can choose different settings for the sample size, and which rows are chosen.

As you transform the sample data, DataBrew helps you build and refine the project recipe—a step-by-step series of the transformations that you applied thus far. Your work-in-progress recipe is saved automatically, so you can leave the project view at any time, return later, and pick up where you left off.

When your recipe is ready for use you can publish it. Publishing a recipe makes it available to the DataBrew job subsystem, where you can apply the recipe to your entire dataset, or create an extensive data profile that lets you understand the structure, content, and statistical characteristics of your data.

Creating a project

Use the following procedure to create a project.

To create a project
  1. Sign in to the AWS Management Console and open the DataBrew console .

  2. On the navigation pane, choose PROJECTS. Then choose Create project.

  3. Enter a name for your project. Then choose a recipe to attach to your project:

    • Choose Create new recipe if you are starting from the beginning. Doing this creates a new, empty recipe and attaches it to your project.

    • Choose Edit existing recipe if you have a previously published recipe that you want to use for this project. If the recipe is currently attached to another project, or has any jobs defined for it, then you can't use it in your new project. Choose Browse recipes to see what recipes are available.

    • Choose Import steps from recipe if you have an existing recipe that's been published previously and want to import its steps, and then do the following:

      1. Choose Browse recipes to see what recipes are available.

      2. Choose the published version of the recipe that you want to use. A recipe can have multiple versions, depending on how often you published it while working in project view.

      3. Choose View recipe steps to examine the data transformations in the recipe.

  4. After you have a recipe, choose the dataset that you want to work with on the Select a dataset pane:

    • My datasets – Choose a dataset that you created previously. For more information, see Creating a project.)

    • Sample files – Create a new dataset based on sample data maintained by AWS. This sample data is a great way to explore what DataBrew can do, without having to provide your own data. Make sure to enter a name for your dataset.

    • New dataset – Create a new dataset. For more information, see Creating a project.

  5. For Access permissions, choose an AWS Identity and Access Management (IAM) role that allows DataBrew to read from your Amazon S3 input location. For an S3 location owned by your AWS account, you can choose the AwsGlueDataBrewDataAccessRole service-managed role. Doing this allows DataBrew to access S3 resources that you own.

  6. On the Sampling pane, you can find options for DataBrew to build a sample of data from your dataset.

    For Type, choose how DataBrew should get rows from your dataset:

    • Use First n rows to create a sample based on the first rows in the dataset.

    • Use Random rows to create a sample based on a random selection of rows in the dataset.

    • Choose the number of rows to appear in the sample: 500, 1,000, 2,500, or a custom sample size, up to a maximum of 5,000 rows. A smaller sample size allows DataBrew to perform transformations faster, saving you time as you develop your recipe. A larger sample size more accurately reflects the makeup of the underlying source data. However, project session initialization and interactive transformations are slower.

  7. (Optional) Choose Tags to attach tags to your dataset.

    Tags are simple labels consisting of a user-defined key and an optional value that can make it easier to manage, search for, and filter DataBrew projects by purpose, owner, environment, or other criteria.

  8. When the settings are as you want them, choose Create job.

DataBrew creates a new dataset if needed, creates a new recipe if needed, builds the data sample, and creates an interactive project session. This process can take a couple of minutes to complete. When the project is ready for use, you can begin working with the data sample.

이 페이지에서

프라이버시사이트 이용 약관쿠키 기본 설정
© 2025, Amazon Web Services, Inc. 또는 계열사. All rights reserved.