使用 Bedrock 数据自动化 API

Amazon Bedrock 数据自动化 (BDA) 功能为处理数据提供了简化的 API 工作流程。对于所有模态，此工作流都包含三个主要步骤：创建项目、调用分析和检索结果。要检索已处理数据的自定义输出，请在调用分析操作时提供蓝图 ARN。

创建数据自动化项目

要开始使用 BDA 处理文件，您需要先创建数据自动化项目。这可以通过两种方式完成，即 CreateDataAutomationProject 操作或 Amazon Amazon Bedrock 控制台。

使用 API

使用 API 创建项目时，您可以调用 CreateDataAutomationProject。创建项目时，您必须为要处理的文件类型定义配置设置（要使用的模态）。以下是如何为图像配置标准输出的示例：


{
    "standardOutputConfiguration": {
        "image": {
            "state": "ENABLED",
            "extraction": {
                "category": {
                    "state": "ENABLED",
                    "types": [
                        "CONTENT_MODERATION",
                        "TEXT_DETECTION"
                    ]
                },
                "boundingBox": {
                    "state": "ENABLED"
                }
            },
            "generativeField": {
                "state": "ENABLED",
                "types": [
                    "IMAGE_SUMMARY",
                    "IAB"
                ]
            }
        }
    }
}

API 会验证输入配置。它创建具有唯一 ARN 的新项目。项目设置会进行存储供以后使用。如果创建了没有参数的项目，则将应用默认设置。例如，在处理图像时，默认情况下将启用图像汇总和文本检测。

每个 AWS 账户可以创建的项目数量是有限制的。系统可能会不允许某些设置的组合，或者可能需要其他权限。

Async

异步调用数据自动化

您已经设置好了项目，可以使用该InvokeDataAutomationAsync操作开始处理图像。如果使用自定义输出，则可以为每个请求提交一个或多个蓝图 ARN。

此 API 调用启动对指定 S3 存储桶中文件的异步处理。API 接受项目 ARN 和要处理的文件，然后启动异步处理作业。返回invocationArn用于跟踪进程。以下情况会引发错误：项目不存在；调用方没有必要的权限；不支持的输入文件格式。

以下是 JSON 请求的结构：


{
   {
   "blueprints": [ 
      { 
         "blueprintArn": "string",
         "stage": "string",
         "version": "string"
      }
   ],
   "clientToken": "string",
   "dataAutomationConfiguration": { 
      "dataAutomationProjectArn": "string",
      "stage": "string"
   },
   "dataAutomationProfileArn": "string",
   "encryptionConfiguration": { 
      "kmsEncryptionContext": { 
         "string" : "string" 
      },
      "kmsKeyId": "string"
   },
   "inputConfiguration": { 
      "assetProcessingConfiguration": { 
         "video": { 
            "segmentConfiguration": { ... }
         }
      "s3Uri": "string"
   },
   "notificationConfiguration": { 
      "eventBridgeConfiguration": { 
         "eventBridgeEnabled": boolean
      }
   },
   "outputConfiguration": { 
      "s3Uri": "string"
   },
   "tags": [ 
      { 
         "key": "sstring",
         "value": "string"
      }
   ]
}
}

对视频文件运行 InvokeDataAutomationAsync 时，您可以将视频的 5 分钟或更长时间片段设置为完整视频，来进行数据提取。使用以起始毫秒和结束毫秒表示的时间戳来设置这段时间。此信息已添加到 assetProcessingConfiguration 元素中。

Sync

调用数据自动化

或者，您可以使用该InvokeDataAutomation操作。该InvokeDataAutomation操作仅支持处理图像。

此 API 调用启动对通过 S3 引用或在有效负载中提供的内容的同步处理。API 接受项目 ARN 和要处理的文件，并在响应中返回结构化见解。以下情况会引发错误：项目不存在；调用方没有必要的权限；不支持的输入文件格式。如果分析的图像在语义上被归类为文档，则这也将作为错误引发，因为 InvokeDataAutomation 仅支持图像。为防止出现此错误，您可以在项目中使用模态路由，强制将所有图像文件类型作为图像进行路由（请参阅禁用模态和路由文件类型）。

以下是图像和文档的 JSON 请求的结构。同步 API 请求同时支持图像字节和 S3 存储桶。要使用图像字节，只需“s3Uri”: “string”在 “InputConfiguration” 部分中将其替换“bytes“: “base64-encoded string“outputConfiguration为可选，默认为内联输出。如果 S3 uri 作为 OutputConfiguration 提供，则加密输出将放入指定的 S3 存储桶中。


{
   {
    "blueprints": [ 
       { 
          "blueprintArn": "string",  //use for image
          "stage": "string",
          "version": "string"
       }
    ],
    "dataAutomationConfiguration": { 
       "dataAutomationProjectArn": "string",
       "stage": "string"
    },
    "dataAutomationProfileArn": "string",
    "inputConfiguration": { 
          "s3Uri": "string"
    },
    "outputConfiguration": { 
       "s3Uri": "string"
    }  
 }
}

输出包括独特的结构，具体取决于调用中指定的文件、操作和自定义输出配置 InvokeDataAutomation。请注意，此响应包括标准和自定义输出响应。

以下是包含标准和自定义输出配置的 JSON 响应的结构：


{
  "semanticModality": "IMAGE",
  "outputSegments": [
    {
      "customOutputStatus": "MATCH",
      "standardOutput": {
        "image": {
          "summary": "This image shows a white Nike running shoe with a black Nike swoosh logo on the side. The shoe has a modern design with a thick, cushioned sole and a sleek upper part. The word \"ROUKEA\" is visible on the sole of the shoe, repeated twice. The shoe appears to be designed for comfort and performance, suitable for running or athletic activities. The background is plain and dark, highlighting the shoe.",
          "iab_categories": [
            {
              "category": "Style and Fashion",
              "confidence": 0.9890000000000001,
              "taxonomy_level": 1,
              "parent_name": "",
              "id": "0ebe86c8-e9af-43f6-a7bb-182a61d2e1fd",
              "type": "IAB"
            },
            {
              "category": "Men's Fashion",
              "confidence": 0.9890000000000001,
              "taxonomy_level": 2,
              "parent_name": "Style and Fashion",
              "id": "13bd456a-3e1b-4681-b0dd-f42a8d5e5ad5",
              "type": "IAB"
            },
            {
              "category": "Style and Fashion",
              "confidence": 0.853,
              "taxonomy_level": 1,
              "parent_name": "",
              "id": "177b29a1-0e40-45c1-8540-5f49a3d7ded3",
              "type": "IAB"
            },
            {
              "category": "Women's Fashion",
              "confidence": 0.853,
              "taxonomy_level": 2,
              "parent_name": "Style and Fashion",
              "id": "f0197ede-3ba6-498b-8f7b-43fecc5735ef",
              "type": "IAB"
            }
          ],
          "content_moderation": [],
          "logos": [
            {
              "id": "2e109eb6-39f5-4782-826f-911b62d277fb",
              "type": "LOGOS",
              "confidence": 0.9170872209665809,
              "name": "nike",
              "locations": [
                {
                  "bounding_box": {
                    "left": 0.3977411523719743,
                    "top": 0.4922481227565456,
                    "width": 0.2574246356942061,
                    "height": 0.15461772197001689
                  }
                }
              ]
            }
          ],
          "text_words": [
            {
              "id": "f70301df-5725-405e-b50c-612e352467bf",
              "type": "TEXT_WORD",
              "confidence": 0.10091366487951722,
              "text": "ROUKEA",
              "locations": [
                {
                  "bounding_box": {
                    "left": 0.6486002310163024,
                    "top": 0.6783271480251003,
                    "width": 0.13219473954570082,
                    "height": 0.05802226710963898
                  },
                  "polygon": [
                    {
                      "x": 0.6486002310163024,
                      "y": 0.7025876947351404
                    },
                    {
                      "x": 0.7760931467045249,
                      "y": 0.6783271480251003
                    },
                    {
                      "x": 0.7807949705620032,
                      "y": 0.7120888684246991
                    },
                    {
                      "x": 0.6533020989743271,
                      "y": 0.7363494151347393
                    }
                  ]
                }
              ],
              "line_id": "9147fec0-d869-4d58-933e-93eb7164c404"
            }
          ],
          "text_lines": [
            {
              "id": "9147fec0-d869-4d58-933e-93eb7164c404",
              "type": "TEXT_LINE",
              "confidence": 0.10091366487951722,
              "text": "ROUKEA",
              "locations": [
                {
                  "bounding_box": {
                    "left": 0.6486002310163024,
                    "top": 0.6783271480251003,
                    "width": 0.13219473954570082,
                    "height": 0.05802226710963898
                  },
                  "polygon": [
                    {
                      "x": 0.6486002310163024,
                      "y": 0.7025876947351404
                    },
                    {
                      "x": 0.7760931467045249,
                      "y": 0.6783271480251003
                    },
                    {
                      "x": 0.7807949705620032,
                      "y": 0.7120888684246991
                    },
                    {
                      "x": 0.6533020989743271,
                      "y": 0.7363494151347393
                    }
                  ]
                }
              ]
            }
          ]
        },
        "statistics": {
          "iab_category_count": 4,
          "content_moderation_count": 0,
          "logo_count": 1,
          "line_count": 1,
          "word_count": 1
        },
        "metadata": {
          "semantic_modality": "IMAGE",
          "image_width_pixels": 173,
          "image_height_pixels": 148,
          "image_encoding": "jpeg",
          "s3_bucket": "test-bucket",
          "s3_key": "uploads/test-image.jpeg"
        }
      },
      "customOutput": {
        "matched_blueprint": {
          "arn": "arn:aws:bedrock:us-east-1:123456789012:blueprint/test",
          "version": "1",
          "name": "test-blueprint",
          "confidence": 1.0
        },
        "inference_result": {
          "product_details": {
            "product_category": "footwear"
          },
          "image_sentiment": "Positive",
          "image_background": "Solid color",
          "image_style": "Product image",
          "image_humor": false
        }
      }
    }
  ]
}

获取数据自动化功能的状态

要检查处理任务的状态并检索结果，请使用 GetDataAutomationStatus。

GetDataAutomationStatus API 允许您监控任务的进度，并在处理完成后访问结果。API 接受返回的调用 ARN。 InvokeDataAutomationAsync它检查作业的当前状态，并返回相关信息。作业完成后，会提供 S3 中结果的位置。

如果作业仍在进行中，则返回当前状态（例如，InProgress“”）。如果作业完成，则会返回“成功”以及结果的 S3 位置。如果出现错误，它将返回 “ServiceError” 或 “ClientError” 以及错误详情。

以下为请求 JSON 的格式：


{
   "InvocationArn": "string" // Arn
}

异步输出响应

文件处理的结果存储在为输入图像配置的 S3 存储桶中。输出包括独特的结构，具体取决于文件模式和调用中指定的操作类型。 InvokeDataAutomationAsync

有关给定模态的标准输出的信息，请参阅 Bedrock 数据自动化功能中的标准输出。

例如，对于图像，输出可以包含以下信息：

图片汇总：图片的描述性摘要或标题。
IAB 分类：基于 IAB 分类法的分类。
图像文本检测：提取的文本，带有边界框信息。
内容审核：检测图像中的不当、不需要或冒犯性内容。

以下是图像处理的输出片段示例：


{
    "metadata": {
        "id": "image_123",
        "semantic_modality": "IMAGE",
        "s3_bucket": "my-s3-bucket",
        "s3_prefix": "images/",
        "image_width_pixels": 1920,
        "image_height_pixels": 1080
    },
    "image": {
        "summary": "A lively party scene with colorful decorations and supplies",
        "iab_categories": [
            {
                "category": "Party Supplies",
                "confidence": 0.9,
                "parent_name": "Events & Attractions"
            }
        ],
        "content_moderation": [
            {
                "category": "Drugs & Tobacco Paraphernalia & Use",
                "confidence": 0.7
            }
        ],
        "text_words": [
            {
                "id": "word_1",
                "text": "lively",
                "confidence": 0.9,
                "line_id": "line_1",
                "locations": [
                    {
                        "bounding_box": {
                            "left": 100,
                            "top": 200,
                            "width": 50,
                            "height": 20
                        },
                        "polygon": [
                            {
                                "x": 100,
                                "y": 200
                            },
                            {
                                "x": 150,
                                "y": 200
                            },
                            {
                                "x": 150,
                                "y": 220
                            },
                            {
                                "x": 100,
                                "y": 220
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

此结构化的输出可以方便地与下游应用程序集成以及用于进一步分析。

蓝图优化 API

InvokeBlueprintOptimizationAsync

您可以通过提供具有正确预期结果的示例内容资产来提高蓝图的准确性。蓝图指令优化使用您的示例来完善蓝图字段中的自然语言指令，从而提高推理结果的准确性。

对于蓝图，您可以调用启动异步优化作业的 InvokeBlueprintOptimizationAsync API，以根据实况数据改进蓝图字段指令。

请求正文


{
    "blueprint": {
        "blueprintArn": "arn:aws:bedrock:us-east-1:123456789012:blueprint/my-document-processor",
        "stage": "DEVELOPMENT"
    },
    "samples": [
        {
            "assetS3Object": {
                "s3Uri": "s3://my-optimization-bucket/samples/document1.pdf"
            },
            "groundTruthS3Object": {
                "s3Uri": "s3://my-optimization-bucket/ground-truth/document1-expected.json"
            }
        }
    ],
    "outputConfiguration": {
        "s3Object": {
            "s3Uri": "s3://my-optimization-bucket/results/optimization-output"
        }
    },
    "dataAutomationProfileArn": "arn:aws:bedrock:us-east-1:123456789012:data-automation-profile/my-profile"
}

响应


{
    "invocationArn": "arn:aws:bedrock:us-east-1:123456789012:blueprint-optimization-invocation/opt-12345abcdef"
}

重要

保存 InvocationArn 以监控优化作业状态。

GetBlueprintOptimizationStatus

检索通过调用 async API 输出的蓝图优化作业的当前状态和结果。 InvokeBlueprintOptimizationAsync GetBlueprintOptimizationStatus 接受返回的调用 ARN。 InvokeBlueprintOptimizationAsync

响应


{
    "status": "Success",
    "outputConfiguration": {
        "s3Object": {
            "s3Uri": "s3://my-optimization-bucket/results/optimization-output"
        }
    }
}

状态值：

已创建-Job 已创建
InProgress -正在进行优化
成功-优化成功完成
ServiceError -出现内部服务错误
ClientError -请求参数无效

CopyBlueprintStage

将蓝图从源阶段复制到目标阶段（例如开发阶段到直播阶段）。这将用于在阶段之间同步所有配置，包括 OptimizationSamples 字段。

请求正文


{
    "blueprintArn": "arn:aws:bedrock:us-east-1:123456789012:blueprint/my-document-processor",
    "sourceStage": "DEVELOPMENT",
    "targetStage": "LIVE"
}

舞台价值：

发展- Development/testing 阶段
LIVE-制作阶段

响应

{}

警告

此操作会覆盖目标舞台配置，且无法轻易撤消。在复制到LIVE舞台之前，请确保进行彻底的测试。

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

文档惯例

使用 Bedrock 数据自动化功能控制台

在 Bedrock 数据自动化功能中标记推理和资源