使用 CTAS 和 INSERT INTO 處理 100 個分割區限制

焦點模式

使用 CTAS 和 INSERT INTO 處理 100 個分割區限制 - Amazon Athena

Athena 每個 CREATE TABLE AS SELECT (CTAS) 查詢的分割區限制為 100 個。同樣地，您可以使用 INSERT INTO 陳述式將最多 100 個分割區新增至目的地資料表。

如果您超出此限制，您可能會收到錯誤訊息 HIVE_TOO_MANY_OPEN_PARTITIONS: Exceeded limit of 100 open writers for partitions/buckets (HIVE_TOO_MANY_OPEN_PARTITIONS：超過分割區／儲存貯體 100 個開啟寫入器的限制)。若要避開此限制，您可以使用 CTAS 陳述式和一系列的 INSERT INTO 陳述式，每個陳述式可建立或插入最多 100 個分割區。

本主題中的範例使用名為的資料庫，tpch100其資料位於 Amazon S3 儲存貯體位置 s3：//amzn-s3-demo-bucket/。

使用 CTAS 和 INSERT INTO 來建立超過 100 個分割區的資料表

使用 CREATE EXTERNAL TABLE 陳述式在您要的欄位上建立分割的資料表。

下列範例陳述式會依資料欄 l_shipdate 來分割資料。該資料表有 2525 個分割區。


CREATE EXTERNAL TABLE `tpch100.lineitem_parq_partitioned`(
  `l_orderkey` int, 
  `l_partkey` int, 
  `l_suppkey` int, 
  `l_linenumber` int, 
  `l_quantity` double, 
  `l_extendedprice` double, 
  `l_discount` double, 
  `l_tax` double, 
  `l_returnflag` string, 
  `l_linestatus` string, 
  `l_commitdate` string, 
  `l_receiptdate` string, 
  `l_shipinstruct` string, 
  `l_comment` string)
PARTITIONED BY ( 
  `l_shipdate` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION   's3://amzn-s3-demo-bucket/lineitem/'

執行如下所示的 SHOW PARTITIONS <table_name> 命令以列出分割區。


SHOW PARTITIONS lineitem_parq_partitioned

以下是部分樣本結果。


/*
l_shipdate=1992-01-02
l_shipdate=1992-01-03
l_shipdate=1992-01-04
l_shipdate=1992-01-05
l_shipdate=1992-01-06

...

l_shipdate=1998-11-24
l_shipdate=1998-11-25
l_shipdate=1998-11-26
l_shipdate=1998-11-27
l_shipdate=1998-11-28
l_shipdate=1998-11-29
l_shipdate=1998-11-30
l_shipdate=1998-12-01
*/

執行 CTAS 查詢以建立分割的資料表。

下列範例會建立名為 my_lineitem_parq_partitioned 的資料表，並使用 WHERE 子句來限制 DATE 早於 1992-02-01。因為範例資料集開始於 1992 年 1 月，所以只會建立 1992 年 1 月的分割區。


CREATE table my_lineitem_parq_partitioned
WITH (partitioned_by = ARRAY['l_shipdate']) AS
SELECT l_orderkey,
         l_partkey,
         l_suppkey,
         l_linenumber,
         l_quantity,
         l_extendedprice,
         l_discount,
         l_tax,
         l_returnflag,
         l_linestatus,
         l_commitdate,
         l_receiptdate,
         l_shipinstruct,
         l_comment,
         l_shipdate
FROM tpch100.lineitem_parq_partitioned
WHERE cast(l_shipdate as timestamp) < DATE ('1992-02-01');

執行 SHOW PARTITIONS 命令，以確認資料表包含您想要的分割區。


SHOW PARTITIONS my_lineitem_parq_partitioned;

範例中的分割區是從 1992 年 1 月開始。


/*
l_shipdate=1992-01-02
l_shipdate=1992-01-03
l_shipdate=1992-01-04
l_shipdate=1992-01-05
l_shipdate=1992-01-06
l_shipdate=1992-01-07
l_shipdate=1992-01-08
l_shipdate=1992-01-09
l_shipdate=1992-01-10
l_shipdate=1992-01-11
l_shipdate=1992-01-12
l_shipdate=1992-01-13
l_shipdate=1992-01-14
l_shipdate=1992-01-15
l_shipdate=1992-01-16
l_shipdate=1992-01-17
l_shipdate=1992-01-18
l_shipdate=1992-01-19
l_shipdate=1992-01-20
l_shipdate=1992-01-21
l_shipdate=1992-01-22
l_shipdate=1992-01-23
l_shipdate=1992-01-24
l_shipdate=1992-01-25
l_shipdate=1992-01-26
l_shipdate=1992-01-27
l_shipdate=1992-01-28
l_shipdate=1992-01-29
l_shipdate=1992-01-30
l_shipdate=1992-01-31
*/

使用 INSERT INTO 陳述式將分割區新增至資料表。

下列範例會針對 1992 年 2 月份的日期新增分割區。


INSERT INTO my_lineitem_parq_partitioned
SELECT l_orderkey,
         l_partkey,
         l_suppkey,
         l_linenumber,
         l_quantity,
         l_extendedprice,
         l_discount,
         l_tax,
         l_returnflag,
         l_linestatus,
         l_commitdate,
         l_receiptdate,
         l_shipinstruct,
         l_comment,
         l_shipdate
FROM tpch100.lineitem_parq_partitioned
WHERE cast(l_shipdate as timestamp) >= DATE ('1992-02-01')
AND cast(l_shipdate as timestamp) < DATE ('1992-03-01');

再次執行 SHOW PARTITIONS。


SHOW PARTITIONS my_lineitem_parq_partitioned;

範例資料表現在有來自 1992 年 1 月和 2 月的分割區。


/*
l_shipdate=1992-01-02
l_shipdate=1992-01-03
l_shipdate=1992-01-04
l_shipdate=1992-01-05
l_shipdate=1992-01-06

...

l_shipdate=1992-02-20
l_shipdate=1992-02-21
l_shipdate=1992-02-22
l_shipdate=1992-02-23
l_shipdate=1992-02-24
l_shipdate=1992-02-25
l_shipdate=1992-02-26
l_shipdate=1992-02-27
l_shipdate=1992-02-28
l_shipdate=1992-02-29
*/

繼續使用每個讀取和新增不超過 100 個分割區的 INSERT INTO 陳述式。繼續進行，直到達到您需要的分割區數目為止。

重要
設定 WHERE 條件時，請確定查詢不會重疊。否則，某些分割區可能會有重複的資料。

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

針對 ETL 使用 CTAS 和 INSERT INTO

使用 SerDes

選取您的 Cookie 偏好設定

自訂 Cookie 偏好設定

必要

效能

功能

廣告

無法儲存 Cookie 偏好設定

使用 CTAS 和 INSERT INTO 處理 100 個分割區限制

使用 CTAS 和 INSERT INTO 來建立超過 100 個分割區的資料表

重要

此頁面是否有幫助？

下一個主題：

上一個主題：

需要協助？