使用星火EXPLAIN陳述式來疑難排解星火 SQL - Amazon Athena

本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。

使用星火EXPLAIN陳述式來疑難排解星火 SQL

您可以使用 Spark EXPLAIN 陳述式搭配 Spark SQL 來疑難排解您的 Spark 程式碼。下列程式碼和輸出範例說明了這種用法。

範例 — 星火SELECT聲明
spark.sql("select * from select_taxi_table").explain(True)

輸出

Calculation started (calculation_id=20c1ebd0-1ccf-ef14-db35-7c1844876a7e) in (session=24c1ebcb-57a8-861e-1023-736f5ae55386). Checking calculation status... Calculation completed. == Parsed Logical Plan == 'Project [*] +- 'UnresolvedRelation [select_taxi_table], [], false == Analyzed Logical Plan == VendorID: bigint, passenger_count: bigint, count: bigint Project [VendorID#202L, passenger_count#203L, count#204L] +- SubqueryAlias spark_catalog.spark_demo_database.select_taxi_table +- Relation spark_demo_database.select_taxi_table[VendorID#202L, passenger_count#203L,count#204L] csv == Optimized Logical Plan == Relation spark_demo_database.select_taxi_table[VendorID#202L, passenger_count#203L,count#204L] csv == Physical Plan == FileScan csv spark_demo_database.select_taxi_table[VendorID#202L, passenger_count#203L,count#204L] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex(1 paths) [s3://amzn-s3-demo-bucket/select_taxi], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<VendorID:bigint,passenger_count:bigint,count:bigint>
範例 – Spark 資料框架

以下範例顯示如何搭配使用 EXPLAIN 與 Spark 資料框架。

taxi1_df=taxi_df.groupBy("VendorID", "passenger_count").count() taxi1_df.explain("extended")

輸出

Calculation started (calculation_id=d2c1ebd1-f9f0-db25-8477-3effc001b309) in (session=24c1ebcb-57a8-861e-1023-736f5ae55386). Checking calculation status... Calculation completed. == Parsed Logical Plan == 'Aggregate ['VendorID, 'passenger_count], ['VendorID, 'passenger_count, count(1) AS count#321L] +- Relation [VendorID#49L,tpep_pickup_datetime#50,tpep_dropoff_datetime#51, passenger_count#52L,trip_distance#53,RatecodeID#54L,store_and_fwd_flag#55, PULocationID#56L,DOLocationID#57L,payment_type#58L,fare_amount#59, extra#60,mta_tax#61,tip_amount#62,tolls_amount#63,improvement_surcharge#64, total_amount#65,congestion_surcharge#66,airport_fee#67] parquet == Analyzed Logical Plan == VendorID: bigint, passenger_count: bigint, count: bigint Aggregate [VendorID#49L, passenger_count#52L], [VendorID#49L, passenger_count#52L, count(1) AS count#321L] +- Relation [VendorID#49L,tpep_pickup_datetime#50,tpep_dropoff_datetime#51, passenger_count#52L,trip_distance#53,RatecodeID#54L,store_and_fwd_flag#55, PULocationID#56L,DOLocationID#57L,payment_type#58L,fare_amount#59,extra#60, mta_tax#61,tip_amount#62,tolls_amount#63,improvement_surcharge#64, total_amount#65,congestion_surcharge#66,airport_fee#67] parquet == Optimized Logical Plan == Aggregate [VendorID#49L, passenger_count#52L], [VendorID#49L, passenger_count#52L, count(1) AS count#321L] +- Project [VendorID#49L, passenger_count#52L] +- Relation [VendorID#49L,tpep_pickup_datetime#50,tpep_dropoff_datetime#51, passenger_count#52L,trip_distance#53,RatecodeID#54L,store_and_fwd_flag#55, PULocationID#56L,DOLocationID#57L,payment_type#58L,fare_amount#59,extra#60, mta_tax#61,tip_amount#62,tolls_amount#63,improvement_surcharge#64, total_amount#65,congestion_surcharge#66,airport_fee#67] parquet == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- HashAggregate(keys=[VendorID#49L, passenger_count#52L], functions=[count(1)], output=[VendorID#49L, passenger_count#52L, count#321L]) +- Exchange hashpartitioning(VendorID#49L, passenger_count#52L, 1000), ENSURE_REQUIREMENTS, [id=#531] +- HashAggregate(keys=[VendorID#49L, passenger_count#52L], functions=[partial_count(1)], output=[VendorID#49L, passenger_count#52L, count#326L]) +- FileScan parquet [VendorID#49L,passenger_count#52L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[s3://amzn-s3-demo-bucket/ notebooks/yellow_tripdata_2016-01.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<VendorID:bigint,passenger_count:bigint>