Utilisez l'EXPLAINinstruction Spark pour résoudre les problèmes liés à Spark SQL

Vous pouvez utiliser l'EXPLAINinstruction Spark avec Spark SQL pour résoudre les problèmes liés à votre code Spark. Les exemples de code et de résultats suivants illustrent cette utilisation.

Exemple — SELECT Déclaration Spark


spark.sql("select * from select_taxi_table").explain(True)

Sortie


Calculation started (calculation_id=20c1ebd0-1ccf-ef14-db35-7c1844876a7e) in 
(session=24c1ebcb-57a8-861e-1023-736f5ae55386). 
Checking calculation status...

Calculation completed.
== Parsed Logical Plan ==
'Project [*]
+- 'UnresolvedRelation [select_taxi_table], [], false

== Analyzed Logical Plan ==
VendorID: bigint, passenger_count: bigint, count: bigint
Project [VendorID#202L, passenger_count#203L, count#204L]
+- SubqueryAlias spark_catalog.spark_demo_database.select_taxi_table
   +- Relation spark_demo_database.select_taxi_table[VendorID#202L,
       passenger_count#203L,count#204L] csv

== Optimized Logical Plan ==
Relation spark_demo_database.select_taxi_table[VendorID#202L,
passenger_count#203L,count#204L] csv

== Physical Plan ==
FileScan csv spark_demo_database.select_taxi_table[VendorID#202L,
passenger_count#203L,count#204L] 
Batched: false, DataFilters: [], Format: CSV, 
Location: InMemoryFileIndex(1 paths)
[s3://amzn-s3-demo-bucket/select_taxi], 
PartitionFilters: [], PushedFilters: [], 
ReadSchema: struct<VendorID:bigint,passenger_count:bigint,count:bigint>

Exemple – Bloc de données Spark

L'exemple suivant illustre comment utiliser EXPLAIN avec un bloc de données Spark.


taxi1_df=taxi_df.groupBy("VendorID", "passenger_count").count()
taxi1_df.explain("extended")

Sortie


Calculation started (calculation_id=d2c1ebd1-f9f0-db25-8477-3effc001b309) in 
(session=24c1ebcb-57a8-861e-1023-736f5ae55386). 
Checking calculation status...

Calculation completed.
== Parsed Logical Plan ==
'Aggregate ['VendorID, 'passenger_count], 
['VendorID, 'passenger_count, count(1) AS count#321L]
+- Relation [VendorID#49L,tpep_pickup_datetime#50,tpep_dropoff_datetime#51,
passenger_count#52L,trip_distance#53,RatecodeID#54L,store_and_fwd_flag#55,
PULocationID#56L,DOLocationID#57L,payment_type#58L,fare_amount#59,
extra#60,mta_tax#61,tip_amount#62,tolls_amount#63,improvement_surcharge#64,
total_amount#65,congestion_surcharge#66,airport_fee#67] parquet

== Analyzed Logical Plan ==
VendorID: bigint, passenger_count: bigint, count: bigint
Aggregate [VendorID#49L, passenger_count#52L], 
[VendorID#49L, passenger_count#52L, count(1) AS count#321L]
+- Relation [VendorID#49L,tpep_pickup_datetime#50,tpep_dropoff_datetime#51,
passenger_count#52L,trip_distance#53,RatecodeID#54L,store_and_fwd_flag#55,
PULocationID#56L,DOLocationID#57L,payment_type#58L,fare_amount#59,extra#60,
mta_tax#61,tip_amount#62,tolls_amount#63,improvement_surcharge#64,
total_amount#65,congestion_surcharge#66,airport_fee#67] parquet

== Optimized Logical Plan ==
Aggregate [VendorID#49L, passenger_count#52L], 
[VendorID#49L, passenger_count#52L, count(1) AS count#321L]
+- Project [VendorID#49L, passenger_count#52L]
   +- Relation [VendorID#49L,tpep_pickup_datetime#50,tpep_dropoff_datetime#51,
passenger_count#52L,trip_distance#53,RatecodeID#54L,store_and_fwd_flag#55,
PULocationID#56L,DOLocationID#57L,payment_type#58L,fare_amount#59,extra#60,
mta_tax#61,tip_amount#62,tolls_amount#63,improvement_surcharge#64,
total_amount#65,congestion_surcharge#66,airport_fee#67] parquet

== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[VendorID#49L, passenger_count#52L], functions=[count(1)], 
output=[VendorID#49L, passenger_count#52L, count#321L])
   +- Exchange hashpartitioning(VendorID#49L, passenger_count#52L, 1000), 
      ENSURE_REQUIREMENTS, [id=#531]
      +- HashAggregate(keys=[VendorID#49L, passenger_count#52L], 
         functions=[partial_count(1)], output=[VendorID#49L, 
         passenger_count#52L, count#326L])
         +- FileScan parquet [VendorID#49L,passenger_count#52L] Batched: true, 
            DataFilters: [], Format: Parquet, 
            Location: InMemoryFileIndex(1 paths)[s3://amzn-s3-demo-bucket/
            notebooks/yellow_tripdata_2016-01.parquet], PartitionFilters: [], 
            PushedFilters: [], 
            ReadSchema: struct<VendorID:bigint,passenger_count:bigint>

Avertissement JavaScript est désactivé ou n'est pas disponible dans votre navigateur.

Pour que vous puissiez utiliser la documentation AWS, Javascript doit être activé. Vous trouverez des instructions sur les pages d'aide de votre navigateur.

Conventions de rédaction

Groupes de travail compatibles avec Spark

Enregistrer les événements de l'application