

# 자습서: 개발 엔드포인트를 통해 REPL 셸 사용하기
<a name="dev-endpoint-tutorial-repl"></a>

 AWS Glue를 통해 개발 엔드포인트를 만들고 REPL(Read-Evaluate-Print Loop) 셀을 호출하여 점차 늘리면서 PySpark 코드를 실행합니다. 그러면 ETL 스크립트를 디버깅하기 전에 상호적으로 디버깅할 수 있습니다.

 개발 엔드포인트에서 REPL을 사용하려면 엔드포인트에 대한 SSH에 권한을 부여해야 합니다.

1. 로컬 컴퓨터에서 SSH 명령어를 실행할 수 있는 터미널창을 열고 편집한 SSH 명령어에 붙여 넣습니다. 명령을 실행합니다.

   개발 엔드포인트에 대해 Python 3을 사용하는 AWS Glue 버전 1.0을 수락했다고 가정하면 출력은 다음과 같습니다.

   ```
   Python 3.6.8 (default, Aug  2 2019, 17:42:44)
   [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   SLF4J: Class path contains multiple SLF4J bindings.
   SLF4J: Found binding in [jar:file:/usr/share/aws/glue/etl/jars/glue-assembly.jar!/org/slf4j/impl/StaticLoggerBinder.class]
   SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
   SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
   SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
   2019-09-23 22:12:23,071 WARN  [Thread-5] yarn.Client (Logging.scala:logWarning(66)) - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
   2019-09-23 22:12:26,562 WARN  [Thread-5] yarn.Client (Logging.scala:logWarning(66)) - Same name resource file:/usr/lib/spark/python/lib/pyspark.zip added multiple times to distributed cache
   2019-09-23 22:12:26,580 WARN  [Thread-5] yarn.Client (Logging.scala:logWarning(66)) - Same path resource file:///usr/share/aws/glue/etl/python/PyGlue.zip added multiple times to distributed cache.
   2019-09-23 22:12:26,581 WARN  [Thread-5] yarn.Client (Logging.scala:logWarning(66)) - Same path resource file:///usr/lib/spark/python/lib/py4j-src.zip added multiple times to distributed cache.
   2019-09-23 22:12:26,581 WARN  [Thread-5] yarn.Client (Logging.scala:logWarning(66)) - Same path resource file:///usr/share/aws/glue/libs/pyspark.zip added multiple times to distributed cache.
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /__ / .__/\_,_/_/ /_/\_\   version 2.4.3
         /_/
   
   Using Python version 3.6.8 (default, Aug  2 2019 17:42:44)
   SparkSession available as 'spark'.
   >>>
   ```

1. `print(spark.version)`을 입력하여 REPL 셀이 올바르게 작동하는지 테스트합니다. Spark 버전을 나타내는 한 REPL은 사용할 준비되어 있습니다.

1. 셀에서 다음 샘플 스크립트를 행마다 실행하고자 합니다.

   ```
   import sys
   from pyspark.context import SparkContext
   from awsglue.context import GlueContext
   from awsglue.transforms import *
   glueContext = GlueContext(SparkContext.getOrCreate())
   persons_DyF = glueContext.create_dynamic_frame.from_catalog(database="legislators", table_name="persons_json")
   print ("Count:  ", persons_DyF.count())
   persons_DyF.printSchema()
   ```