Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Step 5: Write and read Amazon Keyspaces data using the Apache Cassandra Spark Connector

Focus mode
Step 5: Write and read Amazon Keyspaces data using the Apache Cassandra Spark Connector - Amazon Keyspaces (for Apache Cassandra)

In this step, you start by loading the data from the sample file into a DataFrame with the Spark Cassandra Connector. Next, you write the data from the DataFrame into your Amazon Keyspaces table. You can also use this part independently, for example, to migrate data into an Amazon Keyspaces table. Finally, you read the data from your table into a DataFrame using the Spark Cassandra Connector. You can also use this part independently, for example, to read data from an Amazon Keyspaces table to perform data analytics with Apache Spark.

  1. Start the Spark Shell as shown in the following example. Note that this example is using SigV4 authentication.

    ./spark-shell --files application.conf --conf spark.cassandra.connection.config.profile.path=application.conf --packages software.aws.mcs:aws-sigv4-auth-cassandra-java-driver-plugin:4.0.5,com.datastax.spark:spark-cassandra-connector_2.12:3.1.0 --conf spark.sql.extensions=com.datastax.spark.connector.CassandraSparkExtensions
  2. Import the Spark Cassandra Connector with the following code.

    import org.apache.spark.sql.cassandra._
  3. To read data from the CSV file and store it in a DataFrame, you can use the following code example.

    var df = spark.read.option("header","true").option("inferSchema","true").csv("keyspaces_sample_table.csv")

    You can display the result with the following command.

    scala> df.show();

    The output should look similar to this.

    +----------------+----+-----------+----+------------------+--------------------+-------------+ | award|year| category|rank| author| book_title| publisher| +----------------+----+-----------+----+------------------+--------------------+-------------+ |Kwesi Manu Prize|2020| Fiction| 1| Akua Mansa| Where did you go?|SomePublisher| |Kwesi Manu Prize|2020| Fiction| 2| John Stiles| Yesterday|Example Books| |Kwesi Manu Prize|2020| Fiction| 3| Nikki Wolf|Moving to the Cha...| AnyPublisher| | Wolf|2020|Non-Fiction| 1| Wang Xiulan| History of Ideas|Example Books| | Wolf|2020|Non-Fiction| 2|Ana Carolina Silva| Science Today|SomePublisher| | Wolf|2020|Non-Fiction| 3| Shirley Rodriguez|The Future of Sea...| AnyPublisher| | Richard Roe|2020| Fiction| 1| Alejandro Rosalez| Long Summer|SomePublisher| | Richard Roe|2020| Fiction| 2| Arnav Desai| The Key|Example Books| | Richard Roe|2020| Fiction| 3| Mateo Jackson| Inside the Whale| AnyPublisher| +----------------+----+-----------+----+------------------+--------------------+-------------+

    You can confirm the schema of the data in the DataFrame as shown in the following example.

    scala> df.printSchema

    The output should look like this.

    root |-- award: string (nullable = true) |-- year: integer (nullable = true) |-- category: string (nullable = true) |-- rank: integer (nullable = true) |-- author: string (nullable = true) |-- book_title: string (nullable = true) |-- publisher: string (nullable = true)
  4. Use the following command to write the data in the DataFrame to the Amazon Keyspaces table.

    df.write.cassandraFormat("book_awards", "catalog").mode("APPEND").save()
  5. To confirm that the data was saved, you can read it back to a dataframe, as shown in the following example.

    var newDf = spark.read.cassandraFormat("book_awards", "catalog").load()

    Then you can show the data that is now contained in the dataframe.

    scala> newDf.show()

    The output of that command should look like this.

    +--------------------+------------------+----------------+-----------+-------------+----+----+ | book_title| author| award| category| publisher|rank|year| +--------------------+------------------+----------------+-----------+-------------+----+----+ | Long Summer| Alejandro Rosalez| Richard Roe| Fiction|SomePublisher| 1|2020| | History of Ideas| Wang Xiulan| Wolf|Non-Fiction|Example Books| 1|2020| | Where did you go?| Akua Mansa|Kwesi Manu Prize| Fiction|SomePublisher| 1|2020| | Inside the Whale| Mateo Jackson| Richard Roe| Fiction| AnyPublisher| 3|2020| | Yesterday| John Stiles|Kwesi Manu Prize| Fiction|Example Books| 2|2020| |Moving to the Cha...| Nikki Wolf|Kwesi Manu Prize| Fiction| AnyPublisher| 3|2020| |The Future of Sea...| Shirley Rodriguez| Wolf|Non-Fiction| AnyPublisher| 3|2020| | Science Today|Ana Carolina Silva| Wolf|Non-Fiction|SomePublisher| 2|2020| | The Key| Arnav Desai| Richard Roe| Fiction|Example Books| 2|2020| +--------------------+------------------+----------------+-----------+-------------+----+----+
PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.