Step 5: Write and read Amazon Keyspaces data using the Apache Cassandra Spark Connector

Focus mode

Step 5: Write and read Amazon Keyspaces data using the Apache Cassandra Spark Connector - Amazon Keyspaces (for Apache Cassandra)

In this step, you start by loading the data from the sample file into a DataFrame with the Spark Cassandra Connector. Next, you write the data from the DataFrame into your Amazon Keyspaces table. You can also use this part independently, for example, to migrate data into an Amazon Keyspaces table. Finally, you read the data from your table into a DataFrame using the Spark Cassandra Connector. You can also use this part independently, for example, to read data from an Amazon Keyspaces table to perform data analytics with Apache Spark.

Start the Spark Shell as shown in the following example. Note that this example is using SigV4 authentication.


./spark-shell --files application.conf --conf spark.cassandra.connection.config.profile.path=application.conf --packages software.aws.mcs:aws-sigv4-auth-cassandra-java-driver-plugin:4.0.5,com.datastax.spark:spark-cassandra-connector_2.12:3.1.0 --conf spark.sql.extensions=com.datastax.spark.connector.CassandraSparkExtensions

Import the Spark Cassandra Connector with the following code.
```
import org.apache.spark.sql.cassandra._
```

To read data from the CSV file and store it in a DataFrame, you can use the following code example.


var df = spark.read.option("header","true").option("inferSchema","true").csv("keyspaces_sample_table.csv")

You can display the result with the following command.


scala> df.show();

The output should look similar to this.


+----------------+----+-----------+----+------------------+--------------------+-------------+
|           award|year|   category|rank|            author|          book_title|    publisher|
+----------------+----+-----------+----+------------------+--------------------+-------------+
|Kwesi Manu Prize|2020|    Fiction|   1|        Akua Mansa|   Where did you go?|SomePublisher|
|Kwesi Manu Prize|2020|    Fiction|   2|       John Stiles|           Yesterday|Example Books|
|Kwesi Manu Prize|2020|    Fiction|   3|        Nikki Wolf|Moving to the Cha...| AnyPublisher|
|            Wolf|2020|Non-Fiction|   1|       Wang Xiulan|    History of Ideas|Example Books|
|            Wolf|2020|Non-Fiction|   2|Ana Carolina Silva|       Science Today|SomePublisher|
|            Wolf|2020|Non-Fiction|   3| Shirley Rodriguez|The Future of Sea...| AnyPublisher|
|     Richard Roe|2020|    Fiction|   1| Alejandro Rosalez|         Long Summer|SomePublisher|
|     Richard Roe|2020|    Fiction|   2|       Arnav Desai|             The Key|Example Books|
|     Richard Roe|2020|    Fiction|   3|     Mateo Jackson|    Inside the Whale| AnyPublisher|
+----------------+----+-----------+----+------------------+--------------------+-------------+

You can confirm the schema of the data in the DataFrame as shown in the following example.


scala> df.printSchema

The output should look like this.


root
|-- award: string (nullable = true)
|-- year: integer (nullable = true)
|-- category: string (nullable = true)
|-- rank: integer (nullable = true)
|-- author: string (nullable = true)
|-- book_title: string (nullable = true)
|-- publisher: string (nullable = true)

Use the following command to write the data in the DataFrame to the Amazon Keyspaces table.
```
df.write.cassandraFormat("book_awards", "catalog").mode("APPEND").save()
```

To confirm that the data was saved, you can read it back to a dataframe, as shown in the following example.


var newDf = spark.read.cassandraFormat("book_awards", "catalog").load()

Then you can show the data that is now contained in the dataframe.


scala> newDf.show()

The output of that command should look like this.


+--------------------+------------------+----------------+-----------+-------------+----+----+
|          book_title|            author|           award|   category|    publisher|rank|year|
+--------------------+------------------+----------------+-----------+-------------+----+----+
|         Long Summer| Alejandro Rosalez|     Richard Roe|    Fiction|SomePublisher|   1|2020|
|    History of Ideas|       Wang Xiulan|            Wolf|Non-Fiction|Example Books|   1|2020|
|   Where did you go?|        Akua Mansa|Kwesi Manu Prize|    Fiction|SomePublisher|   1|2020|
|    Inside the Whale|     Mateo Jackson|     Richard Roe|    Fiction| AnyPublisher|   3|2020|
|           Yesterday|       John Stiles|Kwesi Manu Prize|    Fiction|Example Books|   2|2020|
|Moving to the Cha...|        Nikki Wolf|Kwesi Manu Prize|    Fiction| AnyPublisher|   3|2020|
|The Future of Sea...| Shirley Rodriguez|            Wolf|Non-Fiction| AnyPublisher|   3|2020|
|       Science Today|Ana Carolina Silva|            Wolf|Non-Fiction|SomePublisher|   2|2020|
|             The Key|       Arnav Desai|     Richard Roe|    Fiction|Example Books|   2|2020|
+--------------------+------------------+----------------+-----------+-------------+----+----+

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Step 4: Prepare the source data and the target table

Troubleshooting

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

Step 5: Write and read Amazon Keyspaces data using the Apache Cassandra Spark Connector

Did this page help you?

Next topic:

Previous topic:

Need help?