

# Tutorial: Loading data into Amazon Keyspaces using DSBulk
<a name="dsbulk-upload"></a>

This step-by-step tutorial guides you through migrating data from Apache Cassandra to Amazon Keyspaces using the DataStax Bulk Loader (DSBulk) available on [GitHub](https://github.com/datastax/dsbulk.git). Using DSBulk is useful to upload datasets to Amazon Keyspaces for academic or test purposes. For more information about how to migrate production workloads, see [Offline migration process: Apache Cassandra to Amazon Keyspaces](migrating-offline.md). In this tutorial, you complete the following steps.

Prerequisites – Set up an AWS account with credentials, create a JKS trust store file for the certificate, configure `cqlsh`, download and install DSBulk, and configure an `application.conf` file. 

1. **Create source CSV and target table** – Prepare a CSV file as the source data and create the target keyspace and table in Amazon Keyspaces.

1. **Prepare the data** – Randomize the data in the CSV file and analyze it to determine the average and maximum row sizes.

1. **Set throughput capacity** – Calculate the required write capacity units (WCUs) based on the data size and desired load time, and configure the table's provisioned capacity.

1. **Configure DSBulk settings** – Create a DSBulk configuration file with settings like authentication, SSL/TLS, consistency level, and connection pool size.

1. **Run the DSBulk load command** – Run the DSBulk load command to upload the data from the CSV file to the Amazon Keyspaces table, and monitor the progress.

**Topics**
+ [Prerequisites: Steps you have to complete before you can upload data with DSBulk](dsbulk-upload-prequs.md)
+ [Step 1: Create the source CSV file and a target table for the data upload using DSBulk](dsbulk-upload-source.md)
+ [Step 2: Prepare the data to upload using DSBulk](dsbulk-upload-prepare-data.md)
+ [Step 3: Set the throughput capacity for the target table](dsbulk-upload-capacity.md)
+ [Step 4: Configure `DSBulk` settings to upload data from the CSV file to the target table](dsbulk-upload-config.md)
+ [Step 5: Run the DSBulk `load` command to upload data from the CSV file to the target table](dsbulk-upload-run.md)

# Prerequisites: Steps you have to complete before you can upload data with DSBulk
<a name="dsbulk-upload-prequs"></a>

You must complete the following tasks before you can start this tutorial.

1. If you have not already done so, sign up for an AWS account by following the steps at [Setting up AWS Identity and Access Management](accessing.md#SettingUp.IAM).

1. Create credentials by following the steps at [Create and configure AWS credentials for Amazon Keyspaces](access.credentials.md).

1. Create a JKS trust store file.

   1.  Download the following digital certificates and save the files locally or in your home directory.

      1. AmazonRootCA1

      1. AmazonRootCA2

      1. AmazonRootCA3

      1. AmazonRootCA4

      1. Starfield Class 2 Root (optional – for backward compatibility)

      To download the certificates, you can use the following commands.

      ```
      curl -O https://www.amazontrust.com/repository/AmazonRootCA1.pem
      curl -O https://www.amazontrust.com/repository/AmazonRootCA2.pem
      curl -O https://www.amazontrust.com/repository/AmazonRootCA3.pem
      curl -O https://www.amazontrust.com/repository/AmazonRootCA4.pem
      curl -O https://certs.secureserver.net/repository/sf-class2-root.crt
      ```
**Note**  
Amazon Keyspaces previously used TLS certificates anchored to the Starfield Class 2 CA. AWS is migrating all AWS Regions to certificates issued under Amazon Trust Services (Amazon Root CAs 1–4). During this transition, configure clients to trust both Amazon Root CAs 1–4 and the Starfield root to ensure compatibility across all Regions.

   1. Convert the digital certificates into trustStore files and add them to the keystore.

      ```
      openssl x509 -outform der -in AmazonRootCA1.pem -out temp_file.der
      keytool -import -alias amazon-root-ca-1 -keystore cassandra_truststore.jks -file temp_file.der
      
      openssl x509 -outform der -in AmazonRootCA2.pem -out temp_file.der
      keytool -import -alias amazon-root-ca-2 -keystore cassandra_truststore.jks -file temp_file.der
      
      openssl x509 -outform der -in AmazonRootCA3.pem -out temp_file.der
      keytool -import -alias amazon-root-ca-3 -keystore cassandra_truststore.jks -file temp_file.der
      
      openssl x509 -outform der -in AmazonRootCA4.pem -out temp_file.der
      keytool -import -alias amazon-root-ca-4 -keystore cassandra_truststore.jks -file temp_file.der
                   
      openssl x509 -outform der -in sf-class2-root.crt -out temp_file.der
      keytool -import -alias cassandra -keystore cassandra_truststore.jks -file temp_file.der
      ```

      In the last step, you need to create a password for the keystore and trust each certificate. The interactive command looks like this.

      ```
      Enter keystore password:  
      Re-enter new password: 
      Owner: CN=Amazon Root CA 1, O=Amazon, C=US
      Issuer: CN=Amazon Root CA 1, O=Amazon, C=US
      Serial number: 66c9fcf99bf8c0a39e2f0788a43e696365bca
      Valid from: Tue May 26 00:00:00 UTC 2015 until: Sun Jan 17 00:00:00 UTC 2038
      Certificate fingerprints:
           SHA1: 8D:A7:F9:65:EC:5E:FC:37:91:0F:1C:6E:59:FD:C1:CC:6A:6E:DE:16
           SHA256: 8E:CD:E6:88:4F:3D:87:B1:12:5B:A3:1A:C3:FC:B1:3D:70:16:DE:7F:57:CC:90:4F:E1:CB:97:C6:AE:98:19:6E
      Signature algorithm name: SHA256withRSA
      Subject Public Key Algorithm: 2048-bit RSA key
      Version: 3
      
      Extensions: 
      
      #1: ObjectId: 2.5.29.19 Criticality=true
      BasicConstraints:[
        CA:true
        PathLen:2147483647
      ]
      
      #2: ObjectId: 2.5.29.15 Criticality=true
      KeyUsage [
        DigitalSignature
        Key_CertSign
        Crl_Sign
      ]
      
      #3: ObjectId: 2.5.29.14 Criticality=false
      SubjectKeyIdentifier [
      KeyIdentifier [
      0000: 84 18 CC 85 34 EC BC 0C   94 94 2E 08 59 9C C7 B2  ....4.......Y...
      0010: 10 4E 0A 08                                        .N..
      ]
      ]
      
      Trust this certificate? [no]:  yes
      Certificate was added to keystore
      Enter keystore password:  
      Owner: CN=Amazon Root CA 2, O=Amazon, C=US
      Issuer: CN=Amazon Root CA 2, O=Amazon, C=US
      Serial number: 66c9fd29635869f0a0fe58678f85b26bb8a37
      Valid from: Tue May 26 00:00:00 UTC 2015 until: Sat May 26 00:00:00 UTC 2040
      Certificate fingerprints:
           SHA1: 5A:8C:EF:45:D7:A6:98:59:76:7A:8C:8B:44:96:B5:78:CF:47:4B:1A
           SHA256: 1B:A5:B2:AA:8C:65:40:1A:82:96:01:18:F8:0B:EC:4F:62:30:4D:83:CE:C4:71:3A:19:C3:9C:01:1E:A4:6D:B4
      Signature algorithm name: SHA384withRSA
      Subject Public Key Algorithm: 4096-bit RSA key
      Version: 3
      
      Extensions: 
      
      #1: ObjectId: 2.5.29.19 Criticality=true
      BasicConstraints:[
        CA:true
        PathLen:2147483647
      ]
      
      #2: ObjectId: 2.5.29.15 Criticality=true
      KeyUsage [
        DigitalSignature
        Key_CertSign
        Crl_Sign
      ]
      
      #3: ObjectId: 2.5.29.14 Criticality=false
      SubjectKeyIdentifier [
      KeyIdentifier [
      0000: B0 0C F0 4C 30 F4 05 58   02 48 FD 33 E5 52 AF 4B  ...L0..X.H.3.R.K
      0010: 84 E3 66 52                                        ..fR
      ]
      ]
      
      Trust this certificate? [no]:  yes
      Certificate was added to keystore
      Enter keystore password:  
      Owner: CN=Amazon Root CA 3, O=Amazon, C=US
      Issuer: CN=Amazon Root CA 3, O=Amazon, C=US
      Serial number: 66c9fd5749736663f3b0b9ad9e89e7603f24a
      Valid from: Tue May 26 00:00:00 UTC 2015 until: Sat May 26 00:00:00 UTC 2040
      Certificate fingerprints:
           SHA1: 0D:44:DD:8C:3C:8C:1A:1A:58:75:64:81:E9:0F:2E:2A:FF:B3:D2:6E
           SHA256: 18:CE:6C:FE:7B:F1:4E:60:B2:E3:47:B8:DF:E8:68:CB:31:D0:2E:BB:3A:DA:27:15:69:F5:03:43:B4:6D:B3:A4
      Signature algorithm name: SHA256withECDSA
      Subject Public Key Algorithm: 256-bit EC (secp256r1) key
      Version: 3
      
      Extensions: 
      
      #1: ObjectId: 2.5.29.19 Criticality=true
      BasicConstraints:[
        CA:true
        PathLen:2147483647
      ]
      
      #2: ObjectId: 2.5.29.15 Criticality=true
      KeyUsage [
        DigitalSignature
        Key_CertSign
        Crl_Sign
      ]
      
      #3: ObjectId: 2.5.29.14 Criticality=false
      SubjectKeyIdentifier [
      KeyIdentifier [
      0000: AB B6 DB D7 06 9E 37 AC   30 86 07 91 70 C7 9C C4  ......7.0...p...
      0010: 19 B1 78 C0                                        ..x.
      ]
      ]
      
      Trust this certificate? [no]:  yes
      Certificate was added to keystore
      Enter keystore password:  
      Owner: CN=Amazon Root CA 4, O=Amazon, C=US
      Issuer: CN=Amazon Root CA 4, O=Amazon, C=US
      Serial number: 66c9fd7c1bb104c2943e5717b7b2cc81ac10e
      Valid from: Tue May 26 00:00:00 UTC 2015 until: Sat May 26 00:00:00 UTC 2040
      Certificate fingerprints:
           SHA1: F6:10:84:07:D6:F8:BB:67:98:0C:C2:E2:44:C2:EB:AE:1C:EF:63:BE
           SHA256: E3:5D:28:41:9E:D0:20:25:CF:A6:90:38:CD:62:39:62:45:8D:A5:C6:95:FB:DE:A3:C2:2B:0B:FB:25:89:70:92
      Signature algorithm name: SHA384withECDSA
      Subject Public Key Algorithm: 384-bit EC (secp384r1) key
      Version: 3
      
      Extensions: 
      
      #1: ObjectId: 2.5.29.19 Criticality=true
      BasicConstraints:[
        CA:true
        PathLen:2147483647
      ]
      
      #2: ObjectId: 2.5.29.15 Criticality=true
      KeyUsage [
        DigitalSignature
        Key_CertSign
        Crl_Sign
      ]
      
      #3: ObjectId: 2.5.29.14 Criticality=false
      SubjectKeyIdentifier [
      KeyIdentifier [
      0000: D3 EC C7 3A 65 6E CC E1   DA 76 9A 56 FB 9C F3 86  ...:en...v.V....
      0010: 6D 57 E5 81                                        mW..
      ]
      ]
      
      Trust this certificate? [no]:  yes
      Certificate was added to keystore
      Enter keystore password:  
      Owner: OU=Starfield Class 2 Certification Authority, O="Starfield Technologies, Inc.", C=US
      Issuer: OU=Starfield Class 2 Certification Authority, O="Starfield Technologies, Inc.", C=US
      Serial number: 0
      Valid from: Tue Jun 29 17:39:16 UTC 2004 until: Thu Jun 29 17:39:16 UTC 2034
      Certificate fingerprints:
           SHA1: AD:7E:1C:28:B0:64:EF:8F:60:03:40:20:14:C3:D0:E3:37:0E:B5:8A
           SHA256: 14:65:FA:20:53:97:B8:76:FA:A6:F0:A9:95:8E:55:90:E4:0F:CC:7F:AA:4F:B7:C2:C8:67:75:21:FB:5F:B6:58
      Signature algorithm name: SHA1withRSA (weak)
      Subject Public Key Algorithm: 2048-bit RSA key
      Version: 3
      
      Extensions: 
      
      #1: ObjectId: 2.5.29.35 Criticality=false
      AuthorityKeyIdentifier [
      KeyIdentifier [
      0000: BF 5F B7 D1 CE DD 1F 86   F4 5B 55 AC DC D7 10 C2  ._.......[U.....
      0010: 0E A9 88 E7                                        ....
      ]
      [OU=Starfield Class 2 Certification Authority, O="Starfield Technologies, Inc.", C=US]
      SerialNumber: [    00]
      ]
      
      #2: ObjectId: 2.5.29.19 Criticality=false
      BasicConstraints:[
        CA:true
        PathLen:2147483647
      ]
      
      #3: ObjectId: 2.5.29.14 Criticality=false
      SubjectKeyIdentifier [
      KeyIdentifier [
      0000: BF 5F B7 D1 CE DD 1F 86   F4 5B 55 AC DC D7 10 C2  ._.......[U.....
      0010: 0E A9 88 E7                                        ....
      ]
      ]
      
      
      Warning:
      The input uses the SHA1withRSA signature algorithm which is considered a security risk. This algorithm will be disabled in a future update.
      
      Trust this certificate? [no]:  yes
      Certificate was added to keystore
      ```

1. Set up the Cassandra Query Language shell (cqlsh) connection and confirm that you can connect to Amazon Keyspaces by following the steps at [Using `cqlsh` to connect to Amazon Keyspaces](programmatic.cqlsh.md). 

1. Download and install DSBulk. 
**Note**  
The version shown in this tutorial might not be the latest version available. Before you download DSBulk, check the [DataStax Bulk Loader download page](https://downloads.datastax.com/#bulk-loader) for the latest version, and update the version number in the following commands accordingly.

   1. To download DSBulk, you can use the following code.

      ```
      curl -OL https://downloads.datastax.com/dsbulk/dsbulk-1.8.0.tar.gz
      ```

   1. Then unpack the tar file and add DSBulk to your `PATH` as shown in the following example.

      ```
      tar -zxvf dsbulk-1.8.0.tar.gz
      # add the DSBulk directory to the path
      export PATH=$PATH:./dsbulk-1.8.0/bin
      ```

   1. Create an `application.conf` file to store settings to be used by DSBulk. You can save the following example as `./dsbulk_keyspaces.conf`. Replace `localhost` with the contact point of your local Cassandra cluster if you are not on the local node, for example the DNS name or IP address. Take note of the file name and path, as you're going to need to specify this later in the `dsbulk load` command. 

      ```
      datastax-java-driver {
        basic.contact-points = [ "localhost"]
        advanced.auth-provider {
              class = software.aws.mcs.auth.SigV4AuthProvider
              aws-region = us-east-1
        }
      }
      ```

   1. To enable SigV4 support, download the shaded `jar` file from [GitHub](https://github.com/aws/aws-sigv4-auth-cassandra-java-driver-plugin/releases/) and place it in the DSBulk `lib` folder as shown in the following example.

      ```
      curl -O -L https://github.com/aws/aws-sigv4-auth-cassandra-java-driver-plugin/releases/download/4.0.6-shaded-v2/aws-sigv4-auth-cassandra-java-driver-plugin-4.0.6-shaded.jar
      ```

# Step 1: Create the source CSV file and a target table for the data upload using DSBulk
<a name="dsbulk-upload-source"></a>

For this tutorial, we use a comma-separated values (CSV) file with the name `keyspaces_sample_table.csv` as the source file for the data migration. The provided sample file contains a few rows of data for a table with the name `book_awards`.

1. Create the source file. You can choose one of the following options:
   + Download the sample CSV file (`keyspaces_sample_table.csv`) contained in the following archive file [samplemigration.zip](samples/samplemigration.zip). Unzip the archive and take note of the path to `keyspaces_sample_table.csv`.
   + To populate a CSV file with your own data stored in an Apache Cassandra database, you can populate the source CSV file by using `dsbulk unload` as shown in the following example.

     ```
     dsbulk unload -k mykeyspace -t mytable -f ./my_application.conf > keyspaces_sample_table.csv
     ```

     Make sure the CSV file you create meets the following requirements:
     + The first row contains the column names.
     + The column names in the source CSV file match the column names in the target table.
     + The data is delimited with a comma.
     + All data values are valid Amazon Keyspaces data types. See [Data types](cql.elements.md#cql.data-types).

1. Create the target keyspace and table in Amazon Keyspaces.

   1. Connect to Amazon Keyspaces using `cqlsh`, replacing the service endpoint, user name, and password in the following example with your own values.

      ```
      cqlsh cassandra.us-east-1.amazonaws.com 9142 -u "111122223333" -p "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" --ssl
      ```

   1. Create a new keyspace with the name `catalog` as shown in the following example. 

      ```
      CREATE KEYSPACE catalog WITH REPLICATION = {'class': 'SingleRegionStrategy'};
      ```

   1. After the new keyspace has a status of available, use the following code to create the target table `book_awards`. To learn more about asynchronous resource creation and how to check if a resource is available, see [Check keyspace creation status in Amazon Keyspaces](keyspaces-create.md).

      ```
      CREATE TABLE catalog.book_awards (
         year int,
         award text,
         rank int, 
         category text,
         book_title text,
         author text, 
         publisher text,
         PRIMARY KEY ((year, award), category, rank)
         );
      ```

   If Apache Cassandra is your original data source, a simple way to create the Amazon Keyspaces target table with matching headers is to generate the `CREATE TABLE` statement from the source table as shown in the following statement.

   ```
   cqlsh localhost 9042  -u "username" -p "password" --execute "DESCRIBE TABLE mykeyspace.mytable;"
   ```

   Then create the target table in Amazon Keyspaces with the column names and data types matching the description from the Cassandra source table.

# Step 2: Prepare the data to upload using DSBulk
<a name="dsbulk-upload-prepare-data"></a>

Preparing the source data for an efficient transfer is a two-step process. First, you randomize the data. In the second step, you analyze the data to determine the appropriate `dsbulk` parameter values and required table settings.

**Randomize the data**  
The `dsbulk` command reads and writes data in the same order that it appears in the CSV file. If you use the `dsbulk` command to create the source file, the data is written in key-sorted order in the CSV. Internally, Amazon Keyspaces partitions data using partition keys. Although Amazon Keyspaces has built-in logic to help load balance requests for the same partition key, loading the data is faster and more efficient if you randomize the order. This is because you can take advantage of the built-in load balancing that occurs when Amazon Keyspaces is writing to different partitions.

To spread the writes across the partitions evenly, you must randomize the data in the source file. You can write an application to do this or use an open-source tool, such as [Shuf](https://en.wikipedia.org/wiki/Shuf). Shuf is freely available on Linux distributions, on macOS (by installing coreutils in [homebrew](https://brew.sh)), and on Windows (by using Windows Subsystem for Linux (WSL)). One extra step is required to prevent the header row with the column names to get shuffled in this step.

To randomize the source file while preserving the header, enter the following code.

```
tail -n +2 keyspaces_sample_table.csv | shuf -o keyspace.table.csv && (head -1 keyspaces_sample_table.csv && cat keyspace.table.csv ) > keyspace.table.csv1 && mv keyspace.table.csv1 keyspace.table.csv
```

Shuf rewrites the data to a new CSV file called `keyspace.table.csv`. You can now delete the `keyspaces_sample_table.csv` file—you no longer need it.

**Analyze the data**  
Determine the average and maximum row size by analyzing the data.

You do this for the following reasons:
+ The average row size helps to estimate the total amount of data to be transferred.
+ You need the average row size to provision the write capacity needed for the data upload.
+ You can make sure that each row is less than 1 MB in size, which is the maximum row size in Amazon Keyspaces.

**Note**  
This quota refers to row size, not partition size. Unlike Apache Cassandra partitions, Amazon Keyspaces partitions can be virtually unbound in size. Partition keys and clustering columns require additional storage for metadata, which you must add to the raw size of rows. For more information, see [Estimate row size in Amazon Keyspaces](calculating-row-size.md).

The following code uses [AWK](https://en.wikipedia.org/wiki/AWK) to analyze a CSV file and print the average and maximum row size.

```
awk -F, 'BEGIN {samp=10000;max=-1;}{if(NR>1){len=length($0);t+=len;avg=t/NR;max=(len>max ? len : max)}}NR==samp{exit}END{printf("{lines: %d, average: %d bytes, max: %d bytes}\n",NR,avg,max);}' keyspace.table.csv
```

Running this code results in the following output.

```
using 10,000 samples:
{lines: 10000, avg: 123 bytes, max: 225 bytes}
```

Make sure that your maximum row size doesn't exceed 1 MB. If it does, you have to break up the row or compress the data to bring the row size below 1 MB. In the next step of this tutorial, you use the average row size to provision the write capacity for the table. 

# Step 3: Set the throughput capacity for the target table
<a name="dsbulk-upload-capacity"></a>

This tutorial shows you how to tune DSBulk to load data within a set time range. Because you know how many reads and writes you perform in advance, use provisioned capacity mode. After you finish the data transfer, you should set the capacity mode of the table to match your application’s traffic patterns. To learn more about capacity management, see [Managing serverless resources in Amazon Keyspaces (for Apache Cassandra)](serverless_resource_management.md).

With provisioned capacity mode, you specify how much read and write capacity you want to provision to your table in advance. Write capacity is billed hourly and metered in write capacity units (WCUs). Each WCU is enough write capacity to support writing 1 KB of data per second. When you load the data, the write rate must be under the max WCUs (parameter: `write_capacity_units`) that are set on the target table. 

By default, you can provision up to 40,000 WCUs to a table and 80,000 WCUs across all the tables in your account. If you need additional capacity, you can request a quota increase in the [ Service Quotas](https://console.aws.amazon.com/servicequotas/home#!/services/cassandra/quotas) console. For more information about quotas, see [Quotas for Amazon Keyspaces (for Apache Cassandra)](quotas.md).

**Calculate the average number of WCUs required for an insert**  
Inserting 1 KB of data per second requires 1 WCU. If your CSV file has 360,000 rows and you want to load all the data in 1 hour, you must write 100 rows per second (360,000 rows / 60 minutes / 60 seconds = 100 rows per second). If each row has up to 1 KB of data, to insert 100 rows per second, you must provision 100 WCUs to your table. If each row has 1.5 KB of data, you need two WCUs to insert one row per second. Therefore, to insert 100 rows per second, you must provision 200 WCUs.

To determine how many WCUs you need to insert one row per second, divide the average row size in bytes by 1024 and round up to the nearest whole number.

For example, if the average row size is 3000 bytes, you need three WCUs to insert one row per second.

```
ROUNDUP(3000 / 1024) = ROUNDUP(2.93) = 3 WCUs
```

**Calculate data load time and capacity**  
Now that you know the average size and number of rows in your CSV file, you can calculate how many WCUs you need to load the data in a given amount of time, and the approximate time it takes to load all the data in your CSV file using different WCU settings.

For example, if each row in your file is 1 KB and you have 1,000,000 rows in your CSV file, to load the data in 1 hour, you need to provision at least 278 WCUs to your table for that hour.

```
1,000,000 rows * 1 KBs = 1,000,000 KBs
1,000,000 KBs / 3600 seconds =277.8 KBs / second = 278 WCUs
```

**Configure provisioned capacity settings**  
You can set a table’s write capacity settings when you create the table or by using the `ALTER TABLE` command. The following is the syntax for altering a table’s provisioned capacity settings with the `ALTER TABLE` command.

```
ALTER TABLE catalog.book_awards WITH custom_properties={'capacity_mode':{'throughput_mode': 'PROVISIONED', 'read_capacity_units': 100, 'write_capacity_units': 278}} ;  
```

For the complete language reference, see [CREATE TABLE](cql.ddl.table.md#cql.ddl.table.create) and [ALTER TABLE](cql.ddl.table.md#cql.ddl.table.alter).

# Step 4: Configure `DSBulk` settings to upload data from the CSV file to the target table
<a name="dsbulk-upload-config"></a>

This section outlines the steps required to configure DSBulk for data upload to Amazon Keyspaces. You configure DSBulk by using a configuration file. You specify the configuration file directly from the command line.

1. Create a DSBulk configuration file for the migration to Amazon Keyspaces, in this example we use the file name `dsbulk_keyspaces.conf`. Specify the following settings in the DSBulk configuration file.

   1. *`PlainTextAuthProvider`* – Create the authentication provider with the `PlainTextAuthProvider` class. `ServiceUserName` and `ServicePassword` should match the user name and password you obtained when you generated the service-specific credentials by following the steps at [Create credentials for programmatic access to Amazon Keyspaces](programmatic.credentials.md).

   1. *`local-datacenter`* – Set the value for `local-datacenter` to the AWS Region that you're connecting to. For example, if the application is connecting to `cassandra.us-east-1.amazonaws.com`, then set the local data center to `us-east-1`. For all available AWS Regions, see [Service endpoints for Amazon Keyspaces](programmatic.endpoints.md). To avoid replicas, set `slow-replica-avoidance` to `false`.

   1. *`SSLEngineFactory`* – To configure SSL/TLS, initialize the `SSLEngineFactory` by adding a section in the configuration file with a single line that specifies the class with `class = DefaultSslEngineFactory`. Provide the path to `cassandra_truststore.jks` and the password that you created previously.

   1. *`consistency`* – Set the consistency level to `LOCAL QUORUM`. Other write consistency levels are not supported, for more information see [Supported Apache Cassandra read and write consistency levels and associated costs](consistency.md).

   1. The number of connections per pool is configurable in the Java driver. For this example, set `advanced.connection.pool.local.size` to 3.

   The following is the complete sample configuration file.

   ```
   datastax-java-driver {
   basic.contact-points = [ "cassandra.us-east-1.amazonaws.com:9142"]
   advanced.auth-provider {
       class = PlainTextAuthProvider
       username = "ServiceUserName"
       password = "ServicePassword"
   }
   
   basic.load-balancing-policy {
       local-datacenter = "us-east-1"
       slow-replica-avoidance = false           
   }
   
   basic.request {
       consistency = LOCAL_QUORUM
       default-idempotence = true
   }
   advanced.ssl-engine-factory {
       class = DefaultSslEngineFactory
       truststore-path = "./cassandra_truststore.jks"
       truststore-password = "my_password"
       hostname-validation = false
     }
   advanced.connection.pool.local.size = 3
   }
   ```

1. Review the parameters for the DSBulk `load` command.

   1. *`executor.maxPerSecond`* – The maximum number of rows that the load command attempts to process concurrently per second. If unset, this setting is disabled with -1.

      Set `executor.maxPerSecond` based on the number of WCUs that you provisioned to the target destination table. The `executor.maxPerSecond` of the `load` command isn’t a limit – it’s a target average. This means it can (and often does) burst above the number you set. To allow for bursts and make sure that enough capacity is in place to handle the data load requests, set `executor.maxPerSecond` to 90% of the table’s write capacity.

      ```
      executor.maxPerSecond = WCUs * .90
      ```

      In this tutorial, we set `executor.maxPerSecond` to 5.
**Note**  
If you are using DSBulk 1.6.0 or higher, you can use `dsbulk.engine.maxConcurrentQueries` instead.

   1. Configure these additional parameters for the DSBulk `load` command.
      + *`batch-mode`* – This parameter tells the system to group operations by partition key. We recommend to disable batch mode, because it can result in hot key scenarios and cause `WriteThrottleEvents`.
      + *`driver.advanced.retry-policy-max-retries`* – This determines how many times to retry a failed query. If unset, the default is 10. You can adjust this value as needed.
      + *`driver.basic.request.timeout`* – The time in minutes the system waits for a query to return. If unset, the default is "5 minutes". You can adjust this value as needed.

# Step 5: Run the DSBulk `load` command to upload data from the CSV file to the target table
<a name="dsbulk-upload-run"></a>

In the final step of this tutorial, you upload the data into Amazon Keyspaces.

To run the DSBulk `load` command, complete the following steps.

1. Run the following code to upload the data from your csv file to your Amazon Keyspaces table. Make sure to update the path to the application configuration file you created earlier.

   ```
   dsbulk load -f ./dsbulk_keyspaces.conf  --connector.csv.url keyspace.table.csv -header true --batch.mode DISABLED --executor.maxPerSecond 5 --driver.basic.request.timeout "5 minutes" --driver.advanced.retry-policy.max-retries 10 -k catalog -t book_awards
   ```

1. The output includes the location of a log file that details successful and unsuccessful operations. The file is stored in the following directory.

   ```
   Operation directory: /home/user_name/logs/UNLOAD_20210308-202317-801911
   ```

1. The log file entries will include metrics, as in the following example. Check to make sure that the number of rows is consistent with the number of rows in your csv file.

   ```
   total | failed | rows/s | p50ms | p99ms | p999ms
      200 |      0 |    200 | 21.63 | 21.89 |  21.89
   ```

**Important**  
Now that you have transferred your data, adjust the capacity mode settings of your target table to match your application’s regular traffic patterns. You incur charges at the hourly rate for your provisioned capacity until you change it. For more information, see [Configure read/write capacity modes in Amazon Keyspaces](ReadWriteCapacityMode.md).