Upgrade the solution - Clickstream Analytics on AWS

Upgrade the solution

Note

Be advised that upgrading directly from version 1.0.x to 1.1.6(+) version is not supported. It is necessary to upgrade to version 1.1.5 first.

Note

By upgrading the web console from earlier 1.1 versions before 1.1.6, you could continue to view the dashboards of the project. However, you could not explore the existing Clickstream data due the changes of data schemas. If you wish to continue to use the Explorations, you will need to upgrade the data pipeline as well and migrate the existing data to new data schemas (if you want to explore historical data).

Planning and Preparation

  1. Backup of Modified QuickSight Analysis and Dashboard: The solution upgrade may involve modifying the out-of-the-box analysis and dashboard. In this case, you can back them up following this documentation.

  2. Data Processing Interval (only applicable to upgrade from v1.0.x): The pipeline upgrade will take about 20 minutes. Make sure no data processing job is running while upgrading the existing pipeline. You can update the existing pipeline to increase the interval and view whether there are running jobs of the EMR Serverless application in the console.

Upgrade Process

Upgrade the web console stack

  1. Log in to AWS CloudFormation console, select your existing web console stack, and choose Update.

  2. Select Replace current template.

  3. Under Specify template:

    • Select Amazon S3 URL.

    • Refer to the table below to find the link for your deployment type.

    • Paste the link in the Amazon S3 URL box.

    • Choose Next.

    Template Description
    Use Cognito for authentication Deploy as public service in AWS Regions
    Use Cognito for authentication with custom domain Deploy as public service with custom domain in AWS Regions
    Use OIDC for authentication Deploy as public service in AWS Regions
    Use OIDC for authentication with custom domain Deploy as public service with custom domain in AWS Regions
    Use OIDC for authentication within VPC Deploy as private service within VPC in AWS Regions
    Use OIDC for authentication with custom domain in AWS China Deploy as public service with custom domain in AWS China Regions
    Use OIDC for authentication within VPC in AWS China Deploy as private service within VPC in AWS China Regions
  4. Under Parameters, review the parameters for the template and modify them as necessary. Refer to Deployment for details about the parameters.

  5. Choose Next.

  6. On the Configure stack options page, choose Next.

  7. On the Review page, review and confirm the settings. Be sure to check the box acknowledging that the template might create (IAM) resources.

  8. Choose View change set and verify the changes.

  9. Choose Execute change set to deploy the stack.

You can view the status of the stack in the AWS CloudFormation console in the Status column. You should receive an UPDATE_COMPLETE status after a few minutes.

Upgrade the pipeline of project

Important

If you encounter any issues during the upgrade process, refer to Troubleshooting for more information.

  1. Log in to the web console of the solution.

  2. Go to Projects, and choose the project to be upgraded.

  3. Choose project id or View Details button.

  4. In the project details page, choose the Upgrade button. You will be prompted to confirm the upgrade action.

  5. Choose Confirm.

You can view the status of the pipeline in the solution console in the Status column. After a few minutes, you can receive an Active status.

Post-Upgrade Actions

This section provides instructions for post-upgrade actions.

Ingestion

As of version 1.1.7, this solution uses launch templates. After upgrading the data ingestion module, complete the following steps to replace the Amazon EC2 instances used by Amazon ECS with the new launch template configuration.

  1. Increase the desired task number by updating the Amazon ECS service.

  2. After the newly added Amazon ECS tasks have started successfully, manually stop the old tasks.

  3. Manually terminate the old Amazon EC2 instances.

Data Modeling

Upgrade the Data Schema and Out-of-the-box Dashboards

The solution automatically and asynchronously upgrades the views and materialized views used by the dashboard after upgrading the pipeline of the project. The duration of the update depends on the workload of the Redshift cluster and the existing data volume, and can take minutes to hours. You can track the progress in the Redshift Schemas section in the Processing tab of the Pipeline Detail page. If the post-configuration job fails, you can access the execution of the workflow through its link and rerun the job via Actions - Redrive or New execution with the input unchanged.

Migrate the existing data (only applicable when upgrading from version earlier than v1.1.6)

  1. Open Redshift query editor v2. For more information, refer to Working with query editor v2 to log in and query data using Redshift query editor v2.

    Note

    The data migration process is CPU-intensive and will incur additional cost. Before starting the migration, ensure that the load on your Redshift is low. It's also advisable to consider temporarily increasing the RPUs of Redshift Serverless or the cluster size when migrating large volumes of data.

    In our benchmark, we migrated 100 million events in 25 minutes using 32 RPUs of Redshift Serverless.

    • Average number of events per day: **10 million**

    • Total events for 30 days: **300 million**

    • RedShift RPU: **32 RPUs**

    • Total duration: **4 hours 45 minutes**

    • Total cost: **$47.77**

  2. Select the Serverless workgroup or provisioned cluster, <project-id>-><app-id>->Tables, and make sure tables for the appId are listed there.

  3. Create a new SQL Editor, select your project's schema.

  4. Execute below SQL in editor. Customize the date range as desired, and execute the following SQL in the editor to migrate events from the past 30 days, or any number of days up to the present, to the new tables.

    -- please replace `<app-id>` with your actual app id -- update the day range based on your need CALL "<app-id>".sp_migrate_data_v2(30);
  5. Wait for the SQL to complete. The execution time depends on the volume of data in events table.

  6. Execute the following SQL to check the stored procedure execution log; ensure there are no errors. If there are any interruptions, timeouts, or other errors, you can re-execute step 4 to continue the data migration.

    -- please replace `<app-id>` with your actual app id SELECT * FROM "<app-id>"."clickstream_log" WHERE log_name = 'sp_migrate_event_to_v2' ORDER BY log_date DESC; SELECT * FROM "<app-id>"."clickstream_log" WHERE log_name = 'sp_migrate_user_to_v2' ORDER BY log_date DESC; SELECT * FROM "<app-id>"."clickstream_log" WHERE log_name = 'sp_migrate_item_to_v2' ORDER BY log_date DESC; SELECT * FROM "<app-id>"."clickstream_log" WHERE log_name = 'sp_migrate_session_to_v2' ORDER BY log_date DESC; SELECT * FROM "<app-id>"."clickstream_log" WHERE log_name = 'sp_migrate_data_to_v2' ORDER BY log_date DESC;
  7. Populate the event data to clickstream_event_base_view table.

    -- please replace `<app-id>` with your actual app id -- update the day range (30 days in below example based on your need CALL "<app-id>".clickstream_event_base_view_sp(NULL, NULL,24*30);
    Note

    It is recommended to refresh the clickstream_event_base_view in batches, especially in the following scenarios:

    • When there are new event load jobs coming in before the migration job completes.

    • When there are new event load jobs coming in before the migration job completes.

    Refreshing the data in batches needs to be done based on the event timestamp. Call the following stored procedure multiple times, in order from old to new event timestamps using below SQL command.

    call "schema".clickstream_event_base_view(start_event_timestamp, end_event_timestamp, 1);

    For example, to refresh data between 2024-05-10 00:00:00 and 2024-05-12 00:00:00, execute the following SQL:

    call "schema".clickstream_event_base_view_sp(TIMESTAMP 'epoch' + 1715270400 * INTERVAL '1 second', TIMESTAMP 'epoch' + 1715443200 * INTERVAL '1 second', 1);

  8. Follow this guide to calculate metrics for the new preset dashboard based on the migrated data.

  9. If your applications no longer use the legacy tables and views, run the SQLs below to clean them to save the storage of Redshift.

    -- please replace `<app-id>` with your actual app id DROP TABLE "<app-id>".event CASCADE; DROP TABLE "<app-id>".item CASCADE; DROP TABLE "<app-id>".user CASCADE; DROP TABLE "<app-id>".event_parameter CASCADE; DROP PROCEDURE "<app-id>".sp_migrate_event_to_v2(nday integer); DROP PROCEDURE "<app-id>".sp_migrate_item_to_v2(nday integer); DROP PROCEDURE "<app-id>".sp_clear_expired_events(retention_range_days integer); DROP PROCEDURE "<app-id>".sp_migrate_data_to_v2(nday integer); DROP PROCEDURE "<app-id>".sp_migrate_user_to_v2(); DROP PROCEDURE "<app-id>".sp_migrate_session_to_v2(); DROP PROCEDURE "<app-id>".sp_clear_item_and_user();