Migrate an on-premises ThoughtSpot Falcon database to Amazon Redshift
Created by Battulga Purevragchaa (AWS) and Antony Prasad Thevaraj (AWS)
Environment: PoC or pilot | Source: On-premises ThoughtSpot Falcon database | Target: Amazon Redshift |
R Type: Re-architect | Workload: All other workloads | Technologies: Migration; Databases |
AWS services: AWS DMS; Amazon Redshift |
Summary
On-premises data warehouses require significant administration time and resources, particularly for large datasets. The financial cost of building, maintaining, and growing these warehouses is also very high. To help manage costs, keep extract, transform, and load (ETL) complexity low, and deliver performance as your data grows, you must constantly choose which data to load and which data to archive.
By migrating your on-premises ThoughtSpot Falcon databases
This pattern describes the steps and process for migrating a ThoughtSpot Falcon database from an on-premises data center to an Amazon Redshift database on the AWS Cloud.
Prerequisites and limitations
Prerequisites
An active AWS account
A ThoughtSpot Falcon database hosted in an on-premises data center
Product versions
ThoughtSpot version 7.0.1
Architecture
The diagram shows the following workflow:
Data is hosted in an on-premises relational database.
AWS Schema Conversion Tool (AWS SCT) converts the data definition language (DDL) that is compatible with Amazon Redshift.
After the tables are created, you can migrate the data by using AWS Database Migration Service (AWS DMS).
The data is loaded into Amazon Redshift.
The data is stored in Amazon Simple Storage Service (Amazon S3) if you use Redshift Spectrum or already host the data in Amazon S3.
Tools
AWS DMS – AWS Data Migration Service (AWS DMS) helps you quickly and securely migrate databases to AWS.
Amazon Redshift – Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools.
AWS SCT – AWS Schema Conversion Tool (AWS SCT) converts your existing database schema from one database engine to another.
Epics
Task | Description | Skills required |
---|---|---|
Identify the appropriate Amazon Redshift configuration. | Identify the appropriate Amazon Redshift cluster configuration based on your requirements and data volume. For more information, see Amazon Redshift clusters in the Amazon Redshift documentation. | DBA |
Research Amazon Redshift to evaluate if it meets your requirements. | Use the Amazon Redshift FAQs | DBA |
Task | Description | Skills required |
---|---|---|
Create an Amazon Redshift cluster. | Sign in to the AWS Management Console, open the Amazon Redshift console, and then create an Amazon Redshift cluster in a virtual private cloud (VPC). For more information, see Creating a cluster in a VPC in the Amazon Redshift documentation. | DBA |
Conduct a PoC for your Amazon Redshift database design. | Follow Amazon Redshift best practices by conducting a proof of concept (PoC) for your database design. For more information, see Conducting a proof of concept for Amazon Redshift in the Amazon Redshift documentation. | DBA |
Create database users. | Create the users in your Amazon Redshift database and grant the appropriate roles for access to the schema and tables. For more information, see Grant access privileges for a user or user group in the Amazon Redshift documentation. | DBA |
Apply configuration settings to the target database. | Apply configuration settings to the Amazon Redshift database according to your requirements. For more information about enabling database, session, and server-level parameters, see the Configuration reference in the Amazon Redshift documentation. | DBA |
Task | Description | Skills required |
---|---|---|
Manually create tables with DDL in Amazon Redshift. | (Optional) If you use AWS SCT, the tables are automatically created. However, if there are failures when replicating DDLs, you have to manually create the tables | DBA |
Create external tables for Redshift Spectrum. | Create an external table with an external schema for Amazon Redshift Spectrum. To create external tables, you must be the owner of the external schema or a database superuser. For more information, see Creating external tables for Amazon Redshift Spectrum in the Amazon Redshift documentation. | DBA |
Task | Description | Skills required |
---|---|---|
Use AWS DMS to migrate the data. | After you create the DDL of the tables in the Amazon Redshift database, migrate your data to Amazon Redshift by using AWS DMS. For detailed steps and instructions, see Using an Amazon Redshift database as a target for AWS DMS in the AWS DMS documentation. | DBA |
Use the COPY command to load the data. | Use the Amazon Redshift For more information, see Using the COPY command to load from Amazon S3 in the Amazon Redshift documentation. | DBA |
Task | Description | Skills required |
---|---|---|
Validate the source and target records. | Validate the table count for the source and target records that were loaded from your source system. | DBA |
Implement Amazon Redshift best practices for performance tuning. | Implement Amazon Redshift best practices for table and database design. For more information, see the blog post Top 10 performance tuning techniques for Amazon Redshift | DBA |
Optimize query performance. | Amazon Redshift uses SQL-based queries to interact with data and objects in the system. Data manipulation language (DML) is the subset of SQL that you can use to view, add, change, and delete data. DDL is the subset of SQL that you use to add, change, and delete database objects such as tables and views. For more information, see Tuning query performance in the Amazon Redshift documentation. | DBA |
Implement WLM. | You can use workload management (WLM) to define multiple query queues and route queries to appropriate queues at runtime. For more information, see Implementing workload management in the Amazon Redshift documentation. | DBA |
Work with concurrency scaling. | By using the Concurrency Scaling feature, you can support virtually unlimited concurrent users and concurrent queries, with consistently fast query performance. For more information, see Working with concurrency scaling in the Amazon Redshift documentation. | DBA |
Use Amazon Redshift best practices for table design. | When you plan your database, certain important table design decisions can strongly influence overall query performance. For more information about choosing the most appropriate table design option, see Amazon Redshift best practices for designing tables in the Amazon Redshift documentation. | DBA |
Create materialized views in Amazon Redshift. | A materialized view contains a precomputed results set based on an SQL query over one or more base tables. You can issue For more information, see Creating materialized views in Amazon Redshift in the Amazon Redshift documentation. | DBA |
Define joins between the tables. | To search more than one table at the same time in ThoughtSpot, you must define joins between the tables by specifying columns that contain matching data across two tables. These columns represent the You can define them by using the | DBA |
Task | Description | Skills required |
---|---|---|
Add an Amazon Redshift connection. | Add an Amazon Redshift connection to your on-premises ThoughtSpot Falcon database. For more information, see Add an Amazon Redshift connection | DBA |
Edit the Amazon Redshift connection. | You can edit the Amazon Redshift connection to add tables and columns. For more information, see Edit an Amazon Redshift connection | DBA |
Remap the Amazon Redshift connection. | Modify the connection parameters by editing the source mapping .yaml file that was created when you added the Amazon Redshift connection. For example, you can remap the existing table or column to a different table or column in an existing database connection. ThoughtSpot recommends that you check the dependencies before and after you remap a table or column in a connection to ensure that they display as required. For more information, see Remap an Amazon Redshift connection | DBA |
Delete a table from the Amazon Redshift connection. | (Optional) If you attempt to remove a table in an Amazon Redshift connection, ThoughtSpot checks for dependencies and shows a list of dependent objects. You can choose the listed objects to delete them or remove the dependency. You can then remove the table. For more information, see Delete a table from an Amazon Redshift connection | DBA |
Delete a table with dependent objects from an Amazon Redshift connection. | (Optional) If you try to delete a table with dependent objects, the operation is blocked. A For more information, see Delete a table with dependent objects from an Amazon Redshift connection | DBA |
Delete an Amazon Redshift connection. | (Optional) Because a connection can be used in multiple data sources or visualizations, you must delete all of the sources and tasks that use that connection before you can delete the Amazon Redshift connection. For more information, see Delete an Amazon Redshift connection | DBA |
Check connection reference for Amazon Redshift. | Make sure that you provide the required information for your Amazon Redshift connection by using the Connection reference | DBA |