Assess query performance for migrating SQL Server databases to MongoDB Atlas on AWS
Created by Battulga Purevragchaa (AWS), Krishnakumar Sathyanarayana (PeerIslands US Inc), and Babu Srinivasan (MongoDB)
Environment: PoC or pilot | Source: Microsoft SQL Server | Target: MongoDB Atlas or MongoDB Enterprise Advanced |
R Type: Replatform | Workload: Microsoft | Technologies: Databases; Migration |
Summary
This pattern provides guidance for loading MongoDB with near real-world data and assessing MongoDB query performance that is as close to the production scenario as possible. The assessment provides input to help you plan your migration to MongoDB from a relational database. The pattern uses PeerIslands Test Data Generator and Performance Analyzer
This pattern is particularly useful for Microsoft SQL Server migration to MongoDB, because performing schema transformations and loading data from current SQL Server instances to MongoDB can be very complex. Instead, you can load near real-world data into MongoDB, understand MongoDB performance, and fine-tune the schema design before you start the actual migration.
Prerequisites and limitations
Prerequisites
An active AWS account
Familiarity with MongoDB Atlas
Target MongoDB schema
Typical query patterns
Limitations
Data load times and performance will be limited by the MongoDB cluster instance size. We recommend that you choose instances that are recommended for production use to understand real-world performance.
PeerIslands Test Data Generator and Performance Analyzer currently supports only online data loads and queries. Offline batch processing (for example, loading data into MongoDB by using Spark connectors) isn’t yet supported.
PeerIslands Test Data Generator and Performance Analyzer supports field relations within a collection. It doesn’t support relationships across collections.
Product editions
This pattern supports both MongoDB Atlas
and MongoDB Enterprise Advanced .
Architecture
Target technology stack
MongoDB Atlas or MongoDB Enterprise Advanced
Architecture
PeerIslands Test Data Generator and Performance Analyzer is built by using Java and Angular, and stores its generated data on Amazon Elastic Block Store (Amazon EBS). The tool consists of two workflows: test data generation and performance testing.
In test data generation, you create a template, which is the JSON representation of the data model that has to be generated. After you create the template, you can generate the data in a target collection, as defined by the load generation configuration.
In performance testing, you create a profile. A profile is a multi-stage testing scenario where you can configure create, read, update, and delete (CRUD) operations, aggregation pipelines, the weightage for each operation, and the duration of each stage. After you create the profile, you can run performance testing on the target database based on the configuration.
PeerIslands Test Data Generator and Performance Analyzer stores its data on Amazon EBS, so you can connect Amazon EBS to MongoDB by using any MongoDB-supported connection mechanism, including peering, allow lists, and private endpoints. By default, the tool doesn’t include operational components; however, it can be configured with Amazon Managed Service for Prometheus, Amazon Managed Grafana, Amazon CloudWatch, and AWS Secrets Manager if required.
Tools
PeerIslands Test Data Generator and Performance Analyzer
includes two components. The Test Data Generator component helps you generate highly customer-specific, real-world data based on your MongoDB schema. The tool is fully UI-driven with a rich data library and can be used to quickly generate billions of records on MongoDB. The tool also provides capabilities to implement relationships between fields in the MongoDB schema. The Performance Analyzer component helps you generate highly customer-specific queries and aggregations, and perform realistic performance testing on MongoDB. You can use the Performance Analyzer to test MongoDB performance with rich load profiles and parameterized queries for your specific use case.
Best practices
See the following resources:
MongoDB Schema Design Best Practices
(MongoDB Developer website) Best Practices of Deploying MongoDB Atlas on AWS
(MongoDB website) Connecting Applications Securely to a MongoDB Atlas Data Plane with AWS PrivateLink
(AWS blog post) Best Practices Guide for MongoDB Performance
(MongoDB website)
Epics
Task | Description | Skills required |
---|---|---|
Understand the database footprint of the current SQL Server source. | Understand your current SQL Server footprint. This can be achieved by running queries against the | DBA |
Understand the source schema. | Determine the table schema and the business representation of the data (for example, zip codes, names, and currency). Use your existing entity relationship (ER) diagram or generate the ER diagram from the existing database. For more information, see the blog post SQL2Mongo: Data Migration Journey | DBA |
Understand query patterns. | Document the top 10 SQL queries you use. You can use the performance_schema.events_statements_summary_by_digest tables that are available in the database to understand the top queries. For more information, see the blog post SQL2Mongo: Data Migration Journey | DBA |
Understand SLA commitments. | Document the target service-level agreements (SLAs) for database operations. Typical measures include query latency and queries per second. The measures and their targets are typically available in non-functional requirements (NFR) documents. | DBA |
Task | Description | Skills required |
---|---|---|
Define the target schema. | Define various options for the target MongoDB schema. For more information, see Schemas | MongoDB engineer |
Define target query patterns. | Define MongoDB queries and aggregation pipelines. These queries are the equivalent of the top queries you captured for your SQL Server workload. To understand how to construct MongoDB aggregation pipelines, see the MongoDB documentation | MongoDB engineer |
Define the MongoDB instance type. | Determine the size of the instance that you plan to use for testing. For guidance, see the MongoDB documentation | MongoDB engineer |
Task | Description | Skills required |
---|---|---|
Set up the MongoDB Atlas cluster. | To set up a MongoDB cluster on AWS, follow the instructions in the MongoDB documentation | MongoDB engineer |
Create users in the target database. | Configure the MongoDB Atlas cluster for access and network security by following the instructions in the MongoDB documentation | MongoDB engineer |
Create appropriate roles in AWS and configure role-based access control for Atlas. | If required, set up additional users by following the instructions in the MongoDB documentation | MongoDB engineer |
Set up Compass for MongoDB Atlas access. | Set up the MongoDB Compass GUI utility | MongoDB engineer |
Task | Description | Skills required |
---|---|---|
Install Test Data Generator. | Install PeerIsland Test Data Generator | MongoDB engineer |
Configure Test Data Generator to generate the appropriate data. | Create a template by using the data library to generate specific data for each field in the MongoDB schema. For more information, see the MongoDB Data Generator & Perf. Analyzer | MongoDB engineer |
Horizontally scale Test Data Generator to generate the required load. | Use the template you created to start the load generation against the target collection by configuring the required parallelism. Determine the time frames and scale to generate the necessary data. | MongoDB engineer |
Validate the load in MongoDB Atlas. | Check the data loaded into MongoDB Atlas. | MongoDB engineer |
Generate required indexes on MongoDB. | Define indexes as required, based on query patterns. For best practices, see the MongoDB documentation | MongoDB engineer |
Task | Description | Skills required |
---|---|---|
Set up load profiles in Performance Analyzer. | Create a performance testing profile in Performance Analyzer by configuring specific queries and their corresponding weightage, duration of the test run, and stages. For more information, see the MongoDB Data Generator & Perf. Analyzer | MongoDB engineer |
Run performance testing. | Use the performance testing profile you created to start the test against the target collection by configuring the required parallelism. Horizontally scale the performance test tool to run queries against MongoDB Atlas. | MongoDB engineer |
Record test results. | Record P95, P99 latency for the queries. | MongoDB engineer |
Tune your schema and query patterns. | Modify indexes and query patterns to address any performance issues. | MongoDB engineer |
Task | Description | Skills required |
---|---|---|
Shut down temporary AWS resources. | Delete all temporary resources that you used for Test Data Generator and Performance Analyzer. | AWS administrator |
Update performance test results. | Understand MongoDB query performance and compare it against your SLAs. If necessary, fine-tune the MongoDB schema and rerun the process. | MongoDB engineer |
Conclude the project. | Close out the project and provide feedback. | MongoDB engineer |
Related resources
GitHub repository: S3toAtlas
Schema: MongoDB Schema design
Aggregation pipelines : MongoDB aggregation pipelines
MongoDB Atlas sizing : Sizing tier selection
Video: MongoDB Data Generator
& Perf. Analyzer References: MongoDB documentation
Tutorials: MongoDB developer guide,
MongoDB Jumpstart AWS Marketplace: MongoDB Atlas on AWS Marketplace
AWS Partner Solutions: MongoDB Atlas on AWS Reference Deployment
Additional resources: