What is Timestream for InfluxDB? - Amazon Timestream

What is Timestream for InfluxDB?

Amazon Timestream for InfluxDB is a managed time-series database engine that makes it easy for application developers and DevOps teams to run InfluxDB databases on AWS for real-time time-series applications using open-source APIs. With Amazon Timestream for InfluxDB, it is easy to set up, operate, and scale time-series workloads that can answer queries with single-digit millisecond query response time.

Amazon Timestream for InfluxDB gives you access to the capabilities of the familiar open source version of InfluxDB on its 2.x branch. This means that the code, applications, and tools you already use today with your existing InfluxDB open-source databases should work seamlessly with Amazon Timestream for InfluxDB. Amazon Timestream for InfluxDB can automatically back up your database and keep your database software up to date with the latest version. In addition, Amazon Timestream for InfluxDB makes it easy to use replication to enhance database availability, and improve data durability. As with all AWS services, there are no upfront investments required, and you pay only for the resources you use.

DB instances

A DB instance is an isolated database environment running in the cloud. It is the basic building block of Amazon Timestream for InfluxDB. A DB instance can contain multiple user-created databases (or organizations and buckets for the case of InfluxDb 2.x databases), and can be accessed using the same client tools and applications you might use to access a standalone self-managed InfluxDB instance. DB instances are simple to create and modify with the AWS command line tools, Amazon Timestream InfluxDB API operations, or the AWS Management Console.

Note

Amazon Timestream for InfluxDB supports access to databases using the Influx API operations and Influx UI. Amazon Timestream for InfluxDB does not allow direct host access.

You can have up to 40 Amazon Timestream for InfluxDB instances.

Each DB instance has a DB instance name. This customer-supplied name uniquely identifies the DB instance when interacting with the Amazon Timestream for InfluxDB API and AWS CLI commands. The DB instance name must be unique for that customer in an AWS Region.

The DB instance name forms part of the DNS hostname allocated to your instance by Timestream for InfluxDB. For example, if you specify influxdb1 as the DB instance name, Timestream will automatically allocate a DNS endpoint for your instance. An example endpoint is influxdb1-3ksj4dla5nfjhi.us-east-1.timestream-influxdb.amazonaws.com, where influxdb1 is your instance name.

In the example endpoint influxdb1-3ksj4dla5nfjhi.us-east-1.timestream-influxdb.amazonaws.com, the string 3ksj4dla5nfjhi is a unique account identifier generated by AWS. The identifier 3ksj4dla5nfjhi in the example doesn't change for the specified account in a certain region. Therefore, all your DB instances created by this account share the same fixed identifier. Consider the following features of the fixed identifier:

  • Currently Timestream for InfluxDB does not support DB instance renaming.

  • If you delete and re-create a DB instance with the same DB instance identifier, the endpoint is the same.

  • If you use the same account to create a DB instance in a different Region, the internally generated identifier is different because the Region is different, as in influxdb2.4a3j5du5ks7md2.us-west-1.timestream-influxdb.amazonaws.com.

Each DB instance supports only one Timestream for InfluxDB database engine.

When creating a DB instance, InfluxDB requires that an organization name be specified. A DB instance can host multiple organizations and multiple buckets associated to each organization.

Amazon Timestream for InfluxDB allows you to create a master user account and password for your DB instance as part of the creation process. This master user has permissions to create organizations, buckets, and to perform read, write, delete and upsert operations on your data. You will also be able to access the InfluxUI and retrieve you operator token on. your first log in. From there you will be able to manage all your access tokens as well. You must set the master user password when you create a DB instance, but you can change it at any time using the Influx API, Influx CLI, or the InfluxUI.

DB instance classes

The DB instance class determines the computation and memory capacity of an Amazon Timestream fi UbfkyxDB DB instance. The DB instance class that you need depends on your processing power and memory requirements.

A DB instance class consists of both the DB instance class type and the size. For example, db.influx is a memory-optimized DB instance class type suitable for the high performance memory requirements related to running InfluxDb workloads. Within the db.influx instance class type, db.influx.2xlarge is a DB instance class. The size of this class is 2xlarge.

For more information about instance class pricing, see Amazon Timestream for InfluxDB pricing.

DB instance class types

Amazon Timestream for InfluxDB supports DB instance classes for the following use case optimized for InfluxDB use cases.

  • db.influx—These instance classes are ideal for running memory-intensive workloads in open-source InfluxDB databases

Hardware specifications for DB instance classes

The following terminology describes the hardware specifications for DB instance classes:

  • vCPU

    The number of virtual central processing units (CPUs). A virtual CPU is a unit of capacity that you can use to compare DB instance classes.

  • Memory (GiB)

    The RAM, in gibibytes, allocated to the DB instance. There is often a consistent ratio between memory and vCPU. As an example, take the db.influx instance class, which has a memory to vCPU ratio similar to the EC2 r7g instance class.

  • Influx-Optimized

    The DB instance uses an optimized configuration stack and provides additional, dedicated capacity for I/O. This optimization provides the best performance by minimizing contention between I/O and other traffic from your instance.

  • Network bandwidth

    The network speed relative to other DB instance classes. In the following table, you can find hardware details about the Amazon Timestream for InfluxDB instance classes.

Instances Class vCPU Memory (GiB) Storage Type Network bandwidth(Gbps)
db.influx.medium 1 8 Influx IOPS Included 10
db.influx.large 2 16 Influx IOPS Included 10
db.influx.xlarge 4 32 Influx IOPS Included 10
db.influx.2xlarge 8 64 Influx IOPS Included 10
db.influx.4xlarge 16 128 Influx IOPS Included 10
db.influx.8xlarge 32 256 Influx IOPS Included 12
db.influx.12xlarge 48 384 Influx IOPS Included 20
db.influx.16xlarge 64 512 Influx IOPS Included 25

InfluxDB instance storage

DB instances for Amazon Timestream for InfluxDB use Influx IOPS Included volumes for databases and log storage.

In some cases, your database workload might not be able to achieve 100 percent of the IOPS that you have provisioned. For more information, see Factors that affect storage performance. For more information about Timestream for InfluxDB storage pricing, see Amazon Timestream pricing.

Amazon Timestream for InfluxDB storage types

Amazon Timestream for InfluxDB provides support for one storage type, Influx IOPS Included. You can create Timestream for InfluxDB instances with up to 16 tebibytes (TiB) of storage.

Here is a brief description of the available storage type:

  • Influx IO Included storage: Storage performance is the combination of I/O operations per second (IOPS) and how fast the storage volume can perform reads and writes (storage throughput). On Influx IOPS Included storage volumes, Amazon Timestream for InfluxDB provides 3 storage tiers that come pre configured with optimal IOPS and throughput required for different types of workloads.

InfluxDB instance sizing

The optimal configuration of a Timestream for InfluxDB instance is dependent on a lot of factors that include ingestion rate, batches sizes, time-series cardinality, concurrent queries and query types. In an effort to provide sizing recommendations we are focusing on an exemplary workload with the following characteristics:

  • Data is collected and written by a fleet of Telegraf agents that gather System, CPU, Memory, Disk, IO, and etc. from a data center.

    Each write request contains 5000 lines.

  • The type of queries executed on the system are categorized as “moderate complexity” queries. This category of queries presents the following characteristics:

    • Have multiple functions and one or two regular expressions

    • May also have group by clauses or sample a time range of multiple weeks.

    • Typically takes a few hundred milliseconds to a couple of thousand milliseconds to execute.

    • CPU favors query performance mainly.

Instance class Storage Type Writes (lines per second) Reads (Queries per second)
db.influx.large Influx IO Included 3K ~50,000 <10
db.influx.2xlarge Influx IO Included 3K ~150,000 <25
db.influx.4xlarge Influx IO Included 3K ~200,000 ~25
db.influx.4xlarge Influx IO Included 12K ~250,000 ~35
db.influx.8xlarge Influx IO Included 12K ~500,000 ~50
db.influx.12xlarge Influx IO Included 12K <750,000 <100

AWS Regions and Availability Zones

Amazon cloud computing resources are hosted in multiple locations world-wide. These locations are composed of AWS Regions and Availability zones. Each AWS Region is a separate geographic area. Each AWS Region has multiple, isolated locations known as availability zones.

Note

For information about finding the availability zones for an AWS Region, see Regions and Zones in the Amazon EC2 User Guide.

Amazon Timestream for InfluxDB enables you to place resources, such as DB instances, and data in multiple locations.

Amazon operates state-of-the-art, highly-available data centers. Although rare, failures can occur that affect the availability of DB instances that are in the same location. If you host all your DB instances in one location that is affected by such a failure, none of your DB instances will be available.

It is important to remember that each AWS Region is completely independent. Any Amazon Timestream for InfluxDB activity you initiate (for example, creating database instances or listing available database instances) runs only in your current default AWS Region. The default AWS Region can be changed in the console, or by setting the AWS_DEFAULT_REGION environment variable. Or it can be overridden by using the --region parameter with the AWS Command Line Interface (AWS CLI). For more information, see Configuring the AWS Command Line Interface, specifically the sections about environment variables and command line options.

To create or work with an Amazon Timestream for InfluxDB DB instance in a specific AWS Region, use the corresponding regional service endpoint.

AWS Region availability

The following table shows the AWS Regions where Amazon Timestream for InfluxDB is currently available and the endpoint for each Region.

AWS Region name Region Endpoint Protocol
US East (N. Virginia) us-east-1 timestream-influxdb.us-east-1.amazonaws.com HTTPS
US East (Ohio) us-east-2 timestream-influxdb.us-east-2.amazonaws.com HTTPS
US West (Oregon) us-west-2 timestream-influxdb.us-west-2.amazonaws.com HTTPS
Asia Pacific (Mumbai) ap-south-1 timestream-influxdb.ap-south-1.amazonaws.com HTTPS
Asia Pacific (Singapore) ap-southeast-1 timestream-influxdb.ap-southeast-1.amazonaws.com HTTPS
Asia Pacific (Sydney) ap-southeast-2 timestream-influxdb.ap-southeast-2.amazonaws.com HTTPS
Asia Pacific (Tokyo) ap-northeast-1 timestream-influxdb.ap-northeast-1.amazonaws.com HTTPS
Europe (Frankfurt) eu-central-1 timestream-influxdb.eu-central-1.amazonaws.com HTTPS
Europe (Ireland) eu-west-1 timestream-influxdb.eu-west-1.amazonaws.com HTTPS
Europe (Stockholm) eu-north-1 timestream-influxdb.eu-north-1.amazonaws.com HTTPS

AWS Regions design

Each AWS Region is designed to be isolated from the other AWS Regions. This design achieves the greatest possible fault tolerance and stability.

When you view your resources, you see only the resources that are tied to the AWS Region that you specified. This is because AWS Regions are isolated from each other, and we don't automatically replicate resources across AWS Regions.

AWS Availability Zones

When you create a DB instance, Amazon Timestream for InfluxDB choose one for you randomly based on your subnet configuration. An Availability Zone is represented by an AWS Region code followed by a letter identifier (for example, us-east-1a).

Use the describe-availability-zones Amazon EC2 command as follows to describe the Availability Zones within the specified Region that are enabled for your account.

aws ec2 describe-availability-zones --region region-name

For example, to describe the Availability Zones within the US East (N. Virginia) Region (us-east-1) that are enabled for your account, run the following command:

aws ec2 describe-availability-zones --region us-east-1

You can't choose the Availability Zones for the primary and secondary DB instances in a Multi-AZ DB deployment. Amazon Timesteram for InfluxDB chooses them for you randomly. For more information about Multi-AZ deployments, see Configuring and managing a multi-AZ deployment..

DB Instance billing for Amazon Timestream for InfluxDB

Amazon Timestream for InfluxDB instances are billed based on the following components:

  • DB instance hours (per hour) — Based on the DB instance class of the DB instance, for example, db.influx.large. Pricing is listed on a per-hour basis, but bills are calculated down to the second and show times in decimal form. Amazon Timestream for InfluxDB usage is billed in 1-second increments, with a minimum of 10 minutes. For more information, see DB instance classesDB instance classes.

  • Storage (per GiB per month) — Storage capacity that you have provisioned to your DB instance. For more information, see InfluxDB instance storage.

  • Data transfer (per GB) — Data transfer in and out of your DB instance from or to the internet and other AWS Regions.

For Amazon Timestream for InfluxDB pricing information, see the Amazon Timestream for InfluxDB pricing page.

Setting up Amazon Timestream for InfluxDB

Before you use Amazon Timestream for InfluxDB for the first time, complete the following tasks:

If you already have an AWS account, know your Amazon Timestream for InfluxDB requirements, and prefer to use the defaults for IAM and VPC Getting started with Timestream for InfluxDBGetting started with Amazon Timestream for InfluxDB.

Sign up for an AWS account

If you do not have an AWS account, complete the following steps to create one.

To sign up for an AWS account

  • Go to the AWS Sign in page.

  • Choose Create a new accountand the follow the instructions.

    Note

    Part of the sign-up procedure involves receiving a phone call and entering a verification code on the phone keypad.

When you sign up for an AWS account, an AWS account root user is created. The root user has access to all AWS services and resources in the account. As a security best practice, assign administrative access to an administrative user, and use only the root user to perform tasks that require root user access.

AWS sends you a confirmation email after the sign-up process is complete. At any time, you can view your current account activity and manage your account by going to https://aws.amazon.com/ and choosing My Account.

User Management

Create an administrative user

Create an administrative user

After you sign up for an AWS account, create an administrative user so that you don't use the root user for everyday tasks.

Secure your AWS account root user

Sign in to the AWS Management Console as the account owner by choosing Root user and entering your AWS account email address. On the next page, enter your password. For help signing in by using root user, see Signing in as the root user in the AWS Sign-In User Guide

Turn on multi-factor authentication (MFA) for your root user. For instructions, see Enable a virtual MFA device for your AWS account root user (console) in the IAM User Guide.

Grant programmatic access

Users need programmatic access if they want to interact with AWS outside of the AWS Management Console. The way to grant programmatic access depends on the type of user that's accessing AWS.

To grant users programmatic access, choose one of the following options:

Which user needs programmatic access? To By
Workforce identity(Users managed in IAM Identity Center) Use temporary credentials to sign programmatic requests to the AWS CLI, AWS SDKs, or AWS APIs. Following the instructions for the interface that you want to use.* For the AWS CLI, see

Configuring the AWS CLI to use AWS IAM Identity Center

in the

AWS Command Line Interface User Guide

.* For AWS SDKs, tools, and AWS APIs, see

IAM Identity Center authentication

in the

AWS SDKs and Tools Reference Guide.

IAM Use temporary credentials to sign programmatic requests to the AWS CLI, SDKs, and APIs. Following the instructions in Using temporary credentials with AWS resources in the IAM User Guide.
IAM (Not recommended) Use long-term credentials to sign programmatic requests to the AWS CLI, SDKs, and APIs. Following the instructions for the interface that you want to use.For the AWS CLI, see

Authenticating using IAM user credentials

in the

AWS Command Line Interface User Guide

.For AWS SDKs and tools, see

Authenticate using long-term credentials

in the

AWS SDKs and Tools Reference Guide

.For AWS APIs, see

Managing access keys for IAM users

in the

IAM User Guide.

Determine requirements

The basic building block of Amazon Timestream for Influx is the DB instance. In a DB instance, you create your buckets. A DB instance provides a network address called an endpoint. Your applications use this endpoint to connect to your DB instance. You will also access your InfluxUI using this same endpoint from your browser. When you create a DB instance, you specify details like storage, memory, database engine and version, network configuration, and security. You control network access to a DB instance through a security group.

Before you create a DB instance and a security group, you must know your DB instance and network needs. Here are some important things to consider:

  • Resource requirements — What are the memory and processor requirements for your application or service? You use these settings to help you determine what DB instance class to use. For specifications about DB instance classes, see DB instance classes.

  • VPC and security group — Your DB instance will most likely be in a virtual private cloud (VPC). To connect to your DB instance, you need to set up security group rules. These rules are set up differently depending on what kind of VPC you use and how you use it. For example, you can use: a default VPC or a user-defined VPC.

    The following list describes the rules for each VPC option:

    • Default VPC — If your AWS account has a default VPC in the current AWS Region, that VPC is configured to support DB instances. If you specify the default VPC when you create the DB instance, make sure to create a VPC security group that authorizes connections from the application or service to the Amazon Timestream for InfluxDB DB instance. Use the Security Group option on the VPC console or the AWS CLI to create VPC security groups. For more information, see Step 3: Create a VPC security group.

  • User-defined VPC — If you want to specify a user-defined VPC when you create a DB instance, be aware of the following:

    • Make sure to create a VPC security group that authorizes connections from the application or service to the Amazon Timestream for InfluxDB DB instance. Use the Security Group option on the VPC console or the AWS CLI to create VPC security groups. For information, see Step 3: Create a VPC security group.

    • The VPC must meet certain requirements in order to host DB instances, such as having at least two subnets, each in a separate Availability zone. For information, see Amazon VPC VPCs and Amazon Timestream for InfluxDB.

  • High availability — Do you need failover support? On Amazon Timestream for InfluxDB, a Multi-AZ deployment creates a primary DB instance and a secondary standby DB instance in another Availability zone for failover support. We recommend Multi-AZ deployments for production workloads to maintain high availability. For development and test purposes, you can use a deployment that isn't Multi-AZ. For more information, see Multi-AZ DB instance deployments.

  • IAM policies — Does your AWS account have policies that grant the permissions needed to perform Amazon Timestream for InfluxDB operations? If you are connecting to AWS using IAM credentials, your IAM account must have IAM policies that grant the permissions required to perform Amazon Timestream for InfluxDB control plane operations. For more information, see Identity and Access Management for Amazon Timestream for InfluxDB.

  • Open ports — What TCP/IP port does your database listen on? The firewalls at some companies might block connections to the default port for your database engine. The default for Timestream for InfluxDB is 8086.

  • AWS Region — What AWS Region do you want your database in? Having your database in close proximity to your application or web service can reduce network latency. For more information, see AWS Regions and Availability Zones.

  • DB disk subsystem — What are your storage requirements? Amazon Timestream for InfluxDB provides provides three configurations for it Influx IOPS Included storage type::

    • Influx Io Included 3k IOPS (SSD)

    • Influx Io Included 12k IOPS (SSD)

    • Influx Io Included 25k IOPS (SSD)

    For more information on Amazon Timestream for InfluxDB storage, see Amazon Timestream for InfluxDB DB instance storage. When you have the information you need to create the security group and the DB instance, continue to the next step.

Provide access to your DB instance in your VPC by creating a security group

VPC security groups provide access to DB instances in a VPC. They act as a firewall for the associated DB instance, controlling both inbound and outbound traffic at the DB instance level. DB instances are created by default with a firewall and a default security group that protect the DB instance.

Before you can connect to your DB instance, you must add rules to a security group that enable you to connect. Use your network and configuration information to create rules to allow access to your DB instance.

For example, suppose that you have an application that accesses a database on your DB instance in a VPC. In this case, you must add a custom TCP rule that specifies the port range and IP addresses that your application uses to access the database. If you have an application on an Amazon EC2 instance, you can use the security group that you set up for the Amazon EC2 instance.

Creating a security group for VPC access

To create a VPC security group, sign in to the AWS Management Console and choose VPC.

Note

Make sure you are in the VPC console, not the Amazon Timesteam for InfluxDB console.

  • In the upper-right corner of the AWS Management Console, choose the AWS Region where you want to create your VPC security group and DB instance. In the list of Amazon VPC resources for that AWS Region, you should see at least one VPC and several subnets. If you don't, you don't have a default VPC in that AWS Region..

  • In the navigation pane, choose Security Groups.

  • Choose Create security group.

  • Inn the Basic details section of the security group page, enter the Security group name and Description. For VPC, choose the VPC thatyou want to create your DB instance in.

  • In Inbound rules, choose Add rule.

    • For Type, choose Custom TCP.

    • For Source, choose a Security group name or enter the IP address range (CIDR value) from where you access the DB instance. If you choose My IP, this allows access to the DB instance from the IP address detected in your browser.

    For Source, choose a security group name or type the IP address range (CIDR value) from where you access the DB instance. If you choose My IP, this allows access to the DB instance from the IP address detected in your browser.

  • (Optional) In Outbound rules, add rules for outbound traffic. By default, all outbound traffic is allowed.

  • Choose Create security group.

You can use this VPC security group as the security group for your DB instance when you create it.

Note

If you use a default VPC, a default subnet group spanning all of the VPC's subnets is created for you. When you create a DB instance, you can choose the default eiifccntf VPC and choose default for DB Subnet Group.

After you have completed the setup requirements, you can create a DB instance using your requirements and security group. To do so, follow the instructions in Creating a DB instance.

Security best practices for Timestream for InfluxDB

Optimize writes to InfluxDB

As any other time-series database, InfluxDB is built to be able to ingest and process data in real-time. To keep the system performing at its best we recommend following optimizations when writing data to InfluxDB:

  • Batch Writes: When writing data to InfluxDB, write data in batches to minimize the network overhead related to every write request. The optimal batch size is 5000 lines of line protocol per write request. To write multiple lines in one request, each line of line protocol must be delimited by a new line (\n).

  • Sort tags by key: Before writing data points to InfluxDB, sort tags by key in lexicographic order.

    measurement,tagC=therefore,tagE=am,tagA=i,tagD=i,tagB=think fieldKey=fieldValue 1562020262 # Optimized line protocol example with tags sorted by key measurement,tagA=i,tagB=think,tagC=therefore,tagD=i,tagE=am fieldKey=fieldValue 1562020262
  • Use the coarsest time precision possible: – InfluxDB writes data in nanosecond precision, however if your data isn’t collected in nanoseconds, there is no need to write at that precision. For better performance, use the coarsest precision possible for timestamps. You can specify the write precision when:

    • When using the SDK you can specify the WritePrecision when setting the time attribute of your point. For more information on InfluxDB client libraries, see the InfluxDB Documentation.

    • When using Telegraf, you configure the time precision in the Telegraf agent configuration. Precision is specified as an interval with an integer + unit (e.g. 0s,10ms,2us,4s). Valid time units are “ns”, “us”, “ms”, and “s”.

      [agent] interval ="10s" metric_batch_size="5000" precision = "0s"
  • Use gzip compression: – Use gzip compression to speed up writes to InfluxDB and reduce network bandwidth. Benchmarks have shown up to a 5x speed improvement when data is compressed.

    • When using Telegraf, in the Influxdb_v2 output plugin configuration in your telegraf.conf, set the content_encoding option to gzip:

      [[outputs.influxdb_v2]] urls = ["http://localhost:8086"] # ... content_encoding = "gzip"
    • When using client libraries, each InfluxDB client library provides options for compressing write requests or enforces compression by default. The method for enabling compression is different for each library. For specific instructions, see the InfluxDB Documentation

    • When using the InfluxDB API /api/v2/write endpoint to write data, compress the data with gzip and set the Content-Encoding header to gzip.

Design for performance

Design your schema for simpler and more performance queries. The following guidelines will ensure that your schema will be easy to query and maximize query performance:

  • Design to query: Choose measurements, tag keys, and field keys that are easy to query. To achieve this goal, follow these principles:

    • Use measurements that have a simple name and accurately describe the schema.

    • Avoid using the same name for a tag key and field key within the same schema.

    • Avoid using reserved Flux keywords and special characters in tag and field keys.

    • Tags store metadata that describe the fields and are common across many data points.

    • Fields store unique or highly variable data, usually numeric data points.

    • Measurements and keys should not contain data, but used to either aggregate or describe data. Data will be stored in tag and field values.

  • Keep your time-series cardinality under control High series cardinality is one of the main causes of decreased write and read performance in InfluxDB. In the context of InfluxDB high cardinality refers to the presence of a very large number of unique tag values. Tags values are indexed in InfluxDB which means that a very high number of unique values will generate a larger index which can slow down data ingestion and query performance.

    To better understand and resolve potential high cardinality related issues you can follow these steps:

    • Understand the causes of high cardinality

    • Measure the cardinality of your buckets

    • Take action to resolve high cardinality

  • Causes of high series cardinality InfluxDB indexes the data based on measurements and tags to speed up data reads. Each set of indexed data elements forms a series key. Tags containing highly variable information like unique IDs, hashes, and random strings lead to a large number of series, also known as high series cardinality. High series cardinality is the primary driver of high memory usage in InfluxDB.

  • Measuring series cardinality If you experience performance slowdowns or see an ever increasing memory usage in your Timestream for InfluxDB instance, we recommend measureing the series cardinality of your buckets.

    InfluxDB provides functions that allows you to measure series cardinality both in Flux and InfluxQL.

    • In Flux use the function influxdb.cardinality()

    • In FluxQL use the SHOW SERIES CARDINALITY command

    In both cases the engine will return the number of unique series keys in your data. Keep in mind that is it not recommended to have more than 10 million series keys on any of your Timestream for InfluxDB instances.

  • Causes of high series cardinality If you encounter that any of your buckets have high cardinality there are a few correcting steps you can take to fix it:

    • Review your tags: Ensure that your workloads don’t generate cases were tags have unique values for most entries. This could happen in cases where the number of unique tag values always grows over time, or if log type messages are being written to the database where every message would have an unique combination of timestamp, tags etc. You can use the following Flux code to help you figure out which Tags are contributing most to your high cardinality issues:

      // Count unique values for each tag in a bucketimport "influxdata/influxdb/schema" cardinalityByTag = (bucket) => schema.tagKeys(bucket: bucket) |> map( fn: (r) => ({ tag: r._value, _value: if contains(set: ["_stop", "_start"], value: r._value) then 0 else (schema.tagValues(bucket: bucket, tag: r._value) |> count() |> findRecord(fn: (key) => true, idx: 0))._value, }), ) |> group(columns: ["tag"]) |> sum() cardinalityByTag(bucket: "example-bucket")

      If you’re experiencing very high cardinality, the query above may time out. If you experience a timeout, run the queries below – one at a time.

      Generate a list of tags:

      // Generate a list of tagsimport "influxdata/influxdb/schema" schema.tagKeys(bucket: "example-bucket")

      Count unique tag values for each tag:

      // Run the following for each tag to count the number of unique tag valuesimport "influxdata/influxdb/schema" tag = "example-tag-key" schema.tagValues(bucket: "my-bucket", tag: tag) |> count()

      We recommend that you run these at different points in time to identify which tag is growing faster.

    • Improve your schema: Follow the modeling recommendations discussed in our Security best practices for Timestream for InfluxDB.

    • Remove or aggregate older data to reduce cardinality: Consider whether or not your use cases needs all the data that is causing your high cardinality issues. If this data is not longer needed or accessed frequently you can aggregate it, delete it or export it to another engine such as Timestream for Live Analytics for long term storage and analysis.