Best practices for Amazon OpenSearch Ingestion
This topic provides best practices for creating and managing Amazon OpenSearch Ingestion pipelines and includes general guidelines that apply to many use cases. Each workload is unique, with unique characteristics, so no generic recommendation is exactly right for every use case.
General best practices
The following general best practices apply to creating and managing pipelines.
-
To ensure high availability, configure VPC pipelines with two or three subnets. If you only deploy a pipeline in one subnet and the Availability Zone goes down, you won't be able to ingest data.
-
Within each pipeline, we recommend limiting the number of sub-pipelines to 5 or fewer.
-
If you're using the S3 source plugin, use evenly-sized S3 files for optimal performance.
-
If you're using the S3 source plugin, add 30 seconds of additional visibility timeout for every 0.25 GB of file size in the S3 bucket for optimal performance.
-
Include a dead-letter queue
(DLQ) in your pipeline configuration so that you can offload failed events and make them accessible for analysis. If your sinks reject data due to incorrect mappings or other issues, you can route the data to the DLQ in order to troubleshoot and fix the issue.
Recommended CloudWatch alarms
CloudWatch alarms perform an action when a CloudWatch metric exceeds a specified value for some
amount of time. For example, you might want AWS to email you if your cluster health
status is red
for longer than one minute. This section includes some
recommended alarms for Amazon OpenSearch Ingestion and how to respond to them.
For more information about configuring alarms, see Creating Amazon CloudWatch Alarms in the Amazon CloudWatch User Guide.
Alarm | Issue |
---|---|
|
The pipeline has reached the maximum capacity and might require a
maxUnits update. Increase the maximum capacity of your
pipeline |
|
The pipeline is unable to write to the OpenSearch sink. Check the pipeline permissions and confirm that the domain or collection is healthy. You can also check the dead letter queue (DLQ) for failed events, if it's configured. |
|
The pipeline is experiencing high latency sending data to the OpenSearch sink. This is likely due to the sink being undersized, or a poor sharding strategy, which is causing the sink to fall behind. Sustained high latency can impact pipeline performance and will likely lead to backpressure on the clients. |
|
Ingestion requests are not being authenticated. Confirm that all clients have Signature Version 4 authentication enabled correctly. |
|
Sustained high CPU usage can be problematic. Consider increasing the maximum capacity for the pipeline. |
|
Sustained high buffer usage can be problematic. Consider increasing the maximum capacity for the pipeline. |
Other alarms you might consider
Consider configuring the following alarms depending on which Amazon OpenSearch Ingestion features you regularly use.
Alarm | Issue |
---|---|
|
The attempt to trigger an export to Amazon S3 failed. |
|
The EndtoEndLatency is higher than desired for
reading from DynamoDB streams. This could be caused by an underscaled
OpenSearch cluster or a maximum pipeline OCU capacity that is too low
for the WCU throughput on the DynamoDB table.
EndtoEndLatency will be higher after an export but
should decrease over time as it catches up to the latest DynamoDB
streams. |
|
No records are being gathered from DynamoDB streams. This could be caused by to no activity on the table, or an issue accessing DynamoDB streams. |
|
A larger number of records are being sent to the DLQ than the OpenSearch sink. Review the OpenSearch sink plugin metrics to investigate and determine the root cause. |
|
All data is timing out while the Grok processor is trying to pattern match. This is likely impacting performance and slowing your pipeline down. Consider adjusting your patterns to reduce timeouts. |
|
The Grok processor is failing to match patterns to the data in the pipeline, resulting in errors. Review your data and Grok plugin configurations to ensure the pattern matching is expected. |
|
The Grok processor is unable to match patterns to the data in the pipeline. Review your data and Grok plugin configurations to ensure the pattern matching is expected. |
|
The Date processor is unable to match any patterns to the data in the pipeline. Review your data and Date plugin configurations to ensure the pattern is expected. |
|
This issue is either occurring because the S3 object doesn't
exist, or the pipeline has insufficient privileges. Reivew the
s3ObjectsNotFound.count and
s3ObjectsAccessDenied.count metrics to determine
the root cause. Confirm that the S3 object exists and/or update the
permissions. |
|
The S3 plugin failed to process an Amazon SQS message. If you have a DLQ enabled on your SQS queue, review the failed message. The queue might be receiving invalid data that the pipeline is attempting to process. |
|
The client is sending a bad request. Confirm that all clients are sending the proper payload. |
|
Requests from the HTTP source plugin contain too much data, which is exceeding the buffer capacity. Adjust the batch size for your clients. |
|
The HTTP source plugin is having trouble receiving events. |
|
Source timeouts are likely the result of the pipeline being
underprovisioned. Consider increasing the pipeline
maxUnits to handle additional workload. |
|
The client is sending a bad request. Confirm that all clients are sending the proper payload. |
|
Requests from the Otel Trace source plugin contain too much data, which is exceeding the buffer capacity. Adjust the batch size for your clients. |
|
The Otel Trace source plugin is having trouble receiving events. |
|
Source timeouts are likely the result of the pipeline being
underprovisioned. Consider increasing the pipeline
maxUnits to handle additional workload. |
|
Source timeouts are likely the result of the pipeline being
underprovisioned. Consider increasing the pipeline
maxUnits to handle additional workload. |