Troubleshooting
This section provides troubleshooting instructions for deploying and using the solution.
If these instructions don’t address your issue, Contact AWS Support provides instructions for opening an AWS Support case for this solution.
Problem: Deployment failure due to "Invalid Logging Configuration: The CloudWatch Logs Resource Policy size was exceeded"
If you encounter a deployment failure due to creating CloudWatch log group with an error message like the one below,
Cannot enable logging. Policy document length breaking Cloudwatch Logs Constraints, either
< 1 or > 5120 (Service: AmazonApiGatewayV2; Status Code: 400; Error Code:
BadRequestException; Request ID: xxx-yyy-zzz; Proxy: null)
Resolution:
CloudWatch Logs resource policies are limited to 5120 characters. The remediation is merging or removing useless policies, then updating the resource policies of CloudWatch logs to reduce the number of policies.
Below is a sample command to reset resource policy of CloudWatch logs:
aws logs put-resource-policy --policy-name AWSLogDeliveryWrite20150319 \ --policy-document ' { "Version": "2012-10-17", "Statement": [ { "Sid": "AWSLogDeliveryWrite2", "Effect": "Allow", "Principal": { "Service": "delivery.logs.amazonaws.com" }, "Action": [ "logs:CreateLogStream", "logs:PutLogEvents" ], "Resource": [ ], "Condition": { "StringEquals": { "aws:SourceAccount": "<your AWS account id>" }, "ArnLike": { "aws:SourceArn": "arn:aws:logs:<AWS region>:<your AWS account id>:*" } } } ] } '
Problem: Cannot delete the CloudFormation stacks created for the Clickstream pipeline
If you encounter a failure with an error message like the one below when deleting the CloudFormation stacks created for the Clickstream pipeline,
Role arn:aws:iam::<your AWS account id>:role/<stack
nam>-ClickStreamApiStackActionSta-<random suffix> is
invalid or cannot be assumed
Resolution:
It results from deleting the web console stack for this solution before the CloudFormation stacks are made for the Clickstream pipeline.
Please create a new IAM role with the identical name mentioned in the above error message and trust the CloudFormation service with sufficient permission to delete those stacks.
Note
You can delete the IAM role after successfully removing those CloudFormation stacks.
Problem: Reporting stack(Clickstream-Reporting-xxx) deployment fail
Reporting stack deployment failed with message like
Connection attempt timed out
And it happened when creating DataSource(AWS::QuickSight::DataSource).
Resolution:
Login solution web console and click "Retry" button in pipeline detail information page.
Problem: Clickstream-DataModelingRedshift-xxxxx stack upgrade failed in UPDATE_ROLLBACK_FAILED
When upgrading from 1.0.x to the latest version, if the CloudFormation stack
Clickstream-DataModelingRedshift-xxxxx
is in the
UPDATE_ROLLBACK_FAILED
state, you need to manually fix it by following the
steps below.
Resolution:
-
In the Cloudformation Resource tab, find the Lambda Function name with logical ID
CreateApplicationSchemasCreateSchemaForApplicationsFn
. -
Update the
fn_name
andaws_region
in below script and execute it in a shell terminal. Make sure you have AWS CLI installed and configured.aws_region=<us-east-1> # replace this with your AWS region, and remove '<', '>' fn_name=<fn_name_to_replace> # replace this with actual function name in step 1 and remove '<', '>' cat <<END | > ./index.mjs export const handler = async (event) => { console.log('No ops!') const response = { Status: 'SUCCESS', Data: { DatabaseName: '', RedshiftBIUsername: '' } }; return response; }; END rm ./noops-lambda.zip > /dev/null 2>&1 zip ./noops-lambda.zip ./index.mjs aws lambda update-function-code --function-name ${fn_name} \ --zip-file fileb://./noops-lambda.zip \ --region ${aws_region} | tee /dev/null
-
In the Cloudformation web console, choose Stack actions and Continue update rollback.
-
Wait until the stack status is
UPDATE_ROLLBACK_COMPLETE
. -
Retry the upgrade from the solution web console.
Problem: Can not sink data to MSK cluster, got "InvalidReplicationFactor (Broker: Invalid replication factor)" log in Ingestion Server
If you notice that data can not be sunk into S3 through MSK cluster, and the error message in log of Ingestion Server (ECS) worker task is as below:
Message production error: InvalidReplicationFactor (Broker: Invalid replication
factor)
Resolution:
This is caused by replication factor larger than available brokers, please edit the MSK
cluster configuration, set default.replication.factor
not
larger than the total number of brokers.
Problem: data processing job failure
If the data processing job implemented by EMR serverless fails with the below errors:
-
IOException: No space left on device
Job failed, please check complete logs in configured logging destination. ExitCode: 1. Last few exceptions: Caused by: java.io.IOException: No space left on device Exception in thread "main" org.apache.spark.SparkException:
-
ExecutorDeadException
Job failed, please check complete logs in configured logging destination. ExitCode: 1. Last few exceptions: Caused by: org.apache.spark.ExecutorDeadException: The relative remote executor(Id: 34), which maintains the block data to fetch is dead. org.apache.spark.shuffle.FetchFailedException Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: ShuffleMapStage
-
Could not find CoarseGrainedScheduler
Job failed, please check complete logs in configured logging destination. ExitCode: 1. Last few exceptions: org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
You need to tune the EMR job default configuration. For more information, refer to configure execution parameters
Problem: data loading workflow failure due to meeting the 25,000 events limit in a single execution history
It's caused by the large volume of data to be loaded or the Redshift load being very high. You could mitigate this error by increasing the compute resources of Redshift (for example, RPUs for Redshift serverless) or reducing the data processing interval. Then restart the data-loading workflow.