Help improve this page
To contribute to this user guide, choose the Edit this page on GitHub link that is located in the right pane of every page.
Troubleshoot issues with Argo CD capabilities
Note
EKS Capabilities are fully managed and run outside your cluster. You do not have direct access to controller namespaces. You can configure controller log delivery for visibility into controller behavior. See Access EKS Capabilities controller logs. Troubleshooting focuses on capability health, application status, and configuration.
Capability is ACTIVE but applications are not syncing
If your Argo CD capability shows ACTIVE status but applications are not syncing, check the capability health and application status.
Check capability health:
You can view capability health and status issues in the EKS console or using the AWS CLI.
Console:
-
Open the Amazon EKS console at https://console.aws.amazon.com/eks/home#/clusters.
-
Select your cluster name.
-
Choose the Observability tab.
-
Choose Monitor cluster.
-
Choose the Capabilities tab to view health and status for all capabilities.
AWS CLI:
# View capability status and health aws eks describe-capability \ --regionregion-code\ --cluster-namemy-cluster\ --capability-namemy-argocd# Look for issues in the health section
Common causes:
-
Repository not configured: Git repository not added to Argo CD
-
Authentication failed: SSH key, token, or CodeCommit credentials invalid
-
Application not created: No Application resources exist in the cluster
-
Sync policy: Manual sync required (auto-sync not enabled)
-
IAM permissions: Missing permissions for CodeCommit or Secrets Manager
Check application status:
# List applications kubectl get application -n argocd # View sync status kubectl get applicationmy-app-n argocd -o jsonpath='{.status.sync.status}' # View application health kubectl get applicationmy-app-n argocd -o jsonpath='{.status.health}'
Check application conditions:
# Describe application to see detailed status kubectl describe applicationmy-app-n argocd # View application health kubectl get applicationmy-app-n argocd -o jsonpath='{.status.health}'
Applications stuck in "Progressing" state
If an application shows Progressing but never reaches Healthy, check the application’s resource status and events.
Check resource health:
# View application resources kubectl get applicationmy-app-n argocd -o jsonpath='{.status.resources}' # Check for unhealthy resources kubectl describe applicationmy-app-n argocd | grep -A 10 "Health Status"
Common causes:
-
Deployment not ready: Pods failing to start or readiness probes failing
-
Resource dependencies: Resources waiting for other resources to be ready
-
Image pull errors: Container images not accessible
-
Insufficient resources: Cluster lacks CPU or memory for pods
Verify target cluster configuration (for multi-cluster setups):
# List registered clusters kubectl get secret -n argocd -l argocd.argoproj.io/secret-type=cluster # View cluster secret details kubectl get secretcluster-secret-name-n argocd -o yaml
Repository authentication failures
If Argo CD cannot access your Git repositories, verify the authentication configuration.
For CodeCommit repositories:
Verify the IAM Capability Role has CodeCommit permissions:
# View IAM policies aws iam list-attached-role-policies --role-namemy-argocd-capability-roleaws iam list-role-policies --role-namemy-argocd-capability-role# Get specific policy details aws iam get-role-policy --role-namemy-argocd-capability-role--policy-namepolicy-name
The role needs codecommit:GitPull permission for the repositories.
For private Git repositories:
Verify repository credentials are correctly configured:
# Check repository secret exists kubectl get secret -n argocdrepo-secret-name-o yaml
Ensure the secret contains the correct authentication credentials (SSH key, token, or username/password).
For repositories using Secrets Manager:
# Verify IAM Capability Role has Secrets Manager permissions aws iam list-attached-role-policies --role-namemy-argocd-capability-role# Test secret retrieval aws secretsmanager get-secret-value --secret-idarn:aws:secretsmanager:region-code:111122223333:secret:my-secret
Multi-cluster deployment issues
If applications are not deploying to remote clusters, verify the cluster registration and access configuration.
Check cluster registration:
# List registered clusters kubectl get secret -n argocd -l argocd.argoproj.io/secret-type=cluster # Verify cluster secret format kubectl get secretCLUSTER_SECRET_NAME-n argocd -o yaml
Ensure the server field contains the EKS cluster ARN, not the Kubernetes API URL.
Verify target cluster Access Entry:
On the target cluster, check that the Argo CD Capability Role has an Access Entry:
# List access entries (run on target cluster or use AWS CLI) aws eks list-access-entries --cluster-nametarget-cluster# Describe specific access entry aws eks describe-access-entry \ --cluster-nametarget-cluster\ --principal-arnarn:aws:iam::111122223333:role/my-argocd-capability-role
Check IAM permissions for cross-account:
For cross-account deployments, verify the Argo CD Capability Role has an Access Entry on the target cluster. The managed capability uses EKS Access Entries for cross-account access, not IAM role assumption.
For more on multi-cluster configuration, see Register target clusters.
Increased application sync time
If your applications are syncing but taking longer than expected, use the following diagnostic steps to identify the cause.
Check last sync time
Confirm the delay by reviewing when applications last synced:
# View last sync time for all applications kubectl get application -n argocd -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.operationState.finishedAt}{"\n"}{end}' # View last sync time for a specific application kubectl get applicationmy-app-n argocd -o jsonpath='{.status.operationState.finishedAt}'
Check application conditions
Review application conditions for reconciliation queue delays:
# Check conditions on an application kubectl get applicationmy-app-n argocd -o jsonpath='{.status.conditions}'
Check targetRevision configuration
Applications using targetRevision: HEAD invalidate the manifest cache on every commit to the repository, which slows sync times:
# List applications using HEAD as targetRevision kubectl get application -n argocd -o jsonpath='{range .items[?(@.spec.source.targetRevision=="HEAD")]}{.metadata.name}{"\n"}{end}'
Common causes
-
No webhook configuration: Without webhooks, Argo CD polls repositories at the default interval of 6 minutes. This delays detection of new commits.
-
targetRevision set to HEAD: Every commit to the repository invalidates the manifest cache. Argo CD then regenerates manifests on each reconciliation.
-
Large or complex Git repositories: Monorepos or complex Helm charts cause slow manifest generation because of the volume of files and templates to process.
-
High number of Kubernetes resources in a single application: Applications managing many resources cause slow cluster cache sync because Argo CD must track the state of each resource.
Mitigations
-
Configure Git webhooks: Webhooks notify Argo CD immediately when changes are pushed, bypassing the default polling interval. For configuration steps, see Argo CD considerations.
-
Use specific branch names or commit SHAs: Set
targetRevisionto a branch name or commit SHA instead ofHEADto preserve the manifest cache between syncs. -
Split large monorepos: Divide large repositories into smaller, focused repositories to reduce manifest generation time.
-
Reduce resources per application: Split applications with many Kubernetes resources into multiple smaller applications to reduce cluster cache sync time.
-
Enable controller log delivery: Controller logs provide visibility into reconciliation behavior and queue processing. For configuration steps, see Access EKS Capabilities controller logs.
Applications repeatedly syncing or stuck out of sync
If your application syncs and then immediately goes OutOfSync, or if it stays stuck in a sync loop, the cause is usually drift between what Git defines and what exists in the cluster. Start with baseline diagnostics.
Gather diagnostic information
# View current sync and health status argocd app getmy-app# Show exact fields that differ between Git and live state argocd app diffmy-app# Check whether the app has ever reached a stable state argocd app historymy-app
The argocd app diff command is the most useful starting point. It shows you exactly which fields cause the application to appear out of sync.
Self-managed certificates cause drift
Controllers such as cert-manager, OPA Gatekeeper, and KEDA generate certificates at runtime. These runtime values are not in Git, so Argo CD detects drift on every reconciliation.
The symptoms are:
-
Application syncs, then immediately shows
OutOfSync -
The diff shows changes on a webhook
caBundlefield or a TLS Secretdatafield
To resolve this, add ignoreDifferences for the affected fields and enable RespectIgnoreDifferences in your sync options:
apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: my-app spec: ignoreDifferences: - group: admissionregistration.k8s.io kind: ValidatingWebhookConfiguration jsonPointers: - /webhooks/0/clientConfig/caBundle - group: "" kind: Secret jsonPointers: - /data/tls.crt - /data/tls.key syncPolicy: syncOptions: - RespectIgnoreDifferences=true
Self-heal interrupts slow-starting workloads
When selfHeal is enabled, Argo CD re-syncs the application when it detects drift. If your workload takes 30–60 seconds to start, the self-heal triggers before the workload becomes Healthy. With prune enabled, this might tear down partially-started resources.
To resolve this, first fix the underlying drift (see the certificate scenario). If drift is not the cause, consider disabling self-heal for workloads that you manage exclusively through Git:
apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: my-app spec: syncPolicy: automated: selfHeal: false prune: false
Note
Self-heal backoff timing is an instance-level controller setting. If you need to adjust self-heal timing rather than disabling it, open an AWS Support case.
ApplicationSet or resource ownership collisions
If two Applications or ApplicationSets manage the same Kubernetes resource, Argo CD shows a SharedResourceWarning. The resource never reaches a stable state. This commonly happens when a shared resource name is not scoped per environment or cluster.
To resolve this:
-
Make the contended resource unique per owner. Add an environment or cluster suffix to the resource name.
-
When renaming an ApplicationSet, set
preserveResourcesOnDeletion: truefirst to avoid destructive teardown of existing resources:
apiVersion: argoproj.io/v1alpha1 kind: ApplicationSet metadata: name: my-appset spec: syncPolicy: preserveResourcesOnDeletion: true
Stuck deletion from resource finalizers
If an application is stuck in Terminating state or shows "N objects remaining for deletion", the resources-finalizer.argocd.argoproj.io finalizer blocks removal until all managed resources delete. A managed resource with its own unprocessable finalizer blocks the deletion indefinitely.
To confirm, list resources that have a deletion timestamp but have not been removed:
kubectl get all -nmy-namespace-o json | \ jq '.items[] | select(.metadata.deletionTimestamp != null) | {name: .metadata.name, kind: .kind, finalizers: .metadata.finalizers}'
To resolve this:
-
Make sure the controller that owns the blocking finalizer is healthy and running.
-
If the owning controller is healthy but the finalizer is not being processed, remove the blocking finalizer from the stuck resource:
kubectl patchresource-kindresource-name-nmy-namespace\ --type json -p '[{"op": "remove", "path": "/metadata/finalizers/0"}]'
Failed sync does not auto-retry to the same revision
After a sync to a specific revision fails, Argo CD does not auto-retry the same revision. This commonly happens because of a manifest defect such as a ComparisonError from a duplicate environment variable key.
Confirm by checking the application status:
argocd app getmy-app# Look for: Operation: Sync Phase: Failed Revision: <sha>
To resolve this, fix the manifest defect in your Git repository and push a new commit. Alternatively, trigger a manual sync:
argocd app syncmy-app
Monorepo commit churn triggers broad regeneration
If many applications track HEAD on the same repository, any commit to that repository changes HEAD for all applications. This triggers manifest regeneration for every application, even those whose files did not change. For more information about targetRevision and caching, see the "Increased application sync time" section on this page.
To scope regeneration to only the files each application uses, add the manifest-generate-paths annotation:
apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: my-app annotations: argocd.argoproj.io/manifest-generate-paths: /apps/my-app spec: source: repoURL: https://github.com/my-org/my-monorepo.git targetRevision: HEAD path: apps/my-app
With this annotation, Argo CD only regenerates manifests when files under the specified path change. For shared libraries used across applications, you can specify multiple paths separated by semicolons (;).
Where possible, pin targetRevision to a branch name or tag instead of HEAD.
Kubernetes defaulting and mutating webhooks cause phantom diffs
If your application shows OutOfSync immediately after a sync, check the diff for fields you never set (such as terminationGracePeriodSeconds, dnsPolicy, or /spec/replicas). The Kubernetes API server or a mutating webhook added those fields at apply time.
To resolve this for fields managed by another controller (such as /spec/replicas when an HPA manages scaling), add ignoreDifferences:
apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: my-app spec: ignoreDifferences: - group: apps kind: Deployment jsonPointers: - /spec/replicas syncPolicy: syncOptions: - RespectIgnoreDifferences=true
For fields added by Kubernetes defaulting or mutating webhooks, you can enable server-side diff on the application:
apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: my-app annotations: argocd.argoproj.io/compare-options: ServerSideDiff=true,IncludeMutationWebhook=true
Server-side diff performs a dry-run apply per resource, which increases load on the Kubernetes API server. Test this on a small number of applications before you enable it broadly.
High-churn controller-owned resources
Some controllers generate large numbers of short-lived or frequently-updated resources. Examples include Karpenter node objects, Cilium identity and endpoint objects, and Kyverno policy reports. If these resources generate a high volume of watch events and cause sync churn, you can reduce the load by excluding those resource kinds or filtering watch events. These changes require instance-level controller configuration.
On the managed capability, open an AWS Support case to request resource exclusions or watch-event filtering for these resource kinds.
Best practices
-
Use application diff first: Run
argocd app diffas the first diagnostic step for any repeated-sync issue. It shows you the exact cause of drift. -
Prefer narrow ignoreDifferences: Target specific fields on specific resource kinds. Avoid broad ignore rules that can mask real configuration drift.
-
Pair ignoreDifferences with RespectIgnoreDifferences: Always add the
RespectIgnoreDifferences=truesync option. Without it, syncs still overwrite the ignored fields. -
Keep resource names unique: Scope resource names per environment and cluster to avoid ownership collisions between Applications or ApplicationSets.
-
Be cautious with prune and selfHeal: Do not enable both on workloads that take a long time to start. The self-heal can tear down resources before they become healthy.
-
Pin targetRevision and scope manifest paths: For applications in large shared repositories, use a branch or tag instead of
HEADand add themanifest-generate-pathsannotation.
When to contact AWS Support
Open an AWS Support case in the following situations:
-
Instance-level controller tuning seems necessary (processor counts, self-heal timing, or resource exclusions).
-
Repo-server or controller capacity seems insufficient for your application count.
-
Workload configuration, drift, ownership, or finalizers do not explain the behavior.
Include the output of argocd app get and argocd app diff for affected applications in your support case.
Next steps
-
Argo CD considerations - Argo CD considerations and best practices
-
Working with Argo CD - Create and manage Argo CD Applications
-
Register target clusters - Configure multi-cluster deployments
-
Troubleshooting EKS Capabilities - General capability troubleshooting guidance