Auditing and logging
Collecting and analyzing [audit] logs is useful for a variety of different reasons. Logs can help with root cause analysis and attribution, i.e. ascribing a change to a particular user. When enough logs have been collected, they can be used to detect anomalous behaviors too. On EKS, the audit logs are sent to Amazon Cloudwatch Logs. The audit policy for EKS is as follows:
apiVersion: audit.k8s.io/v1beta1
kind: Policy
rules:
# Log full request and response for changes to aws-auth ConfigMap in kube-system namespace
- level: RequestResponse
namespaces: ["kube-system"]
verbs: ["update", "patch", "delete"]
resources:
- group: "" # core
resources: ["configmaps"]
resourceNames: ["aws-auth"]
omitStages:
- "RequestReceived"
# Do not log watch operations performed by kube-proxy on endpoints and services
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: "" # core
resources: ["endpoints", "services", "services/status"]
# Do not log get operations performed by kubelet on nodes and their statuses
- level: None
users: ["kubelet"] # legacy kubelet identity
verbs: ["get"]
resources:
- group: "" # core
resources: ["nodes", "nodes/status"]
# Do not log get operations performed by the system:nodes group on nodes and their statuses
- level: None
userGroups: ["system:nodes"]
verbs: ["get"]
resources:
- group: "" # core
resources: ["nodes", "nodes/status"]
# Do not log get and update operations performed by controller manager, scheduler, and endpoint-controller on endpoints in kube-system namespace
- level: None
users:
- system:kube-controller-manager
- system:kube-scheduler
- system:serviceaccount:kube-system:endpoint-controller
verbs: ["get", "update"]
namespaces: ["kube-system"]
resources:
- group: "" # core
resources: ["endpoints"]
# Do not log get operations performed by apiserver on namespaces and their statuses/finalizations
- level: None
users: ["system:apiserver"]
verbs: ["get"]
resources:
- group: "" # core
resources: ["namespaces", "namespaces/status", "namespaces/finalize"]
# Do not log get and list operations performed by controller manager on metrics.k8s.io resources
- level: None
users:
- system:kube-controller-manager
verbs: ["get", "list"]
resources:
- group: "metrics.k8s.io"
# Do not log access to health, version, and swagger non-resource URLs
- level: None
nonResourceURLs:
- /healthz*
- /version
- /swagger*
# Do not log events resources
- level: None
resources:
- group: "" # core
resources: ["events"]
# Log request for updates/patches to nodes and pods statuses by kubelet and node problem detector
- level: Request
users: ["kubelet", "system:node-problem-detector", "system:serviceaccount:kube-system:node-problem-detector"]
verbs: ["update", "patch"]
resources:
- group: "" # core
resources: ["nodes/status", "pods/status"]
omitStages:
- "RequestReceived"
# Log request for updates/patches to nodes and pods statuses by system:nodes group
- level: Request
userGroups: ["system:nodes"]
verbs: ["update", "patch"]
resources:
- group: "" # core
resources: ["nodes/status", "pods/status"]
omitStages:
- "RequestReceived"
# Log delete collection requests by namespace-controller in kube-system namespace
- level: Request
users: ["system:serviceaccount:kube-system:namespace-controller"]
verbs: ["deletecollection"]
omitStages:
- "RequestReceived"
# Log metadata for secrets, configmaps, and tokenreviews to protect sensitive data
- level: Metadata
resources:
- group: "" # core
resources: ["secrets", "configmaps"]
- group: authentication.k8s.io
resources: ["tokenreviews"]
omitStages:
- "RequestReceived"
# Log requests for serviceaccounts/token resources
- level: Request
resources:
- group: "" # core
resources: ["serviceaccounts/token"]
# Log get, list, and watch requests for various resource groups
- level: Request
verbs: ["get", "list", "watch"]
resources:
- group: "" # core
- group: "admissionregistration.k8s.io"
- group: "apiextensions.k8s.io"
- group: "apiregistration.k8s.io"
- group: "apps"
- group: "authentication.k8s.io"
- group: "authorization.k8s.io"
- group: "autoscaling"
- group: "batch"
- group: "certificates.k8s.io"
- group: "extensions"
- group: "metrics.k8s.io"
- group: "networking.k8s.io"
- group: "policy"
- group: "rbac.authorization.k8s.io"
- group: "scheduling.k8s.io"
- group: "settings.k8s.io"
- group: "storage.k8s.io"
omitStages:
- "RequestReceived"
# Default logging level for known APIs to log request and response
- level: RequestResponse
resources:
- group: "" # core
- group: "admissionregistration.k8s.io"
- group: "apiextensions.k8s.io"
- group: "apiregistration.k8s.io"
- group: "apps"
- group: "authentication.k8s.io"
- group: "authorization.k8s.io"
- group: "autoscaling"
- group: "batch"
- group: "certificates.k8s.io"
- group: "extensions"
- group: "metrics.k8s.io"
- group: "networking.k8s.io"
- group: "policy"
- group: "rbac.authorization.k8s.io"
- group: "scheduling.k8s.io"
- group: "settings.k8s.io"
- group: "storage.k8s.io"
omitStages:
- "RequestReceived"
# Default logging level for all other requests to log metadata only
- level: Metadata
omitStages:
- "RequestReceived"
Recommendations
Enable audit logs
The audit logs are part of the EKS managed Kubernetes control plane logs that are managed by EKS. Instructions for enabling/disabling the control plane logs, which includes the logs for the Kubernetes API server, the controller manager, and the scheduler, along with the audit log, can be found here, https://docs.aws.amazon.com/eks/latest/userguide/control-plane-logs.html#enabling-control-plane-log-export.
Note
When you enable control plane logging, you will incur costs
Warning
The maximum size for a CloudWatch Logs entry is 256KB whereas the maximum Kubernetes API request size is 1.5MiB. Log entries greater than 256KB will either be truncated or only include the request metadata.
Utilize audit metadata
Kubernetes audit logs include two annotations that indicate whether or
not a request was authorized authorization.k8s.io/decision
and the
reason for the decision authorization.k8s.io/reason
. Use these
attributes to ascertain why a particular API call was allowed.
Create alarms for suspicious events
Create an alarm to automatically alert you where there is an increase in
403 Forbidden and 401 Unauthorized responses, and then use attributes
like host
, sourceIPs
, and k8s_user.username
to find out
where those requests are coming from.
Analyze logs with Log Insights
Use CloudWatch Log Insights to monitor changes to RBAC objects, e.g. Roles, RoleBindings, ClusterRoles, and ClusterRoleBindings. A few sample queries appear below:
Lists updates to the aws-auth
ConfigMap:
fields @timestamp, @message
| filter @logStream like "kube-apiserver-audit"
| filter verb in ["update", "patch"]
| filter objectRef.resource = "configmaps" and objectRef.name = "aws-auth" and objectRef.namespace = "kube-system"
| sort @timestamp desc
Lists creation of new or changes to validation webhooks:
fields @timestamp, @message
| filter @logStream like "kube-apiserver-audit"
| filter verb in ["create", "update", "patch"] and responseStatus.code = 201
| filter objectRef.resource = "validatingwebhookconfigurations"
| sort @timestamp desc
Lists create, update, delete operations to Roles:
fields @timestamp, @message
| sort @timestamp desc
| limit 100
| filter objectRef.resource="roles" and verb in ["create", "update", "patch", "delete"]
Lists create, update, delete operations to RoleBindings:
fields @timestamp, @message
| sort @timestamp desc
| limit 100
| filter objectRef.resource="rolebindings" and verb in ["create", "update", "patch", "delete"]
Lists create, update, delete operations to ClusterRoles:
fields @timestamp, @message
| sort @timestamp desc
| limit 100
| filter objectRef.resource="clusterroles" and verb in ["create", "update", "patch", "delete"]
Lists create, update, delete operations to ClusterRoleBindings:
fields @timestamp, @message
| sort @timestamp desc
| limit 100
| filter objectRef.resource="clusterrolebindings" and verb in ["create", "update", "patch", "delete"]
Plots unauthorized read operations against Secrets:
fields @timestamp, @message
| sort @timestamp desc
| limit 100
| filter objectRef.resource="secrets" and verb in ["get", "watch", "list"] and responseStatus.code="401"
| stats count() by bin(1m)
List of failed anonymous requests:
fields @timestamp, @message, sourceIPs.0
| sort @timestamp desc
| limit 100
| filter user.username="system:anonymous" and responseStatus.code in ["401", "403"]
Audit your CloudTrail logs
AWS APIs called by pods that are utilizing IAM Roles for Service Accounts (IRSA) are automatically logged to CloudTrail along with the name of the service account. If the name of a service account that wasn’t explicitly authorized to call an API appears in the log, it may be an indication that the IAM role’s trust policy was misconfigured. Generally speaking, Cloudtrail is a great way to ascribe AWS API calls to specific IAM principals.
Use CloudTrail Insights to unearth suspicious activity
CloudTrail insights automatically analyzes write management events from
CloudTrail trails and alerts you of unusual activity. This can help you
identify when there’s an increase in call volume on write APIs in your
AWS account, including from pods that use IRSA to assume an IAM role.
See
Announcing
CloudTrail Insights: Identify and Response to Unusual API Activity
Additional resources
As the volume of logs increases, parsing and filtering them with Log
Insights or another log analysis tool may become ineffective. As an
alternative, you might want to consider running
Sysdig Falco
Yet another option might be to store the audit logs in S3 and use the SageMaker Random Cut Forest algorithm to anomalous behaviors that warrant further investigation.
Tools and resources
The following commercial and open source projects can be used to assess your cluster’s alignment with established best practices:
-
kube-scan
Assigns a risk score to the workloads running in your cluster in accordance with the Kubernetes Common Configuration Scoring System framework -
Kubescape
Kubescape is an open source kubernetes security tool that scans clusters, YAML files, and Helm charts. It detects misconfigurations according to multiple frameworks (including NSA-CISA and MITRE ATT&CK® .)