Hardening Kubernetes for Production: 12 Security Controls We Apply to Every Cluster

A default Kubernetes installation is convenient, not safe. Permissive RBAC, no network policies, root containers, no image scanning, no audit logging — out of the box, your cluster is one stolen credential away from a full compromise.

We've hardened dozens of production clusters across AWS, GCP, Azure, and bare metal. The same 12 controls show up every time. None of them are exotic, but together they raise the bar from "trivially exploitable" to "would require a serious, targeted attack."

1. Pod Security Standards (Restricted profile)

Replace the deprecated PodSecurityPolicy with the built-in Pod Security Standards. Apply the restricted profile to every namespace running workloads — it blocks privileged containers, host network/PID/IPC, hostPath volumes, and forces non-root execution.

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: latest

Use baseline for namespaces that genuinely need it (legacy workloads), but never privileged outside of kube-system.

2. Run containers as non-root, read-only filesystems

Every workload should set runAsNonRoot: true and readOnlyRootFilesystem: true. Add a tmpfs emptyDir for any directories that need write access. This single change kills entire classes of container escape and persistence techniques.

securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  capabilities:
    drop: ["ALL"]
  seccompProfile:
    type: RuntimeDefault

3. Default-deny network policies

By default, every pod can talk to every other pod. That's how a single compromised service becomes a full cluster compromise. Apply a default-deny ingress and egress policy in every namespace, then explicitly allow only what's needed.

This is non-negotiable for SOC 2 and HIPAA clusters, and it's good hygiene everywhere else. We covered the patterns in detail in our Kubernetes network policies guide.

4. RBAC with least privilege

No human user should have cluster-admin for daily work. No service account should have permissions it doesn't need. We audit RBAC with these questions:

  • Which subjects have * verbs on any resource?
  • Which service accounts can list secrets cluster-wide?
  • Which can create pods? (pod creation = arbitrary code execution as the pod's service account)
  • Which can create RoleBindings or ClusterRoleBindings? (= privilege escalation)

Tools like rbac-tool, kubectl-who-can, and krane make this auditing tractable.

5. Secrets management — not Kubernetes Secrets

Kubernetes Secrets are base64-encoded, not encrypted. Anyone with read access to the namespace can dump them. Use one of:

  • External Secrets Operator — pulls from AWS Secrets Manager, GCP Secret Manager, Azure Key Vault, or Vault
  • Sealed Secrets — encrypted at rest in git, decrypted in-cluster
  • SOPS + Helm Secrets — for GitOps workflows

And always enable etcd encryption at rest with a KMS provider. EKS, GKE, and AKS all support this with one configuration flag.

6. Image scanning in CI and at admission

Scan images at two points:

  • In CI — Trivy or Grype on every build, fail the pipeline on Critical/High CVEs in your application or base images
  • At admission — Kyverno or OPA Gatekeeper to reject images that don't come from your registry, aren't signed, or have critical vulnerabilities

Pin base images to specific digests, not tags. node:22-alpine changes; node:22-alpine@sha256:... doesn't.

7. Image signing and verification (Sigstore/Cosign)

Sign every image you build with Cosign. Verify signatures at admission time with a policy controller. This makes supply chain attacks much harder — an attacker who pushes a malicious image to your registry can't get it deployed without your signing key.

8. Audit logging — and actually look at it

Enable Kubernetes audit logs at RequestResponse level for sensitive resources (secrets, RBAC, exec, attach, port-forward) and Metadata for everything else. Ship them to your SIEM. Alert on:

  • kubectl exec into production pods
  • Secret reads outside of normal service accounts
  • RoleBinding/ClusterRoleBinding creation
  • Privileged pod creation
  • Failed authentication attempts

9. Runtime threat detection (Falco)

Audit logs catch API actions. They don't catch what happens inside a running container. Falco watches syscalls and alerts on suspicious behavior — shell spawned in a container, sensitive file read, unexpected network connection, package manager run in production.

Tune the rules carefully (the defaults are noisy), but the value is enormous: you'll know within seconds if someone gets a shell in production.

10. Restrict egress and use a service mesh for mTLS

Most clusters allow unrestricted egress. That means a compromised pod can exfiltrate data to anywhere on the internet. Use network policies to restrict egress to known destinations (your databases, your APIs, your registries), and route everything else through an egress gateway you control.

For service-to-service traffic, mTLS via Istio, Linkerd, or Cilium ensures that even if someone gets onto the network, they can't impersonate a service or eavesdrop on traffic.

11. Patch the control plane and nodes — automatically

Unpatched nodes are the most common cause of cluster compromise. Use:

  • Managed control plane — let AWS/GCP/Azure handle the API server, etcd, and scheduler patches
  • Auto-upgrading node pools — GKE auto-upgrade, EKS managed node groups, or Karpenter with drift detection
  • Surge upgrades — to keep capacity during rolling updates
  • kured — for self-managed clusters, to drain and reboot nodes when kernel updates require it

12. Backups, disaster recovery, and tested restores

Security isn't only about preventing compromise — it's also about recovering from one. Use Velero to back up cluster state and persistent volumes to off-cluster storage (a different cloud account, ideally). And then actually test the restore at least quarterly. A backup you've never restored is a hope, not a plan.

What this gets you

None of these controls are silver bullets. Together, they put you in a position where:

  • A stolen developer credential can't escalate to cluster-admin
  • A compromised pod can't move laterally or exfiltrate data
  • A malicious image won't make it past admission
  • You'll know within seconds if something starts behaving suspiciously
  • You can prove all of this to a SOC 2, HIPAA, or PCI DSS auditor

This is the baseline we apply to every production cluster we manage — whether it's a 3-node startup workload or a multi-region platform. Security is cumulative; every control you skip makes the next compromise cheaper.

"After our DevOps Team engagement, our pentesters reported that our cluster was the hardest target they'd tested all year. We went from 'wide open' to 'audit-ready' in five weeks."

Need a Kubernetes security review?

We'll audit your cluster against this checklist and give you a prioritized remediation plan — for free.

Book Free Assessment
← Back to all articles