Kubernetes Security - Production Cluster Hardening Guide
Secure production Kubernetes clusters with RBAC policies, network segmentation, pod security standards, secret management with Vault, and runtime monitoring using Falco and Tetragon.
Introduction
Kubernetes security represents one of the most critical challenges facing platform engineering teams in 2026, with 96% of organizations either using or evaluating Kubernetes in production environments and container-related security incidents increasing 47% year-over-year according to the latest CNCF State of Cloud Native Security report. The shift-left security paradigm—catching vulnerabilities before production deployment—proves insufficient when clusters themselves remain vulnerable to privilege escalation, lateral movement, and data exfiltration attacks exploiting misconfigurations in role-based access control (RBAC), network policies, and pod security standards. This comprehensive guide provides battle-tested security hardening strategies for production Kubernetes clusters, covering authentication and authorization with RBAC, network segmentation using Calico and Cilium policies, Pod Security Standards enforcement with admission controllers, secure secret management with external vaults, image scanning and supply chain security, runtime threat detection with Falco and Tetragon, and compliance automation for SOC 2 and ISO 27001 requirements.
Organizations running Kubernetes at scale face an expanding attack surface as clusters grow to hundreds of nodes managing thousands of pods across multiple namespaces, with each additional service mesh integration, CI/CD pipeline connection, and third-party operator deployment introducing new vectors for compromise. The Kubernetes security model relies on defense-in-depth—layered controls ensuring that a single misconfiguration doesn't compromise the entire cluster—but default installations prioritize developer productivity over security, leaving clusters vulnerable to well-known exploits like the Ingress NGINX controller CVE-2021-25742 path traversal vulnerability that affected approximately half of cloud-native environments until patching became mandatory in early 2026. This guide assumes you have a working Kubernetes cluster (version 1.28+) and provides copy-paste-ready configurations with explanations for each security control, enabling security teams to systematically harden clusters while maintaining developer velocity through automated policy enforcement and GitOps workflows.
Authentication and Authorization with RBAC
Understanding Kubernetes RBAC
Kubernetes Role-Based Access Control (RBAC) governs what actions users and service accounts can perform against the Kubernetes API server, forming the foundation of cluster security by preventing unauthorized access to resources. RBAC operates through four primary resource types: Roles (namespace-scoped permissions), ClusterRoles (cluster-wide permissions), RoleBindings (grant Role permissions to subjects), and ClusterRoleBindings (grant ClusterRole permissions cluster-wide). The default Kubernetes installation includes overly permissive RBAC rules—notably the system:discovery and system:public-info-viewer ClusterRoles that allow unauthenticated users to query API server metadata, exposing cluster version information attackers can use to identify known vulnerabilities.
Principle of Least Privilege:
Every service account and user should receive only the minimum permissions required to perform their function. The default service account in each namespace automatically mounts a token granting API server access—a dangerous default since most pods don't need Kubernetes API access. Disabling automatic token mounting globally and explicitly enabling it only for pods requiring API access eliminates 70-80% of potential API server abuse vectors.
Service Account Token Best Practices:
# Disable automountServiceAccountToken in namespace default service account
apiVersion: v1
kind: ServiceAccount
metadata:
name: default
namespace: production
automountServiceAccountToken: false
---
# Create dedicated service account with limited scope for app requiring API access
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-collector
namespace: production
automountServiceAccountToken: true
---
# Role granting only read access to pods and nodes
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: metrics-reader
namespace: production
rules:
- apiGroups: [""]
resources: ["pods", "nodes"]
verbs: ["get", "list"]
---
# Bind role to service account
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: metrics-collector-binding
namespace: production
subjects:
- kind: ServiceAccount
name: metrics-collector
namespace: production
roleRef:
kind: Role
name: metrics-reader
apiGroup: rbac.authorization.k8s.io
Implementing Namespace-Level Isolation
Namespaces provide logical isolation for multi-tenant clusters, but without RBAC enforcement, users with access to one namespace can potentially access resources in other namespaces. Create namespaces for each environment (development, staging, production) and team, then enforce isolation using RoleBindings that scope permissions to specific namespaces:
# Development team role - namespace scoped
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: developer
namespace: dev-team-a
rules:
- apiGroups: ["", "apps", "batch"]
resources: ["pods", "deployments", "jobs", "configmaps", "secrets"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
resources: ["pods/log", "pods/exec"]
verbs: ["get", "list"]
---
# Production team role - read-only with limited write access
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: production-sre
namespace: production
rules:
- apiGroups: ["", "apps"]
resources: ["pods", "deployments", "services"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources: ["deployments/scale"]
verbs: ["update"] # Allow scaling only
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list"]
# Note: NO delete permissions, NO secret access, NO exec access
Audit Overly Permissive RBAC:
Use kubectl access matrix to audit existing permissions and identify overly broad grants:
# List all ClusterRoleBindings granting cluster-admin
kubectl get clusterrolebindings -o json | jq '.items[] | select(.roleRef.name=="cluster-admin") | {name:.metadata.name, subjects:.subjects}'
Find all service accounts with wildcard permissions
kubectl get roles,clusterroles --all-namespaces -o json | jq '.items[] | select(.rules[].verbs[] == "*") | {namespace:.metadata.namespace, name:.metadata.name, rules:.rules}'
Audit which service accounts can create pods (potential for privilege escalation)
kubectl auth can-i create pods --as=system:serviceaccount:default:default
Integrating External Identity Providers
Kubernetes supports OpenID Connect (OIDC) integration for centralized authentication using corporate identity providers like Okta, Azure AD, or Google Workspace. Configure the API server to validate JWT tokens issued by your IdP, enabling single sign-on and centralized access revocation:
API Server Configuration (kube-apiserver flags):
--oidc-issuer-url=https://accounts.google.com
--oidc-client-id=kubernetes-auth.example.com
--oidc-username-claim=email
--oidc-groups-claim=groups
--oidc-ca-file=/etc/kubernetes/pki/oidc-ca.pem
After OIDC configuration, create ClusterRoleBindings mapping IdP groups to Kubernetes roles:
# Map Google Workspace group to cluster-admin role
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: oidc-cluster-admins
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: Group
name: kubernetes-admins@example.com
apiGroup: rbac.authorization.k8s.io
Benefits of OIDC Integration:
- Centralized Access Control: Revoke Kubernetes access by disabling user in IdP
- Audit Trails: Identity provider logs capture all authentication attempts
- MFA Enforcement: Leverage IdP's multi-factor authentication policies
- Time-Limited Access: JWT tokens expire automatically (typically 1-hour expiration)
Network Segmentation and Policies
Implementing Zero-Trust Networking
Kubernetes default networking allows all pods to communicate with all other pods across namespaces—a security anti-pattern enabling lateral movement after initial compromise. Network policies enforce zero-trust segmentation by explicitly allowing only required communications and denying all other traffic by default. Kubernetes network policies require a CNI (Container Network Interface) plugin with network policy support—Calico, Cilium, and Weave Net provide production-ready implementations with policy enforcement at the kernel level using eBPF or iptables.
Default Deny-All Network Policy:
Apply to every namespace to establish zero-trust baseline:
# Deny all ingress and egress traffic by default
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {} # Applies to all pods in namespace
policyTypes:
- Ingress
- Egress
After applying default-deny, pods cannot communicate until you explicitly allow traffic with targeted network policies:
Allow Frontend to Backend Communication:
# Allow traffic from frontend pods to backend API on port 8080
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-backend
namespace: production
spec:
podSelector:
matchLabels:
app: backend-api
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
Allow DNS Resolution:
All pods require DNS for service discovery—explicitly allow DNS egress:
# Allow DNS queries to kube-dns/CoreDNS
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns-egress
namespace: production
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to:
- namespaceSelector:
matchLabels:
name: kube-system
- podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
Advanced Network Policies with Cilium
Cilium leverages eBPF (extended Berkeley Packet Filter) for high-performance network policy enforcement at the kernel level, providing Layer 7 (HTTP/gRPC) aware policies that traditional iptables-based solutions cannot match. Cilium enables API-level access control—for example, allowing GET requests to /api/users but denying POST/PUT/DELETE operations:
Layer 7 HTTP Policy Example:
# Allow only HTTP GET requests to /api/users endpoint
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: l7-api-policy
namespace: production
spec:
endpointSelector:
matchLabels:
app: user-service
ingress:
- fromEndpoints:
- matchLabels:
app: frontend
toPorts:
- ports:
- port: "8080"
protocol: TCP
rules:
http:
- method: "GET"
path: "/api/users"
DNS-Based Egress Filtering:
Cilium's DNS-aware policies enable FQDN-based egress control—allowing pods to reach specific external services while blocking all other internet egress:
# Allow egress only to specific external APIs
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: allow-external-apis
namespace: production
spec:
endpointSelector:
matchLabels:
app: data-sync-service
egress:
- toFQDNs:
- matchName: "api.stripe.com"
- matchName: "api.github.com"
toPorts:
- ports:
- port: "443"
protocol: TCP
- toEndpoints:
- matchLabels:
k8s-app: kube-dns
toPorts:
- ports:
- port: "53"
protocol: UDP
Service Mesh Security with Mutual TLS
Service meshes like Istio and Linkerd provide automatic mutual TLS (mTLS) encryption for pod-to-pod communication, authenticating both client and server using X.509 certificates issued by the mesh control plane. mTLS eliminates network sniffing attacks and ensures zero-trust identity verification at the transport layer:
Istio PeerAuthentication (Enforce mTLS Cluster-Wide):
# Require mTLS for all traffic in production namespace
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default-mtls
namespace: production
spec:
mtls:
mode: STRICT # Reject plaintext connections
Authorization Policies (L7 Access Control):
Istio authorization policies enforce identity-based access control using Envoy proxy filters:
# Allow only frontend service account to call backend API
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: backend-authz
namespace: production
spec:
selector:
matchLabels:
app: backend-api
rules:
- from:
- source:
principals: ["cluster.local/ns/production/sa/frontend"]
to:
- operation:
methods: ["GET", "POST"]
paths: ["/api/*"]
Pod Security Standards
Enforcing Pod Security Admission
Kubernetes 1.25+ replaces deprecated PodSecurityPolicy with Pod Security Admission, a built-in admission controller enforcing three predefined security profiles: Privileged (unrestricted), Baseline (prevents known privilege escalations), and Restricted (heavily restricted, follows security hardening best practices). Configure namespace labels to enforce security standards:
Namespace Configuration:
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
Restricted Profile Requirements:
Pods in namespaces with restricted enforcement must satisfy:
- No privilege escalation:
allowPrivilegeEscalation: false - No privileged containers:
privileged: false - Non-root user:
runAsNonRoot: true - Dropped capabilities: Drop all Linux capabilities and add only required ones
- Immutable root filesystem:
readOnlyRootFilesystem: true - Seccomp profile: Use
RuntimeDefaultor custom seccomp profile - No host namespaces: Disallow
hostNetwork,hostPID,hostIPC
Compliant Pod Specification:
apiVersion: v1
kind: Pod
metadata:
name: secure-app
namespace: production
spec:
securityContext:
runAsNonRoot: true
runAsUser: 10000
fsGroup: 10000
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: myapp:1.0
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE # Only if app needs to bind ports < 1024
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir: {}
Implementing Policy Enforcement with Kyverno
Kyverno provides Kubernetes-native policy management using CRDs (Custom Resource Definitions) for policy-as-code enforcement without learning a new policy language like OPA's Rego. Kyverno policies validate, mutate, and generate Kubernetes resources, enabling automated security guardrails:
Require Resource Limits on All Pods:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-resource-limits
spec:
validationFailureAction: enforce
background: true
rules:
- name: check-cpu-memory-limits
match:
any:
- resources:
kinds:
- Pod
validate:
message: "CPU and memory limits are required"
pattern:
spec:
containers:
- resources:
limits:
memory: "?*"
cpu: "?*"
Automatically Add Seccomp Profile:
Mutate pods lacking seccomp profiles to automatically add RuntimeDefault:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: add-default-seccomp
spec:
background: false
rules:
- name: add-seccomp
match:
any:
- resources:
kinds:
- Pod
mutate:
patchStrategicMerge:
spec:
securityContext:
+(seccompProfile):
+(type): RuntimeDefault
Block Privileged Containers:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: disallow-privileged
spec:
validationFailureAction: enforce
rules:
- name: check-privileged
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Privileged containers are not allowed"
pattern:
spec:
containers:
- =(securityContext):
=(privileged): false
Secret Management
Kubernetes Secrets Limitations
Kubernetes Secrets store sensitive data like API keys, database passwords, and TLS certificates, but default secret implementation suffers critical security weaknesses: secrets encoded in base64 (not encrypted), stored unencrypted in etcd unless encryption-at-rest configured, and visible to anyone with RBAC permissions to read secrets in a namespace. Etcd encryption provides defense-in-depth but doesn't solve key management—encryption keys stored on master nodes remain vulnerable to compromise if attackers gain node access.
Enable Etcd Encryption at Rest:
Configure API server with encryption provider configuration:
# /etc/kubernetes/enc/encryption-config.yaml
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: <base64-encoded-32-byte-key>
- identity: {} # Fallback for unencrypted data
API server flag: --encryption-provider-config=/etc/kubernetes/enc/encryption-config.yaml
Rotate Encryption Keys:
# Encrypt all existing secrets with new key
kubectl get secrets --all-namespaces -o json | kubectl replace -f -
External Secret Management with Vault
HashiCorp Vault and cloud provider secret managers (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager) provide production-grade secret management with audit logging, automatic rotation, and fine-grained access control. External Secrets Operator (ESO) synchronizes secrets from external vaults into Kubernetes secrets, enabling GitOps workflows where secret references (not secret values) live in Git:
Install External Secrets Operator:
helm repo add external-secrets https://charts.external-secrets.io
helm install external-secrets external-secrets/external-secrets -n external-secrets-system --create-namespace
Configure AWS Secrets Manager Backend:
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: aws-secrets-backend
namespace: production
spec:
provider:
aws:
service: SecretsManager
region: us-east-1
auth:
jwt:
serviceAccountRef:
name: external-secrets-sa
---
# ExternalSecret maps remote secret to Kubernetes secret
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: database-credentials
namespace: production
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-backend
kind: SecretStore
target:
name: db-secret
creationPolicy: Owner
data:
- secretKey: password
remoteRef:
key: prod/database/postgres
property: password
Benefits of External Secret Management:
- Centralized Audit Logs: All secret access logged in Vault/cloud provider
- Automatic Rotation: Secrets rotated on schedule without pod restarts (using CSI driver)
- Dynamic Secrets: Generate short-lived database credentials per pod
- Encryption Key Management: Cloud KMS manages encryption keys, not stored in cluster
Secrets CSI Driver (Runtime Secret Injection)
Secrets Store CSI Driver mounts secrets directly from external vaults into pod filesystems, avoiding Kubernetes secret objects entirely. Secrets never persisted in etcd and automatically updated when rotated in vault:
Install Secrets Store CSI Driver:
helm repo add secrets-store-csi-driver https://kubernetes-sigs.github.io/secrets-store-csi-driver/charts
helm install csi-secrets-store secrets-store-csi-driver/secrets-store-csi-driver --namespace kube-system
AWS Secrets Manager Provider:
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: aws-secrets
namespace: production
spec:
provider: aws
parameters:
objects: |
- objectName: "prod/api-keys"
objectType: "secretsmanager"
jmesPath:
- path: stripe_key
objectAlias: stripeKey
---
# Pod mounting secrets via CSI driver
apiVersion: v1
kind: Pod
metadata:
name: app-with-secrets
namespace: production
spec:
serviceAccountName: app-sa
containers:
- name: app
image: myapp:1.0
volumeMounts:
- name: secrets
mountPath: "/mnt/secrets"
readOnly: true
env:
- name: STRIPE_API_KEY
valueFrom:
secretKeyRef:
name: app-secrets
key: stripeKey
volumes:
- name: secrets
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: "aws-secrets"
Runtime Security and Threat Detection
Implementing Runtime Security with Falco
Falco provides runtime threat detection for Kubernetes by monitoring system calls using eBPF or kernel modules, detecting anomalous behavior like unexpected process execution, sensitive file access, and privilege escalation attempts. Falco rules trigger alerts when containers exhibit suspicious behavior deviating from established baselines:
Install Falco:
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm install falco falcosecurity/falco --namespace falco --create-namespace \
--set driver.kind=modern_ebpf \
--set falco.grpc.enabled=true \
--set falco.grpcOutput.enabled=true
Custom Falco Rules:
# /etc/falco/rules.d/custom-rules.yaml
- rule: Unauthorized Process in Container
desc: Detect processes not in approved binary list
condition: >
spawned_process and container and not proc.name in (node, java, python, ruby)
output: "Unauthorized process started (user=%user.name command=%proc.cmdline container=%container.name)"
priority: WARNING
-
rule: Sensitive File Access
desc: Detect reads of /etc/shadow or SSH keys
condition: >
open_read and container and (fd.name=/etc/shadow or fd.name glob /root/.ssh/*)
output: "Sensitive file accessed (user=%user.name file=%fd.name container=%container.name)"
priority: CRITICAL
-
rule: Reverse Shell Detected
desc: Detect common reverse shell patterns
condition: >
spawned_process and container and (proc.name in (nc, ncat, netcat, socat) or (proc.name=bash and proc.args contains "-i"))
output: "Potential reverse shell (command=%proc.cmdline container=%container.name)"
priority: CRITICAL
Forward Falco Alerts to SIEM:
Configure Falco output to send alerts to Elasticsearch, Splunk, or cloud SIEM:
# Falco Helm values
falco:
jsonOutput: true
httpOutput:
enabled: true
url: "https://siem.example.com/api/events"
Tetragon for Deep Process Visibility
Cilium Tetragon provides eBPF-based security observability and runtime enforcement, enabling fine-grained process execution policies. Unlike Falco (detection only), Tetragon can block malicious actions in real-time:
Install Tetragon:
helm repo add cilium https://helm.cilium.io
helm install tetragon cilium/tetragon -n kube-system
Tracing Policy (Monitor Binary Execution):
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: monitor-exec
spec:
kprobes:
- call: "sys_execve"
syscall: true
args:
- index: 0
type: "string"
selectors:
- matchNamespaces:
- namespace: "production"
matchBinaries:
- operator: "NotIn"
values:
- "/usr/bin/node"
- "/usr/local/bin/python"
matchActions:
- action: Sigkill # Kill unauthorized processes
Image Security and Supply Chain
Container Image Scanning
Scan container images for CVEs (Common Vulnerabilities and Exposures) before deployment using Trivy, Grype, or cloud provider scanners (AWS ECR scanning, GCP Artifact Analysis). Integrate scanning into CI/CD pipelines to fail builds when critical vulnerabilities detected:
Trivy Scan in CI/CD:
# Scan image and fail if HIGH or CRITICAL vulnerabilities found
trivy image --severity HIGH,CRITICAL --exit-code 1 myapp:1.0
Admission Controller Enforcement:
Implement admission controller blocking deployment of vulnerable images using Kyverno:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: check-image-vulnerabilities
spec:
validationFailureAction: enforce
webhookTimeoutSeconds: 30
rules:
- name: scan-image
match:
any:
- resources:
kinds:
- Pod
verifyImages:
- imageReferences:
- "*"
attestations:
- type: "https://trivy.aquasec.com/vulnerability-scan/v1"
conditions:
- all:
- key: "{{ vulnerabilities.critical }}"
operator: Equals
value: 0
- key: "{{ vulnerabilities.high }}"
operator: LessThan
value: 5
Image Signing with Cosign and Sigstore
Sign container images with Cosign to ensure only verified images from trusted registries deploy to production. Admission controllers verify signatures before pod creation:
Sign Image with Cosign:
# Generate key pair
cosign generate-key-pair
Sign image
cosign sign --key cosign.key myregistry.io/myapp:1.0
Verify Signatures with Kyverno:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: verify-image-signatures
spec:
validationFailureAction: enforce
webhookTimeoutSeconds: 30
rules:
- name: check-signature
match:
any:
- resources:
kinds:
- Pod
verifyImages:
- imageReferences:
- "myregistry.io/*"
attestors:
- count: 1
entries:
- keys:
publicKeys: |-
-----BEGIN PUBLIC KEY-----
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE...
-----END PUBLIC KEY-----
Software Bill of Materials (SBOM)
Generate SBOMs for container images documenting all dependencies, enabling rapid response to newly disclosed vulnerabilities (e.g., Log4Shell). Syft generates SBOMs in SPDX or CycloneDX formats:
# Generate SBOM for image
syft myapp:1.0 -o spdx-json > sbom.json
Scan SBOM for vulnerabilities
grype sbom:./sbom.json
Compliance and Audit Logging
Kubernetes Audit Logging
Enable Kubernetes audit logging to track all API server requests, providing forensic evidence for security incidents and compliance audits (SOC 2, PCI DSS, HIPAA). Configure audit policy to log critical events while minimizing storage overhead:
Audit Policy Configuration:
# /etc/kubernetes/audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Log all secret access
- level: Metadata
resources:
- group: ""
resources: ["secrets"]
# Log exec and port-forward (potential data exfiltration)
- level: Metadata
resources:
- group: ""
resources: ["pods/exec", "pods/portforward"]
# Log authentication failures
- level: Metadata
omitStages:
- RequestReceived
userGroups:
- system:unauthenticated
# Log modifications to critical resources
- level: RequestResponse
verbs: ["create", "update", "patch", "delete"]
resources:
- group: ""
resources: ["pods", "services", "secrets"]
- group: "apps"
resources: ["deployments", "daemonsets", "statefulsets"]
# Don't log read-only GET requests (reduces log volume)
- level: None
verbs: ["get", "list", "watch"]
API Server Configuration:
--audit-policy-file=/etc/kubernetes/audit-policy.yaml
--audit-log-path=/var/log/kubernetes/audit.log
--audit-log-maxage=30
--audit-log-maxbackup=10
--audit-log-maxsize=100
Forward Audit Logs to SIEM:
Use Fluent Bit or Fluentd to ship audit logs to centralized logging:
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: kube-system
data:
fluent-bit.conf: |
[INPUT]
Name tail
Path /var/log/kubernetes/audit.log
Parser json
Tag audit.*
[OUTPUT]
Name es
Match audit.*
Host elasticsearch.logging.svc
Port 9200
Index k8s-audit
Type _doc
Compliance Scanning with Kubescape
Kubescape scans clusters against NSA/CISA Kubernetes Hardening Guide, CIS Benchmarks, and MITRE ATT&CK framework, identifying misconfigurations and compliance gaps:
Install and Run Kubescape:
# Install Kubescape
curl -s https://raw.githubusercontent.com/kubescape/kubescape/master/install.sh | /bin/bash
Scan cluster against NSA framework
kubescape scan framework nsa --exclude-namespaces kube-system,kube-public
Generate compliance report
kubescape scan framework cis --format html --output report.html
Common Issues Detected:
- Anonymous authentication enabled: API server accessible without credentials
- Insecure port enabled: API server listening on insecure port 8080
- Admission controllers disabled: Missing PodSecurityPolicy, NodeRestriction
- Audit logging disabled: No forensic trail for security incidents
- kubelet authentication disabled: Unauthenticated kubelet API access
Incident Response and Recovery
Forensics with kubectl debug
Kubernetes 1.23+ includes kubectl debug for ephemeral debugging containers with different security contexts, enabling root-level troubleshooting without modifying pod security:
# Attach ephemeral debug container with root privileges
kubectl debug -it pod-name --image=busybox --target=container-name -- sh
Debug crashed pod (create copy with debugging tools)
kubectl debug pod-name -it --copy-to=debug-pod --container=app -- sh
Debug node (run privileged pod with host filesystem access)
kubectl debug node/worker-node-1 -it --image=ubuntu
Backup and Disaster Recovery
Implement cluster backup strategy using Velero for disaster recovery and migration:
Install Velero:
velero install \
--provider aws \
--bucket kubernetes-backups \
--secret-file ./credentials-velero \
--backup-location-config region=us-east-1 \
--snapshot-location-config region=us-east-1
Scheduled Backups:
# Daily backup of all namespaces
velero schedule create daily-backup --schedule="0 2 * * *"
Backup critical namespace with retention
velero backup create production-backup --include-namespaces production --ttl 720h
Restore from Backup:
# Restore entire cluster state
velero restore create --from-backup production-backup
Restore specific namespace
velero restore create --from-backup production-backup --include-namespaces production
Conclusion and Security Checklist
Securing production Kubernetes clusters requires defense-in-depth across authentication, network policies, pod security, secret management, runtime monitoring, and compliance automation. Use this checklist to systematically harden clusters:
Essential Security Controls:
- RBAC configured with least-privilege service accounts
- Default service account automountServiceAccountToken disabled
- OIDC integration for centralized authentication
- Network policies enforcing default-deny with explicit allow rules
- Pod Security Admission enforcing Restricted profile in production namespaces
- Policy enforcement with Kyverno/OPA blocking privileged containers
- Etcd encryption-at-rest enabled
- External secret management (Vault/AWS Secrets Manager) configured
- Runtime security monitoring (Falco/Tetragon) deployed
- Container image scanning integrated into CI/CD
- Image signature verification enforced in production
- Kubernetes audit logging enabled and forwarded to SIEM
- Regular compliance scanning with Kubescape
- Disaster recovery backups automated with Velero
- Security incident response runbook documented
Advanced Hardening:
- Service mesh (Istio/Linkerd) mTLS enforced cluster-wide
- Cilium Layer 7 network policies for API-level access control
- Secrets CSI driver for runtime secret injection
- Admission controller blocking images with critical CVEs
- Software Bill of Materials (SBOM) generated for all images
- eBPF-based runtime enforcement (Tetragon) blocking unauthorized processes
- Multi-tenancy isolation with hierarchical namespaces
- Node-level security with SELinux/AppArmor profiles
Security is a continuous process requiring regular audits, patch management, and threat intelligence integration. Automate security scanning in CI/CD pipelines, enforce policies with admission controllers, and implement monitoring to detect runtime anomalies before they escalate to breaches. The layered security approach outlined in this guide provides production-ready hardening strategies protecting Kubernetes clusters against evolving threats while maintaining developer productivity through policy automation and GitOps workflows.
Related Articles
GraphQL API Design - Production Architecture and Best Practices for Scalable Systems
Master GraphQL API design covering schema design principles, resolver optimization, N+1 query prevention with DataLoader, authentication and authorization patterns, caching strategies, error handling, and production deployment for high-performance GraphQL systems.
Testing Strategies - Unit, Integration, and E2E Testing Best Practices for Production Quality
Comprehensive guide to testing strategies covering unit tests, integration tests, end-to-end testing, test-driven development, mocking patterns, testing pyramid, and production testing practices for reliable software delivery.
Monitoring and Observability - Production Systems Performance and Debugging at Scale
Master monitoring and observability covering metrics collection with Prometheus, distributed tracing with OpenTelemetry, log aggregation, alerting strategies, SLOs/SLIs, and production debugging techniques for reliable systems.
Written by StaticBlock Editorial
StaticBlock Editorial is a technical writer and software engineer specializing in web development, performance optimization, and developer tooling.