Kubernetes Security Guide: High-Level K8s Hardening Guide

This introduction combines best practices from the CNCF, NSA, and CISA to help organizations mitigate risks and adopt a multi-layered security approach.

As more organizations have begun to embrace cloud-native technologies, Kubernetes adoption has become the industry standard for container orchestration. This shift toward Kubernetes has largely automated and simplified the deployment, scaling, and management of containerized applications, providing numerous benefits over legacy management protocols for traditional monolithic systems. However, securely managing Kubernetes at scale comes with a unique set of challenges, including hardening the cluster, securing the supply chain, and detecting threats at runtime.

This introduction to Kubernetes security combines best practices from the Cloud Native Computing Foundation (CNCF), National Security Agency (NSA), and Cybersecurity and Infrastructure Security Agency (CISA) to help organizations mitigate risks and adopt a multi-layered security approach.

Cluster Setup and Hardening

Securing a Kubernetes environment starts with hardening the cluster. For users of a managed Kubernetes service (e.g., GKE, EKS, AKS), the respective cloud provider manages the security of the master node and implements various secure-by-default settings for the cluster. GKE Autopilot takes additional measures, implementing GKE hardening guidelines and GCP security best practices. But even for GKE standard or EKS/AKS users, there are guidelines maintained by the cloud providers to secure access to the Kubernetes API server, container access to cloud resources, and Kubernetes upgrades:

GKE Hardening Guide

EKS Best Practices Guide for Security

AKS Cluster Security

For self-managed Kubernetes clusters (e.g., kube-adm, kops), kube-bench can be used to test whether the cluster meets the security guidelines laid out in the CIS Kubernetes Benchmark. Key recommendations include encrypting the secrets stored in etcd at rest, protecting control plane communication with TLS certificates, and turning on audit logging.

Network and Resource Policies

By default, Kubernetes allows communication from any pod to another pod within the same cluster. While this is ideal for service discovery, it provides zero network separation, allowing bad actors or compromised systems unlimited access to all resources. This becomes extremely problematic for teams using namespaces as the primary means of multi-tenancy inside Kubernetes.

To control the traffic flow between pods, namespaces, and external endpoints, use a CNI plugin that supports the Network Policy API (e.g., Calico, Flannel, or cloud-specific CNI) for network isolation. Following the zero-trust model, the best practice is to implement a default deny-all policy to block all ingress and egress traffic unless it is specifically allowed by another policy.

In addition to network policies, Kubernetes provides two resource-level policies: LimitRange and ResourceQuotas. LimitRanges can be used to constrain individual resource usage (e.g., max 2 CPUs per pod), whereas ResourceQuota controls the aggregate resource usage (e.g., a total of 20 CPU in the dev namespace).

RBAC and Service Accounts

With strong network and resource policies in place, the next step is to enforce RBAC authorization to restrict access. Kubernetes admins can enforce RBAC to users and groups to access the cluster, as well as to restrict services from accessing resources within and external to the cluster (e.g., cloud-hosted databases).

Exercise caution in using the default service account that is mounted to every pod upon creation. Depending on the permissions given to the default service account, the pod may be granted more permissions than required. If the service does not require any specific communication with Kubernetes service, set automountServiceAccountToken to false to prevent mounting.

System Hardening

Now that the cluster is secure, the next step is to minimize the attack surface on the systems. This applies to the OS running on the nodes as well as the kernel on the containers. Instead of general-purpose Linux nodes, opt for a specialized OS optimized for running containers, such as AWS Bottle rocket or GKE COS.

Next, take advantage of Linux kernel security features, such as SELinux, AppArmor (beta since 1.4), and/or seccomp (stable since 1.19). AppArmor defines the permissions for a Linux user or group to confine programs to a limited set of resources. Once an AppArmor profile is defined, pods with AppArmor annotations will enforce those rules.

apiVersion: v1

kind: Pod
metadata:
  name: apparmor
  annotations:
	container.apparmor.security.beta.kubernetes.io/hello: localhost/k8s-apparmor-example-deny-write
spec:
  containers:
  - name: hello
	image: busybox
	command: [ "sh", "-c", "echo 'Hello AppArmor!' && sleep 1h" ]

Seccomp, on the other hand, restricts a container’s syscalls. As long as seccomp profiles are available on the underlying Kubernetes node, seccomp profiles can be defined under the securityContext section:

apiVersion: v1
kind: Pod
metadata:
name: audit-pod
labels:
app: audit-pod
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/audit.json
containers:
- name: test-container
image: hashicorp/http-echo:0.2.3
args:
- "-text=just made some syscalls!"

Even if seccomp profiles are not available, users can still restrict the container from various privilege escalation attacks. Under security contexts, Kubernetes allows configuring whether the container can run as privileged, root, or escalate privileges to root. Users can also restrict hostPID, hostIPC, hostNetwork, and hostPaths. All of these settings can be enforced via the Pod Security Policy (which was deprecated in v1.21) or with other open-source tools, such as K-Rail, Kyverno, and OPA/Gatekeeper.

Finally, if additional security assurance is required, a custom RuntimeClass can be configured to take advantage of hardware virtualization (e.g., gVisor, Kata). Define the RuntimeClass at the node level and specify it under the pod definition:

apiVersion: node.k8s.io/v1 # RuntimeClass is defined in the node.k8s.io API group
kind: RuntimeClass
metadata:
  name: myclass # The name the RuntimeClass will be referenced by
  # RuntimeClass is a non-namespaced resource
handler: myconfiguration # The name of the corresponding CRI configuration

---
apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  runtimeClassName: myclass

Supply Chain Security

Even if the cluster and system are secure, to ensure end-to-end security of the entire application, the supply chain must also be taken into consideration. For applications developed in house, follow the best practices for creating containers. Namely, use a minimal base image to reduce the attack surface, pin package versions, and use multi-stage builds to create small images. Also, define a non-root user the container must run with, or build rootless containers with podman to restrict root access.

Next, scan all images for vulnerabilities using open-source tools (e.g., Trivy, Clair, Anchore) or commercial tools (e.g., Xray from Artifactory or container scanning on cloud provider build process). Some tools also allow signing images and verifying the signatures to ensure that the containers were not tampered with during the build and upload process. Finally, define whitelisted registries that Kubernetes can pull images from using ImagePolicyWebhook or any policy enforcement tool mentioned above.

Monitoring, Logging, and Runtime Security

At this point, we have a secure cluster with a locked-down supply chain that produces clean, verified images with limited permissions. However, environments are dynamic, and security teams must be able to respond to incidents in running environments. First and foremost, ensure immutability of containers at runtime by setting readOnlyRootFilesystem to true and storing tmp log files to an emptyDir.

On top of the typical application monitoring (e.g., Prometheus/Grafana) or logging (e.g., EFK), analyse syscall processes and Kubernetes API logs using Falco or Sysdig. Both tools can parse Linux system calls from the kernel at runtime and trigger alerts when a rule is violated. Example rules include alerting when privilege escalation occurs, when read/write events are detected on well-known directories, or when a shell is invoked. Finally, integrate Kubernetes API audit logs with existing log aggregation and alerting tools to monitor all activities in the cluster. These include API request history, performance metrics, deployments, resource consumption, OS calls, and network traffic.

Conclusion

Due to the complex nature of cloud-native systems, a multi-layered approach is required to secure a Kubernetes environment. Kubernetes recommends the 4Cs of cloud-native security: cloud, cluster, container, and code. Start by hardening the cluster and following best practices for cloud security. Then, lock down the container, reduce the attack surface, limit access, and ensure immutability at runtime. Next, secure the supply chain and analyse the code and container for vulnerabilities. Finally, monitor all activity at runtime to build defence into every layer of your software running inside Kubernetes.

ZippyOPS Provide consulting, implementation, and management services on DevOps, DevSecOps, Cloud, Automated Ops, Microservices, Infrastructure, and Security

Services offered by us: https://www.zippyops.com/services

Our Products: https://www.zippyops.com/products

Our Solutions: https://www.zippyops.com/solutions

For Demo, videos check out YouTube Playlist:

https://www.youtube.com/watch?v=4FYvPooN_Tg&list=PLCJ3JpanNyCfXlHahZhYgJH9-rV6ouPro

If this seems interesting, please email us at [email protected] for a call.

Relevant Blogs:

Kubernetes Autoscaling

Kubernetes AWS Integration

Kubernetes Hardway

Kubernetes monitoring