Series
K8s with Divine
Kubernetes deep dives covering the operational details most engineers miss, eviction order, resource requests, DaemonSets, and more.
9 posts so far
Upgrading Kubernetes in Production Without Downtime. The Order of Operations Is Everything.
The commands are the easy part. Sequencing them so workloads stay live is what matters. Pre-flight checks, control plane first, drain with PDBs, plus the gotchas managed-K8s docs leave out.
6 min read
I Built a Production-Shaped EKS Cluster with Terraform. Here's Everything That Bit Me.
From-scratch EKS with Terraform. Subnet placement, OIDC + IRSA, cross-account ENIs, and the two settings that hide until kubectl times out from your laptop.
7 min read
Kubernetes Service Accounts Should Be Boring. Most Teams Make Them a Risk.
Every pod gets a service account token mounted by default. That token is an identity, and identities can be escalated. Here's how to lock it down.
2 min read
etcd Is the Brain of Your Cluster, Here's My 10-Minute Backup Routine
etcd is your cluster's source of truth, every Secret, deployment, and config lives there. Here's the 10-minute backup routine I set once and never skip.
2 min read
Kubernetes Will Evict Your Pods in a Specific Order
Most engineers think it's random. It's not. Pod eviction order is determined by QoS class, and it decides what gets killed first when a node runs out of resources.
2 min read
DaemonSets Aren't Just for Logging, Three Production Use Cases
Most engineers think DaemonSets are for logs. They're for any node-level concern, monitoring, network policy, runtime security.
2 min read
We Didn't Set Resource Requests on Our Pods in Production. Here's Exactly What Happened.
Without requests, the scheduler places pods blindly and noisy neighbors take down everything on the node. Without limits, there's no ceiling.
2 min read
PVC to PV Is a One-to-One Relationship, Here's What That Means in Production
Two storage behaviors catch people completely off guard in production: the sizing trap, and the Released state trap. Know both before they bite you.
2 min read
The First Thing I Check When a Pod Is in CrashLoopBackOff. It's Not the Logs.
Logs only help if the app started long enough to produce output. The exit code tells you why it died at the OS level, before logs even existed.
1 min read