Orchestration (Kubernetes)
The core K8s objects (Pod, Deployment, Service, Ingress, ConfigMap, Secret) and a hands-on deploy on local K8s.
Kubernetes runs containers for you across a cluster of machines: you declare the desired state (“I want 3 replicas of this image, reachable on port 80”), and a control loop continuously works to make reality match. That reconciliation loop is the whole idea — you never tell Kubernetes how to get there, only what you want, and it diffs and converges. The API is a small set of objects that compose; learn the six below and how they stack and you can read and write almost any manifest.
How the objects stack
Read it bottom-up: a Deployment manages Pods; a Service gives those Pods a stable address; an Ingress routes outside HTTP traffic to the Service; ConfigMaps/Secrets feed configuration into the Pods.
| Object | Solves | Key fields | Watch out |
|---|---|---|---|
| Pod | Running one+ co-located containers | containers, volumes | Ephemeral + no self-healing alone — don't create directly |
| Deployment | Replicas, self-healing, rolling updates | replicas, selector, template, strategy | selector labels must match the Pod template labels |
| Service | Stable endpoint + LB over Pods | selector, ports, type | ClusterIP is internal-only; need LoadBalancer/Ingress for external |
| Ingress | External HTTP routing + TLS | rules (host/path), tls | Does nothing without an ingress controller deployed |
| ConfigMap | Non-secret config, decoupled from image | data | Plain text — never put credentials here |
| Secret | Sensitive data injection | data (base64), stringData | base64 ≠ encryption — enable encryption-at-rest + RBAC |
Deployment vs StatefulSet — stateless or not
A Deployment treats its Pods as interchangeable cattle: any replica is as good as any other, they get random names, and rolling updates swap them freely. That’s exactly right for a stateless web tier — but wrong for databases, queues, or anything where each replica has a stable identity and its own persistent disk. For those you reach for a StatefulSet, which gives Pods stable, ordered names (db-0, db-1) and stable per-Pod storage.
| Aspect | Deployment | StatefulSet |
|---|---|---|
| Pod identity | interchangeable, random names | stable, ordered (app-0, app-1) |
| Storage | shared or none | stable per-Pod PersistentVolume |
| Scaling/updates | any order, all at once | ordered, one at a time |
| Use for | stateless web/API services | databases, Kafka, anything with per-replica state |
A deployable manifest set
Here is the full set the source asks you to deploy hands-on — Deployment + Service + Ingress + ConfigMap + Secret — for a simple web app.
apiVersion: v1
kind: ConfigMap
metadata:
name: web-config
data:
APP_GREETING: "hello from k8s"
LOG_LEVEL: "info"
---
apiVersion: v1
kind: Secret
metadata:
name: web-secret
type: Opaque
stringData: # stringData is auto-base64-encoded for you
DATABASE_PASSWORD: "s3cr3t"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
spec:
replicas: 3
selector:
matchLabels: { app: web } # must match template labels below
template:
metadata:
labels: { app: web }
spec:
containers:
- name: web
image: myorg/web:1.4.2
ports:
- containerPort: 8080
envFrom:
- configMapRef: { name: web-config }
- secretRef: { name: web-secret }
readinessProbe:
httpGet: { path: /healthz, port: 8080 }
---
apiVersion: v1
kind: Service
metadata:
name: web
spec:
selector: { app: web } # routes to Pods with label app=web
ports:
- port: 80
targetPort: 8080
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web
spec:
rules:
- host: web.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web
port: { number: 80 }The chain: external request hits web.local, the Ingress routes it to the Service on port 80, the Service load-balances to one of the 3 Deployment Pods on targetPort 8080. Each Pod gets APP_GREETING/LOG_LEVEL from the ConfigMap and DATABASE_PASSWORD from the Secret as environment variables. The readinessProbe keeps a Pod out of the Service’s rotation until /healthz passes — so rollouts don’t send traffic to a Pod that isn’t ready yet.
Hands-on: deploy on local Kubernetes
You don’t need a cloud account to practice — kind (Kubernetes in Docker) or minikube spin up a real cluster on your laptop. The loop:
kind create cluster # or: minikube start
kubectl apply -f manifests.yaml # create everything declaratively
kubectl get pods,svc,ingress # watch it converge to desired state
kubectl rollout status deploy/web # confirm the rollout finished
kubectl logs deploy/web # tail app logs
kubectl rollout undo deploy/web # roll back if a release is bad
kubectl apply is declarative — you describe the end state and Kubernetes diffs and reconciles. Editing replicas: 3 to 5 and re-applying scales up; bumping the image tag triggers a rolling update (new Pods come up and pass readiness before old ones are torn down). For an Ingress to actually serve traffic, install an ingress controller first (e.g. kubectl apply the nginx-ingress manifests, or minikube addons enable ingress).
01 Learning objectives
0 / 2 done02 Curated reading
03 Knowledge check
- 01medium
Which K8s object gives a stable network endpoint for a set of pods?
- 02hard
A Kubernetes Secret encrypts its data by default.
04 Interview questions
browse all ↗What gets asked on this topic — tap a card for how to approach it, the follow-ups, and the trap. Company tags are best-effort & sourced.
-
Explain the core Kubernetes objects: Pod, Deployment, Service, and Ingress. How do they relate?
A Pod is the smallest deployable unit — one or more containers sharing a network namespace and storage. Pods are ephemeral; you rarely create them directly.
A Deployment is the controller you actually use: you declare a desired replica count and a pod template, and it manages a ReplicaSet to keep that many pods running, replacing crashed ones and handling rolling updates.
A Service gives that fluid set of pods a single stable virtual IP and DNS name, load-balancing across the matching pods (selected by labels) so callers do not chase changing pod IPs.
Ingress sits in front of Services to route external HTTP(S) traffic — host/path routing and TLS termination — to the right Service. So: Ingress -> Service -> Pods, with the Deployment keeping the pods alive underneath.
Follow-ups they push on- How does a Service know which pods to send traffic to?
- What is the difference between a Service of type ClusterIP, NodePort, and LoadBalancer?
Red flag Conflating a Service with an Ingress — a Service does L4 load-balancing inside the cluster, Ingress does L7 HTTP routing and TLS at the edge.
source: Kubernetes docs — Concepts ↗ -
What is the difference between a ConfigMap and a Secret? Is a Secret actually encrypted?
Both inject configuration into pods (as env vars or mounted files) and both keep config out of the image. The difference is intent: ConfigMaps hold non-sensitive config (feature flags, URLs); Secrets hold sensitive values (passwords, tokens, keys).
The gotcha: a Secret is only base64-encoded, not encrypted — base64 is trivially reversible, so anyone who can read the Secret object sees the value. To actually protect Secrets you must enable encryption-at-rest for etcd, lock down access with RBAC, and avoid committing Secret manifests to git. Many teams go further with an external secret store (Vault, cloud secret managers) and pull values in at runtime.
Follow-ups they push on- What two things must you configure to make Secrets meaningfully secure?
- Why is putting a Secret YAML in git dangerous even though it 'looks encoded'?
Red flag Claiming a Kubernetes Secret is encrypted by default — it is base64, which is encoding, not encryption. Without encryption-at-rest + RBAC it offers essentially no confidentiality.
source: Kubernetes docs — Secrets ↗ -
What is the difference between a liveness probe and a readiness probe? What breaks if you confuse them?
A liveness probe answers 'is this container healthy?' If it fails, the kubelet restarts the container. A readiness probe answers 'can this pod take traffic right now?' If it fails, the pod is pulled out of the Service's endpoints but is NOT restarted.
Use readiness for slow startup or temporary unavailability (warming a cache, waiting on a dependency); use liveness only for unrecoverable hangs.
The classic mistake: pointing a liveness probe at a deep health check that also depends on a database. When the DB hiccups, every pod fails liveness and gets restarted simultaneously — turning a transient blip into a full self-inflicted outage. There is also a startupProbe for slow-booting apps so liveness does not kill them before they finish starting.
Follow-ups they push on- Why should a liveness probe usually NOT check downstream dependencies?
- When would you add a startupProbe?
Red flag Using a liveness probe that depends on a database or downstream service — a transient outage then triggers a restart storm across all pods, amplifying the incident instead of riding it out.
source: Kubernetes docs — Configure Liveness, Readiness and Startup Probes ↗ -
How does a rolling update work in a Deployment, and how do you roll back a bad release?
When you change a Deployment's pod template, the Deployment controller creates a new ReplicaSet and shifts pods gradually: it scales the new ReplicaSet up and the old one down, governed by
maxSurge(how many extra pods above desired during the update) andmaxUnavailable(how many can be missing). With readiness probes in place, traffic only moves to new pods once they report ready, so there is no downtime.Kubernetes keeps the old ReplicaSets around, so rollback is just
kubectl rollout undo deployment/<name>— it scales the previous ReplicaSet back up. You watch progress withkubectl rollout status. TunemaxSurge/maxUnavailableto trade rollout speed against capacity headroom.Follow-ups they push on- What do maxSurge and maxUnavailable control?
- Why does a rolling update need readiness probes to be safe?
- How is a rolling update different from blue-green or canary?
Red flag Rolling out without readiness probes — Kubernetes considers a pod 'available' as soon as the container starts and sends it traffic before the app can actually serve, causing a wave of errors mid-rollout.
source: Kubernetes docs — Performing a Rolling Update ↗ -
A pod is stuck in CrashLoopBackOff. Walk me through how you debug it.
CrashLoopBackOff means the container keeps starting and exiting, and Kubernetes is backing off between restarts. Work the evidence:
kubectl describe pod <pod>— read the Events and the last container state (exit code, OOMKilled, reason).kubectl logs <pod> --previous— the logs from the crashed instance (current logs may be empty because it just restarted).Common causes: the app crashes on startup (bad config / missing env var / unreachable dependency — visible in logs); exit code 137 / OOMKilled means it exceeded its memory limit (raise the limit or fix the leak); a failing liveness probe restarting a healthy-but-slow app (add a startupProbe); or a bad image/command. Fix the root cause rather than just bumping restart limits.
Follow-ups they push on- Why use `kubectl logs --previous` here?
- What does exit code 137 tell you?
Red flag Reading only `kubectl logs <pod>` (which shows the freshly restarted container, often empty) instead of `--previous`, and missing that an OOMKill or a too-aggressive liveness probe is the actual cause.
source: Kubernetes docs — Debug Running Pods ↗ -
What is the difference between resource requests and limits, and how do they affect scheduling and stability?
A request is the amount of CPU/memory a container is guaranteed; the scheduler uses requests to decide which node a pod fits on. A limit is the hard ceiling the container may not exceed.
The behaviors differ by resource. Exceed a memory limit and the container is OOMKilled. Exceed a CPU limit and the container is throttled (slowed), not killed. If you set no requests, the scheduler packs pods blindly and nodes get oversubscribed; if requests are far below real usage, you overcommit and nodes thrash. The senior point is the QoS class: pods with requests == limits are Guaranteed and evicted last under node memory pressure; pods with no requests/limits are BestEffort and evicted first.
Follow-ups they push on- What happens when a container exceeds its CPU limit vs its memory limit?
- How do requests and limits determine a pod's QoS class and eviction order?
Red flag Setting limits without requests (or omitting both) — the scheduler cannot reason about capacity, leading to oversubscribed nodes and BestEffort pods that are the first to be evicted under pressure.
source: Kubernetes docs — Resource Management for Pods and Containers ↗ -
Walk me through what happens, end to end, when you run `kubectl apply -f deployment.yaml`.
kubectlsends the manifest to the API server, which authenticates, authorizes (RBAC), runs admission controllers, and persists the desired state to etcd. Nothing is running yet — you have only recorded intent.Controllers then reconcile. The Deployment controller sees a new Deployment and creates a ReplicaSet; the ReplicaSet controller creates Pod objects to reach the desired replica count. The scheduler watches for unscheduled pods and binds each to a suitable node based on requests, affinity, and taints. On each chosen node, the kubelet sees a pod assigned to it, pulls the image, and starts the container via the container runtime, reporting status back to the API server.
The whole system is a declarative control loop: you state the desired state, and independent controllers continuously drive the actual state toward it.
Follow-ups they push on- Which component decides which node a pod runs on?
- Why is this described as a reconciliation/control loop rather than imperative execution?
Red flag Describing it as imperative ('kubectl starts the container') — kubectl only records desired state; controllers and the kubelet asynchronously reconcile reality toward it.
source: Kubernetes docs — Kubernetes Components ↗ -
What are the differences between a Service of type ClusterIP, NodePort, and LoadBalancer?
They form a ladder of increasing external exposure, and each builds on the previous.
ClusterIP (the default) gives the Service a stable virtual IP reachable only inside the cluster — perfect for service-to-service traffic that should never be public. NodePort opens a fixed high port (30000–32767) on every node, so external traffic to
nodeIP:nodePortreaches the Service; it builds on ClusterIP and is mostly a dev/debug or building-block mechanism, not a polished production front door. LoadBalancer provisions an external cloud load balancer (an AWS NLB/ALB, a GCP LB) that fronts the Service with a single external IP — the production way to expose one Service to the internet.The senior nuance: one LoadBalancer per Service gets expensive, so for many HTTP services you front them with a single Ingress (L7 routing/TLS) backed by one load balancer instead of a LoadBalancer Service each.
What a strong answer coversClusterIP (default): internal-only stable virtual IP — service-to-service traffic.
NodePort: opens a fixed port on every node; builds on ClusterIP, mainly dev/building-block.
LoadBalancer: provisions a cloud load balancer with an external IP — production single-service exposure.
Each type is a superset of the previous (LoadBalancer → NodePort → ClusterIP under the hood).
Many HTTP services? Use one Ingress instead of a LoadBalancer per Service to save cost.
Quick self-checkYou need internal-only communication between two microservices in the cluster. Which Service type?
-
Correct — it gives a stable in-cluster virtual IP with no external exposure, exactly right for service-to-service traffic.
-
Opens a port on every node to the outside world — unnecessary exposure for purely internal traffic.
-
Provisions an external cloud load balancer — overkill and externally exposed for internal-only traffic.
-
Just maps the Service to an external DNS name; not how you connect two in-cluster services.
Follow-ups they push on- Why would you front many services with an Ingress instead of a LoadBalancer each?
- What range do NodePorts fall in, and why isn't NodePort a great production front door?
- How does a LoadBalancer Service actually get its external IP?
Red flag Reaching for a LoadBalancer Service per microservice — each provisions (and bills for) a separate cloud load balancer; route many HTTP services through a single Ingress instead.
source: Kubernetes docs — Service (publishing types) ↗ -
How does the Horizontal Pod Autoscaler work, and why does it need resource requests set?
The HPA is a control loop (default every 15s) that scales a Deployment's replica count up or down to keep an observed metric near a target. The classic case: target 50% average CPU. It reads current per-pod usage from the metrics server and applies roughly
desiredReplicas = ceil(currentReplicas × currentMetric / targetMetric).The catch interviewers probe: CPU/memory targets are expressed as a percentage of the pod's resource request. If you set no CPU request, there is no denominator, so the HPA cannot compute utilization and will not scale on CPU. So requests are a prerequisite, not optional.
Discuss the rest: HPA changes *replica count* (horizontal), distinct from the Vertical Pod Autoscaler which resizes a pod; it can scale on custom/external metrics (queue depth, RPS) not just CPU; and you add a stabilization window to prevent flapping (rapid scale up/down thrash) on noisy metrics.
What a strong answer coversHPA control loop adjusts replica count to keep a metric near target:
ceil(replicas × current/target).CPU/memory targets are a percentage of the pod's request — no request means no denominator, no scaling.
Horizontal (more pods) vs Vertical Pod Autoscaler (bigger pods) — different tools.
Can scale on custom/external metrics (queue depth, RPS), not just CPU.
A stabilization window prevents flapping on noisy/bursty metrics.
Follow-ups they push on- Why does an HPA on CPU silently do nothing if you forgot to set CPU requests?
- When would you scale on a custom metric like queue length instead of CPU?
- How is HPA different from the cluster autoscaler?
Red flag Configuring an HPA on CPU but omitting CPU resource requests — utilization is computed relative to the request, so with no request the HPA has nothing to divide by and never scales.
source: Kubernetes docs — Horizontal Pod Autoscaling ↗ -
What is a StatefulSet, and how is it different from a Deployment? When do you need one?
A Deployment treats its pods as interchangeable, fungible replicas — random names, no stable identity, no per-pod storage. That is exactly right for stateless app servers.
A StatefulSet gives each pod a stable, sticky identity: a stable ordinal name (
db-0,db-1), stable network identity (a headless Service gives each a predictable DNS name), and its own persistent volume that survives reschedule and follows the pod. Pods are created/scaled/terminated in order (0, 1, 2 …), which matters for clustered systems that need a known startup/teardown sequence.You need a StatefulSet for stateful, clustered workloads where identity matters: databases, Kafka, ZooKeeper, Elasticsearch — anything where pod
db-0must keep beingdb-0with the same data. For stateless web/API tiers, always use a Deployment. The senior caveat: running databases in-cluster at all is a real decision; many teams prefer a managed database over a StatefulSet.What a strong answer coversDeployment pods are fungible (random names, shared/no per-pod storage) — for stateless apps.
StatefulSet gives each pod a stable ordinal identity (
db-0), stable DNS, and its own PVC.Pods come up / scale / terminate in order, which clustered systems rely on.
Use it for databases, Kafka, ZooKeeper, Elasticsearch — workloads where identity + data stick to the pod.
Caveat: consider a managed database instead of running stateful systems in-cluster.
Quick self-checkWhich workload genuinely requires a StatefulSet rather than a Deployment?
-
Correct — stable ordinal identity, stable DNS, and per-pod persistent storage are exactly what a StatefulSet provides.
-
Pods are interchangeable with no per-pod state — a Deployment is the right (and simpler) choice.
-
That is a Job, not a long-running StatefulSet.
-
That is a CronJob; it needs no stable pod identity or storage.
Follow-ups they push on- Why does a database need stable identity and per-pod storage that a web server doesn't?
- What role does the headless Service play for a StatefulSet?
- When would you avoid a StatefulSet and use a managed service instead?
Red flag Running a stateful, clustered system (a database, Kafka) under a plain Deployment — pods get random identities and can share/lose storage, so a rescheduled pod comes back as a different node with the wrong (or no) data.
source: Kubernetes docs — StatefulSets ↗ -
How do you control which node a pod lands on? Explain taints/tolerations vs node affinity.
Two mechanisms that work from opposite directions. Node affinity (and the simpler
nodeSelector) is a pod-side attraction: the pod says 'schedule me on nodes with labelgpu=true'. It can be hard (requiredDuringScheduling) or soft/preferred.Taints and tolerations are a node-side repulsion: you taint a node (
kubectl taint nodes node1 gpu=true:NoSchedule) so it repels all pods by default, and only pods that carry a matching toleration are allowed on. So a taint reserves a node; a toleration is a pod's permission slip to land on a tainted node.The key distinction: affinity *attracts* a pod toward nodes; a taint *repels* pods away from a node unless they tolerate it — and a toleration alone does not *force* a pod onto that node (you pair it with affinity for that). Use taints to dedicate expensive/special nodes (GPU, spot) and affinity to steer pods toward the right hardware; add pod anti-affinity to spread replicas across nodes/zones for HA.
What a strong answer coversNode affinity / nodeSelector: pod-side *attraction* toward nodes with matching labels.
Taints: node-side *repulsion* — a tainted node rejects pods unless they tolerate the taint.
Tolerations: a pod's permission to schedule onto a tainted node (but doesn't force it there).
Combine: taint dedicates a node (GPU/spot), affinity steers the right pods to it.
Pod anti-affinity spreads replicas across nodes/zones for availability.
Follow-ups they push on- Why doesn't a toleration alone guarantee a pod runs on the tainted node?
- How would you dedicate GPU nodes so only ML workloads land there?
- How does pod anti-affinity improve availability?
Red flag Assuming a toleration *attracts* a pod to a tainted node — a toleration only lets the pod tolerate the taint; to actually steer it there you also need node affinity/nodeSelector.
source: Kubernetes docs — Taints and Tolerations ↗ -
What is a namespace in Kubernetes, and what problems does it actually solve (and not solve)?
A namespace is a virtual cluster-within-a-cluster: a scope for naming and a boundary for applying policy. It lets you partition one physical cluster among teams or environments (
team-a,staging) so names don't collide and you can attach ResourceQuotas (cap CPU/memory per namespace), RBAC (who can do what, where), and NetworkPolicies per slice.What it is good for: organization, quota, and access control on a shared cluster. What it is not: a hard security/isolation boundary. By default, pods in different namespaces can still reach each other over the network — namespaces alone do not isolate traffic; you need NetworkPolicies for that. And some objects are cluster-scoped (nodes, PersistentVolumes, namespaces themselves), so they live outside any namespace.
The senior point: namespaces are an organizational and policy primitive, not a substitute for multi-tenancy isolation between untrusted parties.
What a strong answer coversA namespace scopes names and is the unit for ResourceQuota, RBAC, and NetworkPolicy.
Great for partitioning a shared cluster by team or environment.
Not a network isolation boundary — cross-namespace pod traffic is allowed by default.
Use NetworkPolicies to actually restrict traffic between namespaces.
Some objects are cluster-scoped (nodes, PVs, namespaces) and aren't namespaced.
Quick self-checkBy default, can a pod in namespace `a` reach a pod in namespace `b` over the network?
-
Correct — namespaces scope names and policy objects, but pod-to-pod traffic is open across them unless restricted.
-
Wrong — that isolation requires explicit NetworkPolicies, not namespaces alone.
-
Deployment membership has nothing to do with cross-namespace network reachability.
-
Regular pods can reach across namespaces by default; this is simply incorrect.
Follow-ups they push on- Why don't namespaces stop pods in different namespaces from talking to each other?
- What do you add to get real network isolation between namespaces?
- Name a couple of resources that are cluster-scoped, not namespaced.
Red flag Treating namespaces as a security boundary for untrusted tenants — without NetworkPolicies (and often stronger isolation), pods across namespaces can still reach each other on the network.
source: Kubernetes docs — Namespaces ↗ -
Why do you set both a readiness probe and a preStop hook + terminationGracePeriod for zero-downtime shutdown?
When a pod is deleted (a rolling update, a scale-down), two things happen in parallel, which is the source of the race: Kubernetes sends the container
SIGTERM, and it (asynchronously) removes the pod from Service endpoints. Because endpoint removal propagates through kube-proxy/iptables with a small delay, the load balancer can keep sending new requests to a pod that has already started shutting down — causing dropped connections mid-rollout.The fix is to give that propagation time to win the race. A
preStophook that sleeps a few seconds delays the actual shutdown so in-flight endpoint removal completes before the app stops accepting connections. TheterminationGracePeriodSecondsmust be long enough to cover the preStop sleep plus the app draining in-flight requests after SIGTERM, before Kubernetes escalates to SIGKILL. Readiness probes handle the *startup* side (no traffic until ready); preStop + grace period handle the *shutdown* side.The app must also handle SIGTERM to stop accepting new work and finish in-flight requests — otherwise it gets SIGKILLed and drops connections regardless.
What a strong answer coversOn pod deletion, SIGTERM and endpoint removal happen in parallel — that's the race.
Endpoint removal propagates with a delay, so traffic can still arrive at a terminating pod.
A
preStopsleep delays shutdown until endpoint removal propagates (drains the LB).terminationGracePeriodSecondsmust cover preStop + in-flight drain before SIGKILL.The app must catch SIGTERM and finish in-flight requests, or it gets force-killed.
Follow-ups they push on- Why can a pod still receive traffic after it gets SIGTERM?
- What happens if the grace period is shorter than your preStop + drain time?
- Why isn't a readiness probe alone enough for graceful shutdown?
Red flag Relying on SIGTERM handling alone and skipping the preStop delay — endpoint removal hasn't propagated yet, so the load balancer keeps routing new requests to the dying pod and connections drop mid-rollout.
source: Kubernetes docs — Pod Lifecycle (termination) ↗