1. Node NotReady – how to diagnose?

Check kubelet status, node conditions, disk pressure, and network plugin. Use kubectl describe node, journalctl -u kubelet, and verify CNI pod health.

2. Pod stuck in CrashLoopBackOff

Inspect logs (kubectl logs), previous logs, events, and container exit codes. Common causes: missing config, startup probe failure, or OOMKilled.

3. Upgrade control plane without downtime

Use kubeadm upgrade plan, drain control plane (if multi-master), upgrade kubeadm/kubelet/kubectl, apply upgrade, uncordon. For single-master, expect brief API downtime.

4. Persistent volume stuck in Pending

Check StorageClass, PVC size vs PV, access modes, and CSI driver. kubectl describe pvc reveals events. May need manual PV binding or dynamic provisioning fix.

5. Pods can't reach service via cluster IP

Verify kube-proxy mode (iptables/ipvs), endpoint slices, network policy, and DNS. Use kubectl get endpoints, iptables-save, or test connectivity with a debug pod.

6. Restore etcd cluster after backup

Stop API server, restore snapshot using etcdctl snapshot restore, reconfigure member, restart etcd and API server. For kubeadm, use etcd member list and replace data dir.

7. Pods across nodes can't communicate

Check CNI (Calico/Flannel) configuration, overlay network, firewall rules (ports 8285/8472), and IP pools. Also verify podCIDR ranges don't overlap and host routes exist.

8. Scale HPA not working

Check metrics-server, HPA metrics (kubectl describe hpa), target average utilization, and custom metrics API. Ensure resource requests are set; also verify no scaleTargetRef errors.

9. Certificates expired – recovery

Backup old certs, run kubeadm certs renew all, restart control plane components. Or use kubeadm alpha certs renew. For external CA, manually rotate and update secrets.

10. Pod security – restrict privileged containers

Implement Pod Security Admission (PSA) with enforce: baseline/restricted, or use OPA/Gatekeeper. Set PodSecurityPolicy (deprecated) or admission webhooks. Also use seccomp and AppArmor profiles.

Table of Contents

10 Scenario-Based Kubernetes Interview Questions for Administrators

By AEM Institute Kolkata – India’s leading training hub for DevOps & Cloud Native technologies | Updated: April 2026

🎯 Best Kubernetes Interview Questions and Answer by AEM Institute Kolkata

Kubernetes has become the de‑facto orchestration platform for containerized applications. As a Kubernetes Administrator, you are expected to troubleshoot real‑world failures, upgrade clusters safely, and secure workloads. AEM Institute Kolkata has curated 10 scenario‑based interview questions that test your hands‑on knowledge – not just theoretical definitions. Each question presents a production‑like problem followed by an expert answer. Use these to prepare for your next K8s admin interview or to evaluate candidates.

1. Node NotReady – The Silent Worker

🎭 Scenario: One of your worker nodes shows kubectl get nodes status NotReady for 10 minutes. Pods that were running on it are stuck in Terminating or Unknown state. How do you systematically diagnose the root cause?

✅ Answer (Administrator’s approach):
First, get node details: kubectl describe node <node-name> – check Conditions (Ready, DiskPressure, MemoryPressure, PIDPressure, NetworkUnavailable). Then SSH into the node and:
• systemctl status kubelet – is it active?
• journalctl -u kubelet -n 50 --no-pager – look for errors like “failed to get sandbox image”, “CNI config uninitialized”, or “node not authorized”.
• Check disk usage: df -h and docker/containerd status.
• Verify that kubelet can reach API server (check kubelet config and certificates).
• If the node is unreachable for >5 min, pods are evicted after pod-eviction-timeout. Restart kubelet or drain/delete the node after fixing underlying issue (e.g., network plugin crash, out of memory).

2. CrashLoopBackOff – The Restarting Nightmare

🎭 Scenario: A developer deployed a new microservice. The pod enters CrashLoopBackOff immediately. You see no obvious error in kubectl logs <pod> because the container restarts too fast. How do you capture the root cause?

✅ Answer:
1. Get previous container logs: kubectl logs <pod> --previous – this shows why the last container exited.
2. Check pod events: kubectl describe pod <pod> – look for Exit Code (0=normal, 1+=error). Common causes: missing environment variable, wrong command, or inability to connect to a dependency (DB, Redis).
3. Temporarily override the command to sleep: kubectl debug <pod> -it --image=busybox -- sleep 3600 (if ephemeral containers are supported) or edit deployment to command: ["sleep","3600"] and then exec in to inspect filesystem and config.
4. Verify liveness/readiness probe settings – an aggressive liveness probe can cause restarts if app takes long to start.
5. If OOMKilled appears, increase memory limits or fix memory leak.

3. Upgrading a Production Control Plane

🎭 Scenario: You manage a 3‑node control plane cluster (kubeadm) running v1.27. The business requires upgrading to v1.28 with zero downtime for the API server. What steps do you follow?

✅ Answer:
For multi‑master HA setup:
• Pre‑upgrade checks: kubeadm upgrade plan to verify compatibility and see the upgrade path.
• Drain one control plane node (move etcd and control plane workloads – but etcd is clustered, so draining is safe as long as quorum remains).
• Upgrade kubeadm on that node: apt-get update && apt-get install kubeadm=1.28.x (or yum).
• Apply upgrade: kubeadm upgrade apply v1.28.x --etcd-upgrade=true (only on first CP).
• Upgrade kubelet and kubectl on that node, then restart kubelet.
• Uncordon the node.
• Repeat the same process for the second and third control plane nodes (using kubeadm upgrade node on subsequent nodes).
• Finally, upgrade worker nodes one by one with drain → upgrade kubeadm/kubelet → uncordon.
• Zero downtime is achieved because API servers are behind a load balancer and at least one control plane remains available during the rolling upgrade. For etcd, maintain ≥2 members online.

4. PersistentVolumeClaim Stuck in Pending

🎭 Scenario: A StatefulSet needs storage. The PVC is created but remains Pending indefinitely. You have a default StorageClass. What troubleshooting steps do you take?

✅ Answer:
• kubectl describe pvc <pvc-name> – events often show the reason (e.g., “no persistent volumes available for this claim”, “storageclass.storage.k8s.io not found”).
• Verify the StorageClass exists: kubectl get sc – check if it is marked default and has a provisioner (like ebs.csi.aws.com, kubernetes.io/gce-pd).
• Ensure the CSI controller pods are running in kube-system.
• Check PVC’s requested storage size and accessModes (RWO, RWX). The underlying storage backend may not support the requested mode.
• If no dynamic provisioning, manually create a PV with matching labels/selectors or fix the StorageClass parameters (e.g., wrong zone, missing cloud permissions).
• For local storage, ensure the node has the expected directory and the local PV provisioner is running.

5. Service Unreachable via ClusterIP

🎭 Scenario: Pod A cannot reach Pod B using the Service’s ClusterIP (e.g., curl http://my-svc.default.svc.cluster.local:8080 times out). Both pods are healthy and endpoints exist. How do you isolate the network issue?

✅ Answer:
• First, confirm endpoints: kubectl get endpoints my-svc – must list pod IPs.
• Test connectivity from a debug pod: kubectl run tmp --rm -it --image=nicolaka/netshoot -- /bin/bash, then curl -v http://<cluster-ip>:port and also curl -v http://<pod-ip>:port (bypass service). If pod IP works but cluster IP does not → issue with kube-proxy.
• On a node, check iptables rules: iptables-save | grep <cluster-ip> or for IPVS: ipvsadm -L -n.
• Verify kube-proxy is running: kubectl get pods -n kube-system | grep kube-proxy. Look at its logs.
• Ensure no NetworkPolicy blocks ingress/egress. Also check if the service selector matches pod labels exactly.
• If DNS is the issue, test with nslookup my-svc.default.svc.cluster.local from the debug pod.

6. etcd Snapshot Recovery After Disaster

🎭 Scenario: Your etcd database got corrupted after a sudden power outage. You have a recent snapshot backup (snapshot.db). Walk through the exact steps to restore the entire cluster (single etcd member).

✅ Answer:
For a single‑master cluster (kubeadm):
1. Stop kube-apiserver and etcd: systemctl stop kube-apiserver etcd (or move manifests from /etc/kubernetes/manifests).
2. Restore snapshot using etcdctl (v3): ETCDCTL_API=3 etcdctl snapshot restore snapshot.db --data-dir /var/lib/etcd-restored – also specify --initial-cluster and --initial-advertise-peer-urls if needed.
3. Replace the existing etcd data directory: mv /var/lib/etcd /var/lib/etcd.bak && mv /var/lib/etcd-restored /var/lib/etcd.
4. Ensure ownership: chown -R etcd:etcd /var/lib/etcd.
5. Restart etcd and API server: systemctl start etcd kube-apiserver (or restore static pod manifests).
6. Verify: kubectl get nodes should return the cluster state at backup time. For multi‑member etcd, restore on each node with different --name and rebuild the cluster using etcdctl member add.

7. Cross‑Node Pod Communication Failure

🎭 Scenario: Pods on the same node can communicate, but Pods on different nodes cannot ping each other. The CNI is Calico. What are the top three checks you perform?

✅ Answer:
1. CNI configuration: Check /etc/cni/net.d/ on each node. Ensure the CNI plugin binary exists and the IP pools are correctly defined (no overlapping podCIDRs).
2. Overlay network ports: For Calico (VXLAN or IPIP), verify that firewall allows UDP ports 4789 (VXLAN) or IPIP protocol 4. For Flannel, UDP 8285/8472.
3. BGP route propagation (if using Calico BGP): Check that bird daemon is running, and routes are exchanged between nodes: calicoctl node status and ip route show to see remote podCIDRs. If routes missing, restart calico-node pods. Also verify that host’s rp_filter is set to loose mode (0 or 2).
Additional: Check if kube-proxy’s strict ARP is enabled for certain network plugins.

8. HorizontalPodAutoscaler Won’t Scale

🎭 Scenario: The HPA is configured to scale based on CPU utilization (target 50%). Even under load, the number of replicas remains 1. HPA events show “unable to get metrics”. How do you fix it?

✅ Answer:
• The most common cause: missing Metrics Server. Deploy it: kubectl apply -f components.yaml from the official release.
• Check HPA status: kubectl describe hpa <hpa-name>. Look for “FailedGetResourceMetric” or “missing request for cpu”.
• Ensure each pod has resources.requests.cpu defined. HPA uses request percentage, not limits.
• Verify metrics API: kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes should return data.
• If using custom metrics, ensure the Prometheus adapter or other provider is correctly installed and the metric name matches.
• Check for scaleTargetRef pointing to the right Deployment/StatefulSet.
• Also, if stabilization window or minimum replicas prevents scaling, adjust --horizontal-pod-autoscaler-downscale-stabilization or HPA spec.

9. Expired Kubernetes Certificates

🎭 Scenario: After one year, users cannot run kubectl commands. The error says “certificate has expired or is not yet valid”. The cluster was set up with kubeadm. How do you recover without rebuilding?

✅ Answer:
• First, check cert expiration: kubeadm certs check-expiration.
• Renew all certificates (including admin.conf, apiserver, etcd) using: kubeadm certs renew all (this updates the certificates in-place and restarts control plane components automatically when static pods are used).
• Then update the local kubeconfig: kubectl --kubeconfig /etc/kubernetes/admin.conf and copy to user’s config. For admin.conf, re-generate: kubeadm init phase kubeconfig admin --config ... or simply restart the API server.
• If the API server is inaccessible due to expired client certificates on the kubelet, you may need to manually copy renewed certificates to nodes or run kubeadm upgrade node after renewing on control plane.
• For external etcd, renew etcd certificates separately. After renewal, restart all control plane components.
• If the cluster is completely dead, you can also use the --cert-dir flag and replace secrets.

10. Enforce Pod Security – No Privileged Containers

🎭 Scenario: Your security team demands that no container runs as privileged (privileged: true) and that root user is forbidden. How would you enforce this across the entire cluster without blocking developers immediately?

✅ Answer:
• Use Pod Security Admission (PSA) (built-in since Kubernetes v1.25). Apply labels to namespaces: pod-security.kubernetes.io/enforce=baseline (prevents privileged containers, hostPID, etc.) or restricted (even stricter, no root).
• For a phased rollout: start with warn and audit modes before enforcing.
• Example: kubectl label ns default pod-security.kubernetes.io/enforce=restricted – this will reject any pod with privileged flag or running as root (unless explicitly allowed).
• Alternatively, use OPA Gatekeeper or Kyverno for fine-grained policies (e.g., block containers with securityContext.privileged=true).
• Also enforce read-only root filesystem, drop all capabilities, and run with non‑root user via securityContext.runAsNonRoot: true.
• For legacy workloads, gradually update the container images to meet the policies.

🎓 Why AEM Institute Kolkata for Kubernetes Training?

At AEM Institute Kolkata, we don’t just teach YAML syntax – we build real‑world problem solvers. Our Kubernetes Administrator program includes live labs, disaster recovery drills, and scenario‑based mock interviews exactly like the ones above. With a placement record of 94% in DevOps roles, we are recognized as the best Kubernetes training institute in Kolkata. Whether you’re preparing for CKA, CKAD, or an enterprise admin interview, our curriculum covers:

Cluster installation (kubeadm, kops, EKS, AKS)
Advanced troubleshooting (etcd, CNI, kubelet)
Security (RBAC, PSA, network policies)
Storage and Stateful workloads
CI/CD with ArgoCD & GitOps

📢 Upcoming batch: [Check website for dates] – Limited seats. Get hands‑on with 20+ real‑world scenarios.

Need Help?
Chat with us on WhatsApp

Devraj Sarkar

Cybersecurity Architect | Cloud-Native Defense | AI/ML Security | DevSecOps

𝐖𝐢𝐭𝐡 𝟐𝟑+ 𝐲𝐞𝐚𝐫𝐬 𝐨𝐟 𝐞𝐱𝐩𝐞𝐫𝐭𝐢𝐬𝐞 𝐢𝐧 𝐜𝐲𝐛𝐞𝐫𝐬𝐞𝐜𝐮𝐫𝐢𝐭𝐲 𝐚𝐧𝐝 𝐜𝐥𝐨𝐮𝐝-𝐧𝐚𝐭𝐢𝐯𝐞 𝐝𝐞𝐟𝐞𝐧𝐬𝐞, 𝐈 𝐚𝐫𝐜𝐡𝐢𝐭𝐞𝐜𝐭 𝐫𝐞𝐬𝐢𝐥𝐢𝐞𝐧𝐭 𝐝𝐢𝐠𝐢𝐭𝐚𝐥 𝐞𝐜𝐨𝐬𝐲𝐬𝐭𝐞𝐦𝐬 𝐛𝐲 𝐢𝐧𝐭𝐞𝐠𝐫𝐚𝐭𝐢𝐧𝐠 𝐙𝐞𝐫𝐨 𝐓𝐫𝐮𝐬𝐭, 𝐭𝐡𝐫𝐞𝐚𝐭 𝐢𝐧𝐭𝐞𝐥𝐥𝐢𝐠𝐞𝐧𝐜𝐞, 𝐚𝐧𝐝 𝐩𝐫𝐨𝐚𝐜𝐭𝐢𝐯𝐞 𝐫𝐢𝐬𝐤 𝐦𝐢𝐭𝐢𝐠𝐚𝐭𝐢𝐨𝐧 𝐢𝐧𝐭𝐨 𝐞𝐯𝐞𝐫𝐲 𝐥𝐚𝐲𝐞𝐫 𝐨𝐟 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞.

My journey began in network security (firewalls, IDS/IPS) and evolved through Linux/Windows hardening, IAM, and DevSecOps—bridging security with agile development. Today, I specialize in securing multi-cloud (AWS/Azure/GCP) environments.

𝐀𝐬 𝐚 𝐭𝐫𝐮𝐬𝐭𝐞𝐝 𝐚𝐝𝐯𝐢𝐬𝐨𝐫, 𝐈 𝐡𝐞𝐥𝐩 𝐨𝐫𝐠𝐚𝐧𝐢𝐳𝐚𝐭𝐢𝐨𝐧𝐬:

✔️ Align security investments with business objectives (reducing TCO while maximizing cyber ROI).

✔️ Prioritize risks executives care about—translating technical vulnerabilities into financial/operational impact.

✔️ Optimize team workflows by merging DevSecOps agility with governance rigor—no more “security vs. speed” trade-offs.

𝐂𝐨𝐫𝐞 𝐒𝐭𝐫𝐞𝐧𝐠𝐭𝐡𝐬 & 𝐃𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭𝐢𝐚𝐭𝐢𝐨𝐧:

𝘌𝘯𝘥-𝘵𝘰-𝘦𝘯𝘥 𝘴𝘦𝘤𝘶𝘳𝘪𝘵𝘺 𝘢𝘳𝘤𝘩𝘪𝘵𝘦𝘤𝘵𝘶𝘳𝘦—𝘧𝘳𝘰𝘮 𝘯𝘦𝘵𝘸𝘰𝘳𝘬 𝘩𝘢𝘳𝘥𝘦𝘯𝘪𝘯𝘨 𝘵𝘰 𝘈𝘐-𝘥𝘳𝘪𝘷𝘦𝘯 𝘵𝘩𝘳𝘦𝘢𝘵 𝘥𝘦𝘵𝘦𝘤𝘵𝘪𝘰𝘯.

𝐌𝐮𝐥𝐭𝐢-𝐂𝐥𝐨𝐮𝐝 𝐒𝐞𝐜𝐮𝐫𝐢𝐭𝐲: Deep expertise in AWS/Azure/GCP security tools (Kubernetes, CSPM, CWPP).

𝐓𝐡𝐫𝐞𝐚𝐭 𝐈𝐧𝐭𝐞𝐥𝐥𝐢𝐠𝐞𝐧𝐜𝐞 & 𝐅𝐨𝐫𝐞𝐧𝐬𝐢𝐜𝐬: Proactive hunting, incident response, and post-breach analysis.

𝐙𝐞𝐫𝐨 𝐓𝐫𝐮𝐬𝐭 & 𝐈𝐀𝐌: Architecting least-privilege access, PKI, and micro-segmentation.

𝐀𝐈/𝐌𝐋 𝐒𝐞𝐜𝐮𝐫𝐢𝐭𝐲: Securing LLMs, MLOps pipelines, and data lakes against adversarial attacks.

𝐑𝐞𝐜𝐞𝐧𝐭 𝐂𝐨𝐧𝐬𝐮𝐥𝐭𝐢𝐧𝐠 𝐏𝐫𝐨𝐣𝐞𝐜𝐭𝐬 – 𝐀𝐠𝐞𝐧𝐭𝐢𝐜 𝐀𝐈 & 𝐀𝐈 𝐒𝐞𝐜𝐮𝐫𝐢𝐭𝐲:

✔️ Led security architecture for a GenAI‑powered Agentic AI system (autonomous task‑planning agents using LangChain & AutoGPT). Designed guardrails against prompt injection, tool‑calling abuse, and data exfiltration via agent‑to‑agent communication. Result: Zero security breaches across 10k+ agentic transactions.

✔️ Advised a fintech firm on AI supply chain security – hardened their LLM fine‑tuning pipeline (Hugging Face + AWS SageMaker) against model poisoning and backdoor attacks. Implemented real‑time anomaly detection for model inputs using statistical outlier scoring.

Let’s connect and discuss the future of secure, intelligent infrastructure.

AEM Kolkata

10 Kubernetes interview Questions with Answer for Working Professionals – Updated April 2026