10 Kubernetes interview Questions with Answer for Working Professionals – Updated April 2026
10 Scenario-Based Kubernetes Interview Questions for Administrators
By AEM Institute Kolkata โ Indiaโs leading training hub for DevOps & Cloud Native technologies | Updated: April 2026
๐ฏ Best Kubernetes Interview Questions and Answer by AEM Institute Kolkata
Kubernetes has become the deโfacto orchestration platform for containerized applications. As a Kubernetes Administrator, you are expected to troubleshoot realโworld failures, upgrade clusters safely, and secure workloads. AEM Institute Kolkata has curated 10 scenarioโbased interview questions that test your handsโon knowledge โ not just theoretical definitions. Each question presents a productionโlike problem followed by an expert answer. Use these to prepare for your next K8s admin interview or to evaluate candidates.
1. Node NotReady โ The Silent Worker
๐ญ Scenario: One of your worker nodes shows kubectl get nodes status NotReady for 10 minutes. Pods that were running on it are stuck in Terminating or Unknown state. How do you systematically diagnose the root cause?
โ
Answer (Administratorโs approach):
First, get node details: kubectl describe node <node-name> โ check Conditions (Ready, DiskPressure, MemoryPressure, PIDPressure, NetworkUnavailable). Then SSH into the node and:
โข systemctl status kubelet โ is it active?
โข journalctl -u kubelet -n 50 --no-pager โ look for errors like โfailed to get sandbox imageโ, โCNI config uninitializedโ, or โnode not authorizedโ.
โข Check disk usage: df -h and docker/containerd status.
โข Verify that kubelet can reach API server (check kubelet config and certificates).
โข If the node is unreachable for >5 min, pods are evicted after pod-eviction-timeout. Restart kubelet or drain/delete the node after fixing underlying issue (e.g., network plugin crash, out of memory).
2. CrashLoopBackOff โ The Restarting Nightmare
๐ญ Scenario: A developer deployed a new microservice. The pod enters CrashLoopBackOff immediately. You see no obvious error in kubectl logs <pod> because the container restarts too fast. How do you capture the root cause?
โ
Answer:
1. Get previous container logs: kubectl logs <pod> --previous โ this shows why the last container exited.
2. Check pod events: kubectl describe pod <pod> โ look for Exit Code (0=normal, 1+=error). Common causes: missing environment variable, wrong command, or inability to connect to a dependency (DB, Redis).
3. Temporarily override the command to sleep: kubectl debug <pod> -it --image=busybox -- sleep 3600 (if ephemeral containers are supported) or edit deployment to command: ["sleep","3600"] and then exec in to inspect filesystem and config.
4. Verify liveness/readiness probe settings โ an aggressive liveness probe can cause restarts if app takes long to start.
5. If OOMKilled appears, increase memory limits or fix memory leak.
3. Upgrading a Production Control Plane
๐ญ Scenario: You manage a 3โnode control plane cluster (kubeadm) running v1.27. The business requires upgrading to v1.28 with zero downtime for the API server. What steps do you follow?
โ
Answer:
For multiโmaster HA setup:
โข Preโupgrade checks: kubeadm upgrade plan to verify compatibility and see the upgrade path.
โข Drain one control plane node (move etcd and control plane workloads โ but etcd is clustered, so draining is safe as long as quorum remains).
โข Upgrade kubeadm on that node: apt-get update && apt-get install kubeadm=1.28.x (or yum).
โข Apply upgrade: kubeadm upgrade apply v1.28.x --etcd-upgrade=true (only on first CP).
โข Upgrade kubelet and kubectl on that node, then restart kubelet.
โข Uncordon the node.
โข Repeat the same process for the second and third control plane nodes (using kubeadm upgrade node on subsequent nodes).
โข Finally, upgrade worker nodes one by one with drain โ upgrade kubeadm/kubelet โ uncordon.
โข Zero downtime is achieved because API servers are behind a load balancer and at least one control plane remains available during the rolling upgrade. For etcd, maintain โฅ2 members online.
4. PersistentVolumeClaim Stuck in Pending
๐ญ Scenario: A StatefulSet needs storage. The PVC is created but remains Pending indefinitely. You have a default StorageClass. What troubleshooting steps do you take?
โ
Answer:
โข kubectl describe pvc <pvc-name> โ events often show the reason (e.g., โno persistent volumes available for this claimโ, โstorageclass.storage.k8s.io not foundโ).
โข Verify the StorageClass exists: kubectl get sc โ check if it is marked default and has a provisioner (like ebs.csi.aws.com, kubernetes.io/gce-pd).
โข Ensure the CSI controller pods are running in kube-system.
โข Check PVCโs requested storage size and accessModes (RWO, RWX). The underlying storage backend may not support the requested mode.
โข If no dynamic provisioning, manually create a PV with matching labels/selectors or fix the StorageClass parameters (e.g., wrong zone, missing cloud permissions).
โข For local storage, ensure the node has the expected directory and the local PV provisioner is running.
5. Service Unreachable via ClusterIP
๐ญ Scenario: Pod A cannot reach Pod B using the Serviceโs ClusterIP (e.g., curl http://my-svc.default.svc.cluster.local:8080 times out). Both pods are healthy and endpoints exist. How do you isolate the network issue?
โ
Answer:
โข First, confirm endpoints: kubectl get endpoints my-svc โ must list pod IPs.
โข Test connectivity from a debug pod: kubectl run tmp --rm -it --image=nicolaka/netshoot -- /bin/bash, then curl -v http://<cluster-ip>:port and also curl -v http://<pod-ip>:port (bypass service). If pod IP works but cluster IP does not โ issue with kube-proxy.
โข On a node, check iptables rules: iptables-save | grep <cluster-ip> or for IPVS: ipvsadm -L -n.
โข Verify kube-proxy is running: kubectl get pods -n kube-system | grep kube-proxy. Look at its logs.
โข Ensure no NetworkPolicy blocks ingress/egress. Also check if the service selector matches pod labels exactly.
โข If DNS is the issue, test with nslookup my-svc.default.svc.cluster.local from the debug pod.
6. etcd Snapshot Recovery After Disaster
๐ญ Scenario: Your etcd database got corrupted after a sudden power outage. You have a recent snapshot backup (snapshot.db). Walk through the exact steps to restore the entire cluster (single etcd member).
โ
Answer:
For a singleโmaster cluster (kubeadm):
1. Stop kube-apiserver and etcd: systemctl stop kube-apiserver etcd (or move manifests from /etc/kubernetes/manifests).
2. Restore snapshot using etcdctl (v3): ETCDCTL_API=3 etcdctl snapshot restore snapshot.db --data-dir /var/lib/etcd-restored โ also specify --initial-cluster and --initial-advertise-peer-urls if needed.
3. Replace the existing etcd data directory: mv /var/lib/etcd /var/lib/etcd.bak && mv /var/lib/etcd-restored /var/lib/etcd.
4. Ensure ownership: chown -R etcd:etcd /var/lib/etcd.
5. Restart etcd and API server: systemctl start etcd kube-apiserver (or restore static pod manifests).
6. Verify: kubectl get nodes should return the cluster state at backup time. For multiโmember etcd, restore on each node with different --name and rebuild the cluster using etcdctl member add.
7. CrossโNode Pod Communication Failure
๐ญ Scenario: Pods on the same node can communicate, but Pods on different nodes cannot ping each other. The CNI is Calico. What are the top three checks you perform?
โ
Answer:
1. CNI configuration: Check /etc/cni/net.d/ on each node. Ensure the CNI plugin binary exists and the IP pools are correctly defined (no overlapping podCIDRs).
2. Overlay network ports: For Calico (VXLAN or IPIP), verify that firewall allows UDP ports 4789 (VXLAN) or IPIP protocol 4. For Flannel, UDP 8285/8472.
3. BGP route propagation (if using Calico BGP): Check that bird daemon is running, and routes are exchanged between nodes: calicoctl node status and ip route show to see remote podCIDRs. If routes missing, restart calico-node pods. Also verify that hostโs rp_filter is set to loose mode (0 or 2).
Additional: Check if kube-proxyโs strict ARP is enabled for certain network plugins.
8. HorizontalPodAutoscaler Wonโt Scale
๐ญ Scenario: The HPA is configured to scale based on CPU utilization (target 50%). Even under load, the number of replicas remains 1. HPA events show โunable to get metricsโ. How do you fix it?
โ
Answer:
โข The most common cause: missing Metrics Server. Deploy it: kubectl apply -f components.yaml from the official release.
โข Check HPA status: kubectl describe hpa <hpa-name>. Look for โFailedGetResourceMetricโ or โmissing request for cpuโ.
โข Ensure each pod has resources.requests.cpu defined. HPA uses request percentage, not limits.
โข Verify metrics API: kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes should return data.
โข If using custom metrics, ensure the Prometheus adapter or other provider is correctly installed and the metric name matches.
โข Check for scaleTargetRef pointing to the right Deployment/StatefulSet.
โข Also, if stabilization window or minimum replicas prevents scaling, adjust --horizontal-pod-autoscaler-downscale-stabilization or HPA spec.
9. Expired Kubernetes Certificates
๐ญ Scenario: After one year, users cannot run kubectl commands. The error says โcertificate has expired or is not yet validโ. The cluster was set up with kubeadm. How do you recover without rebuilding?
โ
Answer:
โข First, check cert expiration: kubeadm certs check-expiration.
โข Renew all certificates (including admin.conf, apiserver, etcd) using: kubeadm certs renew all (this updates the certificates in-place and restarts control plane components automatically when static pods are used).
โข Then update the local kubeconfig: kubectl --kubeconfig /etc/kubernetes/admin.conf and copy to userโs config. For admin.conf, re-generate: kubeadm init phase kubeconfig admin --config ... or simply restart the API server.
โข If the API server is inaccessible due to expired client certificates on the kubelet, you may need to manually copy renewed certificates to nodes or run kubeadm upgrade node after renewing on control plane.
โข For external etcd, renew etcd certificates separately. After renewal, restart all control plane components.
โข If the cluster is completely dead, you can also use the --cert-dir flag and replace secrets.
10. Enforce Pod Security โ No Privileged Containers
๐ญ Scenario: Your security team demands that no container runs as privileged (privileged: true) and that root user is forbidden. How would you enforce this across the entire cluster without blocking developers immediately?
โ
Answer:
โข Use Pod Security Admission (PSA) (built-in since Kubernetes v1.25). Apply labels to namespaces: pod-security.kubernetes.io/enforce=baseline (prevents privileged containers, hostPID, etc.) or restricted (even stricter, no root).
โข For a phased rollout: start with warn and audit modes before enforcing.
โข Example: kubectl label ns default pod-security.kubernetes.io/enforce=restricted โ this will reject any pod with privileged flag or running as root (unless explicitly allowed).
โข Alternatively, use OPA Gatekeeper or Kyverno for fine-grained policies (e.g., block containers with securityContext.privileged=true).
โข Also enforce read-only root filesystem, drop all capabilities, and run with nonโroot user via securityContext.runAsNonRoot: true.
โข For legacy workloads, gradually update the container images to meet the policies.
๐ Why AEM Institute Kolkata for Kubernetes Training?
At AEM Institute Kolkata, we donโt just teach YAML syntax โ we build realโworld problem solvers. Our Kubernetes Administrator program includes live labs, disaster recovery drills, and scenarioโbased mock interviews exactly like the ones above. With a placement record of 94% in DevOps roles, we are recognized as the best Kubernetes training institute in Kolkata. Whether youโre preparing for CKA, CKAD, or an enterprise admin interview, our curriculum covers:
- Cluster installation (kubeadm, kops, EKS, AKS)
- Advanced troubleshooting (etcd, CNI, kubelet)
- Security (RBAC, PSA, network policies)
- Storage and Stateful workloads
- CI/CD with ArgoCD & GitOps
๐ข Upcoming batch: [Check website for dates] โ Limited seats. Get handsโon with 20+ realโworld scenarios.

Cybersecurity Architect | Cloud-Native Defense | AI/ML Security | DevSecOps
๐๐ข๐ญ๐ก ๐๐+ ๐ฒ๐๐๐ซ๐ฌ ๐จ๐ ๐๐ฑ๐ฉ๐๐ซ๐ญ๐ข๐ฌ๐ ๐ข๐ง ๐๐ฒ๐๐๐ซ๐ฌ๐๐๐ฎ๐ซ๐ข๐ญ๐ฒ ๐๐ง๐ ๐๐ฅ๐จ๐ฎ๐-๐ง๐๐ญ๐ข๐ฏ๐ ๐๐๐๐๐ง๐ฌ๐, ๐ ๐๐ซ๐๐ก๐ข๐ญ๐๐๐ญ ๐ซ๐๐ฌ๐ข๐ฅ๐ข๐๐ง๐ญ ๐๐ข๐ ๐ข๐ญ๐๐ฅ ๐๐๐จ๐ฌ๐ฒ๐ฌ๐ญ๐๐ฆ๐ฌ ๐๐ฒ ๐ข๐ง๐ญ๐๐ ๐ซ๐๐ญ๐ข๐ง๐ ๐๐๐ซ๐จ ๐๐ซ๐ฎ๐ฌ๐ญ, ๐ญ๐ก๐ซ๐๐๐ญ ๐ข๐ง๐ญ๐๐ฅ๐ฅ๐ข๐ ๐๐ง๐๐, ๐๐ง๐ ๐ฉ๐ซ๐จ๐๐๐ญ๐ข๐ฏ๐ ๐ซ๐ข๐ฌ๐ค ๐ฆ๐ข๐ญ๐ข๐ ๐๐ญ๐ข๐จ๐ง ๐ข๐ง๐ญ๐จ ๐๐ฏ๐๐ซ๐ฒ ๐ฅ๐๐ฒ๐๐ซ ๐จ๐ ๐ข๐ง๐๐ซ๐๐ฌ๐ญ๐ซ๐ฎ๐๐ญ๐ฎ๐ซ๐.
My journey began in network security (firewalls, IDS/IPS) and evolved through Linux/Windows hardening, IAM, and DevSecOpsโbridging security with agile development. Today, I specialize in securing multi-cloud (AWS/Azure/GCP) environments.
๐๐ฌ ๐ ๐ญ๐ซ๐ฎ๐ฌ๐ญ๐๐ ๐๐๐ฏ๐ข๐ฌ๐จ๐ซ, ๐ ๐ก๐๐ฅ๐ฉ ๐จ๐ซ๐ ๐๐ง๐ข๐ณ๐๐ญ๐ข๐จ๐ง๐ฌ:
โ๏ธ Align security investments with business objectives (reducing TCO while maximizing cyber ROI).
โ๏ธ Prioritize risks executives care aboutโtranslating technical vulnerabilities into financial/operational impact.
โ๏ธ Optimize team workflows by merging DevSecOps agility with governance rigorโno more “security vs. speed” trade-offs.
๐๐จ๐ซ๐ ๐๐ญ๐ซ๐๐ง๐ ๐ญ๐ก๐ฌ & ๐๐ข๐๐๐๐ซ๐๐ง๐ญ๐ข๐๐ญ๐ข๐จ๐ง:
๐๐ฏ๐ฅ-๐ต๐ฐ-๐ฆ๐ฏ๐ฅ ๐ด๐ฆ๐ค๐ถ๐ณ๐ช๐ต๐บ ๐ข๐ณ๐ค๐ฉ๐ช๐ต๐ฆ๐ค๐ต๐ถ๐ณ๐ฆโ๐ง๐ณ๐ฐ๐ฎ ๐ฏ๐ฆ๐ต๐ธ๐ฐ๐ณ๐ฌ ๐ฉ๐ข๐ณ๐ฅ๐ฆ๐ฏ๐ช๐ฏ๐จ ๐ต๐ฐ ๐๐-๐ฅ๐ณ๐ช๐ท๐ฆ๐ฏ ๐ต๐ฉ๐ณ๐ฆ๐ข๐ต ๐ฅ๐ฆ๐ต๐ฆ๐ค๐ต๐ช๐ฐ๐ฏ.
๐๐ฎ๐ฅ๐ญ๐ข-๐๐ฅ๐จ๐ฎ๐ ๐๐๐๐ฎ๐ซ๐ข๐ญ๐ฒ: Deep expertise in AWS/Azure/GCP security tools (Kubernetes, CSPM, CWPP).
๐๐ก๐ซ๐๐๐ญ ๐๐ง๐ญ๐๐ฅ๐ฅ๐ข๐ ๐๐ง๐๐ & ๐ ๐จ๐ซ๐๐ง๐ฌ๐ข๐๐ฌ: Proactive hunting, incident response, and post-breach analysis.
๐๐๐ซ๐จ ๐๐ซ๐ฎ๐ฌ๐ญ & ๐๐๐: Architecting least-privilege access, PKI, and micro-segmentation.
๐๐/๐๐ ๐๐๐๐ฎ๐ซ๐ข๐ญ๐ฒ: Securing LLMs, MLOps pipelines, and data lakes against adversarial attacks.
๐๐๐๐๐ง๐ญ ๐๐จ๐ง๐ฌ๐ฎ๐ฅ๐ญ๐ข๐ง๐ ๐๐ซ๐จ๐ฃ๐๐๐ญ๐ฌ โ ๐๐ ๐๐ง๐ญ๐ข๐ ๐๐ & ๐๐ ๐๐๐๐ฎ๐ซ๐ข๐ญ๐ฒ:
โ๏ธ Led security architecture for a GenAIโpowered Agentic AI system (autonomous taskโplanning agents using LangChain & AutoGPT). Designed guardrails against prompt injection, toolโcalling abuse, and data exfiltration via agentโtoโagent communication. Result: Zero security breaches across 10k+ agentic transactions.
โ๏ธ Advised a fintech firm on AI supply chain security โ hardened their LLM fineโtuning pipeline (Hugging Face + AWS SageMaker) against model poisoning and backdoor attacks. Implemented realโtime anomaly detection for model inputs using statistical outlier scoring.
Letโs connect and discuss the future of secure, intelligent infrastructure.
