10 Kubernetes interview Questions with Answer for Working Professionals – Updated April 2026
10 Scenario-Based Kubernetes Interview Questions for Administrators
By AEM Institute Kolkata – India’s leading training hub for DevOps & Cloud Native technologies | Updated: April 2026
🎯 Best Kubernetes Interview Questions and Answer by AEM Institute Kolkata
Kubernetes has become the de‑facto orchestration platform for containerized applications. As a Kubernetes Administrator, you are expected to troubleshoot real‑world failures, upgrade clusters safely, and secure workloads. AEM Institute Kolkata has curated 10 scenario‑based interview questions that test your hands‑on knowledge – not just theoretical definitions. Each question presents a production‑like problem followed by an expert answer. Use these to prepare for your next K8s admin interview or to evaluate candidates.
1. Node NotReady – The Silent Worker
🎭 Scenario: One of your worker nodes shows kubectl get nodes status NotReady for 10 minutes. Pods that were running on it are stuck in Terminating or Unknown state. How do you systematically diagnose the root cause?
✅ Answer (Administrator’s approach):
First, get node details: kubectl describe node <node-name> – check Conditions (Ready, DiskPressure, MemoryPressure, PIDPressure, NetworkUnavailable). Then SSH into the node and:
• systemctl status kubelet – is it active?
• journalctl -u kubelet -n 50 --no-pager – look for errors like “failed to get sandbox image”, “CNI config uninitialized”, or “node not authorized”.
• Check disk usage: df -h and docker/containerd status.
• Verify that kubelet can reach API server (check kubelet config and certificates).
• If the node is unreachable for >5 min, pods are evicted after pod-eviction-timeout. Restart kubelet or drain/delete the node after fixing underlying issue (e.g., network plugin crash, out of memory).
2. CrashLoopBackOff – The Restarting Nightmare
🎭 Scenario: A developer deployed a new microservice. The pod enters CrashLoopBackOff immediately. You see no obvious error in kubectl logs <pod> because the container restarts too fast. How do you capture the root cause?
✅ Answer:
1. Get previous container logs: kubectl logs <pod> --previous – this shows why the last container exited.
2. Check pod events: kubectl describe pod <pod> – look for Exit Code (0=normal, 1+=error). Common causes: missing environment variable, wrong command, or inability to connect to a dependency (DB, Redis).
3. Temporarily override the command to sleep: kubectl debug <pod> -it --image=busybox -- sleep 3600 (if ephemeral containers are supported) or edit deployment to command: ["sleep","3600"] and then exec in to inspect filesystem and config.
4. Verify liveness/readiness probe settings – an aggressive liveness probe can cause restarts if app takes long to start.
5. If OOMKilled appears, increase memory limits or fix memory leak.
3. Upgrading a Production Control Plane
🎭 Scenario: You manage a 3‑node control plane cluster (kubeadm) running v1.27. The business requires upgrading to v1.28 with zero downtime for the API server. What steps do you follow?
✅ Answer:
For multi‑master HA setup:
• Pre‑upgrade checks: kubeadm upgrade plan to verify compatibility and see the upgrade path.
• Drain one control plane node (move etcd and control plane workloads – but etcd is clustered, so draining is safe as long as quorum remains).
• Upgrade kubeadm on that node: apt-get update && apt-get install kubeadm=1.28.x (or yum).
• Apply upgrade: kubeadm upgrade apply v1.28.x --etcd-upgrade=true (only on first CP).
• Upgrade kubelet and kubectl on that node, then restart kubelet.
• Uncordon the node.
• Repeat the same process for the second and third control plane nodes (using kubeadm upgrade node on subsequent nodes).
• Finally, upgrade worker nodes one by one with drain → upgrade kubeadm/kubelet → uncordon.
• Zero downtime is achieved because API servers are behind a load balancer and at least one control plane remains available during the rolling upgrade. For etcd, maintain ≥2 members online.
4. PersistentVolumeClaim Stuck in Pending
🎭 Scenario: A StatefulSet needs storage. The PVC is created but remains Pending indefinitely. You have a default StorageClass. What troubleshooting steps do you take?
✅ Answer:
• kubectl describe pvc <pvc-name> – events often show the reason (e.g., “no persistent volumes available for this claim”, “storageclass.storage.k8s.io not found”).
• Verify the StorageClass exists: kubectl get sc – check if it is marked default and has a provisioner (like ebs.csi.aws.com, kubernetes.io/gce-pd).
• Ensure the CSI controller pods are running in kube-system.
• Check PVC’s requested storage size and accessModes (RWO, RWX). The underlying storage backend may not support the requested mode.
• If no dynamic provisioning, manually create a PV with matching labels/selectors or fix the StorageClass parameters (e.g., wrong zone, missing cloud permissions).
• For local storage, ensure the node has the expected directory and the local PV provisioner is running.
5. Service Unreachable via ClusterIP
🎭 Scenario: Pod A cannot reach Pod B using the Service’s ClusterIP (e.g., curl http://my-svc.default.svc.cluster.local:8080 times out). Both pods are healthy and endpoints exist. How do you isolate the network issue?
✅ Answer:
• First, confirm endpoints: kubectl get endpoints my-svc – must list pod IPs.
• Test connectivity from a debug pod: kubectl run tmp --rm -it --image=nicolaka/netshoot -- /bin/bash, then curl -v http://<cluster-ip>:port and also curl -v http://<pod-ip>:port (bypass service). If pod IP works but cluster IP does not → issue with kube-proxy.
• On a node, check iptables rules: iptables-save | grep <cluster-ip> or for IPVS: ipvsadm -L -n.
• Verify kube-proxy is running: kubectl get pods -n kube-system | grep kube-proxy. Look at its logs.
• Ensure no NetworkPolicy blocks ingress/egress. Also check if the service selector matches pod labels exactly.
• If DNS is the issue, test with nslookup my-svc.default.svc.cluster.local from the debug pod.
6. etcd Snapshot Recovery After Disaster
🎭 Scenario: Your etcd database got corrupted after a sudden power outage. You have a recent snapshot backup (snapshot.db). Walk through the exact steps to restore the entire cluster (single etcd member).
✅ Answer:
For a single‑master cluster (kubeadm):
1. Stop kube-apiserver and etcd: systemctl stop kube-apiserver etcd (or move manifests from /etc/kubernetes/manifests).
2. Restore snapshot using etcdctl (v3): ETCDCTL_API=3 etcdctl snapshot restore snapshot.db --data-dir /var/lib/etcd-restored – also specify --initial-cluster and --initial-advertise-peer-urls if needed.
3. Replace the existing etcd data directory: mv /var/lib/etcd /var/lib/etcd.bak && mv /var/lib/etcd-restored /var/lib/etcd.
4. Ensure ownership: chown -R etcd:etcd /var/lib/etcd.
5. Restart etcd and API server: systemctl start etcd kube-apiserver (or restore static pod manifests).
6. Verify: kubectl get nodes should return the cluster state at backup time. For multi‑member etcd, restore on each node with different --name and rebuild the cluster using etcdctl member add.
7. Cross‑Node Pod Communication Failure
🎭 Scenario: Pods on the same node can communicate, but Pods on different nodes cannot ping each other. The CNI is Calico. What are the top three checks you perform?
✅ Answer:
1. CNI configuration: Check /etc/cni/net.d/ on each node. Ensure the CNI plugin binary exists and the IP pools are correctly defined (no overlapping podCIDRs).
2. Overlay network ports: For Calico (VXLAN or IPIP), verify that firewall allows UDP ports 4789 (VXLAN) or IPIP protocol 4. For Flannel, UDP 8285/8472.
3. BGP route propagation (if using Calico BGP): Check that bird daemon is running, and routes are exchanged between nodes: calicoctl node status and ip route show to see remote podCIDRs. If routes missing, restart calico-node pods. Also verify that host’s rp_filter is set to loose mode (0 or 2).
Additional: Check if kube-proxy’s strict ARP is enabled for certain network plugins.
8. HorizontalPodAutoscaler Won’t Scale
🎭 Scenario: The HPA is configured to scale based on CPU utilization (target 50%). Even under load, the number of replicas remains 1. HPA events show “unable to get metrics”. How do you fix it?
✅ Answer:
• The most common cause: missing Metrics Server. Deploy it: kubectl apply -f components.yaml from the official release.
• Check HPA status: kubectl describe hpa <hpa-name>. Look for “FailedGetResourceMetric” or “missing request for cpu”.
• Ensure each pod has resources.requests.cpu defined. HPA uses request percentage, not limits.
• Verify metrics API: kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes should return data.
• If using custom metrics, ensure the Prometheus adapter or other provider is correctly installed and the metric name matches.
• Check for scaleTargetRef pointing to the right Deployment/StatefulSet.
• Also, if stabilization window or minimum replicas prevents scaling, adjust --horizontal-pod-autoscaler-downscale-stabilization or HPA spec.
9. Expired Kubernetes Certificates
🎭 Scenario: After one year, users cannot run kubectl commands. The error says “certificate has expired or is not yet valid”. The cluster was set up with kubeadm. How do you recover without rebuilding?
✅ Answer:
• First, check cert expiration: kubeadm certs check-expiration.
• Renew all certificates (including admin.conf, apiserver, etcd) using: kubeadm certs renew all (this updates the certificates in-place and restarts control plane components automatically when static pods are used).
• Then update the local kubeconfig: kubectl --kubeconfig /etc/kubernetes/admin.conf and copy to user’s config. For admin.conf, re-generate: kubeadm init phase kubeconfig admin --config ... or simply restart the API server.
• If the API server is inaccessible due to expired client certificates on the kubelet, you may need to manually copy renewed certificates to nodes or run kubeadm upgrade node after renewing on control plane.
• For external etcd, renew etcd certificates separately. After renewal, restart all control plane components.
• If the cluster is completely dead, you can also use the --cert-dir flag and replace secrets.
10. Enforce Pod Security – No Privileged Containers
🎭 Scenario: Your security team demands that no container runs as privileged (privileged: true) and that root user is forbidden. How would you enforce this across the entire cluster without blocking developers immediately?
✅ Answer:
• Use Pod Security Admission (PSA) (built-in since Kubernetes v1.25). Apply labels to namespaces: pod-security.kubernetes.io/enforce=baseline (prevents privileged containers, hostPID, etc.) or restricted (even stricter, no root).
• For a phased rollout: start with warn and audit modes before enforcing.
• Example: kubectl label ns default pod-security.kubernetes.io/enforce=restricted – this will reject any pod with privileged flag or running as root (unless explicitly allowed).
• Alternatively, use OPA Gatekeeper or Kyverno for fine-grained policies (e.g., block containers with securityContext.privileged=true).
• Also enforce read-only root filesystem, drop all capabilities, and run with non‑root user via securityContext.runAsNonRoot: true.
• For legacy workloads, gradually update the container images to meet the policies.
🎓 Why AEM Institute Kolkata for Kubernetes Training?
At AEM Institute Kolkata, we don’t just teach YAML syntax – we build real‑world problem solvers. Our Kubernetes Administrator program includes live labs, disaster recovery drills, and scenario‑based mock interviews exactly like the ones above. With a placement record of 94% in DevOps roles, we are recognized as the best Kubernetes training institute in Kolkata. Whether you’re preparing for CKA, CKAD, or an enterprise admin interview, our curriculum covers:
- Cluster installation (kubeadm, kops, EKS, AKS)
- Advanced troubleshooting (etcd, CNI, kubelet)
- Security (RBAC, PSA, network policies)
- Storage and Stateful workloads
- CI/CD with ArgoCD & GitOps
📢 Upcoming batch: [Check website for dates] – Limited seats. Get hands‑on with 20+ real‑world scenarios.

Cybersecurity Architect | Cloud-Native Defense | AI/ML Security | DevSecOps
With over 23 years of experience in cybersecurity, I specialize in building resilient, zero-trust digital ecosystems across multi-cloud (AWS, Azure, GCP) and Kubernetes (EKS, AKS, GKE) environments. My journey began in network security—firewalls, IDS/IPS—and expanded into Linux/Windows hardening, IAM, and DevSecOps automation using Terraform, GitLab CI/CD, and policy-as-code tools like OPA and Checkov.
Today, my focus is on securing AI/ML adoption through MLSecOps, protecting models from adversarial attacks with tools like Robust Intelligence and Microsoft Counterfit. I integrate AISecOps for threat detection (Darktrace, Microsoft Security Copilot) and automate incident response with forensics-driven workflows (Elastic SIEM, TheHive).
Whether it’s hardening cloud-native stacks, embedding security into CI/CD pipelines, or safeguarding AI systems, I bridge the gap between security and innovation—ensuring defense scales with speed.
Let’s connect and discuss the future of secure, intelligent infrastructure.
