Pod Terminating Stuck Openshift

Pod terminating stuck OpenShift
PODs hanging in terminating state.
There might be situations where you have already deleted pods (or already removed deployment
configuration) but pods are stuck in Terminating state.
In that case Force delete the pod
kubectl delete pod --grace-period=0 --force --namespace <NAMESPACE> <PODNAME>
Network files system: An nfs volume allows an existing NFS (Network File System) share to
be mounted into a Pod. Unlike emptyDir, which is erased when a Pod is removed, the
contents of an nfs volume are preserved and the volume is merely unmounted. This means
that an NFS volume can be pre-populated with data, and that data can be shared between
pods. NFS can be mounted by multiple writers simultaneously.
Trident: is an open-source storage provisioner.
The simplicity you can obtain by using Trident for dynamically creating PVCs, coupled with
its production grade CSI drivers and data management capabilities make it a key option for
stateful storage requirements for OpenShift. Applications generate data and access to
storage should be painless and on-demand.
1. It is the first out-of-tree, out-of-process storage provisioner that works by watching

events at the Kubernetes API Server, affording it levels of visibility and flexibility that cannot
otherwise be achieved.
2. It is capable of orchestrating across multiple platforms at the same time through a unified
interface.
Reboot node without causing application outages?
To reboot a node without causing an outage for applications running on the platform, it is important
to first evacuate the pods.
For pods that are made highly available by the routing tier, nothing else needs to be done. For other
pods needing storage, typically databases, it is critical to ensure that they can remain in operation
with one pod temporarily going offline.
Currently, the easiest way to manage node reboots is to ensure that there are at least three nodes
available to run infrastructure. The nodes to run the infrastructure are called master nodes.
The scenario below demonstrates a common mistake that can lead to service interruptions for the
applications running on OpenShift Container Platform when only two nodes are available.
Node A is marked unschedulable and all pods are evacuated.
The registry pod running on that node is now redeployed on node B. This means node B is now
running both registry pods.
Node B is now marked unschedulable and is evacuated.
The service exposing the two pod endpoints on node B, for a brief period of time, loses all endpoints
until they are redeployed to node A.
The same process using three master nodes for infrastructure does not result in a service disruption.
However, due to pod scheduling, the last node that is evacuated and brought back in to rotation is
left running zero registries. The other two nodes will run two and one registries respectively. The
best solution is to rely on pod anti-affinity.
Pod anti-affinity: with this in place only two infrastructure nodes are available and one is rebooted,
the container image registry pod is prevented from running on the other node. oc get pods reports
the pod as unready until a suitable node is available. Once a node is available and all pods are back
in ready state, the next node can be restarted.
Scenarios
1. Pod is in pending state:
We need to check whether node is working fine or not.
To check any problem with node
- Check docker service, atomic -shift -node.service is running or not and few other commands
Systemctl status docker
Systemctl status atomic -OpenShift -node.service
Any of these two are not running means there is a problem with the node
If we find the node is not running we have to make it as unschedulable
Oc adm manage-node <node-name> --schedulable=false

Or
Oc adm cordon <node-name>
This will make to get attached to other nodes
Check the operation and do remedy and mark is schedulable
2. Evacuate a node?
Evacuate pods with graceful termination by restarting on another node prior to

termination on the existing node
a. Set a desired node to unschedulable , preventing new workloads from

arriving on the node
b. Timeout periods apply before a pod is forcefully terminated
c. Some pods are never terminated gracefully based on their schedule type,
such as daemonset
Oc adm manage-node <node-name> --evacuate
Oc adm drain <node-name>
Step1: Mark that node as unschedulable
Step2: we have to drain the node
(docker service is not running or pod is in pending state or when there is a space issue we drain the
node)
Step3: once the issue is fixed we need to make as schedulable
Oc adm manage-node <node-name> --schedulable
Oc adm uncordon <node-name>
3. Pod is in pending state
If a Pod is stuck in Pending it means that it cannot be scheduled onto a node. Generally
this is because there are insufficient resources of one type or another that prevent scheduling.
Step1: oc desc pod <pod-name>
Reason1: If the issue is like cluster is full or no nodes are available to schedule the pod, we need
to add more nodes to the cluster.
Reason2: If issue is because of resource quota limit (increase quota of the project)
Reason3: If the issue is because of pending state pvc, fix the pvs by mounting. (Mention in
deployment config file)
Reason4: If pod is assigned to any node or not, if assigned to any node and still it is in pending
state, that means some issue with the atomic OpenShift –node service (not running/error), if the
pod is not assigned to any node then issue is with the scheduler.
4. Pods not running state
If a Pod is stuck in the Waiting state, then it has been scheduled to a worker node, but it can't
run on that machine.
Step1: check logs of the pod
Oc log –f <pod-name>
If we can see the error in the log we can fix the issue in the code.
Reason1: image pull back off -> state of the pod
This will arise when there is a problem in pulling the image
Step1: check the image name is correct or not
Step2: check image tag is correct or not
Stecp3: check whether image is pulling from private/public registry, if we are pulling the
images from the private registry we need to configure that option explicitly, if pulling from public
registry and still getting image pull back off issue that means there is some problem with (a)
internet (b) atomic – OpenShift-node.service
Reason2: pod status -> crash loop backoff
Step1: we need to check logs and fix the issue in code
Step2: check if the pod if we have forgotten cmd instruction in docker file
Step3: check if the pod is restarting frequently (fix the issue in code or liveness probe)
Reasron3: run container error -> restart the pods
Issue is likely to be with the mounting PVC or check in stack-overflow.
5. Pods are in not ready state?
Step1: Describe the pod

Step2: check if readiness probe is failing
Oc desc pod <pod-name>
Step3: Check readiness is configured or not
Curt –kv .. command

Pod Terminating Stuck Openshift

Uploaded by

Copyright:

Available Formats

Pod Terminating Stuck Openshift

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pod Terminating Stuck Openshift

Uploaded by

Copyright:

Available Formats

Pod terminating stuck OpenShift

PODs hanging in terminating state.

In that case Force delete the pod

kubectl delete pod --grace-period=0 --force --namespace <NAMESPACE> <PODNAME>

Trident: is an open-source storage provisioner.

1. It is the first out-of-tree, out-of-process storage provisioner that works by watching

Reboot node without causing application outages?

Node A is marked unschedulable and all pods are evacuated.

Node B is now marked unschedulable and is evacuated.

1. Pod is in pending state:

We need to check whether node is working fine or not.

To check any problem with node

Systemctl status docker

Systemctl status atomic -OpenShift -node.service

If we find the node is not running we have to make it as unschedulable

Oc adm manage-node <node-name> --schedulable=false

Oc adm cordon <node-name>

This will make to get attached to other nodes

Check the operation and do remedy and mark is schedulable

Evacuate pods with graceful termination by restarting on another node prior to

a. Set a desired node to unschedulable , preventing new workloads from

Oc adm manage-node <node-name> --evacuate

Oc adm drain <node-name>

Step1: Mark that node as unschedulable

Step2: we have to drain the node

Step3: once the issue is fixed we need to make as schedulable

Oc adm manage-node <node-name> --schedulable

Oc adm uncordon <node-name>

3. Pod is in pending state

Step1: oc desc pod <pod-name>

4. Pods not running state

Reason1: image pull back off -> state of the pod

This will arise when there is a problem in pulling the image

Step1: check the image name is correct or not

Step2: check image tag is correct or not

Reason2: pod status -> crash loop backoff

Step1: we need to check logs and fix the issue in code

Reasron3: run container error -> restart the pods

Issue is likely to be with the mounting PVC or check in stack-overflow.

5. Pods are in not ready state?

Step1: Describe the pod

You might also like