Home > Mobile >  Monitoring Kubernetes Pods in Google Cloud
Monitoring Kubernetes Pods in Google Cloud

Time:01-12

We have an application deployed on GKE with a total of 10 pods running and serving the application. I am trying to find the metrics using which I can create an alert when my Pod goes down or is there a way to check the status of Pods so that I can set up an alert based on that condition?

I explored GCP and looked into their documentation but couldn't find anything. What I could find is one metric below but I don't know what it measures. To me it looks like a number of times Kubernetes thinks a pod has died and it restarts the pod.

Metric: kubernetes.io/container/restart_count
Resource type: k8s_container

Any advice on this is highly appreciated as we can improve our monitoring based on this metric

GCP alerting policy creation

CodePudding user response:

That metric is the same you are right it will the count of POD restart.

Number of times the container has restarted. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.

Read more at : https://cloud.google.com/monitoring/api/metrics_kubernetes

Or

You can use Prometheus to get the metrics and monitor with Grafana

sum(kube_pod_container_status_restarts_total{cluster="$cluster",namespace="$namespace",pod=~"$service.*"})

This will give the value of the POD restart count.

OR

You can also use the BotKube : https://www.botkube.io/installation/

You can set to notify when your readiness liveness fails to slack notification etc..

Or

You write your own script and run it on Kubernetes to monitor and notify when any POD restart in cluster.

Example github : https://github.com/harsh4870/Slack-Post-On-POD-Ready-State

This script notifies in slack when POD becomes ready after deployment, you can change it to monitor the restart count.

i would recommend using Prometheus, Grafana option, however, stackdriver is Good but i am not Google employee.

CodePudding user response:

Why do you want to monitor when a pod is down? Kubernetes will immediatly try to start it on the same node or on a different one if that node is down for whatever reason.

Instead, there are other metrics you have to monitor for. Like the restart_count which could indicate that pods are not coming back online. But also other metrics like

  • kube_pod_container_status_restarts_total
  • kube_pod_status_phase
  • kube_pod_container_status_running
  • kube_pod_status_phase vs kube_node_status_capacity_pods

This article has a lot of interesting metrics to monitor for https://medium.com/google-cloud/gke-monitoring-84170ea44833

  •  Tags:  
  • Related