Kubernetes kubelet monitoring
The Kubernetes kubelet is the primary node agent responsible for ensuring that the containers on each node are running and healthy. Monitoring the kubelet helps you troubleshoot pod startup failures, track container runtime errors, detect PLEG staleness, and ensure consistent pod life cycle management across your cluster. Site24x7 automatically discovers and monitors the kubelet running on each Kubernetes node.
How it works
The Site24x7 agent collects the following Kubelet metrics from each node:
- Runtime and process-level metrics for the kubelet process.
- Pod life cycle and container state metrics.
- Container runtime operation counts, errors, and durations.
- Pod startup and worker operation timing.
- Storage operation success and failure tracking.
- Cgroup manager operation durations.
- PLEG (Pod Lifecycle Event Generator) relist intervals and durations.
These metrics help you detect scheduling delays, startup failures, container runtime errors, and degradation in pod life cycle events.
Prerequisites
- Install Site24x7 Kubernetes agent version 22.1.00 or above.
- For existing customers, upgrade your Kubernetes agent to version 22.1.00 or above.
The Site24x7 Kubernetes agent must be installed and running on the cluster before enabling kubelet monitoring.
View your Kubernetes Kubelet monitor
As soon as you upgrade your agent, the Site24x7 Kubernetes monitoring agent will fetch all the Kubelet monitoring metrics.
To view your Kubernetes Kubelet monitor:
- Log in to your Site24x7 account.
- Navigate to K8s. Then, select the cluster name.
- Click Kubelet.
This will open the list of kubelet monitors in that particular cluster. Select any one to view detailed insights into that monitor.

Supported metrics
The following tables list all kubelet metrics collected at every poll interval.
Utilization
| Metric name | Description | Units |
|---|---|---|
| Running Pods | The total number of pods that have a running pod sandbox during the last poll time. | Count |
| Pod Start Count | The count of kubelet seeing a pod for the first time to the pod starting to run during the last poll period. | Count |
| Total Rest Client Requests | The total number of HTTP requests during the last poll period. | Count |
| Container Log File System Usage | The total bytes used by the container's logs on the filesystem during the last poll time. | Bytes |
| Process CPU Time | The CPU time consumed by the kubelet process during the last poll period. | Seconds |
| Process Open File Descriptors | The number of file descriptors that a process has open by the kubelet process during the last poll time. | Count |
| Container States | ||
| Containers in Created State | The number of containers that are currently with the state of created during the last poll time. | Count |
| Containers in Exited State | The number of containers that are currently with the state of exited during the last poll time. | Count |
| Containers in Running State | The number of containers that are currently with the state of running during the last poll time. | Count |
| Pod Starts Duration | ||
| Average Pod Start Duration | The average time taken by the kubelet per pod to start the pod to run during the last poll period. | Seconds |
| Total Pod Starts Duration | The total time taken for the kubelet to start the pod to run during the last poll period. | Seconds |
| Runtime Operations | ||
| Kubelet Runtime Operations | The total number of runtime operations during the last poll period. | Count |
| Kubelet Runtime Operations Errors | The total number of runtime operation errors during the last poll period. | Count |
| Runtime Operations Duration | ||
| Average Runtime Operation Duration | The average time taken per runtime operations during the last poll period. | Seconds |
| Total Runtime Operations Duration | The total time taken for the runtime operations during the last poll period. | Seconds |
| Process Memory Usage | ||
| Process Resident Memory | The amount of resident memory size in bytes used by the kubelet process during the last poll period. | Bytes |
| Process Virtual Memory | The amount of virtual memory size in bytes used by the kubelet process during the last poll period. | Bytes |
| Go Usage | ||
| Go Threads | The number of OS threads created by the Go runtime in the kubelet process during the last poll period. | Count |
| Goroutines | The number of goroutines that currently exist for the kubelet process during the last poll period. | Count |
Operations
| Metric name | Description | Units |
|---|---|---|
| Pod Worker Starts | The total count of pods that started a worker during the last poll period. | Count |
| Cgroup Manager Operations | The total count of cgroup manager operations during the last poll period. | Count |
| Pod Worker Operations | ||
| Pod Worker Operations - Create | The total count of pod to sync by kubelet for Create operation during the last poll period. | Count |
| Pod Worker Operations - Sync | The total count of pod to sync by kubelet for Sync operation during the last poll period. | Count |
| Pod Worker Operations - Update | The total count of pod to sync by kubelet for Update operations during the last poll period. | Count |
| Pod Worker Operations Duration | ||
| Average Pod Worker Operation Duration - Create | The average time taken by the kubelet to sync per pod for Create operation during the last poll period. | Seconds |
| Average Pod Worker Operation Duration - Sync | The average time taken by the kubelet to sync per pod for Sync operation during the last poll period. | Seconds |
| Average Pod Worker Operation Duration - Update | The average time taken by the kubelet to sync per pod for Update operation during the last poll period. | Seconds |
| Pod Worker Start Duration | ||
| Average Pod Worker Start Duration | The average time taken by the kubelet to see a pod and start a worker during the last poll period. | Seconds |
| Total Pod Worker Starts Duration | The total time taken for kubelet seeing a pod to starting a worker during the last poll period. | Seconds |
| Storage Operations | ||
| Storage Operations - Success | The total count of storage operation with status success during the last poll period. | Count |
| Storage Operations - Failure | The total time taken by the kubelet to see a pod and start a worker during the last poll period. | Count |
| Storage Operations Duration | ||
| Average Storage Operation Duration - Success | The average time taken per storage operation with status success during the last poll period. | Seconds |
| Average Storage Operation Duration - Failure | The average time taken per storage operation with status fail-unknown during the last poll period. | Seconds |
| Cgroup Manager Operation Duration | ||
| Average Cgroup Manager Operation Duration | The average time taken per cgroup manager operations during the last poll period. | Seconds |
| Total Cgroup Manager Operations Duration | The total time taken for cgroup manager operations during the last poll period. | Seconds |
Pod Lifecycle Event Generator (PLEG)
| Metric name | Description | Units |
|---|---|---|
| PLEG Relists | The total count of relisting in PLEG during the last poll period. | Count |
| PLEG Last Seen | The timestamp in seconds when PLEG was last seen active. | Timestamp |
| Time Since PLEG Last Seen | The elapsed time in seconds since PLEG last reported an event, indicating potential PLEG staleness. | Seconds |
| PLEG Discard Events | The number of discard events in PLEG during the last poll period. | Count |
| PLEG Relist Interval | ||
| Average PLEG Relist Interval | The average time taken between relisting pods in PLEG during the last poll period. | Seconds |
| Total PLEG Relists Interval | The total time taken between relisting pods in PLEG during the last poll period. | Seconds |
| PLEG Relist Duration | ||
| Average PLEG Relist Duration | The average time taken for relisting pods in PLEG during the last poll period. | Seconds |
| Total PLEG Relists Duration | The total time taken for relisting pods in PLEG during the last poll period. | Seconds |
