Alerting for your Kubernetes monitor
Effective Kubernetes monitoring relies on proactively identifying and addressing potential issues before they impact application performance and availability. Site24x7 provides robust Kubernetes monitoring capabilities, with configuring thresholds being a key aspect.
Thresholds allow you to define acceptable performance limits for various Kubernetes resources and receive alerts when these limits are breached, enabling you to take corrective action promptly.
Why are thresholds important in Kubernetes
Using thresholds for Kubernetes monitoring offers several benefits:
-
Early detection of performance bottlenecks: Identify resource constraints (CPU, memory, disk I/O) before they impact application performance.
-
Proactive alerting for potential issues: Receive notifications about potential problems before they escalate into major incidents.
-
Reduced downtime and improved application availability: Resolve issues quickly and minimize downtime by receiving alerts before issues arise.
-
Improved resource management: Optimize resource allocation based on performance data and threshold violations.
Threshold configuration
Site24x7 allows you to configure thresholds for a wide range of Kubernetes metrics for every supported resource type. You can create new or associate existing threshold profiles for every Kubernetes resource being monitored.
Follow the steps below to create a threshold profile:
-
Log in to Site24x7.
-
Navigate to K8s > select the cluster > go to the Kubernetes resource monitor you'd like to associate with a threshold profile.
-
Hover over the hamburger icon
next to the display name. Click Edit.
-
Under Configuration Profiles > Threshold and Availability, click the plus icon + to create a new profile. Click the pencil icon
to edit an existing profile.
-
Click Save.
Configuring thresholds for specific Kubernetes resources:
Site24x7 allows you to configure thresholds at different levels of your Kubernetes environment.
Here's a sample configuration for the key component, i.e., pods:
These critical metrics include CPU Usage, Memory Usage, Restart Count, and Readiness Probe Failures.
Sample configuration 1:
If you are setting alerts for when your pod restart count crosses, you can get alerted when the pod restarts cross the ideal threshold limit and also when the resource utilization breaches the defined limit.
-
Navigate to the pod monitor and click the hamburger icon
> Edit.
-
Under Configuration Profiles > Threshold and Availability, click the plus icon + to create a new profile. Click the pencil icon
to edit an existing profile.
-
In the Threshold Profile pop-up window, under Threshold Configuration, select Restart Counts from the Set Threshold Values drop-down menu.
-
Select the below specifications for this threshold configuration:
-
-
Metric: Restart Count
-
Threshold Type: Static Threshold
-
Severity Level: Critical
-
Threshold Count: 5
-
Condition: Above
-
Select the condition, threshold count, and the poll value and then Save your changes.
Sample configuration 2:
Similarly, you can get alerts when a pod's CPU usage exceeds 90%:
Navigate to the Edit Kubernetes monitor page using the above steps, and select the below specifications for this threshold configuration:
-
-
Metric: Pod CPU Usage
-
Threshold Type: Static Threshold
-
Severity Level: Critical
-
Threshold Value: 90%
-
Condition: Above
-
By configuring this, you will get alerted when the pod's CPU usage exceeds 90%.
You can also set thresholds for certain metrics at different levels of Kubernetes components. For example, you can set alerts for the CPU usage of the pods on different levels, beginning with the pod, at the deployment level, namespace, node, or at the cluster level for holistic tracking.
Similarly, you can create alerts for all the different threshold attributes for all the components, including:
Setting severity levels
When configuring a threshold, you must assign a severity level (Critical, Warning, Trouble) to the alert that will be triggered when the threshold is breached. Choose the severity level based on the impact of the issue.
-
Down: Use for issues that directly impact application availability or performance and require immediate attention.
-
Critical: Use for potential problems that should be investigated but do not currently impact application availability.
-
Trouble: Use for minor issues or deviations from normal behavior that may require monitoring but do not warrant immediate action.
Alerting mechanisms
Site24x7 offers various alerting mechanisms, including email, SMS, webhooks, and voice calls, to notify you when thresholds are breached.
Configuring thresholds is a critical aspect of effective Kubernetes monitoring with Site24x7. By setting appropriate thresholds, you can proactively identify and address potential issues before they impact application performance and availability.