{"id":96985,"date":"2024-07-31T16:13:24","date_gmt":"2024-07-31T10:43:24","guid":{"rendered":"https:\/\/www.whizlabs.com\/blog\/?p=96985"},"modified":"2024-07-31T16:13:24","modified_gmt":"2024-07-31T10:43:24","slug":"monitoring-kubernetes-cluster-performance","status":"publish","type":"post","link":"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/","title":{"rendered":"Monitoring Kubernetes Cluster Performance &#8211; Metrics and Best Practices"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Kubernetes has been an essential DevOps tool for running and managing containerized workloads. It automates various container-related tasks including deploying, managing, and configuring containerized applications. It is widely used in largest-scale enterprise environments to run and manage microservices.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Effective monitoring of your Kubernetes cluster performance is key to ensuring the optimal performance of your services. It gives you visibility of your cluster\u2019s health. It helps pick out underlying cluster issues by tracking uptime, cluster utilization metrics ( including disk, memory, and CPU utilization ), and metrics of cluster components such as APIs, pods, and containers. Alerts help operation teams troubleshoot critical issues including unavailable nodes, pod crashes, control plane component failures,\u00a0 high resource utilization, and misconfigurations, to mention a few.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To become proficient in these aspects, consider the <\/span><a href=\"https:\/\/www.whizlabs.com\/certified-kubernetes-administrator\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Certified Kubernetes Administrator (CKA)<\/span><\/a><span style=\"font-weight: 400;\"> certification course for a comprehensive guide on mastering Kubernetes.<\/span><\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_76 ez-toc-wrap-left counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #ea7e02;color:#ea7e02\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #ea7e02;color:#ea7e02\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#Key_Kubernetes_Metrics_to_Measure\" >Key Kubernetes Metrics to Measure<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#Cluster_Node_Metrics\" >Cluster &amp; Node Metrics<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#Container_Metrics\" >Container Metrics<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#Application_Metrics\" >Application Metrics<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#API_Server_Metrics\" >API Server Metrics<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#Ingress_Metrics\" >Ingress\u00a0 Metrics<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#Network_Metrics\" >Network\u00a0 Metrics<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#Kubernetes_Monitoring_Best_Practices\" >Kubernetes Monitoring Best Practices\u00a0<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#Adopt_a_holistic_monitoring_approach\" >Adopt a holistic monitoring approach<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#Implement_proactive_alerting_and_incident_response\" >Implement proactive alerting and incident response<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#Enable_service_auto-discovery\" >Enable service auto-discovery<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#Implement_centralized_monitoring_and_logging\" >Implement centralized monitoring and logging<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#Incorporate_monitoring_with_the_CI_CD_pipeline\" >Incorporate monitoring with the CI \/ CD pipeline<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#Take_advantage_of_Kubernetes-native_Monitoring_Tools\" >Take advantage of\u00a0 Kubernetes-native Monitoring Tools<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#Tools_for_Monitoring_Kubernetes_Cluster_Performance\" >Tools for Monitoring Kubernetes Cluster Performance<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#Kubernetes_Dashboard\" >Kubernetes Dashboard<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-17\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#Prometheus\" >Prometheus\u00a0<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-18\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#Prometheus_comprises_3_components\" >Prometheus comprises 3 components:\u00a0<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-19\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#Grafana\" >Grafana<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-20\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#Elastic_Stack_ELK\" >Elastic Stack ( ELK )<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-21\" href=\"https:\/\/www.whizlabs.com\/blog\/monitoring-kubernetes-cluster-performance\/#Closing_Thoughts\" >Closing Thoughts<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Key_Kubernetes_Metrics_to_Measure\"><\/span><span style=\"font-weight: 400;\">Key Kubernetes Metrics to Measure<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Monitoring a Kubernetes cluster entails getting insights into salient cluster components including containers, pods, deployments, nodes, services, and replica sets. Let\u2019s delve into these resource metrics in more detail.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Cluster_Node_Metrics\"><\/span><span style=\"font-weight: 400;\">Cluster &amp; Node Metrics<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">A <\/span><a href=\"https:\/\/www.whizlabs.com\/blog\/kubernetes-cluster\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Kubernetes cluster <\/span><\/a><span style=\"font-weight: 400;\">typically comprises two main types of nodes:<\/span><\/p>\n<p><b>Master nodes: <\/b><span style=\"font-weight: 400;\">The Kubernetes master node hosts the control plane which manages the cluster, including the worker nodes and pods. It schedules jobs on worker nodes, manages cluster resources, and ensures the smooth running of the cluster.\u00a0<\/span><\/p>\n<p><strong>Worker nodes<\/strong><b>: <\/b><span style=\"font-weight: 400;\">On the other hand, worker nodes host pods that run containerized workloads.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Cluster monitoring is an umbrella term that implies monitoring the health of the entire Kubernetes cluster: the Master node and worker nodes. In doing so, you get to monitor the health, performance, and resource usage of various cluster components including the control plane and worker nodes. It helps you derive valuable insights about the control plane components such as the API server, schedular, controller manager &amp; Etcd metrics. Additionally, you will get information about the health status of persistent volumes, track storage I\/O operations, and monitor ingress\/egress traffic,<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Cluster monitoring also offers visibility into containers, pods, nodes, and deployments such as Daemon Sets, Replica Sets, and Replication controllers. For example, you can view the number of nodes, containers, and pods. You can track running applications inside pods, and view pod and container metrics such as memory and CPU usage.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">At a node level, you will gather insights about individual nodes including the health and performance of nodes, the number of containers and pods running per node, storage metrics, network traffic I\/O, and much more.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Container_Metrics\"><\/span><span style=\"font-weight: 400;\">Container Metrics<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Containers have become the go-to way of shipping and deploying applications to improve app portability and performance. Tracking container metrics is, thus, essential in ensuring the health and performance of your applications.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">You need to pay particular attention to memory and CPU usage per container, container state (Running, Waiting, Terminated), container restarts, and container logs for debugging in the event of errors.\u00a0 These are just a few metrics that will enable you to address issues proactively and avoid service outages.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Application_Metrics\"><\/span><span style=\"font-weight: 400;\">Application Metrics<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Containers and nodes might be running as expected, but just as important is the health status of the application domiciled in the pods. You need to ensure applications run optimally and use up just the right amount of computing resources. Limit the amount of computing resources ( such as memory and CPU ) each application uses to avoid resource depletion on the node which can impact other applications and services.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Consider monitoring metrics such as latency of requests, and frequency of app errors to evaluate performance and address resource constraints and potential failure. In addition, depending on the type of application, you could monitor other aspects such as database connections, network traffic I\/O, queue size, and cache hit rate to get visibility of your applications\u2019 response times.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"API_Server_Metrics\"><\/span><span style=\"font-weight: 400;\">API Server Metrics<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">One of the core components of the Kubernetes Cluster is the API server. It provides the Kubernetes API to users and other components in the cluster. It receives and processes API requests. Simply put, the API server allows users, and other parts of the Kubernetes cluster to communicate seamlessly.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As a central component of the control plane and Cluster in general, proactive monitoring of the API server is crucial to ensure the cluster\u2019s health. One of the key metrics to monitor is the Latency of API requests sent to the cluster. Latency is the time taken to serve an API request. Tracking latencies of API requests lets you know how the API server is responding. Higher latencies point to performance issues within the API server components.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Request rate is yet another metric you may want to keep an eye on. This indicates the traffic that is handled by the API server.\u00a0 Consider evaluating requests from various resources including nodes, pods, deployments, namespaces, and other API services. These include HTTP requests such as POST and GET.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In addition, consider monitoring the logs for API server errors. Errors such as 5xx errors may indicate unavailable services, internal server errors, bad gateway, etc. Error logs provide a convenient way to pick up issues related to the API server or control plane as a whole unit.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Ingress_Metrics\"><\/span><span style=\"font-weight: 400;\">Ingress\u00a0 Metrics<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Monitoring Ingress metrics ensures effective communication between\u00a0 Kubernetes services and external endpoints. It involves monitoring controller traffic metrics to help in tracking traffic statistics and the health of workloads.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">If you are running the Nginx Ingress controller, it&#8217;s good practice to monitor request rates, response times, error alerts, performance, and resource utilization. High latency in requests and unusual traffic spikes might point to insufficient pod replicas or configuration problems.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Network_Metrics\"><\/span><span style=\"font-weight: 400;\">Network\u00a0 Metrics<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Network metrics provide insights into network performance and possible latencies that may impact uptime and availability of services. It\u2019s therefore top of the mind to have visibility of network latency which will give you insights on possible causes and appropriate remediation measures.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In addition, evaluate traffic volume between pods and services and keep an eye on dropped network packets to identify anomalies before they snowball resulting in disruption of services.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Kubernetes_Monitoring_Best_Practices\"><\/span><span style=\"font-weight: 400;\">Kubernetes Monitoring Best Practices\u00a0<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Effective monitoring of your Kubernetes cluster provides actionable insights and robust incident management for seamless and prompt resolution of anomalies. Employing best practices in monitoring your infrastructure lets you easily identify resource constraints, application crashes, service failures, and any issues affecting the nodes and workloads.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Let&#8217;s pour over some of the best Kubernetes monitoring practices.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Adopt_a_holistic_monitoring_approach\"><\/span><span style=\"font-weight: 400;\">Adopt a holistic monitoring approach<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Having a comprehensive monitoring strategy that collects metrics at both cluster and granular levels ( nodes, pods, namespaces, daemonsets, etc ) is highly recommended.\u00a0 At a cluster level, key metrics such as API, Scheduler, Etcd, and network I\/O, along with resource utilization will give you an overview of your cluster&#8217;s overall health status.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Monitoring nodes,\u00a0 pods, containers, replica sets, daemon sets, and other workloads will help you identify resource spikes and anomalies thereby enabling faster issue resolution.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This holistic approach to monitoring will help reveal system-wide patterns and issues thus allowing you to plan and mitigate issues on time.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Implement_proactive_alerting_and_incident_response\"><\/span><span style=\"font-weight: 400;\">Implement proactive alerting and incident response<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Configuring alert thresholds is essential in proactive incident management. It alerts you of impending issues before they happen and helps avert possible service interruption or downtime. For example, you may consider setting an alert threshold at 70% CPU usage if a specific pod regularly crashes at around 80% CPU usage. This will allow you time to take the necessary interventions to avert imminent failure. Escalation policies should be well defined with alert notifications channeled to the appropriate monitoring teams.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Enable_service_auto-discovery\"><\/span><span style=\"font-weight: 400;\">Enable service auto-discovery<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">To get an accurate picture of all the services in your Kubernetes cluster, consider implementing automatic service discovery to detect and monitor new services. This will give you visibility of all the services on your cluster including the newly spawned ones and help you scale your resources accordingly.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Implement_centralized_monitoring_and_logging\"><\/span><span style=\"font-weight: 400;\">Implement centralized monitoring and logging<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">In addition to monitoring metrics and collecting metrics, centralized logging from diverse sources including nodes, pods, containers, application-level logs, and system-wide Kubernetes logs is recommended. Centralized monitoring and log collection lets you easily keep track of all the resources and troubleshoot anomalies from a central point, thus saving time and energy which would have been expended in analyzing metrics from different servers.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Incorporate_monitoring_with_the_CI_CD_pipeline\"><\/span><span style=\"font-weight: 400;\">Incorporate monitoring with the CI \/ CD pipeline<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">The goal of CI\/CD is to ensure efficient building and shipping of software. It automates the delivery of code to testing and eventually to production environments. By doing so, products are rolled out to the market faster and bug fixes are applied efficiently.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Incorporating monitoring within your CI\/CD\u00a0 automates the collection and analysis of metrics from your Kubernetes cluster. This makes it easier and more efficient to stay in the loop of the health status of your cluster and mitigate issues before they escalate.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Take_advantage_of_Kubernetes-native_Monitoring_Tools\"><\/span><span style=\"font-weight: 400;\">Take advantage of\u00a0 Kubernetes-native Monitoring Tools<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Kubernetes provides native monitoring tools to monitor various metrics actively. A perfect example is\u00a0 Kubernetes Dashboard, a web-based UI for monitoring Kubernetes Clusters. With Kubernetes Dashboard, you can monitor a myriad of metrics including system resource utilization and Kubernetes components such as pods, deployments,\u00a0 Replica sets, Daemon Sets, Ingress, Config Maps, and much more. Administrators can easily have an overview of the Cluster including running applications and troubleshooting errors. Other popular Kubernetes cluster monitoring tools include Grafana and Prometheus. We shall discuss this in much detail in a short while.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Tools_for_Monitoring_Kubernetes_Cluster_Performance\"><\/span><span style=\"font-weight: 400;\">Tools for Monitoring Kubernetes Cluster Performance<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Here are a few monitoring solutions you should consider to stay on top of your monitoring game. These are open-source tools that provide excellent visibility into your Kubernetes infrastructure.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Kubernetes_Dashboard\"><\/span><span style=\"font-weight: 400;\">Kubernetes Dashboard<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><a href=\"https:\/\/kubernetes.io\/docs\/tasks\/access-application-cluster\/web-ui-dashboard\/\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">Kubernetes dashboard<\/span><\/a><span style=\"font-weight: 400;\"> is an open-source native tool for monitoring Kubernetes clusters. It lets you gain a bird\u2019s eye view of your cluster\u2019s health and resource usage. You can view and manage various cluster components including your deployments, Daemon Sets, Stateful sets, Replica Sets, and much more<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In addition, the UI lets you explore namespaces, examine persistent volumes, config maps and monitor running services.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">A built-in logs viewer lets you peek at logs from containers running from pods and provides insights in case of anomalies or underlying issues.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Prometheus\"><\/span><span style=\"font-weight: 400;\">Prometheus\u00a0<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Written in Go language, Prometheus is a popular monitoring and alerting tool for collecting and storing time-series data. It collects data by scraping HTTP endpoints from associated target metrics. It leverages service discovery or static configuration to detect and monitor target nodes.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Prometheus_comprises_3_components\"><\/span><span style=\"font-weight: 400;\">Prometheus comprises 3 components:\u00a0<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><b>Prometheus server:<\/b><span style=\"font-weight: 400;\"> The Prometheus server gathers time-series metrics from exporters and stores them in a database.\u00a0<\/span><\/p>\n<p><b>Export manager: <\/b><span style=\"font-weight: 400;\">\u00a0Exporters are programs that fetch and parse metrics from various endpoints and pass them to the Prometheus server for processing and storage.\u00a0<\/span><\/p>\n<p><b>The AlertManager: <\/b><span style=\"font-weight: 400;\">The AlertManager, as the name suggests, is used to configure alerts and send notifications to operation teams when specific events are triggered.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Prometheus is an ideal platform for monitoring Kubernetes clusters because it can gather application metrics separately from node-level statistics.\u00a0 It offers a vast array of libraries for gathering service metrics and node exporters for retrieving resource usage stats including memory and CPU utilization, bandwidth metrics, disk space usage, and much more.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Prometheus is often used alongside Grafana for data visualization on elegant dashboards.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Grafana\"><\/span><span style=\"font-weight: 400;\">Grafana<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><a href=\"https:\/\/grafana.com\/\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">Grafana<\/span><\/a><span style=\"font-weight: 400;\"> is an open-source data analytics platform that offers immersive and interactive dashboards for data visualization. It collates data from multiple endpoints and displays them on built-in,\u00a0 customizable charts. Grafana ingests data from a wide range of endpoints, including\u00a0 Prometheus which stores time-series data. With Prometheus as a data source, data is fetched and displayed on beautiful charts for visualization and easy analysis.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The Grafana-Prometheus coupling provides a perfect combination in getting a complete view of the Kubernetes cluster: from cluster level down to application monitoring. Metrics are displayed on color-coded dashboards for faster incident identification and management. Charts are displayed side by side showing various metrics including cluster resource utilization, network I\/O, filesystem usage, individual pod, container statistics, and much more.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Elastic_Stack_ELK\"><\/span><span style=\"font-weight: 400;\">Elastic Stack ( ELK )<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">If your goal is log aggregation and monitoring, consider <\/span><a href=\"https:\/\/www.elastic.co\/elastic-stack\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">Elastic Stack<\/span><\/a><span style=\"font-weight: 400;\">. Although it is no longer open-source, Elastic Stack is one of the go-to tools for monitoring Kubernetes logs. Elastic Stack is commonly abbreviated as ELK. It is a tech stack comprising three components: Elasticsearch Logstash, and Kibana.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Elasticsearch is a search and analytics engine that stores and indexes logs. It provides a powerful HTTP RESTful API for performing blazing-fast searches in real time.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Logstash is a data collection engine that ingests data from multiple endpoints and processes it. It then sends it to Elasticsearch for indexing and storage.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Kibana is a powerful and rich visualization tool that visualizes data stored in Elasticsearch on beautiful charts and dashboards.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Each of these components can easily be deployed in a Kubernetes environment as helm charts or as individual pods.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Closing_Thoughts\"><\/span><span style=\"font-weight: 400;\">Closing Thoughts<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">We cannot stress enough how crucial Kubernetes monitoring is in tracking the health of your cluster. It offers insightful metrics that help you optimize performance, and resource allocation, and navigate the various issues associated with the cluster. In this guide, we have shared useful tips on best Kubernetes practices and some monitoring tools to help you keep an eye on your cluster performance. If you&#8217;re wondering <\/span><a href=\"https:\/\/www.whizlabs.com\/blog\/kubernetes-certifications-list\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Which Kubernetes Certification is Right for You?<\/span><\/a><span style=\"font-weight: 400;\">, Whizlabs provides a comprehensive overview to help you choose the certification that best fits your career goals.<\/span><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Kubernetes has been an essential DevOps tool for running and managing containerized workloads. It automates various container-related tasks including deploying, managing, and configuring containerized applications. It is widely used in largest-scale enterprise environments to run and manage microservices. Effective monitoring of your Kubernetes cluster performance is key to ensuring the optimal performance of your services. It gives you visibility of your cluster\u2019s health. It helps pick out underlying cluster issues by tracking uptime, cluster utilization metrics ( including disk, memory, and CPU utilization ), and metrics of cluster components such as APIs, pods, and containers. Alerts help operation teams troubleshoot [&hellip;]<\/p>\n","protected":false},"author":408,"featured_media":96988,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_uag_custom_page_level_css":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"default","adv-header-id-meta":"","stick-header-meta":"default","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"set","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[1862,4823],"tags":[5203,5202],"class_list":["post-96985","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-devops","category-kuberenetes","tag-kubernetes-cluster-performance","tag-kubernetes-metrics"],"uagb_featured_image_src":{"full":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2024\/07\/Monitoring-Kubernetes-Cluster-Performance-scaled.webp",2560,1286,false],"thumbnail":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2024\/07\/Monitoring-Kubernetes-Cluster-Performance-150x150.webp",150,150,true],"medium":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2024\/07\/Monitoring-Kubernetes-Cluster-Performance-300x151.webp",300,151,true],"medium_large":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2024\/07\/Monitoring-Kubernetes-Cluster-Performance-768x386.webp",768,386,true],"large":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2024\/07\/Monitoring-Kubernetes-Cluster-Performance-1024x515.webp",1024,515,true],"1536x1536":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2024\/07\/Monitoring-Kubernetes-Cluster-Performance-1536x772.webp",1536,772,true],"2048x2048":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2024\/07\/Monitoring-Kubernetes-Cluster-Performance-2048x1029.webp",2048,1029,true],"profile_24":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2024\/07\/Monitoring-Kubernetes-Cluster-Performance-scaled.webp",24,12,false],"profile_48":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2024\/07\/Monitoring-Kubernetes-Cluster-Performance-scaled.webp",48,24,false],"profile_96":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2024\/07\/Monitoring-Kubernetes-Cluster-Performance-scaled.webp",96,48,false],"profile_150":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2024\/07\/Monitoring-Kubernetes-Cluster-Performance-scaled.webp",150,75,false],"profile_300":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2024\/07\/Monitoring-Kubernetes-Cluster-Performance-scaled.webp",300,151,false],"tptn_thumbnail":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2024\/07\/Monitoring-Kubernetes-Cluster-Performance-250x250.webp",250,250,true],"web-stories-poster-portrait":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2024\/07\/Monitoring-Kubernetes-Cluster-Performance-640x853.webp",640,853,true],"web-stories-publisher-logo":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2024\/07\/Monitoring-Kubernetes-Cluster-Performance-96x96.webp",96,96,true],"web-stories-thumbnail":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2024\/07\/Monitoring-Kubernetes-Cluster-Performance-150x75.webp",150,75,true]},"uagb_author_info":{"display_name":"Anitha Dorairaj","author_link":"https:\/\/www.whizlabs.com\/blog\/author\/anitha-dorairaj\/"},"uagb_comment_info":6,"uagb_excerpt":"Kubernetes has been an essential DevOps tool for running and managing containerized workloads. It automates various container-related tasks including deploying, managing, and configuring containerized applications. It is widely used in largest-scale enterprise environments to run and manage microservices. Effective monitoring of your Kubernetes cluster performance is key to ensuring the optimal performance of your services.&hellip;","_links":{"self":[{"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/posts\/96985","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/users\/408"}],"replies":[{"embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/comments?post=96985"}],"version-history":[{"count":5,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/posts\/96985\/revisions"}],"predecessor-version":[{"id":96991,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/posts\/96985\/revisions\/96991"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/media\/96988"}],"wp:attachment":[{"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/media?parent=96985"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/categories?post=96985"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/tags?post=96985"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}