Cluster Monitoring Operator

 ·  ☕ 1 

Openshift cluster monitoring operator 我才不告訴你勒

slide: https://hackmd.io/p/OonUQ9QKQ7-7JPBd1N9tOA?both


We have a collaborative session

please prepare laptop or smartphone to join!


Who am I?

  • Jason Li
  • SRE/Backend developer
  • ❤️ kubernetes Go Rust
  • 🐱 lover
  • 不斷的從入門到放棄

Agenda

  • Background
  • Related Work
  • Method
  • Conclusion

Background

Prometheus Operator, Prometheus, Prometheus Adapter, kube-state-metrics, … e.t.c.

In order to manage such diverse components, a centralized management configuration file is required.


  • UI
  • Prometheus
  • Metrics
  • Thanos

UI

  • Grafana

Prometheus

  • Prometheus Operator
  • Prometheus-k8s
    👎 - Prometheus-user-workload
  • Alertmanager

Prometheus Operator

  • Provide Kubernetes native deployment and management related monitoring components.

  • automate the configuration of a Prometheus based monitoring stack for Kubernetes clusters.

    • Prometheus
    • Alertmanager
    • Related components

Prometheus Operator(cont’d)


Metrics

  • node-exporter
  • kube-state-metrics
  • openshift-state-metrics

👎 prometheus-adapter
👎 Telemeter Client
👎 configuration sharing


node-exporter

  • Node exporter for hardware and OS metrics exposed by *NIX kernels.
  • We can scrape, including a wide variety of system metrics further down in the output (prefixed with node_).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# HELP node_network_transmit_queue_length transmit_queue_length value of /sys/class/net/<iface>.
# TYPE node_network_transmit_queue_length gauge
node_network_transmit_queue_length{device="br0"} 1000
node_network_transmit_queue_length{device="eth0"} 1000
node_network_transmit_queue_length{device="lo"} 1000
node_network_transmit_queue_length{device="ovs-system"} 1000
node_network_transmit_queue_length{device="tun0"} 1000
node_network_transmit_queue_length{device="veth24377b8e"} 0
node_network_transmit_queue_length{device="veth58bd788d"} 0
...


kube-state-metrics

  • Focused on the health of the individual Kubernetes components, such as deployments, nodes and pods.
  • Exposes raw data unmodified from the Kubernetes API
  • Designed to be consumed either by Prometheus

openshift-state-metrics

  • Expands upon kube-state-metrics by adding metrics for OpenShift specific resources.
  • Expose cluster-level metrics for OpenShift specific resources

openshift-state-metrics (cont’d)

  • BuildConfig Metrics
  • Build Metrics
  • DeploymentConfig Metrics
  • ClusterResourceQuota Metrics
  • Route Metrics
  • Group Metrics

ref: https://github.com/openshift/openshift-state-metrics


Thanos

  • Thanos
  • Thanos Querier
  • Thanos Ruler

Method

Component Key
Prometheus Operator prometheusOperator
Prometheus prometheusK8s
Alertmanager alertmanagerMain
kube-state-metrics kubeStateMetrics
openshift-state-metrics openshiftStateMetrics
Grafana grafana
Telemeter Client telemeterClient
Prometheus Adapter k8sPrometheusAdapter
Thanos Querier thanosQuerier

Method (cont’d)

  • Only Prometheus and Alertmanager have extensive configuration options.
  • Other components usually provide only the nodeSelector field.

Method (cont’d)

move components to the node

1
2
3
4
5
6
7
8
data:
  config.yaml: |
    prometheusOperator:
      nodeSelector:
        foo: bar
    prometheusK8s:
      nodeSelector:
        foo: bar    

persistent volume claim

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data:
  config.yaml: |
    prometheusK8s:
      volumeClaimTemplate:
        metadata:
          name: localpvc
        spec:
          storageClassName: local-storage
          resources:
            requests:
              storage: 40Gi    

custom Alertmanager configuration

  • At this stage, cluster monitoring does not provide Alertmanager settings

Conclusion

💯 💪 🎉

Wrap up

  • Self-updating monitoring stack that is based on Prometheus wider eco-system
  • Provides monitoring of cluster components
  • Expect to manage each component through the configuration file🎉

Thank you! 🐑


Meng Ze Li
Meng Ze Li
Kubernetes / DevOps / Backend