开发一个接口监控的Prometheus Exporter

想必大家对于黑盒监控都不陌生，我们经常使用 blackbox_exporter 来进行黑盒监控，在 K8s 中进行黑盒监控可以参考这里。

既然已经有成熟的工具，为何自己还要再来尝试开发一个？

我说是为了学习，你信吗？

既然是为了学习，整体逻辑就不用太复杂，主要需要实现以下功能：

可以通过配置文件的方式增加监控项
吐出 Prometheus 可收集指标
支持 tcp 和 http 探测
支持配置检测频率

写在前面

在正式开始之前，先简单介绍一下 Prometheus 以及 Prometheus Exporter。

Prometheus 是 CNCF 的一个开源监控工具，是近几年非常受欢迎的开源项目之一。在云原生场景下，经常使用它来进行指标监控。

Prometheus 支持 4 种指标类型：

Counter（计数器）：只增不减的指标，比如请求数，每来一个请求，该指标就会加 1。
Gauge（仪表盘）：动态变化的指标，比如 CPU，可以看到它的上下波动。
Histogram（直方图）：数据样本分布情况的指标，它将数据按 Bucket 进行划分，并计算每个 Bucket 内的样本的一些统计信息，比如样本总量、平均值等。
Summary（摘要）：类似于 Histogram，也用于表示数据样本的分布情况，但同时展示更多的统计信息，如样本数量、总和、平均值、上分位数、下分位数等。

在实际使用中，常常会将这些指标组合起来使用，以便能更好的观测系统的运行状态和性能指标。

这些指标从何而来？

Prometheus Exporter 就是用来收集和暴露指标的工具，通常情况下是 Prometheus Exporter 收集并暴露指标，然后 Prometheus 收集并存储指标，使用 Grafana 或者 Promethues UI 可以查询并展示指标。

Prometheus Exporter 主要包含两个重要的组件：

Collector：收集应用或者其他系统的指标，然后将其转化为 Prometheus 可识别收集的指标。
Exporter：它会从 Collector 获取指标数据，并将其转成为 Prometheus 可读格式。

那 Prometheus Exporter 是如何生成 Prometheus 所支持的 4 种类型指标（Counter、Gauge、Histogram、Summary）的呢？

Prometheus 提供了客户端包 github.com/prometheus/client_golang，通过它可以声明不通类型的指标，比如：

（1）针对 Counter 类型

import (
 "net/http"

 "github.com/prometheus/client_golang/prometheus"
 "github.com/prometheus/client_golang/prometheus/promhttp"
)

func main() {
 // 创建一个Counter指标
 counterMetric := prometheus.NewCounter(prometheus.CounterOpts{
  Name: "example_counter", // 指标名称
  Help: "An example counter metric.", // 指标帮助信息
 })

 // 注册指标
 prometheus.MustRegister(counterMetric)

 // 增加指标值
 counterMetric.Inc()

 // 创建一个HTTP处理器来暴露指标
 http.Handle("/metrics", promhttp.Handler())

 // 启动Web服务器
 http.ListenAndServe(":8080", nil)
}

（2）针对 Grauge 类型

import (
 "net/http"

 "github.com/prometheus/client_golang/prometheus"
 "github.com/prometheus/client_golang/prometheus/promhttp"
)

func main() {
 // 创建一个Gauge指标
 guageMetric := prometheus.NewGauge(prometheus.GaugeOpts{
  Name: "example_gauge", // 指标名称
  Help: "An example gauge metric.", // 指标帮助信息
 })

 // 注册指标
 prometheus.MustRegister(guageMetric)

 // 设置指标值
 guageMetric.Set(100)

 // 创建一个HTTP处理器来暴露指标
 http.Handle("/metrics", promhttp.Handler())

 // 启动Web服务器
 http.ListenAndServe(":8080", nil)
}

（3）针对 Histogram 类型

import (
 "math/rand"
 "net/http"
 "time"

 "github.com/prometheus/client_golang/prometheus"
 "github.com/prometheus/client_golang/prometheus/promhttp"
)

func main() {
 // 创建一个Histogram指标
 histogramMetric := prometheus.NewHistogram(prometheus.HistogramOpts{
  Name:    "example_histogram", // 指标名称
  Help:    "An example histogram metric.", // 指标帮助信息
  Buckets: prometheus.LinearBuckets(0, 10, 10), // 设置桶宽度
 })

 // 注册指标
 prometheus.MustRegister(histogramMetric)

 // 定期更新指标值
 go func() {
  for {
   time.Sleep(time.Second)
   histogramMetric.Observe(rand.Float64() * 100)
  }
 }()

 // 创建一个HTTP处理器来暴露指标
 http.Handle("/metrics", promhttp.Handler())

 // 启动Web服务器
 http.ListenAndServe(":8080", nil)
}

（4）针对 Summary 类型

import (
 "math/rand"
 "net/http"
 "time"

 "github.com/prometheus/client_golang/prometheus"
 "github.com/prometheus/client_golang/prometheus/promhttp"
)

func main() {
 // 创建一个Summary指标
 summaryMetric := prometheus.NewSummary(prometheus.SummaryOpts{
  Name:       "example_summary", // 指标名称
  Help:       "An example summary metric.", // 指标帮助信息
  Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001}, // 设置分位数和偏差
 })

 // 注册指标
 prometheus.MustRegister(summaryMetric)

 // 定期更新指标值
 go func() {
  for {
   time.Sleep(time.Second)
   summaryMetric.Observe(rand.Float64() * 100)
  }
 }()

 // 创建一个HTTP处理器来暴露指标
 http.Handle("/metrics", promhttp.Handler())

 // 启动Web服务器
 http.ListenAndServe(":8080", nil)
}

上面的例子都是直接在创建指标的时候声明了指标描述，我们也可以先声明描述，再创建指标，比如：

import (  
   "github.com/prometheus/client_golang/prometheus"  
   "github.com/prometheus/client_golang/prometheus/promhttp"   "net/http")  
  
// 1. 定义一个结构体，用于存放描述信息  
type Exporter struct {  
   summaryDesc *prometheus.Desc  
}  
  
// 2. 定义一个Collector接口，用于存放两个必备函数，Describe和Collect  
type Collector interface {  
   Describe(chan<- *prometheus.Desc)  
   Collect(chan<- prometheus.Metric)  
}  
  
// 3. 定义两个必备函数Describe和Collect  
func (e *Exporter) Describe(ch chan<- *prometheus.Desc) {  
   // 将描述信息放入队列  
   ch <- e.summaryDesc  
}  
  
func (e *Exporter) Collect(ch chan<- prometheus.Metric) {  
   //  采集业务指标数据  
   ch <- prometheus.MustNewConstSummary(  
      e.summaryDesc, // 将指标数据与自定义描述信息绑定  
      4711, 403.34,  //  是该指标数据的值，这里表示该 Summary 指标的计数值和总和值。  
      map[float64]float64{0.5: 42.3, 0.9: 323.3}, // 是一个 map，其中包含了 Summary 指标的 quantile 值及其对应的值。例如，0.5 表示 50% 的样本值处于这个值以下，0.9 表示 90% 的样本值处于这个值以下  
      "200", "get", // 是指标的标签值，用于标识和区分指标实例的特征。这些标签值与在 NewExporter 中创建的 prometheus.NewDesc 函数的第三个参数相对应。  
   )  
}  
  
// 4. 定义一个实例化函数，用于生成prometheus数据  
func NewExporter() *Exporter {  
   return &Exporter{  
      summaryDesc: prometheus.NewDesc(  
         "example_summary",                   // 指标名  
         "An example summary metric.",        // 帮助信息  
         []string{"code", "method"},          // 变量标签名，值是可变的  
         prometheus.Labels{"owner": "joker"}, // 常量标签，固定的  
      ),  
   }  
}  
  
func main() {  
   // 实例化exporter  
   exporter := NewExporter()  
  
   // 注册指标  
   prometheus.MustRegister(exporter)  
  
   // 创建一个HTTP处理器来暴露指标  
   http.Handle("/metrics", promhttp.Handler())  
  
   // 启动Web服务器  
   http.ListenAndServe(":8080", nil)  
}

通过上面的介绍，对于怎么创建一个 Prometheus Exporter 是不是有了初步的了解？主要可分为下面几步：

定义一个 Exporter 结构体，用于存放描述信息
实现 Collector 接口
实例化 exporter
注册指标
暴露指标

现在开始

有了一定的基本知识后，我们开始开发自己的 Exporter。

我们再来回顾一下需要实现的功能：

可以通过配置文件的方式增加监控项
吐出 Prometheus 可收集指标
支持 tcp 和 http 探测
支持配置检测频率

（1）我们的采集对象是通过配置文件加载的，所以我们可以先确定配置文件的格式，我希望的是如下格式：

- url: "http://www.baidu.com"  
  name: "百度测试"  
  protocol: "http"
  check_interval: 2s
- url: "localhost:2222"  
  name: "本地接口2222检测"  
  protocol: "tcp"

其中 check_interval 是检测频率，如果不写，默认是 1s。

我们需要解析配置文件里的内容，所以需要先定义配置文件的结构体，如下：

// InterfaceConfig 定义接口配置结构  
type InterfaceConfig struct {  
   Name          string        `yaml:"name"`  
   URL           string        `yaml:"url"`  
   Protocol      string        `yaml:"protocol"`  
   CheckInterval time.Duration `yaml:"check_interval,omitempty"`  
}

然后，我们使用的是 yaml 格式的配置文件，保存在 config.yaml 文件中，意味着我们需要解析 config.yaml 这个文件，然后再解析。

// loadConfig 从配置文件加载接口配置  
func loadConfig(configFile string) ([]InterfaceConfig, error) {  
   config := []InterfaceConfig{}  
  
   // 从文件加载配置  
   data, err := ioutil.ReadFile(configFile)  
   if err != nil {  
      return nil, err  
   }  
  
   // 解析配置文件  
   err = yaml.Unmarshal(data, &config)  
   if err != nil {  
      return nil, err  
   }  
  
   // 设置默认的检测时间间隔为1s  
   for i := range config {  
      if config[i].CheckInterval == 0 {  
         config[i].CheckInterval = time.Second  
      }  
   }  
  
   return config, nil  
}

因为监控对象可以是多个，所以使用 []InterfaceConfig{} 来保存多个对象。

（2）定义接口探测的 Collector 接口，实现 Promethues Collector 接口

type HealthCollector struct {  
   interfaceConfigs []InterfaceConfig  
   healthStatus     *prometheus.Desc  
}

这里将配置文件也放进去，期望在初始化 HealthCollector 的时候将配置文件一并加载了。

// NewHealthCollector 创建HealthCollector实例  
func NewHealthCollector(configFile string) (*HealthCollector, error) {  
   // 从配置文件加载接口配置  
   config, err := loadConfig(configFile)  
   if err != nil {  
      return nil, err  
   }  
  
   // 初始化HealthCollector  
   collector := &HealthCollector{  
      interfaceConfigs: config,  
      healthStatus: prometheus.NewDesc(  
         "interface_health_status",  
         "Health status of the interfaces",  
         []string{"name", "url", "protocol"},  
         nil,      ),  
   }  
  
   return collector, nil  
}

在这里定义了 []string{"name", "url", "protocol"} 动态标签，方便使用 PromQL 查询指标和做监控告警。

（3）实现 Prometheus Collector 接口的 Describe 和 Collect 方法

// Describe 实现Prometheus Collector接口的Describe方法  
func (c *HealthCollector) Describe(ch chan<- *prometheus.Desc) {  
   ch <- c.healthStatus  
}  
  
// Collect 实现Prometheus Collector接口的Collect方法  
func (c *HealthCollector) Collect(ch chan<- prometheus.Metric) {  
   var wg sync.WaitGroup  
  
   for _, iface := range c.interfaceConfigs {  
      wg.Add(1)  
  
      go func(iface InterfaceConfig) {  
         defer wg.Done()  
  
         // 检测接口健康状态  
         healthy := c.checkInterfaceHealth(iface)  
  
         // 创建Prometheus指标  
         var metricValue float64  
         if healthy {  
            metricValue = 1  
         } else {  
            metricValue = 0  
         }  
         ch <- prometheus.MustNewConstMetric(  
            c.healthStatus,  
            prometheus.GaugeValue,  
            metricValue,  
            iface.Name,  
            iface.URL,  
            iface.Protocol,  
         )  
      }(iface)  
   }  
  
   wg.Wait()  
}

在 Collect 方法中，我们通过 checkInterfaceHealth 来获取检测对象的监控状态，然后创建 Prometheus 对应的指标，这里规定 1 就是存活状态，0 就是异常状态。

（4）实现 http 和 tcp 检测方法

// checkInterfaceHealth 检测接口健康状态  
func (c *HealthCollector) checkInterfaceHealth(iface InterfaceConfig) bool {  
   switch iface.Protocol {  
   case "http":  
      return c.checkHTTPInterfaceHealth(iface)  
   case "tcp":  
      return c.checkTCPInterfaceHealth(iface)  
   default:  
      return false  
   }  
}  
  
// checkHTTPInterfaceHealth 检测HTTP接口健康状态  
func (c *HealthCollector) checkHTTPInterfaceHealth(iface InterfaceConfig) bool {  
   client := &http.Client{  
      Timeout: 5 * time.Second,  
   }  
  
   resp, err := client.Get(iface.URL)  
   if err != nil {  
      return false  
   }  
   defer resp.Body.Close()  
  
   return resp.StatusCode == http.StatusOK  
}  
  
// checkTCPInterfaceHealth 检测TCP接口健康状态  
func (c *HealthCollector) checkTCPInterfaceHealth(iface InterfaceConfig) bool {  
   conn, err := net.DialTimeout("tcp", iface.URL, 5*time.Second)  
   if err != nil {  
      return false  
   }  
   defer conn.Close()  
  
   return true  
}

http 和 tcp 的检测方法这里比较粗暴，http 的就请求一次查看状态码，tcp 的就检查能不能建立连接。

（5）创建 main 方法，完成开发

func main() {  
   // 解析命令行参数  
   configFile := flag.String("config", "", "Path to the config file")  
   flag.Parse()  
  
   if *configFile == "" {  
      // 默认使用当前目录下的config.yaml  
      *configFile = "config.yaml"  
   }  
  
   // 加载配置文件  
   collector, err := NewHealthCollector(*configFile)  
   if err != nil {  
      fmt.Println("Failed to create collector:", err)  
      return  
   }  
  
   // 注册HealthCollector  
   prometheus.MustRegister(collector)  
  
   // 启动HTTP服务，暴露Prometheus指标  
   http.Handle("/metrics", promhttp.Handler())  
   err = http.ListenAndServe(":2112", nil)  
   if err != nil {  
      fmt.Println("Failed to start HTTP server:", err)  
      os.Exit(1)  
   }  
}

在这里增加了解析命令行参数，支持通过 --config 的方式来指定配置文件，如果不指定默认使用 config.yaml。

到这里就开发完了，虽然没有严格在写在前面中梳理的开发步骤，但是整体大差不差。

应用部署

开发出来的东西如果不上线，那就等于没做，你的 KPI 是 0，领导才不关心你做事的过程，只看结果。所以不论好或是不好，先让它跑起来才是真的好。

（1）编写 Dockerfile，当然要用容器来运行应用了。

FROM golang:1.19 AS build-env  
ENV GOPROXY https://goproxy.cn  
ADD . /go/src/app  
WORKDIR /go/src/app  
RUN go mod tidy  
RUN GOOS=linux GOARCH=386 go build -v -o /go/src/app/go-interface-health-check  
  
FROM alpine  
COPY --from=build-env /go/src/app/go-interface-health-check /usr/local/bin/go-interface-health-check  
COPY --from=build-env /go/src/app/config.yaml /opt/  
WORKDIR /opt  
EXPOSE 2112  
CMD [ "go-interface-health-check","--config=/opt/config.yaml" ]

（2）编写 docker-compose 配置文件，这里直接使用 docker-compose 部署，相比 K8s 的 yaml 来说更简单快捷。

version: '3.8'  
services:  
  haproxy:  
    image: go-interface-health-check:v0.3  
    container_name: interface-health-check  
    network_mode: host  
    restart: unless-stopped  
    command: [ "go-interface-health-check","--config=/opt/config.yaml" ]  
    volumes:  
      - /u01/interface-health-check:/opt  
      - /etc/localtime:/etc/localtime:ro  
    user: root  
    logging:  
      driver: json-file  
      options:  
        max-size: 20m  
        max-file: 100

使用 docker-compose up -d 运行容器后，就可以使用 curl http://127.0.0.1:2112/metrics 查看指标。

收集展示

Prometheus 的搭建这里不再演示，如果有不清楚的，可以移步这里。

在 Prometheus 里配置抓取指标的配置：

    scrape_configs:
      - job_name: 'interface-health-check'
        static_configs:
          - targets: ['127.0.0.1:2112']

配置完重载 prometheus，可以查看抓取的 target 是否存活。

最后，为了方便展示，可以创建一个 Grafana 面板，比如：

当然，可以根据需要创建告警规则，当 interface_health_status==0 表示接口异常。

最后

以上就完成了自己开发一个 Prometheus Exporter，上面的例子写的比较简单粗暴，可以根据实际情况来进行调整。

前两天刷到冯唐的一句话：“越是底层的人，处理人际关系的能力就越差，你越往上走，你就会发现，你以为人家天天在研究事，其实他们在研究人。”

你怎么理解这句话？

链接

[1] https://www.yuque.com/coolops/kubernetes/dff1cg
[2] https://www.yuque.com/coolops/kubernetes/wd2vts
[3] https://github.com/prometheus/client_golang/blob/main/prometheus/examples_test.go
[4] https://www.cnblogs.com/0x00000/p/17557743.html

Menu

Share

开发一个接口监控的Prometheus Exporter

写在前面

现在开始

应用部署

收集展示

最后

链接

Comment

Linux基于等保的安全加固

【夜莺监控】告警管理，香！

Calico下如何切换数据面到eBPF

基于Kubernetes的CICD实战

【夜莺监控】从日志提取指标的瑞士军刀

【夜莺监控】海王——Categraf

【运维必读】运维的十一条规范

什么是SRE？应具备什么能力？

记一次k8s control-plane 排障经历

Argo Workflows-Kubernetes的工作流引擎