2018-05-17 prometheus 多数据中心实战

前言：本文章只配置了单个prometheus 从节点，多节点配置还请按照参考文档自行探索。

prometheus 支持互相读取，为多数据中心做数据同步，指导文档如下：

https://www.robustperception.io/scaling-and-federating-prometheus/

中文翻译版本如下：

https://www.robustperception.io/scaling-and-federating-prometheus/

按照这两位大神指导，实操如下：

从节点配置：在采集数据的节点上配置下面增加如下配置：

scrape_configs:
  - job_name: 'node'
    metrics_path: /metrics
    static_configs:
      - targets: ['ip1:port1', 'ip2:port2', 'ip3:port3', 'ip4:port4']
# 在原来基础上增加如下内容作为从节点
    relabel_configs:
      - source_labels: [__address__]
        modulus:       1    # 1 slave
        target_label:  __tmp_hash
        action:        hashmod
      - source_labels: [__tmp_hash]
        regex:         ^0$  # This is the 1st slave
        action:        keep

主节点配置：在prometheus.yml 增加一个 job，把从节点看作一个完整实例来采集数据，并通过 job进行筛选业务，筛选出带有node的内容：

  - job_name: 'node-1st-slave'
    honor_labels: true
    metrics_path: /federate
    params:
      match[]:
        - '{__name__=~"^node.*"}'   # Request all slave-level time series
    static_configs:
      - targets:
        - 主节点ip:9090

配置完成之后重载prometheus kill -HUP PrometheusPID 发现 target 已经 up 起来了，点击 endpoint url 可以看到数据已经正常采集到了，因为把slave机器整个作为一个exporter实例，所以主prometheus节点up计算只能监控到slave prometheus 有没有挂掉，不能监控到从节点下面各个实例有没有挂掉，不过这个问题可以拆分获取到并通过 consul 服务发现来适应生产环境。

2018-05-17 prometheus 多数据中心实战

推荐阅读更多精彩内容