Promethus 的监控能力好像特别不错,自己来搭建一个尝试监控下 MysqL 的运行情况。
1、下载安装
$ wget https://github.com/prometheus/prometheus/releases/download/v1.7.1/prometheus-1.7.1.linux-amd64.tar.gz $ mkdir app $ tar zxvf prometheus-1.7.1.darwin-amd64.tar.gz -C app
2、修改配置文件
$ cd app/prometheus-1.7.1 $ vim prometheus.yml
初始化的配置文件类似这样
# my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Attach these labels to any time series or alerts when communicating with # external systems (federation,remote storage,Alertmanager). external_labels: monitor: 'codelab-monitor' # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first.rules" # - "second.rules" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:9090']
可以看到有一个默认的任务,这是 prometheus 在监控自己的状态。
再在下面加入两个新的任务
- job_name: 'linux' static_configs: - targets: ['127.0.0.1:9100'] labels: instance: db1
其中 127.0.0.1
是我要监控的服务器的ip,这里我监控本机,后面 9100 则是 prometheus 去访问的端口(即 exporter 的端口)。
注意 yaml 文件不允许有 tab 符,一律得使用空格
2.1 编辑 systemd 脚本
以后肯定还是得用 systemd 管理进程的,所以这里附上脚本。
# /etc/systemd/system/prometheus.service [Unit] Description=Prometheus Server Documentation=https://prometheus.io/docs/introduction/overview/ After=network-online.target [Service] User=bot ExecStart=/home/bot/app/prometheus/prometheus \ -config.file=/home/bot/app/prometheus/prometheus.yml \ -storage.local.path=/home/bot/data/prometheus [Install] WantedBy=multi-user.target
上面的 storage.local.path 需要创建并设置好权限。
3、启动
$ ./prometheus -config.file=./prometheus.yml INFO[0000] Starting prometheus (version=1.7.1,branch=master,revision=3afb3fffa3a29c3de865e1172fb740442e9d0133) source="main.go:88" INFO[0000] Build context (go=go1.8.3,user=root@0aa1b7fc430d,date=20170612-11:44:05) source="main.go:89" INFO[0000] Host details (Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 l-test (none)) source="main.go:90" INFO[0000] Loading configuration file ./prometheus.yml source="main.go:252" INFO[0000] Loading series map and head chunks... source="storage.go:428" INFO[0000] 0 series loaded. source="storage.go:439" INFO[0000] Starting target manager... source="targetmanager.go:63" INFO[0000] Listening on :9090 source="web.go:259"
4、访问
接下来访问 http://prometheus_host:9090
可以看到这样的界面
在 Status -> Targets 下可以看到刚刚添加的 linux 任务,因为我并没有开 linux 任务的那个端口,所以这里显示 down。
5、监控服务器
为了监控服务器,我们需要 exporter。在这里 https://prometheus.io/download/ 可以找到 promethues 提供的各种 exporter。监控机器的是 node_exporter。
// 首先下载 exporter $ wget https://github.com/prometheus/node_exporter/releases/download/v0.14.0/node_exporter-0.14.0.linux-amd64.tar.gz $ tar zxvf node_exporter-0.14.0.linux-amd64.tar.gz $ cd node_exporter-0.14.0.linux-amd64/ $ ls LICENSE node_exporter NOTICE $ ./node_exporter INFO[0000] Starting node_exporter (version=0.14.0,revision=840ba5dcc71a084a3bc63cb6063003c1f94435a6) source="node_exporter.go:140" INFO[0000] Build context (go=go1.7.5,user=root@bb6d0678e7f3,date=20170321-12:12:54) source="node_exporter.go:141" INFO[0000] No directory specified,see --collector.textfile.directory source="textfile.go:57" INFO[0000] Enabled collectors: source="node_exporter.go:160" INFO[0000] - infiniband source="node_exporter.go:162" INFO[0000] - edac source="node_exporter.go:162" INFO[0000] - entropy source="node_exporter.go:162" INFO[0000] - loadavg source="node_exporter.go:162" INFO[0000] - mdadm source="node_exporter.go:162" INFO[0000] - meminfo source="node_exporter.go:162" INFO[0000] - netstat source="node_exporter.go:162" INFO[0000] - textfile source="node_exporter.go:162" INFO[0000] - vmstat source="node_exporter.go:162" INFO[0000] - diskstats source="node_exporter.go:162" INFO[0000] - zfs source="node_exporter.go:162" INFO[0000] - filefd source="node_exporter.go:162" INFO[0000] - filesystem source="node_exporter.go:162" INFO[0000] - hwmon source="node_exporter.go:162" INFO[0000] - netdev source="node_exporter.go:162" INFO[0000] - stat source="node_exporter.go:162" INFO[0000] - uname source="node_exporter.go:162" INFO[0000] - wifi source="node_exporter.go:162" INFO[0000] - conntrack source="node_exporter.go:162" INFO[0000] - time source="node_exporter.go:162" INFO[0000] - sockstat source="node_exporter.go:162" INFO[0000] Listening on :9100 source="node_exporter.go:186"
node exporter 监听的是 9100 端口,所以前面配置 prometheus 的时候端口写的 9100。
这个时候再去 Status -> Targets 看先前 down 的任务,已经变成 up 了。
这里也顺便附上 node exporter 的 systemd 脚本。
[Unit] Description=Prometheus node exporter After=local-fs.target network-online.target network.target Wants=local-fs.target network-online.target network.target [Service] ExecStart=/home/bot/app/prometheus_exporter/node_exporter/node_exporter Type=simple [Install] WantedBy=multi-user.target