生產環境安裝Prometheus+Grafana_ZenDei技術網路在線

生產環境安裝Prometheus+Grafana

-Advertisement-

安裝Prometheus wget https://github.com/prometheus/prometheus/releases/download/v2.34.0/prometheus-2.34.0.linux-amd64.tar.gz tar -zxvf prometheus-2.34.0. ...

安裝Prometheus

wget https://github.com/prometheus/prometheus/releases/download/v2.34.0/prometheus-2.34.0.linux-amd64.tar.gz

tar -zxvf prometheus-2.34.0.linux-amd64.tar.gz
mv prometheus-2.34.0.linux-amd64 prometheus

vim prometheus.yml

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
           - localhost:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
        - "rules/host_rules.yml"
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]
  - job_name: 'agent-web01'
    static_configs:
      - targets: ['172.31.32.104:9100']
  - job_name: 'agent-web02'
    static_configs:
      - targets: ['172.31.29.223:9100']
  - job_name: 'java'
    static_configs:
      - targets: ['172.31.29.223:8100']
    metrics_path: '/actuator/prometheus'

創建規則

mkdir -p /root/prometheus/rules

cat host_rules.yml

groups:
- name: 系統資源告警規則
  rules:
  - alert: CPU使用率告警
    expr: 100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[1m]) )) * 100 > 80
    for: 1m
    labels:
      user: prometheus
      severity: warning
    annotations:
      description: "伺服器: CPU使用超過80%！(當前值: {{ humanize $value }}%)"
  - alert: 記憶體使用率告警
    expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 80 
    for: 1m
    labels:
      user: prometheus
      severity: warning
    annotations:
      description: "伺服器: 記憶體使用超過80%！(當前值: {{ humanize $value }}%)"
  - alert: 磁碟告警規則
    expr: 100 - (node_filesystem_free_bytes{mountpoint="/",fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 70
    for: 1m
    labels:
      user: prometheus
      severity: warning
    annotations:
      description: "伺服器: 磁碟使用超過70%！(當前值: {{ humanize $value }}%)"

啟動

nohup ./prometheus &

效果圖

生產環境安裝Prometheus+Grafana

安裝alertmanager

wget https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.linux-amd64.tar.gz

tar -zxvf alertmanager-0.24.0.linux-amd64.tar.gz

mv alertmanager-0.24.0.linux-amd64 alertmanager

vim alertmanager.yml

具體可以去企業微信後臺查找相關參數

global:
  resolve_timeout: 2m
  wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
  wechat_api_secret: '<你的企業微信secret>'
  wechat_api_corp_id: '<你的企業微信id>'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'wechat'
receivers:
- name: 'wechat'
  wechat_configs:
  - send_resolved: true
    to_party: '1'
    agent_id: '<你的企業微信應用id>'
templates:
  - '/alertmanager/*.tmpl'

yaml語法檢查

./amtool check-config alertmanager.yml

企業微信報警模板

cat wechat.tmpl

 {{ define "wechat.default.message" }}
{{- if gt (len .Alerts.Firing) 0 -}}
{{- range $index, $alert := .Alerts -}}
======== 異常告警 ========
告警名稱：{{ $alert.Labels.alertname }}
告警級別：{{ $alert.Labels.severity }}
告警機器：{{ $alert.Labels.instance }} {{ $alert.Labels.device }}
告警詳情：{{ $alert.Annotations.summary }}
告警時間：{{ $alert.StartsAt.Format "2006-01-02 15:04:05" }}
========== END ==========
{{- end }}
{{- end }}
{{- if gt (len .Alerts.Resolved) 0 -}}
{{- range $index, $alert := .Alerts -}}
======== 告警恢復 ========
告警名稱：{{ $alert.Labels.alertname }}
告警級別：{{ $alert.Labels.severity }}
告警機器：{{ $alert.Labels.instance }}
告警詳情：{{ $alert.Annotations.summary }}
告警時間：{{ $alert.StartsAt.Format "2006-01-02 15:04:05" }}
恢復時間：{{ $alert.EndsAt.Format "2006-01-02 15:04:05" }}
========== END ==========
{{- end }}
{{- end }}
{{- end }}

啟動

nohup ./alertmanager &

生產環境安裝Prometheus+Grafana

安裝Grafana

wget https://dl.grafana.com/oss/release/grafana-8.4.6.linux-amd64.tar.gz

tar -zxvf grafana-8.4.6.linux-amd64.tar.gz

mv grafana-8.4.6.linux-amd64 grafana

配置未更改，走的預設，也可以根據具體情況做相應的配置改動

啟動

nohup ./grafana &

配置數據源為Prometheus

因為在本地，所以首選localhost

生產環境安裝Prometheus+Grafana

保存並測試

導入配置

grafana dashboard地址

https://grafana.com/grafana/dashboards

11074 ------1 Node Exporter for Prometheus Dashboard EN 20201010

生產環境安裝Prometheus+Grafana

1860 -------Node Exporter Full

生產環境安裝Prometheus+Grafana

4701 -------JVM (Micrometer)

生產環境安裝Prometheus+Grafana

參考鏈接

https://blog.51cto.com/u_15060547/3817600
https://www.cnblogs.com/Devinhao/articles/16180018.html

本文由博客一文多發平臺 OpenWrite 發佈！

本文來自博客園，作者：Devinhao，轉載請註明原文鏈接：https://www.cnblogs.com/Devinhao/p/16184823.html

您的分享是我們最大的動力!

-Advertisement-

更多相關文章

C#中檢查null的語法糖

今天看到已經更新了devblogs，新增的C# 11的!!（用於檢查null的語法）經過非常長的討論，最後取消了。然後我又想起來null檢查，這個可以說一說。函數參數null檢查傳統寫法寫一個函數的時候，最經典的檢查，估計也是大家最常使用的null檢查，應該是這樣的吧： public stat ...
WPF命令

理解命名新特性：1、將事件委托到適當的命令 2、使控制項的啟用狀態和相應命令的狀態保持同步命令:表示應用程式任務，並且跟蹤任務是否能夠被執行，然而，命令實際上不包含執行應用程式任務的代碼。命令綁定：每個命令綁定針對用戶界面的具體區域，將命令連接到相關的應用程式邏輯。命令源：命令源觸發命令。命 ...
VS 生成後事件中自動修改文件名插入當前時間

C#中要謹慎使用async void，因為它可能會導致程式崩潰。 ...
Metalama簡介4.使用Fabric操作項目或命名空間

本文介紹如何用Metalama框架無侵入地為.NET項目添加編譯時AOP及代碼分析,以及動態生成方法 ...
【小記】Ubuntu 升級 Linux 內核標準流程

註意事項不讀本註意事項沒資格乾這個活！ 1，進行此操作前，一定要先創建磁碟快照，出現任何報錯，必須回滾。 2，公司生產機避免升級一二級版本號，建議僅安裝內核安全更新。 3，對公司生產機操作前，一定要徵得技術主管同意，你自己沒把握讓他來弄。 4，請不要參照網上其他文章下載 deb 包手動升級，尤其是 ...
VirtualBox安裝Ubuntu20.04圖文教程

鏡像下載、功能變數名稱解析、時間同步請點擊阿裡雲開源鏡像站 Virtual Box 安裝虛擬機一、下載安裝Virtual Box 1. 下載Virtual Box 2. 安裝Virtual Box 雙擊Virtual Box安裝程式進入安裝歡迎界面，如下圖所示：單擊下一步按鈕後進入下一安裝界面，在 ...
Linux_Java實現連接HDFS

一：創建maven項目導入maven <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.7.6</version> </d ...
Centos7系統創建用戶時出現“useradd: user ‘xxxx‘ already exists”錯誤

鏡像下載、功能變數名稱解析、時間同步請點擊阿裡雲開源鏡像站背景： Centos7上需要創建一個用戶leojiang，而用戶時間不純在系統上，但是還是報錯說用戶已經存在。 1、假設您正在嘗試添加一個名為“leojiang”的用戶並且您收到以下錯誤。 [root@leo]# useradd -m -d /h ...