第1章 Zabbix簡介及組成 1.1 zabbix簡介 zabbix是一個基於web界面,提供分散式系統監視以及網路監視功能的企業級的開源解決方案。它可以監視各種網路參數,保證伺服器自動的安全運營,並提供靈活的通知機制以讓系統管理員快速定位/解決存在的各種問題 1.1 zabbix組成 zabbi ...
第1章 Zabbix簡介及組成
1.1 zabbix簡介
zabbix是一個基於web界面,提供分散式系統監視以及網路監視功能的企業級的開源解決方案。它可以監視各種網路參數,保證伺服器自動的安全運營,並提供靈活的通知機制以讓系統管理員快速定位/解決存在的各種問題
1.1 zabbix組成
zabbix server和zabbix agent,可選組件zabbix proxy
可以通過SNMP、zabbix agent、fping、埠監視等方法對遠程伺服器或網路狀態完成監視/數據收集等功能。支持linux及類unix、windows平臺只能安裝客戶端(被監控)
第2章 Zabbix 3.0.13服務端安裝
zabbix server3.0無法centos6上進行yum安裝,故我們要在centos7上進行yum安裝。如果一定要在centos6上進行安裝,則強烈建議通過源碼方式進行編譯安裝,同時還需要註意PHP的版本
註意:儘管zabbix server3.0在centos6上不能進行yum安裝,但zabbix agent3.0在centos6上是可進行yum安裝的
第3章 Web頁面報錯總結
3.1 問題一Zabbix alerter processes more than 75% busy
問題原因:
zabbix伺服器郵件進程繁忙導致的,一般是因為設置動作的間隔太短。特殊情況下會產生大量告警,如伺服器發幾萬封郵件過程中,郵件進程發掛了
解決方案:
01.刪除資料庫解決(風險較大,不建議)
02.修改郵件腳本,將郵件的動作改為列印時間,等待郵件完全釋放再改回來,如下
1 [root@m01 ~]# cat /usr/lib/zabbix/alertscripts/sms 2 3 #!/bin/bash 4 5 echo `date` >>/tmp/sms.txt
3.2 問題二Zabbix discoverer processes more than 75% busy
問題原因:
01.配置了discovery自動發現任務,配置的每個discovery任務在一定時間內占用1個進程,而zabbix_server.conf中預設配置只有1個discovery(被註釋,預設生效)
02.為了快速驗證自動發現效果,將discovery任務的"Delay"由預設3600s設置成60s
解決方案:
01.修改配置文件中的StartDiscoverers進程數量,取消其之前的#號並將數值修改為5,最後重啟服務
(註:根據系統硬體配置,可以設置成更高的數值,但其範圍為0~250)
1 [root@m01 ~]# grep 'StartDiscoverers' /etc/zabbix/zabbix_server.conf 2 3 ### Option: StartDiscoverers 4 5 StartDiscoverers=5 6 7 [root@m01 ~]# systemctl restart zabbix-server.service
02.編寫定時任務腳本重啟zabbix_server來降低負載
1 [root@m01 ~]# crontab -e 2 3 @daily service zabbix-server restart > /dev/null 2>&1 4 5 #計劃會每天自動重啟Zabbix服務以結束僵屍進程並清理記憶體等
3.3 問題三Zabbix poller processes more than 75% busy
問題原因:
01.通過Zabbix agent採集數據的設備死機或其他原因導致zabbix agent死掉server獲取不到數據
02. server向agent獲取數據時時間過長,超過了server設置的timeout時間
解決方案:
01.增加Zabbix Server啟動時初始化的進程數量
1 ### Option: StartPollers 2 3 StartPollers=10 #改成多少取決於伺服器的性能和監控的數量,如果記憶體足夠的話可以設置更高
02.修改模板自動發現規則中的保留失去的資源期間為0
3.4 問題四Zabbix housekeeper processes more than 75% busy
問題原因:
為了防止資料庫持續增大,zabbix有自動刪除歷史數據的機制即housekeeper,而mysql刪除數據時性能會降低,就會報錯
解決方案:
調整HousekeepingFrequency參數
1 HousekeepingFrequency=12 #間隔時間 2 3 MaxHousekeeperDelete=1000000 #最大刪除量
3.5 問題五Zabbix server記憶體溢出,無法啟動
問題原因:
zabbix使用一段時間後,再次加入一批交換機監控,zabbix-server將無法啟動,查看日誌顯示如下(提示記憶體溢出,需調整zabbix伺服器配置zabbix_server.conf)
1 2816:20170725:174352.675 [file:dbconfig.c,line:652] zbx_mem_realloc(): out of memory (requested 162664 bytes) 2 3 2816:20170725:174352.675 [file:dbconfig.c,line:652] zbx_mem_realloc(): please increase CacheSize configuration parameter
解決方案:
1 vim zabbix_server.conf 2 3 CacheSize=1024M #預設為8M
3.6 PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 11 bytes)
問題原因:
zabbix某些頁面無法打開,查看php日誌發現,當訪問這個頁面時報錯記憶體不足
解決方案:
不清楚是否記憶體泄露,最簡單的方法是調大php進程的可用記憶體
1 [root@zabbix-master ~]# grep 'memory_limit' /etc/httpd/conf.d/zabbix.conf 2 3 php_value memory_limit 512M #預設128M
3.7總結
此處筆者提供一份配置文件,其中的參數數值可供大家修改
1 # This is a configuration file for Zabbix server daemon 2 # To get more information about Zabbix, visit http://www.zabbix.com 3 ############ GENERAL PARAMETERS ################# 4 ### Option: ListenPort 5 # Listen port for trapper. 6 # 7 # Mandatory: no 8 # Range: 1024-32767 9 # Default: 10 #說明:服務端監聽埠用於接收二級代理或直連AGENT的採集數據 11 # ListenPort=10051 12 ### Option: SourceIP 13 # Source IP address for outgoing connections. 14 # 15 # Mandatory: no 16 # Default: 17 #說明:服務端監聽IP,建議指定 18 # SourceIP= 19 ### Option: LogType 20 # Specifies where log messages are written to: 21 # system - syslog 22 # file - file specified with LogFile parameter 23 # console - standard output 24 # 25 # Mandatory: no 26 # Default: 27 # LogType=file 28 ### Option: LogFile 29 # Log file name for LogType 'file' parameter. 30 # 31 # Mandatory: no 32 # Default: 33 # LogFile= 34 #說明:zabbix服務端日誌路徑,視具體情況指定 35 LogFile=/tmp/zabbix_server.log 36 ### Option: LogFileSize 37 # Maximum size of log file in MB. 38 # 0 - disable automatic log rotation. 39 # 40 # Mandatory: no 41 # Range: 0-1024 42 # Default: 43 #說明:日誌達到多少M里就輪轉;若此參數值為0時,則不輪轉,日誌將不斷變大,建議設置成輪轉 44 LogFileSize=50 45 ### Option: DebugLevel 46 # Specifies debug level: 47 # 0 - basic information about starting and stopping of Zabbix processes 48 # 1 - critical information 災難日誌,日誌量較少 49 # 2 - error information 錯誤級別,日誌量大於CRITICAL級別 50 # 3 - warnings 告警級別,日誌量大於ERROR級別 51 # 4 - for debugging (produces lots of information)調試級別,日誌量大於WARNING 52 # 5 - extended debugging (produces even more information) 53 # 54 # Mandatory: no 55 # Range: 0-5 56 # Default: 57 #說明:日誌級別0~4,單位時間內生成日誌的量不斷增大 58 DebugLevel=3 59 ### Option: PidFile 60 # Name of PID file. 61 # 62 # Mandatory: no 63 # Default: 64 #說明:zabbix服務端程式PID路徑 65 PidFile=/tmp/zabbix_server.pid 66 ### Option: DBHost 67 # Database host name. 68 # If set to localhost, socket is used for MySQL. 69 # If set to empty string, socket is used for PostgreSQL. 70 # 71 # Mandatory: no 72 # Default: 73 #說明:指定資料庫信息,對於mysql,若設置為localhost則mysql用SOCKET來連接(需配合參數 DBSocket 使用),否則用IP連接;若DHHOST值為空,則預設連接PostgreSQL 74 # DBHost=localhost 75 DBHost= 76 ### Option: DBName 77 # Database name. 78 # For SQLite3 path to database file must be provided. DBUser and DBPassword are ignored. 79 # 80 # Mandatory: yes 81 # Default: 82 # DBName= 83 #說明:服務端連接資料庫的庫名 84 DBName= 85 ### Option: DBSchema 86 # Schema name. Used for IBM DB2 and PostgreSQL. 87 # 88 # Mandatory: no 89 # Default: 90 #說明:專門用於 IBM DB2資料庫的連接信息 91 # DBSchema= 92 ### Option: DBUser 93 # Database user. Ignored for SQLite. 94 # 95 # Mandatory: no 96 # Default: 97 #說明:連接資料庫的用戶 98 # DBUser= 99 DBUser= 100 ### Option: DBPassword 101 # Database password. Ignored for SQLite. 102 # Comment this line if no password is used. 103 # 104 # Mandatory: no 105 # Default: 106 #說明:連接資料庫的密碼 107 DBPassword= 108 ### Option: DBSocket 109 # Path to MySQL socket. 110 # 111 # Mandatory: no 112 # Default: 113 #說明:指定MYSQL的SOCK連接路徑 114 DBSocket=/tmp/mysql.sock 115 ### Option: DBPort 116 # Database port when not using local socket. Ignored for SQLite. 117 # 118 # Mandatory: no 119 # Range: 1024-65535 120 # Default (for MySQL): 121 #說明:指定連接資料庫的埠,預設3306 122 DBPort=3306 123 ############ ADVANCED PARAMETERS ################ 124 #高級參數 125 ### Option: StartPollers 126 # Number of pre-forked instances of pollers. 127 # 128 # Mandatory: no 129 # Range: 0-1000 130 # Default: 131 #說明;初始化時,啟動子進程數量,數量越多,則服務端吞吐能力越強,對系統資源消耗越大 132 StartPollers=300 133 ### Option: StartIPMIPollers 134 # Number of pre-forked instances of IPMI pollers. 135 # 136 # Mandatory: no 137 # Range: 0-1000 138 # Default: 139 # 140 #說明:主要用於IPmi技術用於獲取硬體狀態場景。若無相關監控項,建議設置為0 141 # StartIPMIPollers=0 142 ### Option: StartPollersUnreachable 143 # Number of pre-forked instances of pollers for unreachable hosts (including IPMI and Java). 144 # At least one poller for unreachable hosts must be running if regular, IPMI or Java pollers 145 # are started. 146 # 147 # Mandatory: no 148 # Range: 0-1000 149 # Default: 150 #說明:預設情況下,ZABBIX會啟用指定進程用於探測某些不可達主機的(含IPMI場景);若使用場景中含有代理端,建議保持預設;若直接agent較多,可視具體情況調整 151 StartPollersUnreachable=50 152 ### Option: StartTrappers 153 # Number of pre-forked instances of trappers. 154 # Trappers accept incoming connections from Zabbix sender, active agents and active proxies. 155 # At least one trapper process must be running to display server availability and view queue 156 # in the frontend. 157 # 158 # Mandatory: no 159 # Range: 0-1000 160 # Default: 161 #說明:用於設置諸如SNMP STRAPPER場景提交來的數據的接收進程數,若客戶機SNMP TRAPPER技術較多,建議加大此參數值 162 StartTrappers=50 163 ### Option: StartPingers 164 # Number of pre-forked instances of ICMP pingers. 165 # 166 # Mandatory: no 167 # Range: 0-1000 168 # Default: 169 #說明:用於設置啟用icmp協議PING主機方式啟動線程數量,若單台代理所管理機器超過500台,建議加大此數值 170 # StartPingers=10 171 ### Option: StartDiscoverers 172 # Number of pre-forked instances of discoverers. 173 # 174 # Mandatory: no 175 # Range: 0-250 176 # Default: 177 #說明:用於設置自動發現主機的線程數量,若單台代理所管理機器超過500台,可以考慮加大此數值(僅適用於直接AGENT場景) 178 StartDiscoverers=15 179 ### Option: StartHTTPPollers 180 # Number of pre-forked instances of HTTP pollers. 181 # 182 # Mandatory: no 183 # Range: 0-1000 184 # Default: 185 #說明:用於設置WEB撥測監控線程數,可視具體情況增加或減少此數值。 186 # StartHTTPPollers=1 187 ### Option: StartTimers 188 # Number of pre-forked instances of timers. 189 # Timers process time-based trigger functions and maintenance periods. 190 # Only the first timer process handles the maintenance periods. 191 # 192 # Mandatory: no 193 # Range: 1-1000 194 # Default: 195 #說明:各實例計時器數量,主要用於觸發器,標有維護標識的主機,但只第一個計時器用於計算維護標識主機。 196 # StartTimers=1 197 ### Option: StartEscalators 198 # Number of pre-forked instances of escalators. 199 # 200 # Mandatory: no 201 # Range: 0-100 202 # Default: 203 #說明:用於處理動作中的步驟的進程,zabbix動作較多時建議調大。 204 StartEscalators=30 205 ### Option: JavaGateway 206 # IP address (or hostname) of Zabbix Java gateway. 207 # Only required if Java pollers are started. 208 # 209 # Mandatory: no 210 # Default: 211 #說明:JAVAGATEWAY 場景下使用 212 JavaGateway=10.238.0.180 213 ### Option: JavaGatewayPort 214 # Port that Zabbix Java gateway listens on. 215 # 216 # Mandatory: no 217 # Range: 1024-32767 218 # Default: 219 #說明:JAVAGATEWAY 場景下使用 220 JavaGatewayPort=10052 221 ### Option: StartJavaPollers 222 # Number of pre-forked instances of Java pollers. 223 # 224 # Mandatory: no 225 # Range: 0-1000 226 # Default: 227 #說明:JAVAGATEWAY 場景下使用 228 StartJavaPollers=30 229 ### Option: StartVMwareCollectors 230 # Number of pre-forked vmware collector instances. 231 # 232 # Mandatory: no 233 # Range: 0-250 234 # Default: 235 #說明:用於設置監控VMWARE Esxi主機實例時使用,若為0則不啟用,若要監控ESXI主機,此值最少為1 ;視監控ESXI數量設置對應數值 236 # StartVMwareCollectors=0 237 ### Option: VMwareFrequency 238 # How often Zabbix will connect to VMware service to obtain a new data. 239 # 240 # Mandatory: no 241 # Range: 10-86400 242 # Default: 243 #說明:代理端訪問 VMWARE service的頻率,單位:秒 244 # VMwareFrequency=60 245 ### Option: VMwarePerfFrequency 246 # How often Zabbix will connect to VMware service to obtain performance data. 247 # 248 # Mandatory: no 249 # Range: 10-86400 250 # Default: 251 # VMwarePerfFrequency=60 252 ### Option: VMwareCacheSize 253 # Size of VMware cache, in bytes. 254 # Shared memory size for storing VMware data. 255 # Only used if VMware collectors are started. 256 # 257 # Mandatory: no 258 # Range: 256K-2G 259 # Default: 260 #說明:划出多少共用記憶體用於存儲VMWARE數據 261 VMwareCacheSize=256M 262 ### Option: VMwareTimeout 263 # Specifies how many seconds vmware collector waits for response from VMware service. 264 # 265 # Mandatory: no 266 # Range: 1-300 267 # Default: 268 #說明:等待VMWare返回數據的最長時間 269 VMwareTimeout=10 270 ### Option: SNMPTrapperFile 271 # Temporary file used for passing data from SNMP trap daemon to the server. 272 # Must be the same as in zabbix_trap_receiver.pl or SNMPTT configuration file. 273 # 274 # Mandatory: no 275 # Default: 276 #說明:指定SNMP TRAPPER 時的臨時文件,用於代理端啟用SNMP TRAPPER功能時使用 277 # SNMPTrapperFile=/tmp/zabbix_traps.tmp 278 ### Option: StartSNMPTrapper 279 # If 1, SNMP trapper process is started. 280 # 281 # Mandatory: no 282 # Range: 0-1 283 # Default: 284 #說明:是否啟用 snmptrapper功能 ,預設不啟用=0,啟用=1(配合參數SNMPTrapperFile使用) 285 # StartSNMPTrapper=0 286 ### Option: ListenIP 287 # List of comma delimited IP addresses that the trapper should listen on. 288 # Trapper will listen on all network interfaces if this parameter is missing. 289 # 290 # Mandatory: no 291 # Default: 292 #說明:啟用SNMPTRAPPER里 ,接收端監聽的IP,此參數與StartSNMPTrapper,SNMPTrapperFile 聯合使用 293 # ListenIP=0.0.0.0 294 ListenIP=10.238.0.180 295 ### Option: HousekeepingFrequency 296 # How often Zabbix will perform housekeeping procedure (in hours). 297 # Housekeeping is removing outdated information from the database. 298 # To prevent Housekeeper from being overloaded, no more than 4 times HousekeepingFrequency 299 # hours of outdated information are deleted in one housekeeping cycle, for each item. 300 # To lower load on server startup housekeeping is postponed for 30 minutes after server start. 301 # With HousekeepingFrequency=0 the housekeeper can be only executed using the runtime control option. 302 # In this case the period of outdated information deleted in one housekeeping cycle is 4 times the 303 # period since the last housekeeping cycle, but not less than 4 hours and not greater than 4 days. 304 # 305 # Mandatory: no 306 # Range: 0-24 307 # Default: 308 #說明:多少小時清理一次代理端資料庫的 history, alert, and alarms,以保持代理端資料庫輕便,建議保持預設 309 HousekeepingFrequency=24 310 ### Option: MaxHousekeeperDelete 311 # The table "housekeeper" contains "tasks" for housekeeping procedure in the format: 312 # [housekeeperid], [tablename], [field], [value]. 313 # No more than 'MaxHousekeeperDelete' rows (corresponding to [tablename], [field], [value]) 314 # will be deleted per one task in one housekeeping cycle. 315 # SQLite3 does not use this parameter, deletes all corresponding rows without a limit. 316 # If set to 0 then no limit is used at all. In this case you must know what you are doing! 317 # 318 # Mandatory: no 319 # Range: 0-1000000 320 # Default: 321 #說明:每個HouseKeeper任務刪除的最大記錄數,1.8.2開始支持 322 MaxHousekeeperDelete=1000000 323 ### Option: SenderFrequency 324 # How often Zabbix will try to send unsent alerts (in seconds). 325 # 326 # Mandatory: no 327 # Range: 5-3600 328 # Default: 329 #說明:多少秒後重試發送失敗的報警信息 330 SenderFrequency=30 331 ### Option: CacheSize 332 # Size of configuration cache, in bytes. 333 # Shared memory size for storing host, item and trigger data. 334 # 335 # Mandatory: no 336 # Range: 128K-8G 337 # Default: 338 #說明;zabbix初始化時占用多少系統共用記憶體用於存儲配置信息,HOST,ITEM,TRIGGER數據,視監控主機數量和監控項調整,建議調整到32M或者更大 339 CacheSize=8G 340 ### Option: CacheUpdateFrequency 341 # How often Zabbix will perform update of configuration cache, in seconds. 342 # 343 # Mandatory: no 344 # Range: 1-3600 345 # Default: 346 #說明:zabbix更新操作系統CACHE頻率,若管理頁面操作不頻繁,可以考慮加大參數值 347 CacheUpdateFrequency=300 348 ### Option: StartDBSyncers 349 # Number of pre-forked instances of DB Syncers. 350 # 351 # Mandatory: no 352 # Range: 1-100 353 # Default: 354 #說明:將採集數據從CACHE同步到資料庫線程數量,視資料庫伺服器I/O繁忙情況,和資料庫寫能力調整。數值越大,寫能力越強。對資料庫伺服器I/O壓力越大。 355 StartDBSyncers=20 356 ### Option: HistoryCacheSize 357 # Size of history cache, in bytes. 358 # Shared memory size for storing history data. 359 # 360 # Mandatory: no 361 # Range: 128K-2G 362 # Default: 363 #說明:用於設置劃分多少系統共用記憶體用於存儲採集的歷史數據,此數值越大,資料庫讀壓力越小 364 HistoryCacheSize=2048M 365 ### Option: HistoryIndexCacheSize 366 # Size of history index cache, in bytes. 367 # Shared memory size for indexing history cache. 368 # 369 # Mandatory: no 370 # Range: 128K-2G 371 # Default: 372 #說明:3.0.0開始支持,歷史索引大小,一個監控項需要100bytes來存儲 373 HistoryIndexCacheSize=2048M 374 ### Option: TrendCacheSize 375 # Size of trend cache, in bytes. 376 # Shared memory size for storing trends data. 377 # 378 # Mandatory: no 379 # Range: 128K-2G 380 # Default: 381 #說明:用於設置劃分多少系統共用記憶體用於存儲計算出來的趨勢數據,此參數值從一定程度上可影響資料庫讀壓力 382 TrendCacheSize=512M 383 ### Option: ValueCacheSize 384 # Size of history value cache, in bytes. 385 # Shared memory size for caching item history data requests. 386 # Setting to 0 disables value cache. 387 # 388 # Mandatory: no 389 # Range: 0,128K-64G 390 # Default: 391 #說明:划出系統多少共用記憶體用於已請求的存儲監控項信息,若監控項較多,建議加大此數值 392 ValueCacheSize=16G 393 ### Option: Timeout 394 # Specifies how long we wait for agent, SNMP device or external check (in seconds). 395 # 396 # Mandatory: no 397 # Range: 1-30 398 # Default: 399 # Timeout=3 400 #說明:與AGNET\SNMP設備和其它外部設備通信超時設置,單位為秒;若採集數據不完整或網路繁忙,或從管理頁面發現客戶端狀態變化頻繁,可以考慮加大此數值。註意若此數值加大,應該考慮參數 StartPollers 是否有相應加大的必要。 401 Timeout=10 402 ### Option: TrapperTimeout 403 # Specifies how many seconds trapper may spend processing new data. 404 # 405 # Mandatory: no 406 # Range: 1-300 407 # Default: 408 #說明:啟用 trapper功能,用於進程等待超時設置。根據需要調整 409 TrapperTimeout=50 410 ### Option: UnreachablePeriod 411 # After how many seconds of unreachability treat a host as unavailable. 412 # 413 # Mandatory: no 414 # Range: 1-3600 415 # Default: 416 #說明:當AGNET端處於不可用狀態下,間隔多少秒後,嘗試重新連接。建議根據具體情況設置。註意,若此數值過小,右agent端業務系統繁忙時,有可能造成報警信息誤報 417 # UnreachablePeriod=45 418 ### Option: UnavailableDelay 419 # How often host is checked for availability during the unavailability period, in seconds. 420 # 421 # Mandatory: no 422 # Range: 1-3600 423 # Default: 424 #說明:當AGENT端處於可用狀態下,間隔多少秒後,進行狀態檢查。若出現可正常採集數據,但管理頁面AGENT狀態不正常;若在網路,埠等均通暢情況下,AGENT狀態仍不正常,可以考慮加大此數值 425 # UnavailableDelay=60 426 ### Option: UnreachableDelay 427 # How often host is checked for availability during the unreachability period, in seconds. 428 # 429 # Mandatory: no 430 # Range: 1-3600 431 # Default: 432 #說明:當agent端處於不可達狀態下,延遲多少秒後,進行重新嘗試,建議保持預設,在AGENT接入調試階段,可考慮減少此數值 433 # UnreachableDelay=15 434 ### Option: AlertScriptsPath 435 # Full path to location of custom alert scripts. 436 # Default depends on compilation options. 437 # 438 # Mandatory: no 439 # Default: 440 #說明:監控報警腳本路徑,非研發人員不建議修改此參數值 441 # AlertScriptsPath=${datadir}/zabbix/alertscripts 442 AlertScriptsPath=/home/zabbix/bin 443 ### Option: ExternalScripts 444 # Full path to location of external scripts. 445 # Default depends on compilation options. 446 # 447 # Mandatory: no 448 # Default: 449 #說明:自定義腳本存儲路徑,非研發人員不建議修改此參數值 450 # ExternalScripts=${datadir}/zabbix/