前言 筆者多年前便維護過ELK,但是由於站點日誌流量及伺服器數量並不是很多基本都是單機搞定。 然而光Web伺服器就400+,Nginx日誌大小每天50G+,加上其他業務系統日誌,之前單機ELK顯然不足以支撐現有的業務場景。 規劃篇 目前的業務採用阿裡雲+自建機房的模式,阿裡雲做為線上業務,自建機房做 ...
前言
筆者多年前便維護過ELK,但是由於站點日誌流量及伺服器數量並不是很多基本都是單機搞定。
然而光Web伺服器就400+,Nginx日誌大小每天50G+,加上其他業務系統日誌,之前單機ELK顯然不足以支撐現有的業務場景。
規劃篇
目前的業務採用阿裡雲+自建機房的模式,阿裡雲做為線上業務,自建機房做為災備中心,儘可能的將線上日誌實時傳輸到自建機房進行數據分析。
架構圖
簡述
1.日誌集中處理
筆者一開始是在每台機器通過filebeat+logstash的方式將日誌進行收集和處理後發送到elasticsearch,logstash本身java應用比較耗費記憶體,而且維護成本較高。
後來採用rsyslog的方式將所有伺服器存儲到單台阿裡雲伺服器,再通過rsyslog轉發到自建機房,基本實現了毫秒級的同步。
2.解耦
起初是通過Logstash直接往Elasticsearch存儲日誌,一旦遇到需要重啟或者維護Elasticsearch集群的時候這時日誌將無處安放,難免造成日誌丟失.
引入Redis後即便elasticsearch在維護期間也可以先將數據緩存下來。將Logstash-shipper和Logstash-indexer劃清了界限。
3.集群多實例
自建機房預備了3台記憶體為256G伺服器部署ELK集群,但是官方建議jvm的記憶體不要超過32G,大概原因是一旦jvm設置超過32G將會採用不同的演算法,這種演算法會耗費更多系統資源。
比如設置為48G的情況下性能甚至不及20G,具體解釋參考官方鏈接。
https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html
所以為了不浪費資源我在每台機器上部署了兩個Elasticsearch節點,共6個node.
4.收集日誌類型
- nginx-access.log
- nginx-error.log
- php-error.log
- php-slow.log
- action.log 前面幾個都好理解,這個日誌是開發採集用戶後臺管理操作關鍵行為的json格式日誌。
5.伺服器規劃
主機名和公網IP均做了化名處理.ELK stak3台物理機跑了6個Elasticsearch集群node,6個logstash-indexer,1個logstash-shipper,1個redis,1個kibana,1個nginx.
主機名 | 配置 | 用途 | 備註 |
rsyslog-relay | 8core 16G 1T | 收集所有伺服器所需日誌,集中存儲轉發。 | 系統為centos7.3 |
elk01 | 16core 256G 9.8T |
nginx kibana rsyslog-server logstash-shipper elk01-indexer elk01-indexer2 elk01-elasticsearch elk01-elasticsearch2 |
3台伺服器配置均相同 系統均為centos7.3 磁碟為12塊1.8T 10k的sas盤,做Raid 0+1. |
elk02 | 16core 256G 9.8T |
elk02-indexer elk02-indexer2 elk02-elasticsearch elk02-elasticsearch2 |
|
elk03 | 16core 256G 9.8T |
elk03-indexer elk03-indexer2 elk03-elasticsearch elk03-elasticsearch2 |
部署篇
rsyslog
升級
將日誌集中存儲到本地機房,Centos7.3自帶的rsyslog為V7版本,先升級到V8。因為V8的rsyslog-relp有日誌重傳機制,可防止數據丟失。
卸載原有版本,添加v8 yum源,安裝新版本。
[root@rsyslog-relay ~]# rpm -qa|grep rsyslog rsyslog-7.4.7-16.el7.x86_64
[root@rsyslog-relay ~]# yum remove -y rsyslog-7.4.7-16.el7.x86_64
vim /etc/yum.repos.d/rsyslog_v8.repo [rsyslog_v8] baseurl = http://rpms.adiscon.com/v8-stable/epel-$releasever/$basearch enabled = 1 gpgcheck = 1 gpgkey = http://rpms.adiscon.com/RPM-GPG-KEY-Adiscon name = Rsyslog version 8 repository
[root@rsyslog-relay ~]#yum install rsyslog rsyslog-relp -y
rsyslog客戶端配置
nginx的文件名根據業務區分為後臺管理訪問日誌,前臺訪問日誌,支付日誌,但是根據類型打上了nginx-access和nginx-error兩個標簽。
配置文件路徑為/etc/rsyslog.conf
1 $ModLoad imuxsock 2 $ModLoad imjournal 3 $ModLoad imfile 4 $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat 5 $IncludeConfig /etc/rsyslog.d/*.conf 6 *.info;mail.none;authpriv.none;cron.none /var/log/messages 7 authpriv.* /var/log/secure 8 mail.* -/var/log/maillog 9 cron.* /var/log/cron 10 *.emerg :omusrmsg:* 11 uucp,news.crit /var/log/spooler 12 local7.* /var/log/boot.log 13 14 ##########Start Nginx Log File################# 15 $InputFileName /var/log/nginx/access-admin.log 16 $InputFileTag site-web1-nginx-access: 17 $InputFileStateFile site-web1-nginx-access 18 $InputFileSeverity debug 19 $InputRunFileMonitor 20 $InputFilePollInterval 1 21 22 $InputFileName /var/log/nginx/error-admin.log 23 $InputFileTag site-web1-nginx-error: 24 $InputFileStateFile site-web1-nginx-error 25 $InputFileSeverity debug 26 $InputRunFileMonitor 27 $InputFilePollInterval 1 28 29 $InputFileName /var/log/nginx/access-frontend.log 30 $InputFileTag site-web1-nginx-access: 31 $InputFileStateFile site-web1-nginx-access 32 $InputFileSeverity debug 33 $InputRunFileMonitor 34 $InputFilePollInterval 1 35 36 $InputFileName /var/log/nginx/error-frontend.log 37 $InputFileTag site-web1-nginx-error: 38 $InputFileStateFile site-web1-nginx-error 39 $InputFileSeverity debug 40 $InputRunFileMonitor 41 $InputFilePollInterval 1 42 43 $InputFileName /var/log/nginx/access-pay.log 44 $InputFileTag site-web1-nginx-access: 45 $InputFileStateFile site-web1-nginx-access 46 $InputFileSeverity debug 47 $InputRunFileMonitor 48 $InputFilePollInterval 1 49 50 $InputFileName /var/log/nginx/error-pay.log 51 $InputFileTag site-web1-nginx-error: 52 $InputFileStateFile site-web1-nginx-error 53 $InputFileSeverity debug 54 $InputRunFileMonitor 55 $InputFilePollInterval 1 56 ######################End Of Nginx Log File################ 57 58 ######################Start Of Action Log File############# 59 $InputFileName /var/log/php-fpm/action_log.log 60 $InputFileTag site-web1-action: 61 $InputFileStateFile site-web1-action 62 $InputFileSeverity debug 63 $InputRunFileMonitor 64 $InputFilePollInterval 1 65 ######################End Of Action Log File############### 66 67 #####################Start PHP Log File################### 68 $InputFileName /var/log/php-fpm/www-slow.log 69 $InputFileTag site-web1-php-slow: 70 $InputFileStateFile site-web1-php-slow 71 $InputFileSeverity debug 72 $InputRunFileMonitor 73 $InputFilePollInterval 1 74 $InputFileReadMode 2 75 76 $InputFileName /var/log/php-fpm/error.log 77 $InputFileTag site-web1-php-error: 78 $InputFileStateFile site-web1-php-error 79 $InputFileSeverity debug 80 $InputRunFileMonitor 81 $InputFilePollInterval 1 82 83 $WorkDirectory /var/lib/rsyslog 84 $ActionQueueType LinkedList 85 $ActionQueueFileName srvrfwd 86 $ActionResumeRetryCount -1 87 $ActionQueueSaveOnShutdown on 88 ####################End Of PHP log File##################### 89 90 ###################Start Log Forward############################################## 91 if $programname == 'site-web1-nginx-access' then @@阿裡雲rsyslog伺服器內網地址:514 92 if $programname == 'site-web1-nginx-error' then @@阿裡雲rsyslog伺服器內網地址:514 93 if $programname == 'site-web1-php-slow' then @@阿裡雲rsyslog伺服器內網地址:514 94 if $programname == 'site-web1-php-error' then @@阿裡雲rsyslog伺服器內網地址:514 95 if $programname == 'site-web1-action' then @@阿裡雲rsyslog伺服器內網地址:514 96 ###################End Of log Forward##############################################
rsyslog阿裡雲中繼伺服器配置(rsyslog-relay)
1.主配置文件路徑為/etc/rsyslogconf,每台伺服器對應的配置文件通過include的方式放置在
/etc/rsyslog.d/,文件名以.conf結尾,對應主配置文件的第7行。
1 $ModLoad omrelp 2 $ModLoad imudp 3 $UDPServerRun 514 4 $ModLoad imtcp 5 $InputTCPServerRun 514 6 $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat 7 $IncludeConfig /etc/rsyslog.d/*.conf 8 $umask 0022 9 *.info;mail.none;authpriv.none;cron.none /var/log/messages 10 authpriv.* /var/log/secure 11 mail.* -/var/log/maillog 12 cron.* /var/log/cron 13 *.emerg :omusrmsg:* 14 uucp,news.crit /var/log/spooler 15 local7.* /var/log/boot.log 16 *.* :omrelp:本地機房rsyslog伺服器:20514
2.每台伺服器配置文件,如site-web1的示例如下:
1 $template site-web1-nginx-access,"/data/rsyslog/nginx/site/site-web1-nginx-access.log" 2 $template site-web1-nginx-error,"/data/rsyslog/nginx/site/site-web1-nginx-error.log" 3 $template site-web1-php-slow,"/data/rsyslog/php/site/site-web1-php-slow.log" 4 $template site-web1-php-error,"/data/rsyslog/php/site/site-web1-php-error.log" 5 $template site-web1-action,"/data/rsyslog/php/site/site-web1-action.log" 6 7 if $programname == 'site-web1-nginx-access' then ?site-web1-nginx-access 8 if $programname == 'site-web1-nginx-error' then ?site-web1-nginx-error 9 if $programname == 'site-web1-php-slow' then ?site-web1-php-slow 10 if $programname == 'site-web1-php-error' then ?site-web1-php-error 11 if $programname == 'site-web1-action' then ?site-web1-action
本地機房rsyslog配置(elk01)
/etc/rsyslog.conf主配置文件如下,另外/etc/rsyslog.d/裡面的站點配置文件跟中繼伺服器裡面的一模一樣。
1 $ModLoad imrelp 2 $ModLoad omrelp 3 $InputRELPServerRun 20514 4 $WorkDirectory /var/lib/rsyslog 5 $DirCreateMode 0755 6 $FileCreateMode 0644 7 $FileOwner logstash 8 $DirOwner logstash 9 $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat 10 $IncludeConfig /etc/rsyslog.d/*.conf 11 $OmitLocalLogging on 12 $IMJournalStateFile imjournal.state 13 *.info;mail.none;authpriv.none;cron.none /var/log/messages 14 authpriv.* /var/log/secure 15 mail.* -/var/log/maillog 16 cron.* /var/log/cron 17 *.emerg :omusrmsg:* 18 uucp,news.crit /var/log/spooler 19 local7.* /var/log/boot.log 20 $PrivDropToGroup logstash
至此所有伺服器日誌都通過rsyslog集中收集了.
elk安裝
均是通過yum方式安裝的最新6.x版本,根據規劃elk01上安裝的nginx及elk02上安裝的redis均是採用yum方式安裝,就不在一一贅述。
elasticsearch yum源
[elasticsearch-6.x]
name=Elasticsearch repository for 6.x packages
baseurl=https://artifacts.elastic.co/packages/6.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
logstash yum源
[logstash-6.x] name=Elastic repository for 6.x packages baseurl=https://artifacts.elastic.co/packages/6.x/yum gpgcheck=1 gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch enabled=1 autorefresh=1 type=rpm-md
kibana yum源
[kibana-6.x] name=Kibana repository for 6.x packages baseurl=https://artifacts.elastic.co/packages/6.x/yum gpgcheck=1 gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch enabled=1 autorefresh=1 type=rpm-md
logstash-shipper配置
logstash安裝後配置文件路徑預設為/etc/logstash,拷貝一份做為logstash-shipper的配置文件目錄.
cp -r /etc/logstash /etc/logstash-shipper
chown -R logstash.logstash /etc/logstash-shipper
主配置文件/etc/logstash-shipper/logstash.yml
path.data: /var/lib/logstash-shipper path.config: /etc/logstash-shipper/conf.d path.logs: /var/log/logstash/shipper
創建相關目錄並授權予logstash用戶
mkdir -p /var/lib/logstash-shipper && chown logstash.logstash /var/lib/logstash-shipper
mkdir -p /var/log/logstash/shipper && chown logstash.logstash /var/log/logstash/shipper
站點配置文件/etc/logstash-shipper/conf.d/shipper.conf
截至目前站點配置有3千多行,全部貼出來略顯冗餘,這裡挑一個站的配置供參考。
input { file { path => "/data/rsyslog/php/site/site-*-php-error.log" type => "site-php-error" sincedb_path => "/data/sincedb/site" } file { path => "/data/rsyslog/php/site/site-*-php-slow.log" type => "site-php-slow" sincedb_path => "/data/sincedb/site" } file { path => "/data/rsyslog/nginx/site/site-*-nginx-error.log" type => "site-nginx-error" sincedb_path => "/data/sincedb/site" } file { path => "/data/rsyslog/nginx/site/site-*-nginx-access.log" type => "site-nginx-access" sincedb_path => "/data/sincedb/site" } file { path => "/data/rsyslog/php/site/site*action.log" type => "site-action" sincedb_path => "/data/sincedb/site" } } output { redis { host => "elk02內網地址" port => "6379" db => "8" data_type => "list" key => "server-log" } }
伺服器啟動腳本:/etc/systemd/system/logstash-shipper.service
[Unit] Description=logstash-shipper [Service] Type=simple User=logstash Group=logstash # Load env vars from /etc/default/ and /etc/sysconfig/ if they exist. # Prefixing the path with '-' makes it try to load, but if the file doesn't # exist, it continues onward. EnvironmentFile=-/etc/default/logstash EnvironmentFile=-/etc/sysconfig/logstash ExecStart=/usr/share/logstash/bin/logstash "--path.settings" "/etc/logstash-shipper" Restart=always WorkingDirectory=/ Nice=19 LimitNOFILE=16384 [Install] WantedBy=multi-user.target
logstash-indexer配置
每台伺服器均跑了兩個indexer
logstash-indexer1
拷貝配置文件目錄
cp -r /etc/logstash /etc/logstash-indexerchown -R logstash.logstash /etc/logstash-indexer
主配置文件/etc/logstash-indexer/logstash.yml
path.data: /var/lib/logstash path.config: /etc/logstash-indexer/conf.d path.logs: /var/log/logstash/indexer
創建相關目錄並授權予logstash用戶
mkdir -p /var/log/logstash/indexer && chown logstash.logstash /var/log/logstash/indexer
每個日誌類型都對應了一個配置文件,放置在/etc/logstash-indexer/conf.d
/etc/logstash-indexer/conf.d
├── action.conf 程式自定義的用戶行為日誌
├── nginx_access.conf nginx訪問日誌
├── nginx_error.conf nginx錯誤日誌
├── php_error.conf php錯誤日誌
└── php_slow.conf php-slow日誌
各配置文件如下
action.conf
input { redis { host => "elk02內網ip" port => "6379" db => "8" data_type => "list" key => "kosun-log" } } filter { if [type] =~ '^.+action' { mutate { gsub => [ "message", "^.+-action: ", "" ] } json { source => "message" } date { match => ["time", "yyyy-MM-dd HH:mm:ss"] target => "@timestamp" "locale" => "en" timezone => "Asia/Shanghai" remove_field => ["time"] } } } output { if [type] =~ '^.+action' { elasticsearch { hosts => ["elasticsearch:9200"] index => "logstash-action-%{+YYY.MM.dd}" } } }
nginx-access
input { redis { host => "elk02內網地址" port => "6379" db => "8" data_type => "list" key => "kosun-log" } } filter { if [type] =~ '^.+nginx-access' { fingerprint { method => "SHA1" key => "^.+nginx-access" } grok { match => [ "message" , "%{COMBINEDAPACHELOG} %{DATA:msec} %{QUOTEDSTRING:x_forward_ip} %{DATA:server_name} %{DATA:request_time} %{DATA:upstream_response_time} %{DATA:scheme} %{GREEDYDATA:extra_fields}" ] overwrite => [ "message" ] } date { match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss.SSS Z"] #target => "@timestamp" "locale" => "en" timezone => "Asia/Shanghai" } mutate { gsub => ["agent", "\"", ""] gsub => ["referrer", "\"", ""] gsub => ["x_forward_ip", "\"", ""] gsub => ["extra_fields", "\"", ""] } if [extra_fields] =~ /^{.*}$/ { mutate { gsub => ["extra_fields", "\"","", "extra_fields", "\\x0A","", "extra_fields", "\\x22",'\"', "extra_fields", "(\\)","" ] } json { source => "extra_fields" target => "extra_fields_json" } } geoip { source => "clientip" fields => ["city_name","location"] } } } output { if [type] =~ '^.+nginx-access' { elasticsearch { hosts => ["elasticsearch:9200"] index => "logstash-%{type}-%{+YYYY.MM.dd}" document_type => "%{type}" # flush_size => 50000 # idle_flush_time => 10 sniffing => true template_overwrite => true document_id => "%{fingerprint}" } } }
nginx-error
input { redis { host => "elk02內網地址" port => "6379" db => "8" data_type => "list" key => "kosun-log" } } filter { if [type] =~ '^.+nginx-error' { fingerprint { method => "SHA1" key => "^.+nginx-error" } grok { match => { "message" => [ "(?<timestamp>\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}) \[%{DATA:err_severity}\] (%{NUMBER:pid:int}#%{NUMBER}: \*%{NUMBER}|\*%{NUMBER}) %{DATA:err_message}(?:, client: (?<clientip>%{IP}|%{HOSTNAME}))(?:, server: %{IPORHOST:server})(?:, request: %{QS:request})?(?:, host: %{QS:client_ip})?(?:, referrer: \"%{URI:referrer})?", "%{DATESTAMP:timestamp} \[%{DATA:err_severity}\] %{GREEDYDATA:err_message}" ] } } date { match => ["timestamp" , "YYYY/MM/dd HH:mm:ss"] "locale" => "en" timezone => "Asia/Shanghai" remove_field => [ "timestamp" ] } } } output { if [type] =~ '^.+nginx-error' { elasticsearch { hosts => ["elasticsearch:9200"] index => "logstash-nginx-error-%{+YYY.MM.dd}" document_id => "%{fingerprint}" } } }
php-error
input { redis { host => "elk02內網地址" port => "6379" db => "8" data_type => "list" key => "kosun-log" } } output { if [type] =~ '^.+php-error' { elasticsearch { hosts => ["elasticsearch:9200"] index => "logstash-php-error-%{+YYY.MM.dd}" } } }
php-slow
input { redis { host => "elk02內網地址" port => "6379" db => "8" data_type => "list" key => "kosun-log" } } filter { if [type] =~ '^.+slow' { multiline { pattern => "\[\d{2}-" negate => true what => "previous" } } } output { if [type] =~ '^.+slow' { elasticsearch { hosts => ["elasticsearch:9200"] index => "logstash-php-slow-%{+YYY.MM.dd}" } } }
服務啟動腳本/etc/systemd/system/logstash-indexer.service
[Unit] Description=logstash-indexer [Service] Type=simple User=logstash Group=logstash # Load env vars from /etc/default/ and /etc/sysconfig/ if they exist. # Prefixing the path with '-' makes it try to load, but if the file doesn't # exist, it continues onward. EnvironmentFile=-/etc/default/logstash EnvironmentFile=-/etc/sysconfig/logstash ExecStart=/usr/share/logstash/bin/logstash "--path.settings" "/etc/logstash-indexer" Restart=always WorkingDirectory=/ Nice=19 LimitNOFILE=16384 [Install] WantedBy=multi-user.target
logstash-indexer2的配置文件一樣,只需修改相應的目錄和啟動腳本即可。
Elasticsearch集群部署
每台伺服器跑了兩個elasticsearch實例,一個為yum安裝,一個為源碼包解壓。
yum預設安裝的配置文件位於/etc/elasticsearch
以elk01為例elasticsearch實例1配置
修改jvm.options裡面的記憶體設置為31g
-Xms31g
-Xmx31g
主配置文件/etc/elasticsearch/elasticsearch.yml
#============================cluster setting============================== cluster.name:elk cluster.routing.allocation.same_shard.host: true #============================node setting================================= node.name: elk01 node.master: true node.data: true #============================path setting================================= path.data: /data/es-data path.logs: /var/log/elasticsearch #============================memory setting=============================== bootstrap.memory_lock: false #============================network setting============================== network.host: elk01內網地址 http.port: 9200 transport.tcp.port: 9300 #============================thread_pool setting========================== thread_pool.search.queue_size: 10000 #============================discovery setting============================ discovery.zen.ping.unicast.hosts: ["elk01:9300", "elk02:9300", "elk03:9300"] discovery.zen.minimum_master_nodes: 2 #============================gateway setting============================== gateway.recover_after_nodes: 4 gateway.recover_after_time: 5m gateway.expected_nodes: 5 indices.recovery.max_bytes_per_sec: 800mb http.cors.enabled: true http.cors.allow-origin: "*" xpack.security.enabled: false xpack.monitoring.enabled: true xpack.graph.enabled: false xpack.watcher.enabled: false
創建數據目錄並授權
mkdir -p /data/es-data && chown elasticsearch.elasticsearch /data/es-data
服務啟動腳本,yum安裝自帶,無需修改/usr/lib/systemd/system/elasticsearch.service
elk01 elasticsearch實例2配置
該實例是用源碼包做的
下載源碼包,解壓,移動到指定目錄,授權
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.1.2.zip
tar xf elasticsearch-6.1.2.zip
mv elasticsearch-6.1.2 /usr/local/
chown -R elasticsearch.elasticsearch /usr/local/elasticsearch-6.1.2
修改/usr/local/elasticsearch-6.1.2/config/jvm.options 記憶體設置31g
配置文件/usr/local/elasticsearch-6.1.2/config/elasticsearch.yml
#============================cluster setting============================== cluster.name: elk cluster.routing.allocation.same_shard.host: true #node.max_local_storage_nodes: 2 #============================node setting================================= node.name: elk01-2 node.master: false node.data: true #============================path setting================================= path.data: /data/es-data2 path.logs: /var/log/elasticsearch2 #============================memory setting=============================== bootstrap.memory_lock: false #============================network setting============================== network.host: elk01內網ip http.port: 9201 transport.tcp.port: 9301 #============================thread_pool setting========================== thread_pool.search.queue_size: 10000 #============================discovery setting============================ discovery.zen.ping.unicast.hosts: ["elk01:9300", "elk02:9300", "elk03:9300"] discovery.zen.minimum_master_nodes: 2 #============================gateway setting============================== gateway.recover_after_nodes: 4 gateway.recover_after_time: 5m gateway.expected_nodes: 5 indices.recovery.max_bytes_per_sec: 800mb http.cors.enabled: true http.cors.allow-origin: "*" xpack.security.enabled: false xpack.monitoring.enabled: true xpack.graph.enabled: false xpack.watcher.enabled: false
創建相關目錄並授權
mkdir -p /data/es-data2 && chown elasticsearch.elasticsearch /data/es-