採集目錄到HDFS 使用flume採集目錄需要啟動hdfs集群 spooldir source 監控指定目錄 如果目錄下有新文件產生 就採集走 註意!!! 此組件監控的目錄不能有同名的文件產生 一旦有重名文件:報錯 罷工 註意!!! 此組件監控的目錄不能有同名的文件產生 一旦有重名文件:報錯 罷工 ...
採集目錄到HDFS
使用flume採集目錄需要啟動hdfs集群
vi spool-hdfs.conf
# Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source ##註意:不能往監控目中重覆丟同名文件 a1.sources.r1.type = spooldir a1.sources.r1.spoolDir = /root/logs2 a1.sources.r1.fileHeader = true # Describe the sink a1.sinks.k1.type = hdfs a1.sinks.k1.channel = c1 a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/ a1.sinks.k1.hdfs.filePrefix = events-
#控制文件夾的滾動頻率 a1.sinks.k1.hdfs.round = true a1.sinks.k1.hdfs.roundValue = 10 a1.sinks.k1.hdfs.roundUnit = minute
#控制文件的滾動頻率 a1.sinks.k1.hdfs.rollInterval = 3 #時間維度 a1.sinks.k1.hdfs.rollSize = 20 #文件大小維度 a1.sinks.k1.hdfs.rollCount = 5 #event數量維度 a1.sinks.k1.hdfs.batchSize = 1 a1.sinks.k1.hdfs.useLocalTimeStamp = true #生成的文件類型,預設是Sequencefile,可用DataStream,則為普通文本 a1.sinks.k1.hdfs.fileType = DataStream # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
mkdir /root/logs2
-
註意!!! 此組件監控的目錄不能有同名的文件產生 一旦有重名文件:報錯 罷工
啟動命令:
bin/flume-ng agent -c ./conf -f ./conf/spool-hdfs.conf -n a1 -Dflume.root.logger=INFO,console
採集文件到HDFS
vi tail-hdfs.conf
# Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = exec a1.sources.r1.command = tail -F /root/logs/test.log a1.sources.r1.channels = c1 # Describe the sink a1.sinks.k1.type = hdfs a1.sinks.k1.channel = c1 a1.sinks.k1.hdfs.path = /flume/tailout/%y-%m-%d/%H-%M/ a1.sinks.k1.hdfs.filePrefix = events- a1.sinks.k1.hdfs.round = true a1.sinks.k1.hdfs.roundValue = 10 a1.sinks.k1.hdfs.roundUnit = minute a1.sinks.k1.hdfs.rollInterval = 3 a1.sinks.k1.hdfs.rollSize = 20 a1.sinks.k1.hdfs.rollCount = 5 a1.sinks.k1.hdfs.batchSize = 1 a1.sinks.k1.hdfs.useLocalTimeStamp = true #生成的文件類型,預設是Sequencefile,可用DataStream,則為普通文本 a1.sinks.k1.hdfs.fileType = DataStream # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
mkdir /root/logs
啟動命令
bin/flume-ng agent -c conf -f conf/tail-hdfs.conf -n a1
exec source 可以執行一個shell命令 (tail -F sx.log) 實時採集文件數據變化
模擬數據生成的腳步:
while true;do date >> /root/logs/test.log;sleep 0.5;done 或 #!/bin/bash while true do date >> /root/logs/test.log sleep 1 done