python demo t.py
import json
lst = list()
for i in range(3):
lst.append(dict(a=i+1, b=4.3, c='world', date=1554972722911))
fi = '/Applications/logstash-5.6.3/test1.json'
with open(fi, 'w') as f:
Logstash demo t.conf
input {
file {
path => "/Applications/logstash-5.6.3/test1.json"
start_position => "beginning"
codec => "json"
filter {
date {
match => [ "date", "MMM dd yyyy HH:mm:ss", "UNIX_MS" ]
timezone => "Asia/Shanghai"
target => "date"
output {
elasticsearch {
hosts => ["xx.xx.xx.xx:8080"]
index => "t_l-%{+YYYY.MM.dd}"
document_type => "tt"
manage_template => false
template_name => "t_temp"
最开始以为是file input插件的参数那里设置的不对,然后在官方的社区里搜到了这个问题https://discuss.elastic.co/t/file-plugin-doesnt-read-file/49052/3,提问者的问题跟我一模一样,下面有人回答是ignore_older参数捣的鬼,然后提问那哥们把ignore_older参数值改了后,问题解决了。我以为这就找到答案了,就去看ignore_older参数的定义,摘抄如下:
无奈从头开始详读input file插件文档,在Tracking_of_current_position_in_watched_files中了解到,file插件是通过一个叫sincedb的文件来保存Logstash已经读取的文件,以及上一次读取文件时的最后一个位置(方便下一次接着该位置继续往下读)。所以我该从sincedb入手查问题,在data目录下找到sincedb文件:
➜ file pwd
➜ file ll
total 8
drwxr-xr-x 3 ivanli admin 102 4 19 16:39 ./
drwxr-xr-x 3 ivanli admin 102 4 18 15:12 ../
-rw-r--r-- 1 ivanli admin 17 4 19 16:39 .sincedb_34488dda9a79102a6b4436bfb0f592d2
➜ file cat .sincedb_34488dda9a79102a6b4436bfb0f592d2
13326659 1 2 172
文件里一行数据包含4个数,分别代表inode number、major device number、minor device number和current byte offset。
其中inode,major device,minor device是操作系统的文件系统的概念。current byte offset表示上一次文件读取的位置。
Be aware of the INODE reuse problem. Background: the sincedb tracks read content position by INODE, because during file rotation (different techniques have different effects), the name of the log file may change but its INODE does not. Eventually as files are created and deleted inevitably an INODE will be reused and if by chance that INODE has been seen before via a log file name that once satisfied the glob pattern, LS will think it has read the file before. Two things can happen here 1) the new file is smaller than the last-read point then LS will detect this and start from the beginning (assumes rotation) and 2) the new file is bigger than the last-read point then LS will read from the last-read point on. This random unfortunate situation is difficult to foresee and to code for and leads to very confused OPS people.
- 文件的内容小于上一次文件读取的位置时,LS会从头(beginning)开始读文件内容;
- 文件的内容大于上一次文件读取的位置时,LS会从上一次文件读取的位置开始读文件内容。
所以,之前我本地测试Py demo的时候,每次往同一个文件里覆盖写的内容是一模一样的,文件大小没有变化,所以LS不会去读文件的内容。只要每次文件更新的内容造成文件大小有变化,LS就可以读数据了,但这也不符合我的需求。毕竟每次我写的都是全新的内容,而且新写的内容可能和上一次的内容是一模一样的(业务上的要求),这样LS就不读数据了。另外,即使内容不完全一样,若新的内容大于offset的话,LS读取出来的内容是不全的(因为从上一次的offset开始读,前面的内容忽略了)。
The file input rereads a file if
its inode number changes, or
if the file shrinks (indicating that it was rotated via copy/truncate).
The file input does not support other means of detecting changes in a file.
input {
file {
path => "/Applications/logstash-5.6.3/test*.json"
start_position => "beginning"
codec => "json"
➜ logstash-5.6.3 stat test1.json
16777218 13326659 -rw-r--r-- 1 ivanli admin 0 572 "Apr 19 17:48:39 2019" "Apr 19 17:48:34 2019" "Apr 19 17:48:34 2019" "Apr 19 16:39:35 2019" 4096 8 0 test1.json
➜ logstash-5.6.3 vim test1.json
➜ logstash-5.6.3 stat test1.json
16777218 13334123 -rw-r--r-- 1 ivanli admin 0 572 "Apr 19 18:16:47 2019" "Apr 19 18:16:46 2019" "Apr 19 18:16:46 2019" "Apr 19 18:16:46 2019" 4096 8 0 test1.json
发现文件的inode号变成13334123了,对于LS来说是一个新文件,内容全部被读。 至于vim为啥会改变文件的inode号:
So the inode is unchanged. In Vim, as cjm has already stated, the choice is controlled by the backup , backupcopy and writebackup options. By default, Vim renames the old file, then writes a new file with the original name, if it thinks it can re-create the original file's attributes.