nginx+lua+kafka实现访问ip实时上传并消费存库分析一

最近自己的博客系统发现经常有有一些ip攻击,所以,想做一个实现动态封禁攻击ip的功能,最初想的是使用redis实现,目前刚好在学习kafka,所以,本人使用了kafka实时发送访问的ip到后台,然后后台入库处理,并做归属地查询,然后分析出此ip是否存在攻击功能,使用脚本,将攻击者的ip动态封禁,实现网站保护的一个基本功能,由于是自己的网站使用,没有做太多的优化,今天,分享下第一个步骤,将访问ip入库话不多说

我们一起来看下,具体怎么操作的,首先,安装openstry,这个是支持lua脚本语言的nginx.大家可以上网搜索相关资料,进行安装,这个不是今天的重点,这里就不在叙述openstry的安装了,我们从安装kafka开始,

一:安装zookeeper

因为安装kafka需要使用zookeeper,当然,kafka内部自带zookeeper,小伙伴们也可以使用自带的zoookeeper,我今天演示的案例使用的是外部的zookeeper,

首先,在linux下载zookeeper安装包,执行解压,然后执行配置文件的配置,启动,就可以了,过程比较简单,我们大概看下就行

我的安装包是zookeeper-3.4.14,安装完成后,启动可以,启动命令,./zkServer.sh start

我的已经启动了,我们使用命令./zkServer.是status,可以看到如下信息的话,就证明已经启动成功了,我的是单机版,大家注意

二:安装kafka

    kafka的官网是:[http://kafka.apache.org/](http://kafka.apache.org/),下载最新的版本就可以,用wget在线安装

    我的安装版本是kafka_2.12-2.5.0.tgz，

    执行解压，解压命令：tar －zxvf kafka_2.12-2.5.0.tgz

    进入kafka的配置文件目录,执行配置文件的配置,配置文件名称是server.properties

主要配置下listeners，advertised.listeners,zookeeper,其他都按照默认配置即可，以下是我的配置文件配置，我删除了一部分注释信息

############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from 
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://你自己的ip地址:9092

# Hostname and port the broker will advertise to producers and consumers. If not set, 
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
advertised.listeners=PLAINTEXT://你自己的ip地址:9092
num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600

############################# Log Basics #############################

# A comma separated list of directories under which to store log files
log.dirs=/usr/local/software/kafka/logs/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1

############################# Log Flush Policy #############################

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

zookeeper.connect=localhost:2181

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=18000

group.initial.rebalance.delay.ms=0

接下来就是启动了，一定记得，先启动zk，再启动kafka，

启动命令，当然你也可以使用nohup执行后台启动,

后台启动命令

启动成功后,执行jps命令,会看到kafka的进程,说明启动成功了,如果启动失败,一般会报错内存不够的话,需要改一下移动的内存,kafka默认的都是1g,我们改成512M就可以，

修改内存大小，将Xmx和Xms由原来的的1G修改为512M，注意这里不要修改的太小，否则kafka启动会很慢，
if [ "x$KAFKA_HEAP_OPTS" = "x" ]; then
    export KAFKA_HEAP_OPTS="-Xmx512M -Xms512M"
fi

修改完内存，然后在执行启动，没啥别的问题，就启动成功了，这个时候，我们就可以创建主题了，以下是我的创建以及运行测试命令，如果已经创建过，执行创建命令的时候，会提示已经存在，创建成功的话，会提示created 表示已经创建成功，然后可以执行启动第二条命令，执行生产者，再打开一个窗口，执行第三条命令

启动消费者，这个时候，你就可以测试发送消息，然后另一个窗口就会消费到消息，比较简单，我这里就不做演示了

创建主题:

./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
启动生产者:
./kafka-console-producer.sh --broker-list 自己的ip:9092 --topic test
启动消费者:
./kafka-console-consumer.sh --bootstrap-server 自己的ip:9092 --topic test --from-beginning

三：springboot集成kafka，

    我们的目的是kafka发送消息，然后后台接受消息，拿到ip后，执行归属地查询，入库操作，

所以，先看看如何集成到spring项目中

引入依赖

<dependency>
    <groupId>org.springframework.kafka</groupId>
    <artifactId>spring-kafka</artifactId>
    <version>2.3.4.RELEASE</version>
</dependency>

<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>2.3.1</version>
</dependency>

配置kafka

spring.kafka.bootstrap-servers=你的ip:9092
spring.kafka.producer.retries=0
spring.kafka.producer.batch-size=123
spring.kafka.producer.buffer-memory=1234567889
#生产的编码解码方式
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer
# 指定默认消费者group id
spring.kafka.consumer.group-id=blog_app

spring.kafka.consumer.auto-offset-reset=earliest
spring.kafka.consumer.enable-auto-commit=true

# 消费者的编解码方式
spring.kafka.consumer.key-deserializer=org.apache.kafka.common.serialization.StringDeserializer
spring.kafka.consumer.value-deserializer=org.apache.kafka.common.serialization.StringDeserializer

这样，就基本配置完成了，现在就是写生产者发送消息，消费者消费消息，使用spring提供的Kafkaemplete就可以操作了，

我们今天的任务是，实时采集nginx日志的访问ip，然后入库，我们需要写一个消费者类，我的代码如下，获取消息是模板代码，在注解Kafkalisteners中定义你自己的topic，然后获取转换，就可以，代码比较加单，

@Component
public class KafkaReceiver {

    private static final Logger logger = LoggerFactory.getLogger(KafkaReceiver.class);

    private Gson gson = new GsonBuilder().create();

    @Autowired
    private NingxRequestLogService ningxRequestLogService;

    @Autowired
    private RedisTemplate redisTemplate;

    @KafkaListener(topics = {"access-log"})
    public void listen(ConsumerRecord<?, ?> record) {
        Optional<?> kafkaMessage = Optional.ofNullable(record.value());
        if (kafkaMessage.isPresent()) {
            String nginxLog = String.valueOf(kafkaMessage.get());
            logger.info("------------------ 当前访问的信息为 =" + nginxLog);
            try {
                NginxLogParam nginxLogParam = gson.fromJson(nginxLog, NginxLogParam.class);
                if (null != nginxLogParam && StringUtils.isNoneBlank(nginxLogParam.getClientIP())){

                    //归属地查询
                    IpAddr ipAddr = checkAttribution(nginxLogParam);
                    NingxRequestLog ningxRequestLog = new NingxRequestLog();
                    ningxRequestLog.setRequestIp(nginxLogParam.getClientIP());
                    ningxRequestLog.setRequestMethod(nginxLogParam.getMethod());
                    ningxRequestLog.setRequestHttpVersion("1.1");
                    ningxRequestLog.setCreateTime(new Date());
                    ningxRequestLog.setRequestCity(ipAddr.getCity());
                    ningxRequestLog.setRequestProvince(ipAddr.getProvince());

                    this.ningxRequestLogService.add(ningxRequestLog);
                    logger.info("访问ip[{}]入库successed:",ningxRequestLog.getRequestIp());
                }else{
                    logger.info("当前访问的信息未成功获取到ip");
                }
            } catch (Exception e) {
                logger.error("nginx 访问 ip 入库错误,原因为[{}]",e.getMessage());
            }

        }

    }

    private IpAddr checkAttribution(NginxLogParam nginxLogParam) {
        String key = RedisCommonUtils.REQUEST_IP_PREIFIX + nginxLogParam.getClientIP();
        Boolean exitIp = this.redisTemplate.hasKey(key);
        IpAddr ipAddr = new IpAddr();
        if (exitIp){
            ipAddr = (IpAddr)this.redisTemplate.opsForValue().get(key);
            logger.info("当前访问的ip:[{}]在缓存中存在归属地信息,从缓存中返回成功",nginxLogParam.getClientIP());
            return ipAddr;
        }

        //归属地获取
        HashMap<String, String> regionParamMap = new HashMap<>();
        regionParamMap.put("lang","zh-CN");
        String s = null;
        try {
            s = HttpUtil.sendGet(CommonUrls.XINLANG_REGIN_URL+nginxLogParam.getClientIP(), regionParamMap);
        } catch (UnsupportedEncodingException e) {
            logger.error("归属地查询失败,原因为[{}]",e);
            ipAddr.setProvince(StringUtils.EMPTY);
            ipAddr.setCity(StringUtils.EMPTY);

        }
        IpRegion ipRegion = JSON.parseObject(s, IpRegion.class);
        ipAddr.setProvince(StringUtils.isBlank(ipRegion.getRegionName()) ? StringUtils.EMPTY : ipRegion.getRegionName());
        ipAddr.setCity(ipRegion.getCity());
        this.redisTemplate.opsForValue().set(key,ipAddr,24*60,TimeUnit.MINUTES);

        return ipAddr;

    }

    private static class IpAddr {

        public IpAddr() {}

        private String city;
        private String province;

        public String getCity() {
            return city;
        }

        public void setCity(String city) {
            this.city = city;
        }

        public String getProvince() {
            return province;
        }

        public void setProvince(String province) {
            this.province = province;
        }
    }
}

四：使用lua脚本实时发送nginx访问日志到kafka，

    接下来就是如何将nginx消息发送出去的问题，我们采用kafka的lua脚本执行发送，由于kafka已经存在lua标准库，所以我们得下载kafka支持的lua脚本让入到

openstry文件下，然后编写lua脚本执行就可以，一起来看看吧

第一步：下载kafka的lua脚本并加压后复制到指定的文件夹下 ，注意自己的路径就可以

wget https://github.com/doujiang24/lua-resty-kafka/archive/master.zip

yum install -y unzip

unzip lua-resty-kafka-master.zip

cp -rf /usr/local/lua-resty-kafka-master/lib/resty /usr/hello/lualib

第二步：编写kafka脚本

--- Generated by EmmyLua(https://github.com/EmmyLua)
--- Created by renxiaole.
--- DateTime: 2020/8/1 09:48
---
local cjson = require("cjson")
local producer = require("resty.kafka.producer")

local broker_list = {
    { host = "你自己的ip", port = 9092 }
}
--定义一个本地变量
local log_json = {}
--获取headers
local headers=ngx.req.get_headers()
--获取ip
local ip=headers["X-REAL-IP"] or headers["X_FORWARDED_FOR"] or ngx.var.remote_addr or "0.0.0.0"
log_json["ip"]=ip
--对发送的消息执行编码
local message = cjson.encode(log_json);
local productId = ngx.req.get_uri_args()["productId"]
--创建producer对象
local async_producer = producer:new(broker_list, { producer_type = "async" })
--执行发送
local ok, err = async_producer:send("access-log", productId, message)

if not ok then
    ngx.log(ngx.ERR, "kafka send err:", err)
    return
end

第三部:部署到nginx的配置文件中

access_by_lua_file /usr/local/software/openresty/openresty-1.13.6.1/nginx/conf/lua/kafka_log.lua;

第四步：重新启动nginx

然后大功告成，接下来，就是提交代码，执行部署项目，就可以看到数据正常入库了,我们去数据库看看,有没有数据,我们看到已经正常入库!

459 203.208.60.71   GET 1.1 2020-08-01 19:19:46 北京市 海淀
460 183.136.225.56  GET 1.1 2020-08-01 20:05:07 浙江省 诸暨
461 47.100.64.86    GET 1.1 2020-08-01 20:07:45 浙江省 西湖
462 150.137.27.107  GET 1.1 2020-08-01 20:49:27 夏威夷州    檀香山 
464 7.62.153.81 GET 1.1 2020-08-01 20:49:33 俄亥俄州    Columbus
465 47.101.198.15   GET 1.1 2020-08-01 21:18:06 浙江省 西湖
466 47.101.198.126  GET 1.1 2020-08-01 21:26:50 浙江省 西湖

本期分享了通过kafka+lua实时获取网站ip,其实方法很多,下一期,我们一起来分享如何计算这些ip的访问评率,然后动态封禁他们,本期就到这里,本人水平有限,如果有不妥之处,请留言,我会虚心接受意见和建议,谢谢,
大家也可以访问我的网站任小乐技术博客,目前主要会在本人的技术网站发布最新的文章,谢谢!