这次试手Flink从kafka读数据写入hbase,遇到了很大的坑
1.我的程序是用Flink 1.9.1从本地kafka读取数据,写到本地hbase,本地zookeeper和kafka服务都起好了,开始运行程序,没有报错信息,就是一直读不到kafka的数据,在kafka生产者命令窗口都输入10条了,我想怎么还没开始读数据,我也没设置时间窗口啊,见鬼了
答:这种问题99%都是因为你的kafka连接依赖版本不对,如果你现在是1.1不妨改成0.9试试,或许可以读出来了,相反也可以试试。
注:别忘了在flink代码addsource时也要用“FlinkKafkaConsumer09”,不过你改完依赖不改这个,IDEA会提示你的,没多大事
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.9_2.11</artifactId>
<version>1.9.1</version>
</dependency>
2.程序运行起来没问题,kafka也读出数据了,但是一直卡在连接hbase步骤,不失败也不报错,这个开始以为是hbase-client引用版本的事情,特意去maven官网去查了查对应支持的版本,发现没问题啊,为啥这样对我呢?
答:这个问题99%是因为没有找到zookeeper的主机,程序在不停的尝试连接你配置的主机,就是连不上,你说气人不?但是像我这种人没有服务器的主,连接的是本地的地址啊“127.0.0.1”,为啥还会这样呢,讲不讲理?本地也找不到了????这个时候看看你有没有连接什么代理工具,你要是老老实实连个WiFi不至于这样,把代理关了,再试试,或许真的连上了。
configuration.set("hbase.zookeeper.quorum", "127.0.0.1");
3.还有一种情况实在本地运行不易发生的,但是我必须说,线上很容易出问题,此时将写入hbase的配置信息的zookeeper连接地址改为服务器的地址,然后运行程序,这个时候读取kafka一点问题没有,写入hbase报空指针,死活写不进去,你说咋办吧,网上有很多博客说这个事,但是很多都不解决问题或者不适合我们的问题。
答:这个可能是我们程序找不到hbase在zookeeper的目录了,跟默认的不一致,我们最好去zk客户端里边找找我们的hbase的目录之后再填写这个参数,保险些。
configuration.set("zookeeper.znode.parent","/hbase-unsecure");
最后附上我的垃圾代码,仅供参考,你要运行起来之后可能会发现Hbase之插进一条记录,那是我的row_key、列族和列名都写死了,导致不断的覆盖value,你可以给row_key一个变量,最常见的当前时间戳。
pom.xml:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.wy</groupId>
<artifactId>flink2hbase</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.1.2</version>
</dependency>
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-core</artifactId>
<version>4.14.1-HBase-1.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>1.9.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.11</artifactId>
<version>1.9.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.11</artifactId>
<version>1.9.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.9_2.11</artifactId>
<version>1.9.1</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.4</version>
</dependency>
</dependencies>
</project>
主程序:
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.client.*;
import java.util.Properties;
public class flinkhbase {
public static Configuration configuration;
public static Connection connection;
public static Admin admin;
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:9092");
FlinkKafkaConsumer09<String> consumer = new FlinkKafkaConsumer09<String>("sinkTest", new SimpleStringSchema(), properties);
//从最早开始消费
consumer.setStartFromEarliest();
DataStream<String> stream = env.addSource(consumer);
stream.print();
stream.process(new HbaseProcess());
env.execute();
}
}
写入Hbase:
import lombok.extern.slf4j.Slf4j;
import org.apache.flink.streaming.api.functions.ProcessFunction;
import org.apache.flink.util.Collector;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
@Slf4j
public class HbaseProcess extends ProcessFunction<String, String> {
private static final long serialVersionUID = 1L;
private Connection connection = null;
private Table table = null;
@Override
public void open(org.apache.flink.configuration.Configuration parameters) throws Exception {
try {
// 加载HBase的配置
Configuration configuration = HBaseConfiguration.create();
// 读取配置文件
configuration.set("hbase.zookeeper.quorum", "127.0.0.1");
configuration.set("hbase.zookeeper.property.clientPort", "2181");
configuration.setInt("hbase.rpc.timeout", 30000);
configuration.setInt("hbase.client.operation.timeout", 30000);
configuration.setInt("hbase.client.scanner.timeout.period", 30000);
// configuration.set("zookeeper.znode.parent","/hbase-unsecure");
configuration.set("hbase.master","localhost:60010");
connection = ConnectionFactory.createConnection(configuration);
HBaseAdmin hbaseadmin = new HBaseAdmin(connection);
TableName tableName = TableName.valueOf("ygc_test");
// 获取表对象
table = connection.getTable(tableName);
System.out.println(hbaseadmin.tableExists(tableName));
System.out.println("[HbaseSink] : open HbaseSink finished");
} catch (Exception e) {
System.out.println(e);
}
}
@Override
public void close() throws Exception {
System.out.println("close...");
if (null != table) table.close();
if (null != connection) connection.close();
}
@Override
public void processElement(String value, Context ctx, Collector<String> out) throws Exception {
try {
System.out.println("输入的值:"+value);
//row1:cf:a:aaa
String[] split = value.split(":");
// 创建一个put请求,用于添加数据或者更新数据
Put put = new Put(Bytes.toBytes("1002"));
put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("a"), Bytes.toBytes(value));
table.put(put);
System.out.println("插入成功");
} catch (Exception e) {
System.out.println(e);
}
}
}