1. 版本
组件 | 版本 |
---|---|
hudi | 10.0 |
flink | 13.5 |
2.hudi 源码下载
https://github.com/apache/hudi/releases
2.1 需要改flink 版本为13.5
根目录下面的pom 文件
<flink.version>1.13.5</flink.version>
<hive.version>3.1.0</hive.version>
<hadoop.version>3.1.1</hadoop.version>
2.2 编译命令
mvn clean package -DskipTests
# 或者指定scala 版本
#编译后的包
包的路径在packaging/hudi-flink-bundle/target/hudi-flink-bundle_2.12-*.*.*-SNAPSHOT.jar
2.3编译遇到一个错误
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Hudi 0.10.0:
[INFO]
[INFO] Hudi ............................................... SUCCESS [ 1.642 s]
[INFO] hudi-common ........................................ SUCCESS [ 9.808 s]
[INFO] hudi-aws ........................................... SUCCESS [ 1.306 s]
[INFO] hudi-timeline-service .............................. SUCCESS [ 1.623 s]
[INFO] hudi-client ........................................ SUCCESS [ 0.082 s]
[INFO] hudi-client-common ................................. SUCCESS [ 8.027 s]
[INFO] hudi-hadoop-mr ..................................... SUCCESS [ 2.825 s]
[INFO] hudi-spark-client .................................. SUCCESS [ 13.891 s]
[INFO] hudi-sync-common ................................... SUCCESS [ 0.718 s]
[INFO] hudi-hive-sync ..................................... SUCCESS [ 3.027 s]
[INFO] hudi-spark-datasource .............................. SUCCESS [ 0.066 s]
[INFO] hudi-spark-common_2.12 ............................. SUCCESS [ 7.706 s]
[INFO] hudi-spark2_2.12 ................................... SUCCESS [ 9.535 s]
[INFO] hudi-spark_2.12 .................................... SUCCESS [ 25.923 s]
[INFO] hudi-utilities_2.12 ................................ FAILURE [ 2.638 s]
[INFO] hudi-utilities-bundle_2.12 ......................... SKIPPED
[INFO] hudi-cli ........................................... SKIPPED
[INFO] hudi-java-client ................................... SKIPPED
[INFO] hudi-flink-client .................................. SKIPPED
[INFO] hudi-spark3_2.12 ................................... SKIPPED
[INFO] hudi-dla-sync ...................................... SKIPPED
[INFO] hudi-sync .......................................... SKIPPED
[INFO] hudi-hadoop-mr-bundle .............................. SKIPPED
[INFO] hudi-hive-sync-bundle .............................. SKIPPED
[INFO] hudi-spark-bundle_2.12 ............................. SKIPPED
[INFO] hudi-presto-bundle ................................. SKIPPED
[INFO] hudi-timeline-server-bundle ........................ SKIPPED
[INFO] hudi-hadoop-docker ................................. SKIPPED
[INFO] hudi-hadoop-base-docker ............................ SKIPPED
[INFO] hudi-hadoop-namenode-docker ........................ SKIPPED
[INFO] hudi-hadoop-datanode-docker ........................ SKIPPED
[INFO] hudi-hadoop-history-docker ......................... SKIPPED
[INFO] hudi-hadoop-hive-docker ............................ SKIPPED
[INFO] hudi-hadoop-sparkbase-docker ....................... SKIPPED
[INFO] hudi-hadoop-sparkmaster-docker ..................... SKIPPED
[INFO] hudi-hadoop-sparkworker-docker ..................... SKIPPED
[INFO] hudi-hadoop-sparkadhoc-docker ...................... SKIPPED
[INFO] hudi-hadoop-presto-docker .......................... SKIPPED
[INFO] hudi-integ-test .................................... SKIPPED
[INFO] hudi-integ-test-bundle ............................. SKIPPED
[INFO] hudi-examples ...................................... SKIPPED
[INFO] hudi-flink_2.12 .................................... SKIPPED
[INFO] hudi-kafka-connect ................................. SKIPPED
[INFO] hudi-flink-bundle_2.12 ............................. SKIPPED
[INFO] hudi-kafka-connect-bundle .......................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:29 min
[INFO] Finished at: 2022-02-06T17:59:02+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project hudi-utilities_2.12: Could not resolve dependencies for project org.apache.hudi:hudi-utilities_2.12:jar:0.10.0: The following artifacts could not be resolved: io.confluent:kafka-avro-serializer:jar:5.3.4, io.confluent:common-config:jar:5.3.4, io.confluent:common-utils:jar:5.3.4, io.confluent:kafka-schema-registry-client:jar:5.3.4: Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.4 in aliyunmaven (https://maven.aliyun.com/repository/public) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <args> -rf :hudi-utilities_2.12
以上错误需要手动下载包后添加本地仓库
mvn install:install-file -Dfile=/opt/myjar/common-config-5.3.0.jar -DgroupId=io.confluent -DartifactId=common-config -Dversion=5.3.0 -Dpackaging=jar
mvn install:install-file -Dfile=/opt/myjar/common-utils-5.3.0.jar -DgroupId=io.confluent -DartifactId=common-utils -Dversion=5.3.0 -Dpackaging=jar
mvn install:install-file -Dfile=/opt/myjar/kafka-avro-serializer-5.3.0.jar -DgroupId=io.confluent -DartifactId=kafka-avro-serializer -Dversion=5.3.0 -Dpackaging=jar
mvn install:install-file -Dfile=/opt/myjar/kafka-schema-registry-client-5.3.0.jar -DgroupId=io.confluent -DartifactId=kafka-schema-registry-client -Dversion=5.3.0 -Dpackaging=jar
[INFO] Reactor Summary for Hudi 0.10.0:
[INFO]
[INFO] Hudi ............................................... SUCCESS [ 1.370 s]
[INFO] hudi-common ........................................ SUCCESS [ 10.813 s]
[INFO] hudi-aws ........................................... SUCCESS [ 1.394 s]
[INFO] hudi-timeline-service .............................. SUCCESS [ 1.404 s]
[INFO] hudi-client ........................................ SUCCESS [ 0.072 s]
[INFO] hudi-client-common ................................. SUCCESS [ 7.295 s]
[INFO] hudi-hadoop-mr ..................................... SUCCESS [ 2.848 s]
[INFO] hudi-spark-client .................................. SUCCESS [ 15.158 s]
[INFO] hudi-sync-common ................................... SUCCESS [ 0.681 s]
[INFO] hudi-hive-sync ..................................... SUCCESS [ 2.856 s]
[INFO] hudi-spark-datasource .............................. SUCCESS [ 0.054 s]
[INFO] hudi-spark-common_2.12 ............................. SUCCESS [ 7.296 s]
[INFO] hudi-spark2_2.12 ................................... SUCCESS [ 10.521 s]
[INFO] hudi-spark_2.12 .................................... SUCCESS [ 26.299 s]
[INFO] hudi-utilities_2.12 ................................ SUCCESS [ 11.262 s]
[INFO] hudi-utilities-bundle_2.12 ......................... SUCCESS [01:39 min]
[INFO] hudi-cli ........................................... SUCCESS [ 15.297 s]
[INFO] hudi-java-client ................................... SUCCESS [ 2.267 s]
[INFO] hudi-flink-client .................................. SUCCESS [01:06 min]
[INFO] hudi-spark3_2.12 ................................... SUCCESS [ 6.117 s]
[INFO] hudi-dla-sync ...................................... SUCCESS [ 6.830 s]
[INFO] hudi-sync .......................................... SUCCESS [ 0.061 s]
[INFO] hudi-hadoop-mr-bundle .............................. SUCCESS [ 8.565 s]
[INFO] hudi-hive-sync-bundle .............................. SUCCESS [ 1.131 s]
[INFO] hudi-spark-bundle_2.12 ............................. SUCCESS [ 11.139 s]
[INFO] hudi-presto-bundle ................................. SUCCESS [ 38.706 s]
[INFO] hudi-timeline-server-bundle ........................ SUCCESS [ 8.251 s]
[INFO] hudi-hadoop-docker ................................. SUCCESS [ 1.166 s]
[INFO] hudi-hadoop-base-docker ............................ SUCCESS [ 0.649 s]
[INFO] hudi-hadoop-namenode-docker ........................ SUCCESS [ 0.649 s]
[INFO] hudi-hadoop-datanode-docker ........................ SUCCESS [ 0.627 s]
[INFO] hudi-hadoop-history-docker ......................... SUCCESS [ 0.659 s]
[INFO] hudi-hadoop-hive-docker ............................ SUCCESS [ 7.320 s]
[INFO] hudi-hadoop-sparkbase-docker ....................... SUCCESS [ 0.731 s]
[INFO] hudi-hadoop-sparkmaster-docker ..................... SUCCESS [ 0.638 s]
[INFO] hudi-hadoop-sparkworker-docker ..................... SUCCESS [ 0.667 s]
[INFO] hudi-hadoop-sparkadhoc-docker ...................... SUCCESS [ 0.671 s]
[INFO] hudi-hadoop-presto-docker .......................... SUCCESS [ 0.704 s]
[INFO] hudi-integ-test .................................... SUCCESS [ 36.320 s]
[INFO] hudi-integ-test-bundle ............................. SUCCESS [01:47 min]
[INFO] hudi-examples ...................................... SUCCESS [ 8.120 s]
[INFO] hudi-flink_2.12 .................................... SUCCESS [ 38.207 s]
[INFO] hudi-kafka-connect ................................. SUCCESS [ 19.832 s]
[INFO] hudi-flink-bundle_2.12 ............................. SUCCESS [ 27.658 s]
[INFO] hudi-kafka-connect-bundle .......................... SUCCESS [ 14.287 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10:30 min
[INFO] Finished at: 2022-02-06T20:29:29+08:00
[INFO] ------------------------------------------------------------------------
[root@node01 hudi-0.10.0]#
3. 编译的包目录
[root@node01 packaging]# pwd
/opt/module/hudi/hudi-0.10.0/packaging
[root@node01 packaging]# ll
总用量 4
drwxr-xr-x 4 501 games 46 2月 6 20:41 hudi-flink-bundle
drwxr-xr-x 4 501 games 46 2月 6 20:38 hudi-hadoop-mr-bundle
drwxr-xr-x 4 501 games 46 2月 6 20:38 hudi-hive-sync-bundle
drwxr-xr-x 4 501 games 46 2月 6 20:39 hudi-integ-test-bundle
drwxr-xr-x 4 501 games 46 2月 6 20:41 hudi-kafka-connect-bundle
drwxr-xr-x 4 501 games 46 2月 6 20:38 hudi-presto-bundle
drwxr-xr-x 4 501 games 46 2月 6 20:38 hudi-spark-bundle
drwxr-xr-x 4 501 games 101 2月 6 20:38 hudi-timeline-server-bundle
drwxr-xr-x 4 501 games 46 2月 6 20:37 hudi-utilities-bundle
-rw-r--r-- 1 501 games 2206 12月 8 10:38 README.md
[root@node01 packaging]#
4.flink 整合hudi 所需要的jar 包
主要是
hudi-flink-bundle_2.12-0.10.0.jar
hudi-hadoop-mr-bundle-0.10.0.jar
[root@node01 lib]# pwd
/opt/module/flink/flink-1.13.5/lib
[root@node01 lib]# ll
总用量 316964
-rw-r--r-- 1 root root 7802399 1月 1 08:27 doris-flink-1.0-SNAPSHOT.jar
-rw-r--r-- 1 root root 249571 12月 27 23:32 flink-connector-jdbc_2.12-1.13.5.jar
-rw-r--r-- 1 root root 359138 1月 1 10:17 flink-connector-kafka_2.12-1.13.5.jar
-rw-r--r-- 1 hive 1007 92315 12月 15 08:23 flink-csv-1.13.5.jar
-rw-r--r-- 1 hive 1007 106535830 12月 15 08:29 flink-dist_2.12-1.13.5.jar
-rw-r--r-- 1 hive 1007 148127 12月 15 08:23 flink-json-1.13.5.jar
-rw-r--r-- 1 root root 43317025 2月 6 20:51 flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
-rw-r--r-- 1 hive 1007 7709740 12月 15 06:57 flink-shaded-zookeeper-3.4.14.jar
-rw-r--r-- 1 hive 1007 35051557 12月 15 08:28 flink-table_2.12-1.13.5.jar
-rw-r--r-- 1 hive 1007 38613344 12月 15 08:28 flink-table-blink_2.12-1.13.5.jar
-rw-r--r-- 1 root root 62447468 2月 6 20:44 hudi-flink-bundle_2.12-0.10.0.jar
-rw-r--r-- 1 root root 17276348 2月 6 20:51 hudi-hadoop-mr-bundle-0.10.0.jar
-rw-r--r-- 1 root root 1893564 1月 1 10:17 kafka-clients-2.0.0.jar
-rw-r--r-- 1 hive 1007 207909 12月 15 06:56 log4j-1.2-api-2.16.0.jar
-rw-r--r-- 1 hive 1007 301892 12月 15 06:56 log4j-api-2.16.0.jar
-rw-r--r-- 1 hive 1007 1789565 12月 15 06:56 log4j-core-2.16.0.jar
-rw-r--r-- 1 hive 1007 24258 12月 15 06:56 log4j-slf4j-impl-2.16.0.jar
-rw-r--r-- 1 root root 724213 12月 27 23:23 mysql-connector-java-5.1.9.jar
[root@node01 lib]#
5. 进入到flink sql 中
./sql-client.sh embedded shell
# 在SQL Cli设置分析结果展示模式
set execution.result-mode=tableau;
6. 建表语句
CREATE TABLE t1(
uuid VARCHAR(20),
name VARCHAR(10),
age INT,
ts TIMESTAMP(3),
`partition` VARCHAR(20)
)
PARTITIONED BY (`partition`)
WITH (
'connector' = 'hudi',
'path' = 'hdfs://192.168.1.161:8020/hudi-warehouse/hudi-t1',
'write.tasks' = '1',
'compaction.tasks' = '1',
'table.type' = 'MERGE_ON_READ'
);
6.1 插入数据
INSERT INTO t1 VALUES('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1');
INSERT INTO t1 VALUES
('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
## 展示
Flink SQL> INSERT INTO t1 VALUES
> ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
> ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
> ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
> ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
> ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
> ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
> ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
[INFO] Submitting SQL update statement to the cluster...
[INFO] SQL update statement has been successfully submitted to the cluster:
Job ID: d6c70e43969b0f2b5124104468c5e065
Flink SQL> select * from t1;
+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
| op | uuid | name | age | ts | partition |
+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
| +I | id6 | Emma | 20 | 1970-01-01 00:00:06.000 | par3 |
| +I | id5 | Sophia | 18 | 1970-01-01 00:00:05.000 | par3 |
| +I | id8 | Han | 56 | 1970-01-01 00:00:08.000 | par4 |
| +I | id7 | Bob | 44 | 1970-01-01 00:00:07.000 | par4 |
| +I | id2 | Stephen | 33 | 1970-01-01 00:00:02.000 | par1 |
| +I | id1 | Danny | 28 | 1970-01-01 00:00:01.000 | par1 |
| +I | id4 | Fabian | 31 | 1970-01-01 00:00:04.000 | par2 |
| +I | id3 | Julian | 53 | 1970-01-01 00:00:03.000 | par2 |
+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
Received a total of 8 rows
7 更新 操作 更新就是需要从新插入数据
将年龄更改为18
INSERT INTO t1 VALUES('id1','Danny',18,TIMESTAMP '1970-01-01 00:00:01','par1');
查询如下
Flink SQL> select * from t1;
+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
| op | uuid | name | age | ts | partition |
+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
| +I | id8 | Han | 56 | 1970-01-01 00:00:08.000 | par4 |
| +I | id7 | Bob | 44 | 1970-01-01 00:00:07.000 | par4 |
| +I | id4 | Fabian | 31 | 1970-01-01 00:00:04.000 | par2 |
| +I | id3 | Julian | 53 | 1970-01-01 00:00:03.000 | par2 |
| +I | id2 | Stephen | 33 | 1970-01-01 00:00:02.000 | par1 |
| +I | id1 | Danny | 18 | 1970-01-01 00:00:01.000 | par1 |
| +I | id6 | Emma | 20 | 1970-01-01 00:00:06.000 | par3 |
| +I | id5 | Sophia | 18 | 1970-01-01 00:00:05.000 | par3 |
+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
Received a total of 8 rows
Flink SQL>