通过datax 把mysql 的数据导出到HDFS中
注: 我的datax 运行在mysql这台机器上,hdfs在另外的一台机器上;
1. mysql 创建表和插入数据:
1.1 在 test 数据库中 创建表
CREATE TABLE `tbtest` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
`Dep` int(11),
PRIMARY KEY (`id`) USING BTREE)
ENGINE = InnoDB AUTO_INCREMENT = 10000 CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;
1.2 插入数据
insert into tbtest VALUES(1, '张飞',7);
insert into tbtest VALUES(2, '关羽',7);
insert into tbtest VALUES(3, '刘备',7);
2. 编辑并执行json 文件
python /data/datax/datax/bin/datax.py -r mysqlreader -w hdfswriter
可以看到一个json 的文件格式;
2.1 json文件如下:
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [
"id",
"name",
"Dep"
],
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://127.0.0.1:3306/test"
],
"table":
[
"tbtest"
]
}
],
"password": "Test*123",
"username": "root",
"where": ""
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{
"name":"id",
"type":"int"
},
{
"name":"name",
"type":"char"
},
{
"name":"Dep",
"type":"int"
}
],
"compress": "NONE",
"defaultFS": "hdfs://hadoop的IP :9000",
"fieldDelimiter": "\t",
"fileName": "tbtest.txt",
"fileType": "orc",
"path": "/data/hadoop/hadoopdata",
"writeMode": "append"
}
}
}
],
"setting": {
"speed": {
"channel": "1"
}
}
}
}
2.2 执行命令
python /data/datax/datax/bin/datax.py /data/datax/mysqlhdfsjob.json
成功的提示如下:
3. 执行过程中发现的问题
问题1: JSON 格式老是报错, 解决办法: 用Sublime 的PrettyJson 调试, 非常方便。
问题2: java.net.ConnectException: Connection refused 报错; 网络层面先排错, 还有就是
netstat -nltp 看看9000端口,截图如下:
当发现IP是 127.0.0.1 的时候, 就需要更改hadoop 的配置文件了;
更改 core-site.xml,
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://0.0.0.0:9000</value> # 把localhost 改成0.0.0.0或则本机IP.
</property>
</configuration>
4.关于json文件的详细项,参考官方的说明
Please refer to the mysqlreader document:
https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md
Please refer to the hdfswriter document:
https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md
**送人玫瑰,手留余香,如果有帮助,烦请点个赞!**