title: sqoop2 shell 使用
tags: 新建,模板,小书匠
grammar_cjkRuby: true
从MySQL数据库中导入数据到HDFS中
显示 可用创建link的connector
sqoop:000> show connector
0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
+------------------------+---------+------------------------------------------------------------+----------------------+
| Name | Version | Class | Supported Directions |
+------------------------+---------+------------------------------------------------------------+----------------------+
| generic-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
| kite-connector | 1.99.7 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO |
| oracle-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO |
| ftp-connector | 1.99.7 | org.apache.sqoop.connector.ftp.FtpConnector | TO |
| hdfs-connector | 1.99.7 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO |
| kafka-connector | 1.99.7 | org.apache.sqoop.connector.kafka.KafkaConnector | TO |
| sftp-connector | 1.99.7 | org.apache.sqoop.connector.sftp.SftpConnector | TO |
+------------------------+---------+------------------------------------------------------------+----------------------+
创建mysqllink
sqoop:000> update link -n mysqllink
Updating link with name mysqllink
Please update link:
Name: mysqllink
Database connection
Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://master:3306/forsqoop?useSSL=false
Username: root
Password: ******
Fetch Size:
Connection Properties:
There are currently 0 values in the map:
entry#
SQL Dialect
Identifier enclose: `
link was successfully updated with status OK
创建 hdfslink
sqoop:000> update link -n hdfslink
Updating link with name hdfslink
Please update link:
Name: hdfslink
HDFS cluster
URI: hdfs://master:9000
Conf directory: /usr/hadoop-2.6.4/etc/hadoop/
Additional configs::
There are currently 0 values in the map:
entry#
link was successfully updated with status OK
显示创建成功的link
sqoop:000> show link
+-----------+------------------------+---------+
| Name | Connector Name | Enabled |
+-----------+------------------------+---------+
| mysqllink | generic-jdbc-connector | true |
| hdfslink | hdfs-connector | true |
+-----------+------------------------+---------+
更新 job
sqoop:000> update job -n frommysqljob
Updating job with name frommysqljob
Please update job:
Name: frommysqljob
Database source
Schema name: forsqoop
Table name: sds
SQL statement:
Column names:
There are currently 0 values in the list:
element#
Partition column: SD_ID
Partition column nullable:
Boundary query:
Incremental read
Check column:
Last value:
Target configuration
Override null value:
Null value:
File format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
2 : PARQUET_FILE
Choose: 0
Compression codec:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom codec:
Output directory: /user/root/sqoop/frommysql
Append mode:
Throttling resources
Extractors:
Loaders:
Classpath configuration
Extra mapper jars:
There are currently 0 values in the list:
element#
Job was successfully updated with status OK
开启任务
sqoop:000> start job -n frommysqljob
Submission details
Job Name: frommysqljob
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2017-06-25 14:43:47 CST
Lastly updated by: root
External ID: job_1498352893725_0013
http://master:8088/proxy/application_1498352893725_0013/
2017-06-25 14:43:47 CST: BOOTING - Progress is not available
查看执行任务的结果
易出现问题的地方
1 mysql 没有打开远程链接,导致can't get a connector 错误的发生
进入mysql开启远程访问
@'192.168.137.121'可以替换为@‘%’就可任意ip访问,当然我们也可以直接用 UPDATE 更新 root 用户 Host, 但不推荐,123456是密码 SQL如下:
GRANT ALL PRIVILEGES ON *.* TO 'root'@'192.168.137.121' IDENTIFIED BY '123456' WITH GRANT OPTION;
flush privileges;
2 配置 generic-jdbc-connector link时 Identifier enclose: `参数要设置为`或者是空格
3 mysql导入hdfs时表结构如果没有主键需要显示指定Partition column: 参数为表中的一个字段分区,否则会因为没有主键导致导入报错。
4 MySQL中的Connection String:参数配置路径不能使用localhost因为你不知道yarn会分配任务给那一台机器执行此操作,使用master从机后都配过master的IP地址,这样就不用担心从机获取不到master主机上的数据库数据了。
从HDFS导出数据到MySQL数据库
创建 tomysqllink
sqoop:000> create link -n generic-jdbc-connector
Invalid command invocation: Unknown option encountered: -n
sqoop:000> create link generic-jdbc-connector
Invalid command invocation: Missing required option: c
sqoop:000> create link -c generic-jdbc-connector
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: tomysqllink
Database connection
Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://master:3306/forsqoop?useSSL=false
Username: root
Password: ******
Fetch Size:
Connection Properties:
There are currently 0 values in the map:
entry#
SQL Dialect
Identifier enclose: `
New link was successfully created with validation status OK and name tomysqllink
创建fromhdfslink link
sqoop:000> create link -c hdfs-connector
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: fromhdfslink
HDFS cluster
URI: hdfs://master:9000
Conf directory: /usr/hadoop-2.6.4/etc/hadoop/
Additional configs::
There are currently 0 values in the map:
entry#
New link was successfully created with validation status OK and name fromhdfslink
展示创建成功的link
sqoop:000> show link
+--------------+------------------------+---------+
| Name | Connector Name | Enabled |
+--------------+------------------------+---------+
| mysqllink | generic-jdbc-connector | true |
| tomysqllink | generic-jdbc-connector | true |
| hdfslink | hdfs-connector | true |
| fromhdfslink | hdfs-connector | true |
+--------------+------------------------+---------+
创建 tomysqljob job
sqoop:000> create job --f fromhdfslink --t tomysqllink
Creating job for links with from name fromhdfslink and to name tomysqllink
Please fill following values to create new job object
Name: tomysqljob
Input configuration
Input directory: /user/root/sqoop/frommysql
Override null value: N
Null value: ture
Incremental import
Incremental type:
0 : NONE
1 : NEW_FILES
Choose: 0
Last imported date:
Database target
Schema name: forsqoop
Table name: sds
Column names:
There are currently 0 values in the list:
element#
Staging table:
Clear stage table:
Throttling resources
Extractors:
Loaders:
Classpath configuration
Extra mapper jars:
There are currently 0 values in the list:
element#
New job was successfully created with validation status OK and name tomysqljob
展示创建成功的job
sqoop:000> show job
+----+--------------+------------------------------------+--------------------------------------+---------+
| Id | Name | From Connector | To Connector | Enabled |
+----+--------------+------------------------------------+--------------------------------------+---------+
| 1 | frommysqljob | mysqllink (generic-jdbc-connector) | hdfslink (hdfs-connector) | true |
| 2 | tomysqljob | fromhdfslink (hdfs-connector) | tomysqllink (generic-jdbc-connector) | true |
+----+--------------+------------------------------------+--------------------------------------+---------+
执行导入MySQL数据库表的job
sqoop:000> start job -n tomysqljob
Submission details
Job Name: tomysqljob
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2017-06-25 15:03:10 CST
Lastly updated by: root
External ID: job_1498352893725_0014
http://master:8088/proxy/application_1498352893725_0014/
2017-06-25 15:03:10 CST: BOOTING - Progress is not available
查看job的执行情况
相关链接
Connector-GenericJDBC 配置
Connector-HDFS 配置
其中的 FROM Job Configuration
与 TO Job Configuration
分别会在创建job的时候用到里面的参数
自定义 Connector Development 默认支持导入导出CSV