项目有个需求是要求把mysql的数据同步到hive中,之前用过sqoop,这里记录下,以后还用得着
命令如下
sqoop import --connect jdbc:mysql://100.98.97.156:3306/volte_eop_prod --username root --password 123456 --table dw_wy_drop_customized_drilldown_table_daily --direct --fields-terminated-by "\t" --lines-terminated-by "\n" --delete-target-dir --hive-import --create-hive-table --hive-database test --hive-table test1 --num-mappers 1
参数分析
delete-target-dir :当你重复导入数据的时候由于HDFS文件路径已经存在会导致导入失败,加入这个参数,导入完后删除HDFS对应文件,重复导入不会报错
num-mappers : 这是mapper的数量,这个根据你自己的情况而定
create-hive-table : 根据mysql的表结构创建hive表
direct : mysql的特别参数,加快导出速度
执行结果
[ericsson@dlbdn3 runtu]$ sqoop import --connect jdbc:mysql://100.98.97.156:3306/volte_eop_prod --username root --password 123456 --table dw_wy_drop_customized_drilldown_table_daily --direct --fields-terminated-by "\t" --lines-terminated-by "\n" --delete-target-dir --hive-import --create-hive-table --hive-database test --hive-table test1 --num-mappers 1
Warning: /opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/12/13 17:55:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.11.0
18/12/13 17:55:49 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/12/13 17:55:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/12/13 17:55:49 INFO tool.CodeGenTool: Beginning code generation
18/12/13 17:55:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dw_wy_drop_customized_drilldown_table_daily` AS t LIMIT 1
18/12/13 17:55:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dw_wy_drop_customized_drilldown_table_daily` AS t LIMIT 1
18/12/13 17:55:49 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-ericsson/compile/0f7e6d0f0c9ff6fc9fffb7d3d6412651/dw_wy_drop_customized_drilldown_table_daily.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/12/13 17:55:52 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-ericsson/compile/0f7e6d0f0c9ff6fc9fffb7d3d6412651/dw_wy_drop_customized_drilldown_table_daily.jar
18/12/13 17:55:54 INFO tool.ImportTool: Destination directory dw_wy_drop_customized_drilldown_table_daily deleted.
18/12/13 17:55:54 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import
18/12/13 17:55:54 INFO mapreduce.ImportJobBase: Beginning import of dw_wy_drop_customized_drilldown_table_daily
18/12/13 17:55:54 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/12/13 17:55:54 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/12/13 17:55:54 INFO client.RMProxy: Connecting to ResourceManager at dlbdn3/192.168.123.4:8032
18/12/13 17:55:56 INFO db.DBInputFormat: Using read commited transaction isolation
18/12/13 17:55:56 INFO mapreduce.JobSubmitter: number of splits:1
18/12/13 17:55:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1543800485319_1069
18/12/13 17:55:57 INFO impl.YarnClientImpl: Submitted application application_1543800485319_1069
18/12/13 17:55:57 INFO mapreduce.Job: The url to track the job: http://dlbdn3:8088/proxy/application_1543800485319_1069/
18/12/13 17:55:57 INFO mapreduce.Job: Running job: job_1543800485319_1069
18/12/13 17:56:05 INFO mapreduce.Job: Job job_1543800485319_1069 running in uber mode : false
18/12/13 17:56:05 INFO mapreduce.Job: map 0% reduce 0%
18/12/13 17:56:15 INFO mapreduce.Job: map 100% reduce 0%
18/12/13 17:56:15 INFO mapreduce.Job: Job job_1543800485319_1069 completed successfully
18/12/13 17:56:15 INFO mapreduce.Job: Counters: 32
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=153436
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=562
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=6137
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=6137
Total vcore-milliseconds taken by all map tasks=6137
Total megabyte-milliseconds taken by all map tasks=6284288
Map-Reduce Framework
Map input records=1
Map output records=6
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=48
CPU time spent (ms)=1510
Physical memory (bytes) snapshot=328478720
Virtual memory (bytes) snapshot=1694789632
Total committed heap usage (bytes)=824180736
Peak Map Physical memory (bytes)=328478720
Peak Map Virtual memory (bytes)=1694789632
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=562
18/12/13 17:56:15 INFO mapreduce.ImportJobBase: Transferred 562 bytes in 21.2223 seconds (26.4815 bytes/sec)
18/12/13 17:56:15 INFO mapreduce.ImportJobBase: Retrieved 6 records.
18/12/13 17:56:15 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dw_wy_drop_customized_drilldown_table_daily` AS t LIMIT 1
18/12/13 17:56:15 WARN hive.TableDefWriter: Column DATE_TIME had to be cast to a less precise type in Hive
18/12/13 17:56:15 INFO hive.HiveImport: Loading uploaded data into Hive
Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/jars/hive-common-1.1.0-cdh5.11.0.jar!/hive-log4j.properties
OK
Time taken: 3.832 seconds
Loading data to table test.test1
Table test.test1 stats: [numFiles=1, totalSize=562]
OK
Time taken: 0.691 seconds
[ericsson@dlbdn3 runtu]$
用direct参数
18/12/13 17:55:54 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import
生成表名的java文件
[ericsson@dlbdn3 runtu]$ ll
total 32
-rw-rw-r-- 1 ericsson ericsson 32198 Dec 13 17:37 dw_wy_drop_customized_drilldown_table_daily.java
[ericsson@dlbdn3 runtu]$