利用sqoop从mysql导入数据到HIVE中

项目有个需求是要求把mysql的数据同步到hive中，之前用过sqoop，这里记录下，以后还用得着

命令如下

sqoop import --connect jdbc:mysql://100.98.97.156:3306/volte_eop_prod --username root --password 123456 --table dw_wy_drop_customized_drilldown_table_daily --direct --fields-terminated-by "\t" --lines-terminated-by "\n" --delete-target-dir --hive-import --create-hive-table --hive-database test --hive-table test1 --num-mappers 1

参数分析

delete-target-dir ：当你重复导入数据的时候由于HDFS文件路径已经存在会导致导入失败，加入这个参数，导入完后删除HDFS对应文件，重复导入不会报错
num-mappers ：这是mapper的数量，这个根据你自己的情况而定
create-hive-table ：根据mysql的表结构创建hive表
direct : mysql的特别参数，加快导出速度

执行结果

[ericsson@dlbdn3 runtu]$ sqoop import --connect jdbc:mysql://100.98.97.156:3306/volte_eop_prod --username root --password 123456 --table dw_wy_drop_customized_drilldown_table_daily --direct  --fields-terminated-by "\t" --lines-terminated-by "\n" --delete-target-dir --hive-import --create-hive-table --hive-database test --hive-table test1 --num-mappers 1
Warning: /opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/12/13 17:55:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.11.0
18/12/13 17:55:49 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/12/13 17:55:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/12/13 17:55:49 INFO tool.CodeGenTool: Beginning code generation
18/12/13 17:55:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dw_wy_drop_customized_drilldown_table_daily` AS t LIMIT 1
18/12/13 17:55:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dw_wy_drop_customized_drilldown_table_daily` AS t LIMIT 1
18/12/13 17:55:49 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-ericsson/compile/0f7e6d0f0c9ff6fc9fffb7d3d6412651/dw_wy_drop_customized_drilldown_table_daily.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/12/13 17:55:52 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-ericsson/compile/0f7e6d0f0c9ff6fc9fffb7d3d6412651/dw_wy_drop_customized_drilldown_table_daily.jar
18/12/13 17:55:54 INFO tool.ImportTool: Destination directory dw_wy_drop_customized_drilldown_table_daily deleted.
18/12/13 17:55:54 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import
18/12/13 17:55:54 INFO mapreduce.ImportJobBase: Beginning import of dw_wy_drop_customized_drilldown_table_daily
18/12/13 17:55:54 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/12/13 17:55:54 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/12/13 17:55:54 INFO client.RMProxy: Connecting to ResourceManager at dlbdn3/192.168.123.4:8032
18/12/13 17:55:56 INFO db.DBInputFormat: Using read commited transaction isolation
18/12/13 17:55:56 INFO mapreduce.JobSubmitter: number of splits:1
18/12/13 17:55:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1543800485319_1069
18/12/13 17:55:57 INFO impl.YarnClientImpl: Submitted application application_1543800485319_1069
18/12/13 17:55:57 INFO mapreduce.Job: The url to track the job: http://dlbdn3:8088/proxy/application_1543800485319_1069/
18/12/13 17:55:57 INFO mapreduce.Job: Running job: job_1543800485319_1069
18/12/13 17:56:05 INFO mapreduce.Job: Job job_1543800485319_1069 running in uber mode : false
18/12/13 17:56:05 INFO mapreduce.Job:  map 0% reduce 0%
18/12/13 17:56:15 INFO mapreduce.Job:  map 100% reduce 0%
18/12/13 17:56:15 INFO mapreduce.Job: Job job_1543800485319_1069 completed successfully
18/12/13 17:56:15 INFO mapreduce.Job: Counters: 32
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=153436
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=562
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=6137
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=6137
        Total vcore-milliseconds taken by all map tasks=6137
        Total megabyte-milliseconds taken by all map tasks=6284288
    Map-Reduce Framework
        Map input records=1
        Map output records=6
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=48
        CPU time spent (ms)=1510
        Physical memory (bytes) snapshot=328478720
        Virtual memory (bytes) snapshot=1694789632
        Total committed heap usage (bytes)=824180736
        Peak Map Physical memory (bytes)=328478720
        Peak Map Virtual memory (bytes)=1694789632
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=562
18/12/13 17:56:15 INFO mapreduce.ImportJobBase: Transferred 562 bytes in 21.2223 seconds (26.4815 bytes/sec)
18/12/13 17:56:15 INFO mapreduce.ImportJobBase: Retrieved 6 records.
18/12/13 17:56:15 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dw_wy_drop_customized_drilldown_table_daily` AS t LIMIT 1
18/12/13 17:56:15 WARN hive.TableDefWriter: Column DATE_TIME had to be cast to a less precise type in Hive
18/12/13 17:56:15 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/jars/hive-common-1.1.0-cdh5.11.0.jar!/hive-log4j.properties
OK
Time taken: 3.832 seconds
Loading data to table test.test1
Table test.test1 stats: [numFiles=1, totalSize=562]
OK
Time taken: 0.691 seconds
[ericsson@dlbdn3 runtu]$

用direct参数

18/12/13 17:55:54 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import

生成表名的java文件

[ericsson@dlbdn3 runtu]$ ll
total 32
-rw-rw-r-- 1 ericsson ericsson 32198 Dec 13 17:37 dw_wy_drop_customized_drilldown_table_daily.java
[ericsson@dlbdn3 runtu]$

利用sqoop从mysql导入数据到HIVE中

项目有个需求是要求把mysql的数据同步到hive中，之前用过sqoop，这里记录下，以后还用得着

命令如下

参数分析

执行结果

用direct参数

生成表名的java文件

推荐阅读更多精彩内容