使用sqoop从mysql向hive中导数据

一、情况概述

mysql中已创建好名为sitelight的数据库并导入了数据,表名为t_site_formal,数据条数为168992。hive中执行建库语句,名称为sitelight,且未指定分隔符,为默认的'\001'。执行导入数据的命令时,需要启动hadoop集群、启动hive服务端、hive的metastore服务,sqoop无需启动,可直接用。

二、服务启动

  1. 启动hdfs服务
[hadoop@hadoop01 sbin]$ pwd
/home/hadoop/hadoop-2.10.1/sbin
[hadoop@hadoop01 sbin]$ ./start-dfs.sh
  1. 启动hive服务端,无所谓在哪个目录上
[hadoop@hadoop01 sbin]$ nohup hive --service hiveserver2 &
  1. 启动metastore服务,无所谓在哪个目录上
[hadoop@hadoop01 sbin]$ nohup hive --service metastore &

三、执行脚本

sqoop import  \
--connect jdbc:mysql://172.16.100.19:3306/sitelight  \
--username queryuser  \
--password abcde12345  \
--table t_site_formal  \
--hive-import  \
--hive-overwrite  \
--create-hive-table  \
--delete-target-dir \
--hive-database  sitelight \
--hive-table t_site_formal \
-m 1

四、执行导入脚本遇见的错误

  1. 报错信息1
[hadoop@hadoop01 sbin]$ sqoop import   \
> --connect jdbc:mysql://172.16.100.19:3306/sitelight   \
> --username queryuser  \
> --password abcde12345   \
> --table t_site_formal   \
> --hive-import \
> -m 1
Warning: /home/hadoop/sqoop-1.4.7/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/sqoop-1.4.7/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop-1.4.7/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/sqoop-1.4.7/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
21/12/07 15:32:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
21/12/07 15:32:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
21/12/07 15:32:53 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
21/12/07 15:32:53 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
21/12/07 15:32:53 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
21/12/07 15:32:53 INFO tool.CodeGenTool: Beginning code generation
21/12/07 15:32:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_site_formal` AS t LIMIT 1
21/12/07 15:32:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_site_formal` AS t LIMIT 1
21/12/07 15:32:53 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop-2.10.1
Note: /tmp/sqoop-hadoop/compile/672d74685e64a0fe1025ce9b2b875e46/t_site_formal.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
21/12/07 15:32:54 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/672d74685e64a0fe1025ce9b2b875e46/t_site_formal.jar
21/12/07 15:32:54 WARN manager.MySQLManager: It looks like you are importing from mysql.
21/12/07 15:32:54 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
21/12/07 15:32:54 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
21/12/07 15:32:54 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
21/12/07 15:32:54 INFO mapreduce.ImportJobBase: Beginning import of t_site_formal
21/12/07 15:32:54 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
21/12/07 15:32:54 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
21/12/07 15:32:54 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/172.16.100.26:8032
21/12/07 15:32:56 INFO ipc.Client: Retrying connect to server: hadoop01/172.16.100.26:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
21/12/07 15:32:57 INFO ipc.Client: Retrying connect to server: hadoop01/172.16.100.26:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
21/12/07 15:32:58 INFO ipc.Client: Retrying connect to server: hadoop01/172.16.100.26:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
  • 报错原因:INFO client.RMProxy: Connecting to ResourceManager at hadoop01/172.16.100.26:8032,resourcemanager属于yarn的主节点,是由于hadoop服务配置时指定了yarn计算框架。
  • 解决方法:一、如果只想使用hadoop的hdfs文件系统存储数据功能,可不指定yarn,yarn配置的文件是mapred-site.xml,即/home/hadoop/hadoop-2.10.1/etc/hadoop/mapred-site.xml,将其修改成别名,并重启hadoop集群即可;二、直接启动yarn服务,在sbin目录下直接启动,即/home/hadoop/hadoop-2.10.1/sbin/start-yarn.sh
  1. 报错信息2
21/12/07 15:39:42 INFO hive.HiveImport: Loading uploaded data into Hive
21/12/07 15:39:42 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.
21/12/07 15:39:42 ERROR tool.ImportTool: Import failed: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
    at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50)
    at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392)
    at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379)
    at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:337)
    at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:264)
    at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:44)
    ... 12 more
  • 报错原因:Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.很明显,hadoop找不到hive的配置文件。
  • 解决方法:追加当前用户下hadoop的环境变量
[hadoop@hadoop01 sbin]$  vim ~/.bashrc

export HADOOP_CLASSPATH=$HIVE_HOME/lib/*
  1. 报错信息3
21/12/07 17:25:05 INFO mapreduce.ImportJobBase: Transferred 40.7139 MB in 11.4171 seconds (3.566 MB/sec)
21/12/07 17:25:05 INFO mapreduce.ImportJobBase: Retrieved 168237 records.
21/12/07 17:25:05 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table t_site_formal
21/12/07 17:25:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_site_formal` AS t LIMIT 1
21/12/07 17:25:05 WARN hive.TableDefWriter: Column create_date had to be cast to a less precise type in Hive
21/12/07 17:25:05 WARN hive.TableDefWriter: Column audit_date had to be cast to a less precise type in Hive
21/12/07 17:25:05 INFO hive.HiveImport: Loading uploaded data into Hive
21/12/07 17:25:05 INFO conf.HiveConf: Found configuration file file:/home/hadoop/apache-hive-2.3.9-bin/conf/hive-site.xml
2021-12-07 17:25:06,664 main ERROR Could not register mbeans java.security.AccessControlException: access denied ("javax.management.MBeanTrustPermission" "register")
    at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
    at java.lang.SecurityManager.checkPermission(SecurityManager.java:585)
    at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.checkMBeanTrustPermission(DefaultMBeanServerInterceptor.java:1848)
    at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:322)
    at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
    at org.apache.logging.log4j.core.jmx.Server.register(Server.java:380)
    at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:165)
    at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:138)
    at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:507)
    at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:249)
    at org.apache.logging.log4j.core.async.AsyncLoggerContext.start(AsyncLoggerContext.java:86)
    at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:239)
    at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:157)
    at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:130)
    at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:100)
    at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:187)
    at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jDefault(LogUtils.java:154)
    at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:90)
    at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:82)
    at org.apache.hadoop.hive.common.LogUtils.initHiveLog4j(LogUtils.java:65)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:702)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:331)
    at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
  • 报错原因:ERROR Could not register mbeans java.security.AccessControlException: access denied ("javax.management.MBeanTrustPermission" "register"),明确提示是java的安全策略控制不允许,即jdk的安全配置需修改。
  • 解决方法:找到jdk安装目录,并修改安全策略文件。我的jdk是安装在全局变量中,所以使用root用户修改
[hadoop@hadoop01 sbin]$  echo $JAVA_HOME
/usr/local/java
[hadoop@hadoop01 sbin]$ exit
exit
[root@hadoop01 hadoop-2.10.1]$ cd /usr/local/java/jre/lib/security/
[root@hadoop01 security]$ vim java.policy 

//在grant的{}中添加此配置
permission javax.management.MBeanTrustPermission "register";

五、成功运行

LOAD DATA INPATH 'hdfs://hadoop01:8082/user/hadoop/t_site_formal' OVERWRITE INTO TABLE `sitelight`.`t_site_formal`
21/12/07 18:14:47 INFO ql.Driver: Semantic Analysis Completed
21/12/07 18:14:47 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
21/12/07 18:14:47 INFO ql.Driver: Completed compiling command(queryId=hadoop_20211207181447_5c3a828a-2830-4fa6-b852-e4731b760303); Time taken: 0.115 seconds
21/12/07 18:14:47 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
21/12/07 18:14:47 INFO ql.Driver: Executing command(queryId=hadoop_20211207181447_5c3a828a-2830-4fa6-b852-e4731b760303): 
LOAD DATA INPATH 'hdfs://hadoop01:8082/user/hadoop/t_site_formal' OVERWRITE INTO TABLE `sitelight`.`t_site_formal`
21/12/07 18:14:47 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode
21/12/07 18:14:47 INFO hive.metastore: Closed a connection to metastore, current connections: 0
Loading data to table sitelight.t_site_formal
21/12/07 18:14:47 INFO exec.Task: Loading data to table sitelight.t_site_formal from hdfs://hadoop01:8082/user/hadoop/t_site_formal
21/12/07 18:14:47 INFO hive.metastore: Trying to connect to metastore with URI thrift://172.16.100.26:9083
21/12/07 18:14:47 INFO hive.metastore: Opened a connection to metastore, current connections: 1
21/12/07 18:14:47 INFO hive.metastore: Connected to metastore.
21/12/07 18:14:47 INFO common.FileUtils: Creating directory if it doesn't exist: hdfs://hadoop01:8082/usr/local/warehouse/sitelight.db/t_site_formal
21/12/07 18:14:47 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode
21/12/07 18:14:47 INFO exec.StatsTask: Executing stats task
21/12/07 18:14:47 INFO hive.metastore: Closed a connection to metastore, current connections: 0
21/12/07 18:14:47 INFO hive.metastore: Trying to connect to metastore with URI thrift://172.16.100.26:9083
21/12/07 18:14:47 INFO hive.metastore: Opened a connection to metastore, current connections: 1
21/12/07 18:14:47 INFO hive.metastore: Connected to metastore.
21/12/07 18:14:47 INFO hive.metastore: Closed a connection to metastore, current connections: 0
21/12/07 18:14:47 INFO hive.metastore: Trying to connect to metastore with URI thrift://172.16.100.26:9083
21/12/07 18:14:47 INFO hive.metastore: Opened a connection to metastore, current connections: 1
21/12/07 18:14:47 INFO hive.metastore: Connected to metastore.
21/12/07 18:14:47 INFO exec.StatsTask: Table sitelight.t_site_formal stats: [numFiles=1, numRows=0, totalSize=42691574, rawDataSize=0]
21/12/07 18:14:47 INFO ql.Driver: Completed executing command(queryId=hadoop_20211207181447_5c3a828a-2830-4fa6-b852-e4731b760303); Time taken: 0.291 seconds
OK
21/12/07 18:14:47 INFO ql.Driver: OK
Time taken: 0.406 seconds
21/12/07 18:14:47 INFO CliDriver: Time taken: 0.406 seconds
21/12/07 18:14:47 INFO conf.HiveConf: Using the default value passed in for log id: bb608b6a-b9af-4129-bae0-739683648882
21/12/07 18:14:47 INFO session.SessionState: Resetting thread name to  main
21/12/07 18:14:47 INFO conf.HiveConf: Using the default value passed in for log id: bb608b6a-b9af-4129-bae0-739683648882
21/12/07 18:14:47 INFO session.SessionState: Deleted directory: /tmp/hive/hadoop/bb608b6a-b9af-4129-bae0-739683648882 on fs with scheme hdfs
21/12/07 18:14:47 INFO session.SessionState: Deleted directory: /tmp/hadoop/bb608b6a-b9af-4129-bae0-739683648882 on fs with scheme file
21/12/07 18:14:47 INFO hive.metastore: Closed a connection to metastore, current connections: 0
21/12/07 18:14:47 INFO hive.HiveImport: Hive import complete.

我只遇到了这些问题,就及时的记录下来了,即使同样的报错也可能是不同的原因,如果不奏效,就再去找找吧!

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 219,701评论 6 508
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 93,649评论 3 396
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 166,037评论 0 356
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,994评论 1 295
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 68,018评论 6 395
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,796评论 1 308
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,481评论 3 420
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,370评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,868评论 1 319
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 38,014评论 3 338
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 40,153评论 1 352
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,832评论 5 346
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,494评论 3 331
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 32,039评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,156评论 1 272
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,437评论 3 373
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 45,131评论 2 356

推荐阅读更多精彩内容