Spark Hive

版本: 2.3.0

准备

保证spark的的各个节点上都有hive的包。

将hive的配置文件, hive-site.xml 拷贝到spark的 conf文件下 。

配置

在hive-site.xml中的参数 hive.metastore.warehouse.dir 自版本 spark2.0.0 起废弃了。 需要使用 spark.sql.warehouse.dir 来指定默认的数据仓库目录。 需要给该目录提供读写权限。

(但是实际来看,hive.metastore.warehouse.dir 仍然在起作用,并且通过spark-sql创建的表也会在相应的目录下存在。 )

启动 hive

hive  --service  metastore &  
hiveserver2 & 

启动 spark

start-all.sh  
star-histrory-server.sh  

启动spark sql 客户端 验证测试

./spark-sql --master spark://node202.hmbank.com:7077   

18/05/31 12:00:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/05/31 12:00:31 WARN HiveConf: HiveConf of name hive.server2.thrift.client.user does not exist
18/05/31 12:00:31 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist
18/05/31 12:00:31 WARN HiveConf: HiveConf of name hive.server2.webui.host does not exist
18/05/31 12:00:31 WARN HiveConf: HiveConf of name hive.server2.thrift.client.password does not exist
18/05/31 12:00:32 INFO metastore: Trying to connect to metastore with URI thrift://node203.hmbank.com:9083
18/05/31 12:00:32 INFO metastore: Connected to metastore.
18/05/31 12:00:32 INFO SessionState: Created local directory: /var/hive/iotmp/0d80c963-6383-42b5-89c6-9c82cbd4e15c_resources
18/05/31 12:00:32 INFO SessionState: Created HDFS directory: /tmp/hive/root/0d80c963-6383-42b5-89c6-9c82cbd4e15c
18/05/31 12:00:32 INFO SessionState: Created local directory: /var/hive/iotmp/hive/0d80c963-6383-42b5-89c6-9c82cbd4e15c
18/05/31 12:00:32 INFO SessionState: Created HDFS directory: /tmp/hive/root/0d80c963-6383-42b5-89c6-9c82cbd4e15c/_tmp_space.db
18/05/31 12:00:32 INFO SparkContext: Running Spark version 2.3.0
18/05/31 12:00:32 INFO SparkContext: Submitted application: SparkSQL::10.30.16.204
18/05/31 12:00:32 INFO SecurityManager: Changing view acls to: root
18/05/31 12:00:32 INFO SecurityManager: Changing modify acls to: root
18/05/31 12:00:32 INFO SecurityManager: Changing view acls groups to: 
18/05/31 12:00:32 INFO SecurityManager: Changing modify acls groups to: 
18/05/31 12:00:32 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
18/05/31 12:00:33 INFO Utils: Successfully started service 'sparkDriver' on port 33733.
18/05/31 12:00:33 INFO SparkEnv: Registering MapOutputTracker
18/05/31 12:00:33 INFO SparkEnv: Registering BlockManagerMaster
18/05/31 12:00:33 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/05/31 12:00:33 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/05/31 12:00:33 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-f425c261-2aa0-4063-8fa2-2ff4f106d948
18/05/31 12:00:33 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
18/05/31 12:00:33 INFO SparkEnv: Registering OutputCommitCoordinator
18/05/31 12:00:33 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/05/31 12:00:33 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://node204.hmbank.com:4040
18/05/31 12:00:33 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://node202.hmbank.com:7077...
18/05/31 12:00:33 INFO TransportClientFactory: Successfully created connection to node202.hmbank.com/10.30.16.202:7077 after 30 ms (0 ms spent in bootstraps)
18/05/31 12:00:33 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20180531120033-0000
18/05/31 12:00:33 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37444.
18/05/31 12:00:33 INFO NettyBlockTransferService: Server created on node204.hmbank.com:37444
18/05/31 12:00:33 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/05/31 12:00:33 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, node204.hmbank.com, 37444, None)
18/05/31 12:00:33 INFO BlockManagerMasterEndpoint: Registering block manager node204.hmbank.com:37444 with 366.3 MB RAM, BlockManagerId(driver, node204.hmbank.com, 37444, None)
18/05/31 12:00:33 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, node204.hmbank.com, 37444, None)
18/05/31 12:00:33 INFO BlockManager: external shuffle service port = 7338
18/05/31 12:00:33 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, node204.hmbank.com, 37444, None)
18/05/31 12:00:33 INFO EventLoggingListener: Logging events to hdfs://hmcluster/user/spark/eventLog/app-20180531120033-0000
18/05/31 12:00:33 INFO Utils: Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
18/05/31 12:00:34 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
18/05/31 12:00:34 INFO SharedState: loading hive config file: file:/usr/lib/apacheori/spark-2.3.0-bin-hadoop2.6/conf/hive-site.xml
18/05/31 12:00:34 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/usr/lib/apacheori/spark-2.3.0-bin-hadoop2.6/bin/spark-warehouse').
18/05/31 12:00:34 INFO SharedState: Warehouse path is 'file:/usr/lib/apacheori/spark-2.3.0-bin-hadoop2.6/bin/spark-warehouse'.
18/05/31 12:00:34 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/05/31 12:00:34 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.2) is file:/usr/lib/apacheori/spark-2.3.0-bin-hadoop2.6/bin/spark-warehouse
18/05/31 12:00:34 INFO metastore: Mestastore configuration hive.metastore.warehouse.dir changed from /user/hive/warehouse to file:/usr/lib/apacheori/spark-2.3.0-bin-hadoop2.6/bin/spark-warehouse
18/05/31 12:00:34 INFO metastore: Trying to connect to metastore with URI thrift://node203.hmbank.com:9083
18/05/31 12:00:34 INFO metastore: Connected to metastore.
18/05/31 12:00:34 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
spark-sql> show databases;
18/05/31 12:02:22 INFO CodeGenerator: Code generated in 171.318399 ms
default
hivecluster
Time taken: 1.947 seconds, Fetched 2 row(s)
18/05/31 12:02:22 INFO SparkSQLCLIDriver: Time taken: 1.947 seconds, Fetched 2 row(s)

可以正常使用 。

创建表 并插入数据

> create table spark2 (id int , seq int ,name string) using hive options(fileFormat 'parquet');
Time taken: 0.358 seconds
18/05/31 14:10:47 INFO SparkSQLCLIDriver: Time taken: 0.358 seconds
spark-sql> 
         > desc spark2;
id  int NULL
seq int NULL
name    string  NULL
Time taken: 0.061 seconds, Fetched 3 row(s)
18/05/31 14:10:54 INFO SparkSQLCLIDriver: Time taken: 0.061 seconds, Fetched 3 row(s)
spark-sql> insert into spark2 values( 1,1, 'nn');

查询表数据

spark-sql> select * from spark2;
18/05/31 14:12:08 INFO FileSourceStrategy: Pruning directories with: 
18/05/31 14:12:08 INFO FileSourceStrategy: Post-Scan Filters: 
18/05/31 14:12:08 INFO FileSourceStrategy: Output Data Schema: struct<id: int, seq: int, name: string ... 1 more fields>
18/05/31 14:12:08 INFO FileSourceScanExec: Pushed Filters: 
18/05/31 14:12:08 INFO CodeGenerator: Code generated in 33.608151 ms
18/05/31 14:12:08 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 249.2 KB, free 365.9 MB)
18/05/31 14:12:08 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 24.6 KB, free 365.8 MB)
18/05/31 14:12:08 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on node204.hmbank.com:37444 (size: 24.6 KB, free: 366.2 MB)
18/05/31 14:12:08 INFO SparkContext: Created broadcast 6 from processCmd at CliDriver.java:376
18/05/31 14:12:08 INFO FileSourceScanExec: Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes.
18/05/31 14:12:09 INFO SparkContext: Starting job: processCmd at CliDriver.java:376
18/05/31 14:12:09 INFO DAGScheduler: Got job 5 (processCmd at CliDriver.java:376) with 1 output partitions
18/05/31 14:12:09 INFO DAGScheduler: Final stage: ResultStage 3 (processCmd at CliDriver.java:376)
18/05/31 14:12:09 INFO DAGScheduler: Parents of final stage: List()
18/05/31 14:12:09 INFO DAGScheduler: Missing parents: List()
18/05/31 14:12:09 INFO DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[21] at processCmd at CliDriver.java:376), which has no missing parents
18/05/31 14:12:09 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 10.1 KB, free 365.8 MB)
18/05/31 14:12:09 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 4.6 KB, free 365.8 MB)
18/05/31 14:12:09 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on node204.hmbank.com:37444 (size: 4.6 KB, free: 366.2 MB)
18/05/31 14:12:09 INFO SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:1039
18/05/31 14:12:09 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 3 (MapPartitionsRDD[21] at processCmd at CliDriver.java:376) (first 15 tasks are for partitions Vector(0))
18/05/31 14:12:09 INFO TaskSchedulerImpl: Adding task set 3.0 with 1 tasks
18/05/31 14:12:09 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 4, 10.30.16.202, executor 1, partition 0, ANY, 8395 bytes)
18/05/31 14:12:09 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on 10.30.16.202:36243 (size: 4.6 KB, free: 366.2 MB)
18/05/31 14:12:09 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on 10.30.16.202:36243 (size: 24.6 KB, free: 366.2 MB)
18/05/31 14:12:09 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 4) in 595 ms on 10.30.16.202 (executor 1) (1/1)
18/05/31 14:12:09 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 
18/05/31 14:12:09 INFO DAGScheduler: ResultStage 3 (processCmd at CliDriver.java:376) finished in 0.603 s
18/05/31 14:12:09 INFO DAGScheduler: Job 5 finished: processCmd at CliDriver.java:376, took 0.607284 s
1   1   nn
Time taken: 0.793 seconds, Fetched 1 row(s)
18/05/31 14:12:09 INFO SparkSQLCLIDriver: Time taken: 0.793 seconds, Fetched 1 row(s)

在hdfs 上的 hive 的 warehouse目录下查看:

图片.png

可以看到数据已经正确的写入到hive中 ;

查看hive 元数据库 表的存储情况:

图片.png

Done!

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 218,284评论 6 506
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 93,115评论 3 395
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 164,614评论 0 354
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,671评论 1 293
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,699评论 6 392
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,562评论 1 305
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,309评论 3 418
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,223评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,668评论 1 314
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,859评论 3 336
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,981评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,705评论 5 347
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,310评论 3 330
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,904评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,023评论 1 270
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,146评论 3 370
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,933评论 2 355

推荐阅读更多精彩内容