hive 相关
搭建hadoop和hive,mysql的环境,过程截图
1.hadoop install
sunyonggang@gg01:~/hadoop-2.6.0$ ./sbin/start-dfs.sh
Starting namenodes on [gg01]
gg01: starting namenode, logging to /home/sunyonggang/hadoop-2.6.0/logs/hadoop-sunyonggang-namenode-gg01.out
ggg03: starting datanode, logging to /home/sunyonggang/hadoop-2.6.0/logs/hadoop-sunyonggang-datanode-ggg03.out
ggg02: starting datanode, logging to /home/sunyonggang/hadoop-2.6.0/logs/hadoop-sunyonggang-datanode-ggg02.out
Starting secondary namenodes [gg01]
gg01: starting secondarynamenode, logging to /home/sunyonggang/hadoop-2.6.0/logs/hadoop-sunyonggang-secondarynamenode-gg01.out
sunyonggang@gg01:~/hadoop-2.6.0$ ./sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/sunyonggang/hadoop-2.6.0/logs/yarn-sunyonggang-resourcemanager-gg01.out
ggg02: starting nodemanager, logging to /home/sunyonggang/hadoop-2.6.0/logs/yarn-sunyonggang-nodemanager-ggg02.out
ggg03: starting nodemanager, logging to /home/sunyonggang/hadoop-2.6.0/logs/yarn-sunyonggang-nodemanager-ggg03.out
sunyonggang@gg01:~/hadoop-2.6.0$ jps
1915 NameNode
2118 SecondaryNameNode
2260 ResourceManager
2514 Jps
2.mysql install
sunyonggang@gg01:~$ sudo service mysql status
mysql start/running, process 7483
sunyonggang@gg01:~$ mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 43
Server version: 5.5.49-0ubuntu0.14.04.1 (Ubuntu)
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
+--------------------+
3 rows in set (0.00 sec)
3.hive install(ps: 这边安装的时候需要讲my.cof: #bind-address = 127.0.0.1)
sunyonggang@gg01:/etc/mysql$ hive
16/04/26 19:14:49 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
Logging initialized using configuration in jar:file:/home/sunyonggang/apache-hive-0.13.1-bin/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 0.487 seconds, Fetched: 1 row(s)
hive>
利用上节课的数据,建表并导入数据
hive> create external table sg(time char(14),
> cookie char(32),
> keyword varchar(256),
> rank int,
> click int,
> ref varchar(512)
> ) row format delimited fields terminated by '\t' location '/hdfs/user/data';
OK
Time taken: 0.05 seconds
hive> select * from sg limit 5;
OK
20111230000005 57375476989eea12893c0c3811607bcf 奇艺高清 1 1 http://www.qiyi.com/
20111230000005 66c5bb7774e31d0a22278249b26bc83a 凡人修仙传 3 1 http://www.booksky.org/BookDetail.aspx?BookID=1050804&Level=1
20111230000007 b97920521c78de70ac38e3713f524b50 本本联盟 1 1 http://www.bblianmeng.com/
20111230000008 6961d0c97fe93701fc9c0d861d096cd9 华南师范大学图书馆 1 1 http://lib.scnu.edu.cn/
20111230000008 f2f5a21c764aebde1e8afcc2871e086f 在线代理 2 1 http://proxyie.cn/
Time taken: 0.239 seconds, Fetched: 5 row(s)
最热门的查询词排行 top10
- 按词分组,按出现的次数倒序排列
- sql语句与结果
hive> select count(*) as times, keyword from sg group by keyword order by times desc limit 10;
Total MapReduce CPU Time Spent: 1 minutes 25 seconds 60 msec
OK
77627 百度
36564 baidu
29598 人体艺术
23306 4399小游戏
20847 优酷
20677 qq空间
19205 新亮剑
17842 馆陶县县长闫宁的父亲
16612 公安卖萌
15212 百度一下 你就知道
用户查询排行 top10
- 按照cookie分组,按出现次数排序,选取前10
- sql语句与结果
hive> select count(*) as times, cookie from sg group by cookie order by times desc limit 10;
Total MapReduce CPU Time Spent: 1 minutes 36 seconds 110 msec
OK
20385 ac65768b987c20b3b25cd35612f61892
11653 9faa09e57c277063e6eb70d178df8529
11528 02a8557754445a9b1b22a37b40d6db38
2571 cc7063efc64510c20bcdd604e12a3b26
2355 b64b0ec03efd0ca9cef7642c4921658b
1292 7a28a70fe4aaff6c35f8517613fb5c67
1277 b1e371de5729cdda9270b7ad09484c4f
1241 f656e28e7c3e10c2b733e6b68385d5a2
1181 7eab8caf9708d68e6964220e2f89e80d
1120 c72ce1164bcd263ba1f69292abdfdf7c
Time taken: 187.017 seconds, Fetched: 10 row(s)
搜索结果排名第1,但是点击次序排在第2的数据
- rank = 1 && click = 2
- sql语句与结果
hive> select count(*) from sg where rank=1 and click=2;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1461738251729_0004, Tracking URL = http://gg01:8088/proxy/application_1461738251729_0004/
Kill Command = /home/sunyonggang/hadoop-2.6.0/bin/hadoop job -kill job_1461738251729_0004
Hadoop job information for Stage-1: number of mappers: 5; number of reducers: 1
2016-04-27 14:46:23,653 Stage-1 map = 0%, reduce = 0%
2016-04-27 14:47:07,452 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 16.51 sec
2016-04-27 14:47:11,596 Stage-1 map = 47%, reduce = 0%, Cumulative CPU 19.3 sec
2016-04-27 14:47:12,622 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 20.03 sec
2016-04-27 14:47:13,677 Stage-1 map = 73%, reduce = 0%, Cumulative CPU 20.66 sec
2016-04-27 14:47:18,928 Stage-1 map = 87%, reduce = 0%, Cumulative CPU 23.04 sec
2016-04-27 14:47:19,964 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 23.69 sec
2016-04-27 14:47:23,044 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 24.86 sec
MapReduce Total cumulative CPU time: 24 seconds 860 msec
Ended Job = job_1461738251729_0004
MapReduce Jobs Launched:
Job 0: Map: 5 Reduce: 1 Cumulative CPU: 24.86 sec HDFS Read: 1147077971 HDFS Write: 7 SUCCESS
Total MapReduce CPU Time Spent: 24 seconds 860 msec
OK
200885
Time taken: 248.801 seconds, Fetched: 1 row(s)