在操作kylin的demo时学的,顺便记下
一,建立一个数据库
[root@5f2929d8f8f4 admin]# hive
ls: cannot access /home/admin/spark-2.3.1-bin-hadoop2.6/lib/spark-assembly-*.jar: No such file or directory
Logging initialized using configuration in jar:file:/home/admin/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive> create database kylin_flat_db;
OK
Time taken: 0.745 seconds
二,建立数据表
hive> create table kylin_flat_db.web_access_fact_tbl
> (
> day date,
> cookieid string,
> regionid string,
> cityid string,
> siteid string,
> os string,
> pv bigint
> ) row format delimited
> fields terminated by '|' stored as textfile;
OK
Time taken: 0.291 seconds
hive> create table kylin_flat_db.region_tbl
> (
> regionid string,
> regionname string
> ) row format delimited
> fields terminated by '|' stored as textfile;
OK
Time taken: 0.064 seconds
hive> create table kylin_flat_db.city_tbl
> (
> regionid string,
> cityid string,
> cityname string
> ) row format delimited
> fields terminated by '|' stored as textfile;
OK
Time taken: 0.083 seconds
三,从文件中导入数据
hive> use kylin_flat_db;
OK
Time taken: 0.613 seconds
hive> load data local inpath '/home/admin/fact_data.txt' into table web_access_fact_tbl;
Loading data to table kylin_flat_db.web_access_fact_tbl
Table kylin_flat_db.web_access_fact_tbl stats: [numFiles=1, totalSize=274]
OK
Time taken: 0.684 seconds
hive> load data local inpath '/home/admin/city.txt' into table city_tbl;
Loading data to table kylin_flat_db.city_tbl
Table kylin_flat_db.city_tbl stats: [numFiles=1, totalSize=170]
OK
Time taken: 0.221 seconds
hive> load data local inpath '/home/admin/region.txt' into table region_tbl;
Loading data to table kylin_flat_db.region_tbl
Table kylin_flat_db.region_tbl stats: [numFiles=1, totalSize=55]
OK
Time taken: 0.21 seconds
其中fact_data.txt
2016-07-19|GBSYO1IMQ7GHQXOVTP|G03|G0302|810|Mac OS|2
2016-07-03|NTJ95UHFUD3UECNS0U|G05|G0502|3171|Mac OS|4
2016-07-20|ZR27L7C79CCJGTN1F7|G04|G0402|8793|Mac OS|2
2016-07-01|C17QEB0560LUZHD26P|G04|G0402|9793|Android 5.0|5
2016-07-01|N9LRCVTU6PGSUDJ9RB|G03|G0301|1292|Mac OS|1
city.txt
G01|G0101|朝阳
G01|G0102|海淀
G02|G0201|南京
G02|G0202|宿迁
G03|G0301|杭州
G03|G0302|嘉兴
G04|G0401|徐汇
G04|G0402|虹口
G05|G0501|广州
G05|G0502|海珠
region.txt
G01|北京
G02|江苏
G03|浙江
G04|上海
G05|广州
四,查询语法
hive> select "DAY",regionname,cityname,sum(pv),count(distinct cookieid)
> from WEB_ACCESS_FACT_TBL a
> left join CITY_TBL b
> on a.cityid = b.cityid
> left join REGION_TBL c
> on c.regionid = a.regionid
> group by "DAY", regionname, cityname;
Query ID = root_20220503070353_b89fc40c-029e-4ce0-89e0-725a64e01dcf
Total jobs = 1
22/05/03 07:03:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Execution log at: /tmp/root/root_20220503070353_b89fc40c-029e-4ce0-89e0-725a64e01dcf.log
2022-05-03 07:03:56 Starting to launch local task to process map join; maximum memory = 477626368
2022-05-03 07:03:57 Dump the side-table for tag: 1 with group count: 5 into file: file:/tmp/root/2e677d02-1443-469d-9eeb-552adeb1df12/hive_2022-05-03_07-03-53_842_8251926770007504339-1/-local-10006/HashTable-Stage-3/MapJoin-mapfile01--.hashtable
2022-05-03 07:03:57 Uploaded 1 File to: file:/tmp/root/2e677d02-1443-469d-9eeb-552adeb1df12/hive_2022-05-03_07-03-53_842_8251926770007504339-1/-local-10006/HashTable-Stage-3/MapJoin-mapfile01--.hashtable (405 bytes)
2022-05-03 07:03:57 Dump the side-table for tag: 1 with group count: 10 into file: file:/tmp/root/2e677d02-1443-469d-9eeb-552adeb1df12/hive_2022-05-03_07-03-53_842_8251926770007504339-1/-local-10006/HashTable-Stage-3/MapJoin-mapfile11--.hashtable
2022-05-03 07:03:57 Uploaded 1 File to: file:/tmp/root/2e677d02-1443-469d-9eeb-552adeb1df12/hive_2022-05-03_07-03-53_842_8251926770007504339-1/-local-10006/HashTable-Stage-3/MapJoin-mapfile11--.hashtable (573 bytes)
2022-05-03 07:03:57 End of local task; Time Taken: 0.751 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1651552208349_0016, Tracking URL = http://5f2929d8f8f4:8088/proxy/application_1651552208349_0016/
Kill Command = /home/admin/hadoop-2.7.0/bin/hadoop job -kill job_1651552208349_0016
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 1
2022-05-03 07:04:03,388 Stage-3 map = 0%, reduce = 0%
2022-05-03 07:04:07,585 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 1.28 sec
2022-05-03 07:04:12,765 Stage-3 map = 100%, reduce = 100%, Cumulative CPU 3.16 sec
MapReduce Total cumulative CPU time: 3 seconds 160 msec
Ended Job = job_1651552208349_0016
MapReduce Jobs Launched:
Stage-Stage-3: Map: 1 Reduce: 1 Cumulative CPU: 3.16 sec HDFS Read: 17457 HDFS Write: 136 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 160 msec
OK
DAY 上海 虹口 7 2
DAY 广州 海珠 4 1
DAY 浙江 嘉兴 2 1
DAY 浙江 杭州 1 1
Time taken: 20.04 seconds, Fetched: 4 row(s)
五,最后来个创建demo数据的java代码
package com.jsz;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Random;
public class CreateData {
public static void main(String[] args) {
String cookieId[] = { "0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
"Z", "Y", "X", "W", "V", "U", "T", "S", "R", "Q", "P", "O",
"N", "M", "L", "K", "J", "I", "H", "G", "F", "E", "D", "C",
"B", "A" };
String regionId[] = { "G01", "G05", "G04", "G03", "G02" };
String osId[] = { "Android 5.0", "Mac OS", "Window 7" };
String tempDate = null;
String cookieIdTemp = "";
String regionTemp = null;
String cityTemp = null;
String sidTemp = null;
String osTemp = null;
String pvTemp = null;
try {
for (inti = 0; i< 100; i++) {
int x = (int) (Math.random() * 31);
if (x == 0) {
tempDate = "2016-07-01";
}else if (x < 10) {
tempDate = "2016-07-0" + x;
} else {
tempDate = "2016-07-" + x;
}
//System.out.println(tempDate);
for (int i1 = 0; i1 < 18; i1++) {
int j = (int) (Math.random() * 35);
cookieIdTemp += cookieId[j];
}
//System.out.println(cookieIdTemp);
int k = (int) (Math.random() * 4);
regionTemp = regionId[k];
//System.out.println(regionTemp);
int l = (int) (Math.random() * 2) + 1;
cityTemp = regionTemp + "0" + l;
//System.out.println(cityTemp);
Random random = new Random();
int m = (int) Math.floor((random.nextDouble() * 10000.0));
sidTemp = "" + m;
//System.out.println(sidTemp);
int n = (int) (Math.random() * 2);
osTemp = osId[n];
//System.out.println(osTemp);
int h = (int) (Math.random() * 9) + 1;
pvTemp = "" + h;
//System.out.println(pvTemp);
String data = tempDate+"|"+cookieIdTemp+"|"+regionTemp+"|"+cityTemp+"|"+sidTemp+"|"+osTemp+"|"+pvTemp+"\n";
cookieIdTemp="";
File file = new File("table.txt");
// if file doesnt exists, then create it
if (!file.exists()) {
file.createNewFile();
}
// true = append file
FileWriter fileWritter = new FileWriter(file.getName(), true);
BufferedWriter bufferWritter = new BufferedWriter(fileWritter);
bufferWritter.write(data);
bufferWritter.close();
//System.out.println("Done");
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
\