Hive SQL(1)

第一部分：

hive模糊搜索表：show tables like '*name*';
查看表结构信息：desc table_name;
查看分区信息：show partitions table_name;
加载本地文件：load data local inpath '/xxx/test.txt' overwrite into table dm.table_name;
从查询语句给table插入数据：insert overwrite table table_name partition(dt) select * from table_name;
导出数据到本地系统：insert overwrite local directory '/tmp/text' select a.* from table_name a order by 1;
创建表时指定的一些属性：

字段分隔符：row format delimited fields terminated by '\t'
行分隔符：row format delimited lines terminated by '\n'
文件格式为文本型存储：stored as textfile

命令行操作：
hive -e 'select table_cloum from table'执行一个查询,在终端上显示mapreduce的进度，执行完毕后，最后把查询结果输出到终端上，接着hive进程退出，不会进入交互模式
hive -S -e 'select table_cloum from table' -S，终端上的输出不会有mapreduce的进度，执行完毕，只会把查询结果输出到终端上。
hive修改表名：alter table old_table_name rename to new_table_name;
hive复制表结构：create table new_table_name like table_name;
hive添加字段：alter table table_name add columns(columns_values bigint comment 'comm_text');
hive修改字段：alter table table_name change old_column new_column string comment 'comm_text';
删除分区：alter table table_name drop partition(dt='2021-11-30');
添加分区：alter table table_name add partition (dt='2021-11-30');
删除空数据库：drop database myhive2;
强制删除数据库：drop database myhive2 cascade;
删除表：drop table score5;
清空表：truncate table score6;
向hive表中加载数据

直接向分区表中插入数据：insert into table score partition(month ='202107') values ('001','002','100');
通过load方式加载数据：load data local inpath '/export/servers/hivedatas/score.csv' overwrite into table score partition(month='201806');
通过查询方式加载数据：insert overwrite table score2 partition(month = '202106') select s_id,c_id,s_score from score1;
查询语句中创建表并加载数据：create table score2 as select * from score1;
在创建表是通过location指定加载数据的路径：create external table score6 (s_id string,c_id string,s_score int) row format delimited fields terminated by ',' location '/myscore';
export导出与import 导入 hive表数据（内部表操作）：

create table techer2 like techer; --依据已有表结构创建表

export table techer to '/export/techer';

import table techer2 from '/export/techer';

hive表中数据导出

insert导出

将查询的结果导出到本地：insert overwrite local directory '/export/servers/exporthive' select * from score;

将查询的结果格式化导出到本地：insert overwrite local directory '/export/servers/exporthive' row format delimited fields terminated by '\t' collection items terminated by '#' select * from student;

将查询的结果导出到HDFS上(没有local)：insert overwrite directory '/export/servers/exporthive' row format delimited fields terminated by '\t' collection items terminated by '#' select * from score;

Hadoop命令导出到本地：dfs -get /export/servers/exporthive/000000_0 /export/servers/exporthive/local.txt;
hive shell 命令导出

基本语法：（hive -f/-e 执行语句或者脚本 > file）hive -e "select * from myhive.score;" > /export/servers/exporthive/score.txt

hive -f export.sh > /export/servers/exporthive/score.txt

export导出到HDFS上：export table score to '/export/exporthive/score';

Hive查询语句

GROUP BY 分组：select s_id ,avg(s_score) avgscore from score group by s_id having avgscore > 85; 对分组后的数据进行筛选，使用 having
join 连接：inner join 内连接；left join 左连接；right join 右链接；full join 全外链接。
order by 排序：ASC（ascend）: 升序（默认） DESC（descend）: 降序
sort by 局部排序：每个MapReduce内部进行排序，对全局结果集来说不是排序。
distribute by 分区排序：类似MR中partition，进行分区，结合sort by使用

Hive函数

1. 聚合函数

指定列值的数目：count()
指定列值求和：sum()
指定列的最大值：max()
指定列的最小值：min()
指定列的平均值：avg()
非空集合总体变量函数：var_pop(col)
非空集合样本变量函数：var_samp (col)
总体标准偏离函数：stddev_pop(col)
分位数函数：percentile(BIGINT col, p)
中位数函数：percentile(BIGINT col, 0.5)

2. 关系运算

A LIKE B：LIKE比较，如果字符串A符合表达式B 的正则语法，则为TRUE
A RLIKE B：JAVA的LIKE操作，如果字符串A符合JAVA正则表达式B的正则语法，则为TRUE
A REGEXP B：功能与RLIKE相同

3. 数学运算

支持所有数值类型：加(+)、减(-)、乘(*)、除(/)、取余(%)、位与(&)、位或(|)、位异或(^)、位取反(~)

4. 逻辑运算

支持：逻辑与(and)、逻辑或(or)、逻辑非(not)

5. 数值运算

取整函数：round(double a)
指定精度取整函数：round(double a, int d)
向下取整函数：floor(double a)
向上取整函数：ceil(double a)
取随机数函数：rand(),rand(int seed)
自然指数函数：exp(double a)
以10为底对数函数：log10(double a)
以2为底对数函数：log2()
对数函数：log()
幂运算函数：pow(double a, double p)
开平方函数：sqrt(double a)
二进制函数：bin(BIGINT a)
十六进制函数：hex()
绝对值函数：abs()
正取余函数：pmod()

6. 条件函数

if
case when
coalesce(c1,c2,c3)
nvl(c1，c2)

7. 日期函数

获得当前时区的UNIX时间戳: unix_timestamp()
时间戳转日期函数：from_unixtime()
日期转时间戳：unix_timestamp(string date)
日期时间转日期函数：to_date(string timestamp)
日期转年函数：year(string date)
日期转月函数：month (string date)
日期转天函数: day (string date)
日期转小时函数: hour (string date)
日期转分钟函数：minute (string date)
日期转秒函数: second (string date)
日期转周函数: weekofyear (string date)
日期比较函数: datediff(string enddate, string startdate)
日期增加函数: date_add(string startdate, int days)
日期减少函数：date_sub (string startdate, int days)

8. 字符串函数

字符串长度函数：length(string A)
字符串反转函数：reverse(string A)
字符串连接函数: concat(string A, string B…)
带分隔符字符串连接函数：concat_ws(string SEP, string A, string B…)
字符串截取函数: substr(string A, int start, int len)
字符串转大写函数: upper(string A)
字符串转小写函数：lower(string A)
去空格函数：trim(string A)
左边去空格函数：ltrim(string A)
右边去空格函数：rtrim(string A)
正则表达式替换函数：regexp_replace(string A, string B, string C)
正则表达式解析函数: regexp_extract(string subject, string pattern, int index)
URL解析函数：parse_url(string urlString, string partToExtract [, string keyToExtract]) 返回值: string
json解析函数：get_json_object(string json_string, string path)
空格字符串函数：space(int n)
重复字符串函数：repeat(string str, int n)
首字符ascii函数：ascii(string str)
左补足函数：lpad(string str, int len, string pad)
右补足函数：rpad(string str, int len, string pad)
分割字符串函数: split(string str, string pat)
集合查找函数: find_in_set(string str, string strList)

9. 窗口函数

分组求和函数：sum(pv) over(partition by cookieid order by createtime) 有坑，加不加 order by 差别很大，具体详情在下面第二部分。
分组内排序，从1开始顺序排：ROW_NUMBER() 如：1234567
分组内排序，排名相等会在名次中留下空位：RANK() 如：1233567
分组内排序，排名相等不会在名次中留下空位：DENSE_RANK() 如：1233456
有序的数据集合平均分配到指定的数量（num）个桶中：NTILE()
统计窗口内往上第n行值：LAG(col,n,DEFAULT)
统计窗口内往下第n行值：LEAD(col,n,DEFAULT)
分组内排序后，截止到当前行，第一个值：FIRST_VALUE(col)
分组内排序后，截止到当前行，最后一个值: LAST_VALUE(col)
小于等于当前值的行数/分组内总行数：CUME_DIST()

以下函数建议看第二部分详细理解下，此处仅简写，！

将多个group by 逻辑写在一个sql语句中: GROUPING SETS
根据GROUP BY的维度的所有组合进行聚合：CUBE
CUBE的子集，以最左侧的维度为主，从该维度进行层级聚合：ROLLUP

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 217,185评论 6赞 503
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 92,652评论 3赞 393
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 163,524评论 0赞 353
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 58,339评论 1赞 293
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 67,387评论 6赞 391
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 51,287评论 1赞 301
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 40,130评论 3赞 418
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 38,985评论 0赞 275
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 45,420评论 1赞 313
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 37,617评论 3赞 334
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 39,779评论 1赞 348
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 35,477评论 5赞 345
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 41,088评论 3赞 328
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 31,716评论 0赞 22
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 32,857评论 1赞 269
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 47,876评论 2赞 370
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 44,700评论 2赞 354