hive_常用语句

#关于hive中数据库
创建库：
create database if not exists demo comment'库描述信息';

删除空库：
drop database if exists demo;

查看库描述：本质上就是打印出 hive元数据库中的DBS表中存储的demo库的全部信息
desc database demo;

注: 关于hive库的元数据信息放在 DBS 表中

#创建表
create  [external]  table  [if not exists]  tableName(  
gname string comment'字段描述',  
password string comment'',  
gid int comment'',  
unames string comment'' 
) comment '表描述'  
partitioned by (dt string comment'分区字段描述')  
row format delimited 
fields terminated by '\001'   
lines terminated by '\n'  
stored  as  textfile  
location  '/inputdata' ; 

注意点:
1. 创建表的本质：在hdfs中对应的库下面创建目录，在元数据表中添加对应的信息。
2. 关键字 location 用来指定表的存放位置;
建立内部分区表指定 location 位置，该位置下有文件夹，文件下有数据，数据不会自动加载到patition中，要手动添加partition 并指定 partition的location位置;
3. external 字段创建的是外部表;
4. 关于内部表与外部表：
两者的区别体现在删除数据时: 在删除内部表的时候，Hive将会把属于表的元数据和数据全部删掉；而删除外部表的时候，Hive仅仅删除外部表的元数据，数据是不会删除的！
适用场景：
内部表：多用于临时表、中间表
外部表：用于源数据
5. 为什么创建分区表？
单表数据量随着时间越来越大，hive为咯避免全表扫描，引入分区(将单张表的数据根据条件细分到不同的目录)。
6. hive分区表的细节？？
hive分区的字段是一个伪字段，它不会再表中真实存在，可以用来过滤查询等。
一个表可以有多个分区，而每一个分区都是以目录的形式存在。
7. hive中的一些概念在hdfs上的体现
hive数据库的目录名是 .db 结尾;
hive内部表的目录是存放在其数据库的目录下, hive表的目录名没有任何后缀;
hive分区字段的目录名是  分区字段名=分区值 的形式结尾;

查看表:
desc  tablename;
show  create  table  tablename;

#数据的导入
load方式:
load data [local] inpath '/path' [overwrite] into table tableName  partition(type='');

注:
1. Load 操作只是单纯的复制/移动操作，将数据文件移动到 Hive 表对应的位置
2. 指定了 local 会从本地文件系统中加载数据, 如果没有指定 local, 会从hdfs上加载数据
3. 指定了 overwrite 关键字,首先将目标表下的数据删除后,然后将新数据添加到表中
4. 在加载数据时, hive与mysql的区别
hive是读时模式,也就是说在加载数据时,hive不会检查加载的数据是否符合规范;
关系型数据库(mysql)是严格写时模式,如果写入的数据有误,会报错;


insert方式:
insert  into  table  tablename   [partition(type='')]  select  *  from  xxx ;
insert  overwrite  table  tablename   [partition(type='')]  select  *  from  xxx ;


as方式      既能创建表,还同时具备导数据功能:
create table if not exists  要创建的表名称 
as
select 字段a,字段b  from  已经存在的表   where  xxx  ;

#导出数据：
1、从hive表导出到本地目录
2、从hive表导出到hdfs目录
3、 > 重定向到文件中

1、
insert overwrite local directory '/home/hivedata/exp2'
row format delimited fields terminated by '\t' 
select * from aa7; 

2、
insert overwrite directory '/hivedata/exp2' 
row format delimited fields terminated by ',' 
select * from aa7; 

3、
hive -e "use 数据库名;select * from 表名"  >  /home/hivedata/exp3;

#修改表

动态添加分区：
可以在select语句里面通过使用分区值来动态指明分区
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table 表名  partition (age) select id, name, tel, age  from 表名;

修改分区中的数据路径:
ALTER TABLE 表名  PARTITION (date='2016') SET LOCATION "数据路径" ;

修改分区名称:
ALTER TABLE 表名  PARTITION (date='原来的分区字段名称') RENAME TO PARTITION ( date='要修改后的字段名称') ;  

添加列:
ALTER  TABLE   表名  ADD COLUMNS ( 添加字段的名称   数据类型  );  
//在所有存在的列后面，但是在分区列之前添加一列
例如给hive表temptable 添加字段 a,b  
alter  table  temptable  add  columns ( a string, b string );  

表的重命名:
alter table  旧表名  rename  to  新表名;

hive_常用语句

推荐阅读更多精彩内容