一.NULL值的处理——以FLUX字段为例
1.当遇到聚合时,可能为null值,可以用COALESCE(FLUX,0)将null值赋值为0
select URL,sum(COALESCE(FLUX,0)) as sf,sum(COALESCE(COUNT,0)) as sc from URLDAY
where DT>='20200510000000' and DT<='20200517000000'
group by URL
order by sf desc
limit 10
2.排序时,如果有null值,倒序会排在数值前面,此时可以将COALESCE(FLUX,0)放在order by 的之后,让排序正常。
select URL,sum(FLUX) as sf,sum(COUNT) as sc from URLDAY
where DT>='20200510000000' and DT<='20200517000000'
group by URL
order by COALESCE(sf,0) desc
limit 10
3.如果在where条件里过滤null值,则需要FLUX is not null
select URL,sum(FLUX) as sf,sum(COUNT) as sc from URLDAY
where DT>='20200510000000' and DT<='20200517000000' and FLUX is not null
group by URL
order by sf desc
limit 10
二.查询
1.打开sqlline
/usr/lib/phoenix-4.14.1/bin/sqlline.py 10.21.69.1,10.21.69.2,10.21.69.3:2181:/hbase98
sqlline
2.查看表信息
!describe CHANNEL5MINSPEED_2005
describe
3.退出sqlline
!exit
exit
三.重点函数
1.字符串相关
a.字符串截取函数
SUBSTR(字符串,起始位置,长度)
select SUBSTR(DT,1,10) from CHANNEL5MINSPEED_2005;
substr
b.去除空格
去除前后空格 TRIM(" a ")
去除最左空格 LTRIM(" b")
去除最右空格 RTRIM("b ")
c.填充字符
左边填充:LPAD(字符串,字符数,填充字符)
不要举一反三,没有RPAD,hhhhh
select LPAD(DT,20,'0') from CHANNEL5MINSPEED_2005;
LPAD
d.字符长度
LENGTH(“aaaa”)
select LENGTH(DT) from CHANNEL5MINSPEED_2005;
LENGTH
e.正则匹配
正则截取
REGEXP_SUBSTR(字符串,正则表达式,选取第几个)。省略第三个参数时,默认取第一个
正则替换
REGEXP_REPLACE(字符串,正则表达式,替换字符或字符串)
正则分割
REGEXP_SPLIT(字符串, 分割符)
重点:当分割字符相连时,会出现null值。
select REGEXP_SPLIT(DT,'5') from CHANNEL5MINSPEED_2005;
select REGEXP_SPLIT(DT,'0') from CHANNEL5MINSPEED_2005;
REGEXP_SPLIT