练习一: 各部门工资最高的员工(难度:中等)

SELECT b.name as Department,a.Name as Employee,a.Salary
FROM Employee as a
inner join Department as b
on a.DepartmentId =b.Id
WHERE (a.DepartmentId,a.Salary) in
(SELECT (DepartmentId,Max(Salary) FROM Employee GROUP BY DepartmentId)
练习二: 换座位(难度:中等)
小美是一所中学的信息科技老师,她有一张 seat 座位表,平时用来储存学生名字和与他们相对应的座位 id。
其中纵列的id是连续递增的
小美想改变相邻俩学生的座位。
你能不能帮她写一个 SQL query 来输出小美想要的结果呢?
请创建如下所示seat表:

SELECT
(CASE WHEN id = 0 or id = (SELECT COUNT(*) FROM seat) THEN id
WHEN id % 2 = 0 AND id > 0 THEN id - 1
ELSE id + 1
END
) as id,
student
FROM seat
ORDER BY id;
练习三: 分数排名(难度:中等)

2种,升序和降序,分别使用ORDER BY score_avg,ORDER BY score_avg DESC
练习四:连续出现的数字(难度:中等)

SELECT Num AS ConsecutiveNums,
FROM Logs
GROUP BY Num
WHERE count(Num)>=3
练习五:树节点 (难度:中等)


练习六:至少有五名直接下属的经理 (难度:中等)

SELECT a.Name FROM Employee a,Employee b left join a.Id=b.ManagerId WHERE b.ManagerId in (SELECT ManagerId FROM Employee GROUP BY ManagerId WHERE count(ManagerId)>=5)
练习七:查询回答率最高的问题 (难度:中等)

SELECT question_id FROM
survey_log WHERE question_id in (SELECT question_id,MAX(count(answer_id)) FROM survey_log GROUP BY question_id WHERE answer_id is NOT null)
练习八:各部门前3高工资的员工(难度:中等)

select d.Name, e.Name as Employee, e.Salary
from Department as d
inner join employee as e
on d.Id = e.DepartmentId
limit 0, 3
order by Salary desc;
练习九:平面上最近距离 (难度: 困难)

SELECT MIN(SQRT(POW(P1.x-P2.x,2)+POW(P1.y-P2.y,2))) AS shortest FROM point_2d AS P1 INNER JOIN point_2d AS P2 ON p1.x<>p1.x OR p1.y<>p2.y;
练习十:行程和用户(难度:困难)


select t.Request_at as Day,ROUND(sum((case when t.Status = "cancelled_by_driver" then 1 else 0 end))/Count(t.Client_ID),2) as Cancellation Rate
from Trips as t inner join Users as u on t.Client_ID=u.Users_ID and u.Banned = 'No'
where t.Request_at between '2013-10-01'and'2013-10-03' group by t.Request_at;
B
练习一:行转列

SELECT name,
SUM(CASE WHEN subject = 'chinese' THEN score ELSE null END) as chinese,
SUM(CASE WHEN subject = 'math' THEN score ELSE null END) as math,
SUM(CASE WHEN subject = 'english' THEN score ELSE null END) as english
FROM score
GROUP BY name;
练习二:列转行

SELECT name, 'chinese' as subject,max( score) FROM grade group by name
UNION ALL
SELECT name, 'math' as subject,max( score) FROM grade group by name
UNION ALL
SELECT name, 'english' as subject,max( score) FROM grade group by name
练习三:带货主播

SELECT count(date) FROM anchor_sales where date in (select anchor_name,date ,sales,sum(sales) from anchor_sales group by anchor_name,date where sales>0.9*sum(sales))
SELECT count(anchor_name) FROM anchor_sales where date in (select anchor_name,date ,sales,sum(sales) from anchor_sales group by anchor_name,date where sales>0.9*sum(sales))
练习四:MySQL 中如何查看sql语句的执行计划?可以看到哪些信息?解决办法
利用profiles来查看sql 语句执行计划,profiling功能可以了解到cpu io等更详细的信息。
练习五:解释一下 SQL 数据库中 ACID 是指什么
ACID指在可靠数据库中,事件应该具有的特性,它不是一种技术,而是一种思想。ACID为原子性(Atomicity)、一致性(Consistency)、隔离性(Isolation)、持久性(Durability)的总称。这个ACID可以分为数据库层面的和业务层面的来讨论,一般业务层面的都可以被转换为数据库层面的。
C

select cdate as 比赛日期
sum(case when resut = "胜" then count(result) else null end) as 胜,
sum(case when resut = "负" then count(result) else null end) as 负,
from race
group by cdate;

SELECT '比赛日期' as cdate, '胜' as result FROM grade group by '胜'
UNION ALL
SELECT '比赛日期' as cdate, '负' as result FROM grade group by '负'

练习四:hive 数据倾斜的产生原因及优化策略?
1.key分布不均匀
2.业务数据本身的特性
3.SQL语句造成数据倾斜
解决办法
1.hive设置hive.map.aggr=true和hive.groupby.skewindata=true
2.有数据倾斜的时候进行负载均衡,当选项设定为true,生成的查询计划会有两个MR Job。
第一个MR Job中,Map的输出结果集合会随机分布到Reduce中,每个Reduce做部分聚合操作,并输出结果,这样处理的结果是相同Group By Key有可能被分发到不同的Reduce中,从而达到负载均衡的目的;第二个MR Job在根据预处理的数据结果按照 Group By Key 分布到Reduce中(这个过程可以保证相同的 Group By Key 被分布到同一个Reduce中),最后完成最终的聚合操作。
3.SQL语句调整:
1.选用join key 分布最均匀的表作为驱动表。做好列裁剪和filter操作,以达到两表join的时候,数据量相对变小的效果。
2.大小表Join: 使用map join让小的维度表(1000条以下的记录条数)先进内存。在Map端完成Reduce。
3.count distinct大量相同特殊值:count distinct时,将值为空的情况单独处理,
如果是计算count distinct,可以不用处理,直接过滤,在做后结果中加1。如果还有其他计算,需要进行group by,可以先将值为空的记录单独处理,再和其他计算结果进行union.
4.大表Join大表:把空值的Key变成一个字符串加上一个随机数,
把倾斜的数据分到不同的reduce上,由于null值关联不上,处理后并不影响最终的结果。