在一家传媒公司做网站编辑_如何?,中国设计网站排行榜前十名,郑州网站制作公司,2345网址大全手机版5个小案例说清楚-窗口函数#xff08;开窗函数#xff09;
结论先行: 首先#xff0c;写开窗函数一定要先理解开窗的定义和为什么开窗#xff0c;开窗分几种场景#xff1b; 写sql#xff0c;需要心平气和#xff0c;一步一步进行推导#xff0c;这样可以找到写sql的逻…5个小案例说清楚-窗口函数开窗函数
结论先行: 首先写开窗函数一定要先理解开窗的定义和为什么开窗开窗分几种场景 写sql需要心平气和一步一步进行推导这样可以找到写sql的逻辑最终达到万变不离其宗。 over(): 就是为每条数据都开启一个窗口. 窗口的大小默认为当前数据集的大小. over(partition by…): 会按照分区字段将数据分到不同的区中. 会为每个分区中的每条数据都开启一个窗口. 窗口的大小默认为当前分区数据集的大小 over(order by …): 会为每条数据都开启一个窗口,窗口的大小默认为起点到当前行 over(rows between … and …) :划分窗口大小 over(partition by … order by … ) : 会按照分区字段将数据分到不同的区中. 会为每个分区中的每条数据都开启一个窗口. 窗口的大小默认为当前分区数据集中起点到当前行 1.相关函数说明 over():指定分析函数工作的数据窗口大小,也就是可操作的数据集大小这个数据窗口大小可能会随着行的改变而变化 current row:当前行 n preceding:往前n行数据 n following:往后n行数据 unbounded:起点 unbounded preceding:表示到前面的起点 unbounded following:表示到后面的重点 lag(col, n, default_val):往前第n行数据 lead(col, n, default_val):往后第n行数据 ntile(n):把有序窗口的行分发到指定数据的组中各个组有编号编号从1开始对于每一行ntile返回此行所属的组的编号。注意n必须为int类型 2.准备表和数据
jack,2017-01-01,10
tony,2017-01-02,15
jack,2017-02-03,23
tony,2017-01-04,29
jack,2017-01-05,46
jack,2017-04-06,42
tony,2017-01-07,50
jack,2017-01-08,55
mart,2017-04-08,62
mart,2017-04-09,68
neil,2017-05-10,12
mart,2017-04-11,75
neil,2017-06-12,80
mart,2017-04-13,94--创建表
create table business(
name string,
orderdate string,
cost int
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ,;
--导入数据
load data local inpath /opt/module/hive-3.1.2/datas/business.txt into table business;需求一: 查询在2017年4月份购买过的顾客及总人数 分步分析得出 a)查询2017年4月份的数据 b)查询购买过的顾客名称 c)求总人数 这里用到了开窗函数over()表示在所有的当前数据集中。 注意要一步一步分析切记上来囫囵吞枣没有搞清楚需求就开始写 --a)查询2017年4月份的数据
selectname,orderdate,cost
frombusiness
where orderdate like 2017-04%; t1
---------------------------
| name | orderdate | cost |
---------------------------
| jack | 2017-04-06 | 42 |
| mart | 2017-04-08 | 62 |
| mart | 2017-04-09 | 68 |
| mart | 2017-04-11 | 75 |
| mart | 2017-04-13 | 94 |
---------------------------
-- b)查询购买过的顾客名称
selectdistinct t1.name
fromt1 ; t2
-- c)求总人数
selectt2.name,count(t2.name) over() total_num
fromt2 ;
-- d)组装
selectt2.name,count(t2.name) over() total_num
from(
selectdistinct t1.name
from(
selectname,orderdate,cost
frombusiness
where orderdate like 2017-04%)t1 )t2 ; --结果可得
----------------------
| t2.name | total_num |
----------------------
| mart | 2 |
| jack | 2 |
----------------------需求二 查询顾客的购买明细及月购买总额 分析 每行的窗口数据为该月份本人的购买次数 所以需要按照月份进行分区 需要用到 over(partition by monthNum) 此题可以扩展成两个题 扩展1 查询顾客的购买明细及所有顾客月购买总额 selectname,orderdate,cost,sum(cost) over(partition by month(orderdate)) total_cost
frombusiness ;
--结果可得
----------------------------------------
| name | orderdate | cost | total_cost |
----------------------------------------
| jack | 2017-01-01 | 10 | 205 |
| jack | 2017-01-08 | 55 | 205 |
| tony | 2017-01-07 | 50 | 205 |
| jack | 2017-01-05 | 46 | 205 |
| tony | 2017-01-04 | 29 | 205 |
| tony | 2017-01-02 | 15 | 205 |
| jack | 2017-02-03 | 23 | 23 |
| mart | 2017-04-13 | 94 | 341 |
| jack | 2017-04-06 | 42 | 341 |
| mart | 2017-04-11 | 75 | 341 |
| mart | 2017-04-09 | 68 | 341 |
| mart | 2017-04-08 | 62 | 341 |
| neil | 2017-05-10 | 12 | 12 |
| neil | 2017-06-12 | 80 | 80 |
----------------------------------------扩展1 查询顾客的购买明细及每个顾客的月购买总额 selectname,orderdate,cost,sum(cost) over(partition by name, month(orderdate)) total_cost
frombusiness ;
--查询结果
----------------------------------------
| name | orderdate | cost | total_cost |
----------------------------------------
| jack | 2017-01-05 | 46 | 111 |
| jack | 2017-01-08 | 55 | 111 |
| jack | 2017-01-01 | 10 | 111 |
| jack | 2017-02-03 | 23 | 23 |
| jack | 2017-04-06 | 42 | 42 |
| mart | 2017-04-13 | 94 | 299 |
| mart | 2017-04-11 | 75 | 299 |
| mart | 2017-04-09 | 68 | 299 |
| mart | 2017-04-08 | 62 | 299 |
| neil | 2017-05-10 | 12 | 12 |
| neil | 2017-06-12 | 80 | 80 |
| tony | 2017-01-04 | 29 | 94 |
| tony | 2017-01-02 | 15 | 94 |
| tony | 2017-01-07 | 50 | 94 |
----------------------------------------
14 rows selected (20.778 seconds)
需求三
求每个顾客的购买明细及将每个顾客的cost按照日期进行累加分析窗口函数用到了order by 和 rows between unbounded preceding and current rowselectname,orderdate,cost,sum(cost) over(order by orderdate rows between unbounded preceding and current row)
frombusiness ;
-- 查询结果
------------------------------------------
| name | orderdate | cost | sum_window_0 |
------------------------------------------
| jack | 2017-01-01 | 10 | 10 |
| tony | 2017-01-02 | 15 | 25 |
| tony | 2017-01-04 | 29 | 54 |
| jack | 2017-01-05 | 46 | 100 |
| tony | 2017-01-07 | 50 | 150 |
| jack | 2017-01-08 | 55 | 205 |
| jack | 2017-02-03 | 23 | 228 |
| jack | 2017-04-06 | 42 | 270 |
| mart | 2017-04-08 | 62 | 332 |
| mart | 2017-04-09 | 68 | 400 |
| mart | 2017-04-11 | 75 | 475 |
| mart | 2017-04-13 | 94 | 569 |
| neil | 2017-05-10 | 12 | 581 |
| neil | 2017-06-12 | 80 | 661 |
------------------------------------------
扩展需求 求每个顾客的购买明细及 起点到当前行的累加 上一行到当前行的累加 当前行到下一行的累加 上一行到下一行的累加 当前行到终点的累加 selectname,orderdate,cost,sum(cost) over(order by orderdate rows between unbounded preceding and current row) up_c,sum(cost) over(order by orderdate rows between 1 preceding and current row) 1p_c,sum(cost) over(order by orderdate rows between current row and 1 following) c_1f,sum(cost) over(order by orderdate rows between 1 preceding and 1 following) 1p_1f,sum(cost) over(order by orderdate rows between current row and unbounded following) c_uf
frombusiness;
--执行结果
---------------------------------------------------------------
| name | orderdate | cost | up_c | 1p_c | c_1f | 1p_1f | c_uf |
---------------------------------------------------------------
| jack | 2017-01-01 | 10 | 10 | 10 | 25 | 25 | 661 |
| tony | 2017-01-02 | 15 | 25 | 25 | 44 | 54 | 651 |
| tony | 2017-01-04 | 29 | 54 | 44 | 75 | 90 | 636 |
| jack | 2017-01-05 | 46 | 100 | 75 | 96 | 125 | 607 |
| tony | 2017-01-07 | 50 | 150 | 96 | 105 | 151 | 561 |
| jack | 2017-01-08 | 55 | 205 | 105 | 78 | 128 | 511 |
| jack | 2017-02-03 | 23 | 228 | 78 | 65 | 120 | 456 |
| jack | 2017-04-06 | 42 | 270 | 65 | 104 | 127 | 433 |
| mart | 2017-04-08 | 62 | 332 | 104 | 130 | 172 | 391 |
| mart | 2017-04-09 | 68 | 400 | 130 | 143 | 205 | 329 |
| mart | 2017-04-11 | 75 | 475 | 143 | 169 | 237 | 261 |
| mart | 2017-04-13 | 94 | 569 | 169 | 106 | 181 | 186 |
| neil | 2017-05-10 | 12 | 581 | 106 | 92 | 186 | 92 |
| neil | 2017-06-12 | 80 | 661 | 92 | 80 | 92 | 80 |
---------------------------------------------------------------
14 rows selected (17.403 seconds)需求四 查询每个顾客上次 和 下次 的购买时间 分析 lag(col, n, default_val):往前第n行数据 lead(col, n, default_val):往后第n行数据 selectname,orderdate,cost,lag(orderdate, 1, 1977-01-01) over(partition by name order by orderdate) pre_ord,lead(orderdate, 1, 9999-01-01) over(partition by name order by orderdate) nex_ord
frombusiness;
--结果
-----------------------------------------------------
| name | orderdate | cost | pre_ord | nex_ord |
-----------------------------------------------------
| jack | 2017-01-01 | 10 | 1977-01-01 | 2017-01-05 |
| jack | 2017-01-05 | 46 | 2017-01-01 | 2017-01-08 |
| jack | 2017-01-08 | 55 | 2017-01-05 | 2017-02-03 |
| jack | 2017-02-03 | 23 | 2017-01-08 | 2017-04-06 |
| jack | 2017-04-06 | 42 | 2017-02-03 | 9999-01-01 |
| mart | 2017-04-08 | 62 | 1977-01-01 | 2017-04-09 |
| mart | 2017-04-09 | 68 | 2017-04-08 | 2017-04-11 |
| mart | 2017-04-11 | 75 | 2017-04-09 | 2017-04-13 |
| mart | 2017-04-13 | 94 | 2017-04-11 | 9999-01-01 |
| neil | 2017-05-10 | 12 | 1977-01-01 | 2017-06-12 |
| neil | 2017-06-12 | 80 | 2017-05-10 | 9999-01-01 |
| tony | 2017-01-02 | 15 | 1977-01-01 | 2017-01-04 |
| tony | 2017-01-04 | 29 | 2017-01-02 | 2017-01-07 |
| tony | 2017-01-07 | 50 | 2017-01-04 | 9999-01-01 |
-----------------------------------------------------
需求五
需求五: 查询前20%时间的订单信息分析ntile(n):把有序窗口的行分发到指定数据的组中各个组有编号编号从1开始对于每一行ntile返回此行所属的组的编号。注意n必须为int类型--将数据分成5组
selectname,orderdate,cost,ntile(5) over(order by orderdate) gid
frombusiness ; t1
--取第一组的数据
selectt1.name,t1.orderdate,t1.cost
fromt1
where t1.gid 1;
--组合
selectt1.name,t1.orderdate,t1.cost
from(
selectname,orderdate,cost,ntile(5) over(order by orderdate) gid
frombusiness )t1 where t1.gid 1 ;
-- 结果
-----------------------------------
| t1.name | t1.orderdate | t1.cost |
-----------------------------------
| jack | 2017-01-01 | 10 |
| tony | 2017-01-02 | 15 |
| tony | 2017-01-04 | 29 |
-----------------------------------