大数据开发之Hive优化篇4-Hive的数据抽样-Sampling

备注:
Hive 版本 2.1.1

抽样概述

当数据量特别大时,对全体数据进行处理存在困难时,抽样就显得尤其重要了。抽样可以从被抽取的数据中估计和推断出整体的特性,是科学实验、质量检验、社会调查普遍采用的一种经济有效的工作和研究方法。

Hive中,数据抽样分为以下三种:

  1. 随机抽样
  2. 桶表抽样
  3. 块抽样

一.随机抽样

Hive有个随机函数rand(),我们可以通过rand()函数对表进行抽样,然后用limit子句进行限制抽样数据的返回。
其中rand函数前的distribute和sort关键字可以保证数据在mapper和reducer阶段是随机分布的。

代码:

select * from ods_fact_sale order by rand() limit 20;
select * from ods_fact_sale where sale_date = '2011-08-16 00:00:00.0'  distribute by rand() sort by rand() limit 10;

测试记录:
从测试记录可以看出,随机抽样因为需要排序,所以性能也不佳,当然会比全量数据查询性能更优一些

hive> 
    > select * from ods_fact_sale order by rand() limit 20;
Query ID = root_20201231105936_75f9fb76-9149-4884-8faf-4254fd1e3b30
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
20/12/31 10:59:37 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0022, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0022/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1609141291605_0022
Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 1
2020-12-31 10:59:46,944 Stage-1 map = 0%,  reduce = 0%
2020-12-31 11:00:01,475 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 13.27 sec
2020-12-31 11:00:02,506 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 27.08 sec
2020-12-31 11:00:13,893 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 50.64 sec
2020-12-31 11:00:25,199 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 74.57 sec
2020-12-31 11:00:37,526 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 99.39 sec
2020-12-31 11:00:49,832 Stage-1 map = 9%,  reduce = 0%, Cumulative CPU 123.39 sec
2020-12-31 11:01:01,139 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 147.16 sec
2020-12-31 11:01:12,412 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU 170.94 sec
2020-12-31 11:01:24,721 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 194.79 sec
2020-12-31 11:01:35,987 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU 206.61 sec
2020-12-31 11:01:47,263 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU 230.32 sec
2020-12-31 11:01:49,314 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 242.44 sec
2020-12-31 11:01:58,542 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 254.01 sec
2020-12-31 11:02:00,591 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 266.02 sec
2020-12-31 11:02:09,819 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 277.88 sec
2020-12-31 11:02:12,895 Stage-1 map = 21%,  reduce = 0%, Cumulative CPU 289.49 sec
2020-12-31 11:02:24,167 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 312.52 sec
2020-12-31 11:02:31,327 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 324.24 sec
2020-12-31 11:02:34,390 Stage-1 map = 24%,  reduce = 0%, Cumulative CPU 336.04 sec
2020-12-31 11:02:42,588 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 348.11 sec
2020-12-31 11:02:45,663 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU 359.64 sec
2020-12-31 11:02:56,917 Stage-1 map = 27%,  reduce = 0%, Cumulative CPU 383.53 sec
2020-12-31 11:03:06,149 Stage-1 map = 28%,  reduce = 0%, Cumulative CPU 395.32 sec
2020-12-31 11:03:09,227 Stage-1 map = 29%,  reduce = 0%, Cumulative CPU 407.11 sec
2020-12-31 11:03:16,393 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU 418.82 sec
2020-12-31 11:03:19,467 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 430.2 sec
2020-12-31 11:03:27,645 Stage-1 map = 32%,  reduce = 0%, Cumulative CPU 441.85 sec
2020-12-31 11:03:38,914 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 465.44 sec
2020-12-31 11:03:43,008 Stage-1 map = 34%,  reduce = 0%, Cumulative CPU 477.26 sec
2020-12-31 11:03:51,199 Stage-1 map = 35%,  reduce = 0%, Cumulative CPU 489.1 sec
2020-12-31 11:03:55,286 Stage-1 map = 36%,  reduce = 0%, Cumulative CPU 500.86 sec
2020-12-31 11:04:02,462 Stage-1 map = 37%,  reduce = 0%, Cumulative CPU 512.68 sec
2020-12-31 11:04:06,560 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 524.44 sec
2020-12-31 11:04:17,815 Stage-1 map = 39%,  reduce = 0%, Cumulative CPU 548.46 sec
2020-12-31 11:04:23,958 Stage-1 map = 40%,  reduce = 0%, Cumulative CPU 560.07 sec
2020-12-31 11:04:30,092 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU 572.02 sec
2020-12-31 11:04:35,194 Stage-1 map = 42%,  reduce = 0%, Cumulative CPU 584.2 sec
2020-12-31 11:04:42,337 Stage-1 map = 43%,  reduce = 0%, Cumulative CPU 595.69 sec
2020-12-31 11:04:47,456 Stage-1 map = 44%,  reduce = 0%, Cumulative CPU 607.62 sec
2020-12-31 11:04:58,717 Stage-1 map = 45%,  reduce = 0%, Cumulative CPU 631.07 sec
2020-12-31 11:05:03,819 Stage-1 map = 46%,  reduce = 0%, Cumulative CPU 642.81 sec
2020-12-31 11:05:09,966 Stage-1 map = 47%,  reduce = 0%, Cumulative CPU 654.43 sec
2020-12-31 11:05:15,077 Stage-1 map = 48%,  reduce = 0%, Cumulative CPU 665.79 sec
2020-12-31 11:05:21,220 Stage-1 map = 49%,  reduce = 0%, Cumulative CPU 677.29 sec
2020-12-31 11:05:26,334 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 688.58 sec
2020-12-31 11:05:38,617 Stage-1 map = 51%,  reduce = 0%, Cumulative CPU 710.6 sec
2020-12-31 11:05:41,688 Stage-1 map = 52%,  reduce = 0%, Cumulative CPU 723.47 sec
2020-12-31 11:05:49,869 Stage-1 map = 53%,  reduce = 0%, Cumulative CPU 735.39 sec
2020-12-31 11:05:52,936 Stage-1 map = 54%,  reduce = 0%, Cumulative CPU 747.02 sec
2020-12-31 11:06:01,119 Stage-1 map = 55%,  reduce = 0%, Cumulative CPU 759.4 sec
2020-12-31 11:06:05,217 Stage-1 map = 56%,  reduce = 0%, Cumulative CPU 771.34 sec
2020-12-31 11:06:16,493 Stage-1 map = 57%,  reduce = 0%, Cumulative CPU 795.58 sec
2020-12-31 11:06:25,733 Stage-1 map = 58%,  reduce = 0%, Cumulative CPU 807.6 sec
2020-12-31 11:06:28,797 Stage-1 map = 59%,  reduce = 0%, Cumulative CPU 819.42 sec
2020-12-31 11:06:38,003 Stage-1 map = 60%,  reduce = 0%, Cumulative CPU 831.37 sec
2020-12-31 11:06:39,030 Stage-1 map = 61%,  reduce = 0%, Cumulative CPU 842.84 sec
2020-12-31 11:06:49,244 Stage-1 map = 62%,  reduce = 0%, Cumulative CPU 854.85 sec
2020-12-31 11:07:00,504 Stage-1 map = 63%,  reduce = 0%, Cumulative CPU 879.03 sec
2020-12-31 11:07:01,528 Stage-1 map = 64%,  reduce = 0%, Cumulative CPU 891.09 sec
2020-12-31 11:07:11,764 Stage-1 map = 65%,  reduce = 0%, Cumulative CPU 902.4 sec
2020-12-31 11:07:13,809 Stage-1 map = 66%,  reduce = 0%, Cumulative CPU 913.91 sec
2020-12-31 11:07:24,033 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 937.48 sec
2020-12-31 11:07:36,294 Stage-1 map = 69%,  reduce = 0%, Cumulative CPU 961.98 sec
2020-12-31 11:07:47,557 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 973.78 sec
2020-12-31 11:07:48,577 Stage-1 map = 71%,  reduce = 0%, Cumulative CPU 986.12 sec
2020-12-31 11:07:58,802 Stage-1 map = 72%,  reduce = 0%, Cumulative CPU 997.56 sec
2020-12-31 11:07:59,822 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 1009.51 sec
2020-12-31 11:08:10,088 Stage-1 map = 74%,  reduce = 0%, Cumulative CPU 1021.66 sec
2020-12-31 11:08:21,359 Stage-1 map = 75%,  reduce = 0%, Cumulative CPU 1045.31 sec
2020-12-31 11:08:22,387 Stage-1 map = 76%,  reduce = 0%, Cumulative CPU 1057.46 sec
2020-12-31 11:08:32,659 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU 1069.52 sec
2020-12-31 11:08:34,714 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 1081.69 sec
2020-12-31 11:08:44,962 Stage-1 map = 79%,  reduce = 0%, Cumulative CPU 1093.68 sec
2020-12-31 11:08:57,248 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 1117.85 sec
2020-12-31 11:08:59,303 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 1129.42 sec
2020-12-31 11:09:08,562 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 1141.26 sec
2020-12-31 11:09:14,718 Stage-1 map = 82%,  reduce = 27%, Cumulative CPU 1142.0 sec
2020-12-31 11:09:19,840 Stage-1 map = 83%,  reduce = 27%, Cumulative CPU 1153.68 sec
2020-12-31 11:09:25,991 Stage-1 map = 83%,  reduce = 28%, Cumulative CPU 1153.9 sec
2020-12-31 11:09:31,094 Stage-1 map = 84%,  reduce = 28%, Cumulative CPU 1165.59 sec
2020-12-31 11:09:42,338 Stage-1 map = 85%,  reduce = 28%, Cumulative CPU 1177.7 sec
2020-12-31 11:10:04,855 Stage-1 map = 86%,  reduce = 28%, Cumulative CPU 1201.17 sec
2020-12-31 11:10:08,950 Stage-1 map = 86%,  reduce = 29%, Cumulative CPU 1201.22 sec
2020-12-31 11:10:16,121 Stage-1 map = 87%,  reduce = 29%, Cumulative CPU 1212.59 sec
2020-12-31 11:10:27,370 Stage-1 map = 88%,  reduce = 29%, Cumulative CPU 1224.54 sec
2020-12-31 11:10:37,625 Stage-1 map = 89%,  reduce = 29%, Cumulative CPU 1236.28 sec
2020-12-31 11:10:38,646 Stage-1 map = 89%,  reduce = 30%, Cumulative CPU 1236.32 sec
2020-12-31 11:10:48,884 Stage-1 map = 90%,  reduce = 30%, Cumulative CPU 1248.28 sec
2020-12-31 11:11:00,139 Stage-1 map = 91%,  reduce = 30%, Cumulative CPU 1260.32 sec
2020-12-31 11:11:23,680 Stage-1 map = 92%,  reduce = 30%, Cumulative CPU 1283.88 sec
2020-12-31 11:11:26,757 Stage-1 map = 92%,  reduce = 31%, Cumulative CPU 1283.92 sec
2020-12-31 11:11:34,927 Stage-1 map = 93%,  reduce = 31%, Cumulative CPU 1295.65 sec
2020-12-31 11:11:46,182 Stage-1 map = 94%,  reduce = 31%, Cumulative CPU 1308.17 sec
2020-12-31 11:11:58,462 Stage-1 map = 95%,  reduce = 31%, Cumulative CPU 1320.31 sec
2020-12-31 11:12:02,563 Stage-1 map = 95%,  reduce = 32%, Cumulative CPU 1320.36 sec
2020-12-31 11:12:08,713 Stage-1 map = 96%,  reduce = 32%, Cumulative CPU 1331.91 sec
2020-12-31 11:12:19,990 Stage-1 map = 97%,  reduce = 32%, Cumulative CPU 1343.48 sec
2020-12-31 11:12:43,556 Stage-1 map = 98%,  reduce = 32%, Cumulative CPU 1367.18 sec
2020-12-31 11:12:45,603 Stage-1 map = 98%,  reduce = 33%, Cumulative CPU 1367.24 sec
2020-12-31 11:12:55,854 Stage-1 map = 99%,  reduce = 33%, Cumulative CPU 1378.97 sec
2020-12-31 11:13:07,114 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 1390.76 sec
2020-12-31 11:13:09,162 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1392.76 sec
MapReduce Total cumulative CPU time: 23 minutes 12 seconds 760 msec
Ended Job = job_1609141291605_0022
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 117  Reduce: 1   Cumulative CPU: 1392.76 sec   HDFS Read: 31436905540 HDFS Write: 1147 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 23 minutes 12 seconds 760 msec
OK
654691105       2011-05-25 00:00:00.0   PROD10  53
752859493       2011-11-08 00:00:00.0   PROD4   92
620442730       2010-06-11 00:00:00.0   PROD5   22
524983813       2011-04-11 00:00:00.0   PROD6   31
89887602        2010-08-18 00:00:00.0   PROD7   45
93701058        2011-10-31 00:00:00.0   PROD4   62
739459682       2011-01-15 00:00:00.0   PROD4   93
480818608       2010-07-12 00:00:00.0   PROD2   87
457915153       2011-09-09 00:00:00.0   PROD9   85
405422684       2011-11-23 00:00:00.0   PROD10  86
322983965       2012-04-06 00:00:00.0   PROD8   7
588940412       2010-08-15 00:00:00.0   PROD8   51
421954935       2012-01-24 00:00:00.0   PROD4   17
749374812       2010-12-12 00:00:00.0   PROD4   62
298315594       2010-06-13 00:00:00.0   PROD5   75
723116860       2011-01-17 00:00:00.0   PROD10  89
167011022       2011-01-20 00:00:00.0   PROD4   69
430667509       2011-07-07 00:00:00.0   PROD6   63
665176804       2012-08-25 00:00:00.0   PROD7   77
648219864       2012-05-15 00:00:00.0   PROD7   74
Time taken: 814.055 seconds, Fetched: 20 row(s)

hive> 
    > select * from ods_fact_sale where sale_date = '2011-08-16 00:00:00.0'  distribute by rand() sort by rand() limit 10;
Query ID = root_20201231135813_71f7d916-8e6f-4c7a-846f-49b78194da8d
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 469
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
20/12/31 13:58:14 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0023, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0023/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1609141291605_0023
Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 469
2020-12-31 13:58:21,609 Stage-1 map = 0%,  reduce = 0%
2020-12-31 13:58:31,907 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 7.78 sec
2020-12-31 13:58:32,938 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 17.16 sec
2020-12-31 13:58:39,109 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 24.81 sec
2020-12-31 13:58:46,309 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 41.49 sec
2020-12-31 13:58:49,396 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 50.29 sec
2020-12-31 13:58:52,477 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 57.84 sec
2020-12-31 13:58:56,588 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 66.39 sec
2020-12-31 13:58:59,672 Stage-1 map = 8%,  reduce = 0%, Cumulative CPU 73.76 sec
2020-12-31 13:59:04,828 Stage-1 map = 9%,  reduce = 0%, Cumulative CPU 81.84 sec
2020-12-31 13:59:12,006 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 97.09 sec
2020-12-31 13:59:14,061 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 104.49 sec
2020-12-31 13:59:20,215 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU 112.37 sec
2020-12-31 13:59:21,246 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 119.89 sec
2020-12-31 13:59:28,440 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU 135.55 sec
2020-12-31 13:59:36,643 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU 151.18 sec
2020-12-31 13:59:41,772 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 158.62 sec
2020-12-31 13:59:44,854 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 166.61 sec
2020-12-31 13:59:48,955 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 173.93 sec
2020-12-31 13:59:52,032 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 182.11 sec
2020-12-31 13:59:56,135 Stage-1 map = 21%,  reduce = 0%, Cumulative CPU 189.52 sec
2020-12-31 14:00:03,337 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 204.95 sec
2020-12-31 14:00:08,474 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 212.83 sec
2020-12-31 14:00:10,529 Stage-1 map = 24%,  reduce = 0%, Cumulative CPU 220.27 sec
2020-12-31 14:00:16,678 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU 235.53 sec
2020-12-31 14:00:24,875 Stage-1 map = 27%,  reduce = 0%, Cumulative CPU 251.0 sec
2020-12-31 14:00:31,029 Stage-1 map = 28%,  reduce = 0%, Cumulative CPU 258.48 sec
2020-12-31 14:00:33,084 Stage-1 map = 29%,  reduce = 0%, Cumulative CPU 266.53 sec
2020-12-31 14:00:38,214 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU 273.81 sec
2020-12-31 14:00:40,263 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 281.73 sec
2020-12-31 14:00:45,380 Stage-1 map = 32%,  reduce = 0%, Cumulative CPU 289.15 sec
2020-12-31 14:00:52,560 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 304.46 sec
2020-12-31 14:00:56,667 Stage-1 map = 34%,  reduce = 0%, Cumulative CPU 312.72 sec
2020-12-31 14:00:59,737 Stage-1 map = 35%,  reduce = 0%, Cumulative CPU 320.25 sec
2020-12-31 14:01:04,867 Stage-1 map = 36%,  reduce = 0%, Cumulative CPU 328.38 sec
2020-12-31 14:01:05,893 Stage-1 map = 37%,  reduce = 0%, Cumulative CPU 335.74 sec
2020-12-31 14:01:13,071 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 350.86 sec
2020-12-31 14:01:20,251 Stage-1 map = 40%,  reduce = 0%, Cumulative CPU 366.17 sec
2020-12-31 14:01:27,416 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU 373.71 sec
2020-12-31 14:01:28,442 Stage-1 map = 42%,  reduce = 0%, Cumulative CPU 381.35 sec
2020-12-31 14:01:34,585 Stage-1 map = 43%,  reduce = 0%, Cumulative CPU 388.67 sec
2020-12-31 14:01:35,607 Stage-1 map = 44%,  reduce = 0%, Cumulative CPU 396.63 sec
2020-12-31 14:01:43,802 Stage-1 map = 45%,  reduce = 0%, Cumulative CPU 412.13 sec
2020-12-31 14:01:48,929 Stage-1 map = 46%,  reduce = 0%, Cumulative CPU 419.64 sec
2020-12-31 14:01:52,005 Stage-1 map = 47%,  reduce = 0%, Cumulative CPU 427.68 sec
2020-12-31 14:01:54,056 Stage-1 map = 48%,  reduce = 0%, Cumulative CPU 433.61 sec
2020-12-31 14:02:00,199 Stage-1 map = 49%,  reduce = 0%, Cumulative CPU 441.53 sec
2020-12-31 14:02:02,274 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 449.88 sec
2020-12-31 14:02:09,465 Stage-1 map = 51%,  reduce = 0%, Cumulative CPU 465.79 sec
2020-12-31 14:02:15,630 Stage-1 map = 52%,  reduce = 0%, Cumulative CPU 473.67 sec
2020-12-31 14:02:17,687 Stage-1 map = 53%,  reduce = 0%, Cumulative CPU 481.54 sec
2020-12-31 14:02:23,854 Stage-1 map = 54%,  reduce = 0%, Cumulative CPU 489.38 sec
2020-12-31 14:02:25,905 Stage-1 map = 55%,  reduce = 0%, Cumulative CPU 497.17 sec
2020-12-31 14:02:33,084 Stage-1 map = 56%,  reduce = 0%, Cumulative CPU 506.49 sec
2020-12-31 14:02:41,293 Stage-1 map = 57%,  reduce = 0%, Cumulative CPU 523.33 sec
2020-12-31 14:02:43,344 Stage-1 map = 58%,  reduce = 0%, Cumulative CPU 532.39 sec
2020-12-31 14:02:49,496 Stage-1 map = 59%,  reduce = 0%, Cumulative CPU 540.88 sec
2020-12-31 14:02:51,541 Stage-1 map = 60%,  reduce = 0%, Cumulative CPU 548.98 sec
2020-12-31 14:02:57,686 Stage-1 map = 61%,  reduce = 0%, Cumulative CPU 557.3 sec
2020-12-31 14:03:00,757 Stage-1 map = 62%,  reduce = 0%, Cumulative CPU 567.09 sec
2020-12-31 14:03:08,949 Stage-1 map = 63%,  reduce = 0%, Cumulative CPU 585.16 sec
2020-12-31 14:03:13,044 Stage-1 map = 64%,  reduce = 0%, Cumulative CPU 593.36 sec
2020-12-31 14:03:18,168 Stage-1 map = 65%,  reduce = 0%, Cumulative CPU 602.73 sec
2020-12-31 14:03:21,233 Stage-1 map = 66%,  reduce = 0%, Cumulative CPU 610.98 sec
2020-12-31 14:03:27,362 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 620.83 sec
2020-12-31 14:03:29,405 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 629.08 sec
2020-12-31 14:03:37,585 Stage-1 map = 69%,  reduce = 0%, Cumulative CPU 646.57 sec
2020-12-31 14:03:43,726 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 654.23 sec
2020-12-31 14:03:45,778 Stage-1 map = 71%,  reduce = 0%, Cumulative CPU 662.3 sec
2020-12-31 14:03:50,901 Stage-1 map = 72%,  reduce = 0%, Cumulative CPU 670.0 sec
2020-12-31 14:03:52,949 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 678.12 sec
2020-12-31 14:03:58,069 Stage-1 map = 74%,  reduce = 0%, Cumulative CPU 685.59 sec
2020-12-31 14:04:05,244 Stage-1 map = 75%,  reduce = 0%, Cumulative CPU 701.02 sec
2020-12-31 14:04:09,343 Stage-1 map = 76%,  reduce = 0%, Cumulative CPU 709.68 sec
2020-12-31 14:04:12,416 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU 717.45 sec
2020-12-31 14:04:17,540 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 717.45 sec
2020-12-31 14:04:19,590 Stage-1 map = 79%,  reduce = 0%, Cumulative CPU 734.92 sec
2020-12-31 14:04:28,796 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 752.8 sec
2020-12-31 14:04:33,910 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 761.43 sec
2020-12-31 14:04:38,010 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 771.05 sec
2020-12-31 14:04:46,214 Stage-1 map = 83%,  reduce = 0%, Cumulative CPU 780.66 sec
2020-12-31 14:04:55,479 Stage-1 map = 84%,  reduce = 0%, Cumulative CPU 791.47 sec
2020-12-31 14:05:04,703 Stage-1 map = 85%,  reduce = 0%, Cumulative CPU 801.23 sec
2020-12-31 14:05:22,125 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 820.39 sec
2020-12-31 14:05:31,339 Stage-1 map = 87%,  reduce = 0%, Cumulative CPU 830.03 sec
2020-12-31 14:05:40,555 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 839.37 sec
2020-12-31 14:05:49,765 Stage-1 map = 89%,  reduce = 0%, Cumulative CPU 848.73 sec
2020-12-31 14:05:57,956 Stage-1 map = 90%,  reduce = 0%, Cumulative CPU 848.87 sec
2020-12-31 14:06:07,193 Stage-1 map = 91%,  reduce = 0%, Cumulative CPU 866.8 sec
2020-12-31 14:06:21,535 Stage-1 map = 92%,  reduce = 0%, Cumulative CPU 882.17 sec
2020-12-31 14:06:28,722 Stage-1 map = 93%,  reduce = 0%, Cumulative CPU 890.1 sec
2020-12-31 14:06:34,871 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 897.74 sec
2020-12-31 14:06:42,042 Stage-1 map = 95%,  reduce = 0%, Cumulative CPU 905.22 sec
2020-12-31 14:06:49,212 Stage-1 map = 96%,  reduce = 0%, Cumulative CPU 912.75 sec
2020-12-31 14:06:56,368 Stage-1 map = 97%,  reduce = 0%, Cumulative CPU 920.31 sec
2020-12-31 14:07:10,714 Stage-1 map = 98%,  reduce = 0%, Cumulative CPU 935.37 sec
2020-12-31 14:07:16,855 Stage-1 map = 99%,  reduce = 0%, Cumulative CPU 943.06 sec
2020-12-31 14:07:24,004 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 950.73 sec
2020-12-31 14:07:30,153 Stage-1 map = 100%,  reduce = 1%, Cumulative CPU 957.08 sec
2020-12-31 14:07:40,370 Stage-1 map = 100%,  reduce = 2%, Cumulative CPU 968.59 sec
2020-12-31 14:07:47,548 Stage-1 map = 100%,  reduce = 3%, Cumulative CPU 977.62 sec
2020-12-31 14:07:57,780 Stage-1 map = 100%,  reduce = 4%, Cumulative CPU 989.98 sec
2020-12-31 14:08:08,025 Stage-1 map = 100%,  reduce = 5%, Cumulative CPU 1002.35 sec
2020-12-31 14:08:16,225 Stage-1 map = 100%,  reduce = 6%, Cumulative CPU 1012.48 sec
2020-12-31 14:08:26,474 Stage-1 map = 100%,  reduce = 7%, Cumulative CPU 1023.83 sec
2020-12-31 14:08:35,686 Stage-1 map = 100%,  reduce = 8%, Cumulative CPU 1035.25 sec
2020-12-31 14:08:43,876 Stage-1 map = 100%,  reduce = 9%, Cumulative CPU 1044.42 sec
2020-12-31 14:08:54,122 Stage-1 map = 100%,  reduce = 10%, Cumulative CPU 1056.78 sec
2020-12-31 14:09:04,381 Stage-1 map = 100%,  reduce = 11%, Cumulative CPU 1068.06 sec
2020-12-31 14:09:12,577 Stage-1 map = 100%,  reduce = 12%, Cumulative CPU 1077.07 sec
2020-12-31 14:09:22,814 Stage-1 map = 100%,  reduce = 13%, Cumulative CPU 1089.5 sec
2020-12-31 14:09:32,020 Stage-1 map = 100%,  reduce = 14%, Cumulative CPU 1100.92 sec
2020-12-31 14:09:42,265 Stage-1 map = 100%,  reduce = 15%, Cumulative CPU 1112.77 sec
2020-12-31 14:09:50,446 Stage-1 map = 100%,  reduce = 16%, Cumulative CPU 1122.02 sec
2020-12-31 14:10:00,697 Stage-1 map = 100%,  reduce = 17%, Cumulative CPU 1134.19 sec
2020-12-31 14:10:10,947 Stage-1 map = 100%,  reduce = 18%, Cumulative CPU 1145.76 sec
2020-12-31 14:10:18,126 Stage-1 map = 100%,  reduce = 19%, Cumulative CPU 1154.87 sec
2020-12-31 14:10:28,387 Stage-1 map = 100%,  reduce = 20%, Cumulative CPU 1166.3 sec
2020-12-31 14:10:38,627 Stage-1 map = 100%,  reduce = 21%, Cumulative CPU 1178.67 sec
2020-12-31 14:10:46,829 Stage-1 map = 100%,  reduce = 22%, Cumulative CPU 1188.38 sec
2020-12-31 14:10:56,045 Stage-1 map = 100%,  reduce = 23%, Cumulative CPU 1199.71 sec
2020-12-31 14:11:06,291 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 1211.25 sec
2020-12-31 14:11:14,480 Stage-1 map = 100%,  reduce = 25%, Cumulative CPU 1220.47 sec
2020-12-31 14:11:24,728 Stage-1 map = 100%,  reduce = 26%, Cumulative CPU 1231.71 sec
2020-12-31 14:11:34,956 Stage-1 map = 100%,  reduce = 27%, Cumulative CPU 1243.07 sec
2020-12-31 14:11:43,155 Stage-1 map = 100%,  reduce = 28%, Cumulative CPU 1252.2 sec
2020-12-31 14:11:52,379 Stage-1 map = 100%,  reduce = 29%, Cumulative CPU 1263.42 sec
2020-12-31 14:12:02,628 Stage-1 map = 100%,  reduce = 30%, Cumulative CPU 1274.71 sec
2020-12-31 14:12:12,877 Stage-1 map = 100%,  reduce = 31%, Cumulative CPU 1285.68 sec
2020-12-31 14:12:21,081 Stage-1 map = 100%,  reduce = 32%, Cumulative CPU 1294.61 sec
2020-12-31 14:12:32,371 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 1308.38 sec
2020-12-31 14:12:40,558 Stage-1 map = 100%,  reduce = 34%, Cumulative CPU 1317.37 sec
2020-12-31 14:12:48,765 Stage-1 map = 100%,  reduce = 35%, Cumulative CPU 1326.48 sec
2020-12-31 14:13:00,064 Stage-1 map = 100%,  reduce = 36%, Cumulative CPU 1337.97 sec
2020-12-31 14:13:08,279 Stage-1 map = 100%,  reduce = 37%, Cumulative CPU 1348.92 sec
2020-12-31 14:13:16,478 Stage-1 map = 100%,  reduce = 38%, Cumulative CPU 1358.16 sec
2020-12-31 14:13:27,748 Stage-1 map = 100%,  reduce = 39%, Cumulative CPU 1369.61 sec
2020-12-31 14:13:36,954 Stage-1 map = 100%,  reduce = 40%, Cumulative CPU 1381.52 sec
2020-12-31 14:13:45,162 Stage-1 map = 100%,  reduce = 41%, Cumulative CPU 1390.5 sec
2020-12-31 14:13:56,426 Stage-1 map = 100%,  reduce = 42%, Cumulative CPU 1404.04 sec
2020-12-31 14:14:04,630 Stage-1 map = 100%,  reduce = 43%, Cumulative CPU 1413.17 sec
2020-12-31 14:14:15,907 Stage-1 map = 100%,  reduce = 44%, Cumulative CPU 1424.15 sec
2020-12-31 14:14:24,108 Stage-1 map = 100%,  reduce = 45%, Cumulative CPU 1433.07 sec
2020-12-31 14:14:33,324 Stage-1 map = 100%,  reduce = 46%, Cumulative CPU 1444.27 sec
2020-12-31 14:14:44,594 Stage-1 map = 100%,  reduce = 47%, Cumulative CPU 1457.57 sec
2020-12-31 14:14:51,765 Stage-1 map = 100%,  reduce = 48%, Cumulative CPU 1464.38 sec
2020-12-31 14:15:00,997 Stage-1 map = 100%,  reduce = 49%, Cumulative CPU 1475.56 sec
2020-12-31 14:15:12,284 Stage-1 map = 100%,  reduce = 50%, Cumulative CPU 1486.63 sec
2020-12-31 14:15:20,479 Stage-1 map = 100%,  reduce = 51%, Cumulative CPU 1495.52 sec
2020-12-31 14:15:28,690 Stage-1 map = 100%,  reduce = 52%, Cumulative CPU 1506.78 sec
2020-12-31 14:15:39,970 Stage-1 map = 100%,  reduce = 53%, Cumulative CPU 1518.01 sec
2020-12-31 14:15:48,182 Stage-1 map = 100%,  reduce = 54%, Cumulative CPU 1527.01 sec
2020-12-31 14:15:57,401 Stage-1 map = 100%,  reduce = 55%, Cumulative CPU 1538.2 sec
2020-12-31 14:16:08,678 Stage-1 map = 100%,  reduce = 56%, Cumulative CPU 1551.7 sec
2020-12-31 14:16:16,895 Stage-1 map = 100%,  reduce = 57%, Cumulative CPU 1560.42 sec
2020-12-31 14:16:25,107 Stage-1 map = 100%,  reduce = 58%, Cumulative CPU 1569.44 sec
2020-12-31 14:16:36,370 Stage-1 map = 100%,  reduce = 59%, Cumulative CPU 1580.69 sec
2020-12-31 14:16:45,596 Stage-1 map = 100%,  reduce = 60%, Cumulative CPU 1592.01 sec
2020-12-31 14:16:52,776 Stage-1 map = 100%,  reduce = 61%, Cumulative CPU 1601.0 sec
2020-12-31 14:17:05,078 Stage-1 map = 100%,  reduce = 62%, Cumulative CPU 1614.51 sec
2020-12-31 14:17:13,284 Stage-1 map = 100%,  reduce = 63%, Cumulative CPU 1623.31 sec
2020-12-31 14:17:21,491 Stage-1 map = 100%,  reduce = 64%, Cumulative CPU 1632.19 sec
2020-12-31 14:17:32,762 Stage-1 map = 100%,  reduce = 65%, Cumulative CPU 1643.45 sec
2020-12-31 14:17:40,961 Stage-1 map = 100%,  reduce = 66%, Cumulative CPU 1654.75 sec
2020-12-31 14:17:49,164 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 1663.65 sec
2020-12-31 14:18:00,460 Stage-1 map = 100%,  reduce = 68%, Cumulative CPU 1674.82 sec
2020-12-31 14:18:09,686 Stage-1 map = 100%,  reduce = 69%, Cumulative CPU 1685.96 sec
2020-12-31 14:18:18,922 Stage-1 map = 100%,  reduce = 70%, Cumulative CPU 1694.86 sec
2020-12-31 14:18:29,166 Stage-1 map = 100%,  reduce = 71%, Cumulative CPU 1706.34 sec
2020-12-31 14:18:38,369 Stage-1 map = 100%,  reduce = 72%, Cumulative CPU 1717.78 sec
2020-12-31 14:18:48,641 Stage-1 map = 100%,  reduce = 73%, Cumulative CPU 1728.87 sec
2020-12-31 14:18:56,848 Stage-1 map = 100%,  reduce = 74%, Cumulative CPU 1737.66 sec
2020-12-31 14:19:06,087 Stage-1 map = 100%,  reduce = 75%, Cumulative CPU 1748.69 sec
2020-12-31 14:19:16,359 Stage-1 map = 100%,  reduce = 76%, Cumulative CPU 1759.78 sec
2020-12-31 14:19:24,564 Stage-1 map = 100%,  reduce = 77%, Cumulative CPU 1768.65 sec
2020-12-31 14:19:34,806 Stage-1 map = 100%,  reduce = 78%, Cumulative CPU 1779.88 sec
2020-12-31 14:19:45,062 Stage-1 map = 100%,  reduce = 79%, Cumulative CPU 1791.24 sec
2020-12-31 14:19:53,273 Stage-1 map = 100%,  reduce = 80%, Cumulative CPU 1800.31 sec
2020-12-31 14:20:02,499 Stage-1 map = 100%,  reduce = 81%, Cumulative CPU 1811.28 sec
2020-12-31 14:20:12,750 Stage-1 map = 100%,  reduce = 82%, Cumulative CPU 1822.45 sec
2020-12-31 14:20:20,981 Stage-1 map = 100%,  reduce = 83%, Cumulative CPU 1831.51 sec
2020-12-31 14:20:31,219 Stage-1 map = 100%,  reduce = 84%, Cumulative CPU 1843.78 sec
2020-12-31 14:20:41,474 Stage-1 map = 100%,  reduce = 85%, Cumulative CPU 1855.11 sec
2020-12-31 14:20:48,653 Stage-1 map = 100%,  reduce = 86%, Cumulative CPU 1864.26 sec
2020-12-31 14:20:58,906 Stage-1 map = 100%,  reduce = 87%, Cumulative CPU 1875.36 sec
2020-12-31 14:21:09,160 Stage-1 map = 100%,  reduce = 88%, Cumulative CPU 1886.85 sec
2020-12-31 14:21:19,417 Stage-1 map = 100%,  reduce = 89%, Cumulative CPU 1897.92 sec
2020-12-31 14:21:26,596 Stage-1 map = 100%,  reduce = 90%, Cumulative CPU 1907.18 sec
2020-12-31 14:21:36,846 Stage-1 map = 100%,  reduce = 91%, Cumulative CPU 1918.57 sec
2020-12-31 14:21:47,109 Stage-1 map = 100%,  reduce = 92%, Cumulative CPU 1929.52 sec
2020-12-31 14:21:55,303 Stage-1 map = 100%,  reduce = 93%, Cumulative CPU 1938.42 sec
2020-12-31 14:22:05,571 Stage-1 map = 100%,  reduce = 94%, Cumulative CPU 1949.77 sec
2020-12-31 14:22:14,793 Stage-1 map = 100%,  reduce = 95%, Cumulative CPU 1960.81 sec
2020-12-31 14:22:23,001 Stage-1 map = 100%,  reduce = 96%, Cumulative CPU 1969.72 sec
2020-12-31 14:22:33,270 Stage-1 map = 100%,  reduce = 97%, Cumulative CPU 1981.0 sec
2020-12-31 14:22:43,503 Stage-1 map = 100%,  reduce = 98%, Cumulative CPU 1992.01 sec
2020-12-31 14:22:50,683 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 2000.89 sec
2020-12-31 14:23:05,030 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2016.56 sec
MapReduce Total cumulative CPU time: 33 minutes 36 seconds 560 msec
Ended Job = job_1609141291605_0023
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
20/12/31 14:23:06 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0024, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0024/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1609141291605_0024
Hadoop job information for Stage-2: number of mappers: 2; number of reducers: 1
2020-12-31 14:23:17,212 Stage-2 map = 0%,  reduce = 0%
2020-12-31 14:23:24,475 Stage-2 map = 50%,  reduce = 0%, Cumulative CPU 5.37 sec
2020-12-31 14:23:25,505 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 11.9 sec
2020-12-31 14:23:30,651 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 14.14 sec
MapReduce Total cumulative CPU time: 14 seconds 140 msec
Ended Job = job_1609141291605_0024
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 117  Reduce: 469   Cumulative CPU: 2016.56 sec   HDFS Read: 31438766866 HDFS Write: 79070 HDFS EC Read: 0 SUCCESS
Stage-Stage-2: Map: 2  Reduce: 1   Cumulative CPU: 14.14 sec   HDFS Read: 207188 HDFS Write: 614 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 33 minutes 50 seconds 700 msec
OK
601096637       2011-08-16 00:00:00.0   PROD10  28
7504198 2011-08-16 00:00:00.0   PROD7   22
7666912 2011-08-16 00:00:00.0   PROD7   70
393337914       2011-08-16 00:00:00.0   PROD5   55
98814403        2011-08-16 00:00:00.0   PROD4   45
744615937       2011-08-16 00:00:00.0   PROD7   73
124859277       2011-08-16 00:00:00.0   PROD3   69
212317100       2011-08-16 00:00:00.0   PROD10  48
504809117       2011-08-16 00:00:00.0   PROD3   33
268235827       2011-08-16 00:00:00.0   PROD9   91
Time taken: 1517.782 seconds, Fetched: 10 row(s)

二.桶表抽样

当数据量特别大时,对全体数据进行处理存在困难时,抽样就显得尤其重要了。抽样可以从被抽取的数据中估计和推断出整体的特性,是科学实验、质量检验、社会调查普遍采用的一种经济有效的工作和研究方法。

Hive支持桶表抽样和块抽样。所谓桶表指的是在创建表时使用CLUSTERED BY子句创建了桶的表。桶表抽样的语法如下:

table_sample: TABLESAMPLE (BUCKET x OUT OF y [ON colname])

TABLESAMPLE子句允许用户编写用于数据抽样而不是整个表的查询,该子句出现FROM子句中,可用于任何表中。桶编号从1开始,colname表明抽取样本的列,可以是非分区列中的任意一列,或者使用rand()表明在整个行中抽取样本而不是单个列。在colname上分桶的行随机进入1到y个桶中,返回属于桶x的行。下面的例子中,返回32个桶中的第3个桶中的行:

代码:

-- 随机抽取一百分之一的数据
select * from ods_fact_sale tablesample(bucket 1 out of 100 on rand()) limit 100

测试记录:

hive> select * from ods_fact_sale tablesample(bucket 1 out of 100 on rand()) limit 100;
Query ID = root_20210106102309_b7fd3c38-74f3-4877-bf44-d5bb24a62a93
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
21/01/06 10:23:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0029, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0029/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1609141291605_0029
Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 0
2021-01-06 10:23:18,751 Stage-1 map = 0%,  reduce = 0%
2021-01-06 10:23:26,042 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 7.34 sec
2021-01-06 10:23:30,196 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 14.66 sec
2021-01-06 10:23:34,325 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 21.83 sec
2021-01-06 10:23:38,447 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 29.09 sec
2021-01-06 10:23:42,571 Stage-1 map = 8%,  reduce = 0%, Cumulative CPU 32.67 sec
2021-01-06 10:23:43,616 Stage-1 map = 9%,  reduce = 0%, Cumulative CPU 36.22 sec
2021-01-06 10:23:46,695 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 43.32 sec
2021-01-06 10:23:49,779 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 46.9 sec
2021-01-06 10:23:50,812 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU 50.38 sec
2021-01-06 10:23:53,928 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 53.95 sec
2021-01-06 10:23:54,958 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 57.4 sec
2021-01-06 10:23:58,044 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU 60.99 sec
2021-01-06 10:24:02,163 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU 68.22 sec
2021-01-06 10:24:03,201 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 71.85 sec
2021-01-06 10:24:06,282 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 75.38 sec
2021-01-06 10:24:07,315 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 78.96 sec
2021-01-06 10:24:10,398 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 82.35 sec
2021-01-06 10:24:11,427 Stage-1 map = 21%,  reduce = 0%, Cumulative CPU 85.78 sec
2021-01-06 10:24:15,539 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 92.86 sec
2021-01-06 10:24:18,621 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 96.44 sec
2021-01-06 10:24:19,646 Stage-1 map = 24%,  reduce = 0%, Cumulative CPU 100.07 sec
2021-01-06 10:24:22,724 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 103.7 sec
2021-01-06 10:24:23,749 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU 107.35 sec
2021-01-06 10:24:26,827 Stage-1 map = 27%,  reduce = 0%, Cumulative CPU 114.56 sec
2021-01-06 10:24:29,950 Stage-1 map = 28%,  reduce = 0%, Cumulative CPU 118.18 sec
2021-01-06 10:24:30,972 Stage-1 map = 29%,  reduce = 0%, Cumulative CPU 121.63 sec
2021-01-06 10:24:34,056 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU 125.19 sec
2021-01-06 10:24:35,083 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 128.71 sec
2021-01-06 10:24:38,153 Stage-1 map = 32%,  reduce = 0%, Cumulative CPU 132.27 sec
2021-01-06 10:24:42,256 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 139.28 sec
2021-01-06 10:24:43,284 Stage-1 map = 34%,  reduce = 0%, Cumulative CPU 142.67 sec
2021-01-06 10:24:46,354 Stage-1 map = 35%,  reduce = 0%, Cumulative CPU 146.25 sec
2021-01-06 10:24:47,379 Stage-1 map = 36%,  reduce = 0%, Cumulative CPU 149.89 sec
2021-01-06 10:24:50,443 Stage-1 map = 37%,  reduce = 0%, Cumulative CPU 153.59 sec
2021-01-06 10:24:51,469 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 157.12 sec
2021-01-06 10:24:55,594 Stage-1 map = 39%,  reduce = 0%, Cumulative CPU 164.4 sec
2021-01-06 10:24:58,671 Stage-1 map = 40%,  reduce = 0%, Cumulative CPU 168.07 sec
2021-01-06 10:24:59,695 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU 171.6 sec
2021-01-06 10:25:03,822 Stage-1 map = 43%,  reduce = 0%, Cumulative CPU 179.17 sec
2021-01-06 10:25:07,927 Stage-1 map = 44%,  reduce = 0%, Cumulative CPU 186.31 sec
2021-01-06 10:25:12,024 Stage-1 map = 46%,  reduce = 0%, Cumulative CPU 193.51 sec
2021-01-06 10:25:16,114 Stage-1 map = 48%,  reduce = 0%, Cumulative CPU 200.55 sec
2021-01-06 10:25:20,199 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 207.57 sec
2021-01-06 10:25:24,294 Stage-1 map = 51%,  reduce = 0%, Cumulative CPU 214.76 sec
2021-01-06 10:25:28,387 Stage-1 map = 53%,  reduce = 0%, Cumulative CPU 221.97 sec
2021-01-06 10:25:32,493 Stage-1 map = 55%,  reduce = 0%, Cumulative CPU 228.8 sec
2021-01-06 10:25:36,581 Stage-1 map = 56%,  reduce = 0%, Cumulative CPU 235.82 sec
2021-01-06 10:25:40,678 Stage-1 map = 58%,  reduce = 0%, Cumulative CPU 243.01 sec
2021-01-06 10:25:44,771 Stage-1 map = 60%,  reduce = 0%, Cumulative CPU 250.35 sec
2021-01-06 10:25:48,863 Stage-1 map = 62%,  reduce = 0%, Cumulative CPU 257.45 sec
2021-01-06 10:25:52,971 Stage-1 map = 63%,  reduce = 0%, Cumulative CPU 264.79 sec
2021-01-06 10:25:56,067 Stage-1 map = 64%,  reduce = 0%, Cumulative CPU 268.38 sec
2021-01-06 10:25:57,091 Stage-1 map = 65%,  reduce = 0%, Cumulative CPU 272.06 sec
2021-01-06 10:26:00,161 Stage-1 map = 66%,  reduce = 0%, Cumulative CPU 275.58 sec
2021-01-06 10:26:01,181 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 279.07 sec
2021-01-06 10:26:04,288 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 282.68 sec
2021-01-06 10:26:08,389 Stage-1 map = 69%,  reduce = 0%, Cumulative CPU 289.73 sec
2021-01-06 10:26:09,413 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 293.31 sec
2021-01-06 10:26:12,491 Stage-1 map = 71%,  reduce = 0%, Cumulative CPU 296.94 sec
2021-01-06 10:26:13,517 Stage-1 map = 72%,  reduce = 0%, Cumulative CPU 300.47 sec
2021-01-06 10:26:16,646 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 304.17 sec
2021-01-06 10:26:17,667 Stage-1 map = 74%,  reduce = 0%, Cumulative CPU 307.78 sec
2021-01-06 10:26:21,775 Stage-1 map = 75%,  reduce = 0%, Cumulative CPU 314.72 sec
2021-01-06 10:26:24,842 Stage-1 map = 76%,  reduce = 0%, Cumulative CPU 318.24 sec
2021-01-06 10:26:25,864 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU 321.84 sec
2021-01-06 10:26:28,990 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 325.54 sec
2021-01-06 10:26:30,010 Stage-1 map = 79%,  reduce = 0%, Cumulative CPU 328.98 sec
2021-01-06 10:26:33,100 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 335.9 sec
2021-01-06 10:26:36,182 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 339.46 sec
2021-01-06 10:26:37,208 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 342.94 sec
2021-01-06 10:26:40,278 Stage-1 map = 83%,  reduce = 0%, Cumulative CPU 346.48 sec
2021-01-06 10:26:41,299 Stage-1 map = 84%,  reduce = 0%, Cumulative CPU 350.03 sec
2021-01-06 10:26:44,368 Stage-1 map = 85%,  reduce = 0%, Cumulative CPU 353.74 sec
2021-01-06 10:26:48,470 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 360.97 sec
2021-01-06 10:26:49,519 Stage-1 map = 87%,  reduce = 0%, Cumulative CPU 364.47 sec
2021-01-06 10:26:52,587 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 367.97 sec
2021-01-06 10:26:53,612 Stage-1 map = 89%,  reduce = 0%, Cumulative CPU 371.48 sec
2021-01-06 10:26:56,714 Stage-1 map = 90%,  reduce = 0%, Cumulative CPU 375.17 sec
2021-01-06 10:26:57,740 Stage-1 map = 91%,  reduce = 0%, Cumulative CPU 378.75 sec
2021-01-06 10:27:01,849 Stage-1 map = 92%,  reduce = 0%, Cumulative CPU 386.16 sec
2021-01-06 10:27:04,955 Stage-1 map = 93%,  reduce = 0%, Cumulative CPU 389.82 sec
2021-01-06 10:27:05,978 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 393.33 sec
2021-01-06 10:27:09,048 Stage-1 map = 95%,  reduce = 0%, Cumulative CPU 397.01 sec
2021-01-06 10:27:10,079 Stage-1 map = 96%,  reduce = 0%, Cumulative CPU 400.49 sec
2021-01-06 10:27:13,200 Stage-1 map = 97%,  reduce = 0%, Cumulative CPU 404.31 sec
2021-01-06 10:27:16,273 Stage-1 map = 98%,  reduce = 0%, Cumulative CPU 411.48 sec
2021-01-06 10:27:18,312 Stage-1 map = 99%,  reduce = 0%, Cumulative CPU 414.97 sec
2021-01-06 10:27:20,354 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 418.35 sec
MapReduce Total cumulative CPU time: 6 minutes 58 seconds 350 msec
Ended Job = job_1609141291605_0029
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 117   Cumulative CPU: 418.35 sec   HDFS Read: 62555036 HDFS Write: 629015 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 minutes 58 seconds 350 msec
OK
169387977       2011-01-30 00:00:00.0   PROD10  87
169387995       2011-05-10 00:00:00.0   PROD6   86
169388013       2011-04-14 00:00:00.0   PROD10  46
169388092       2010-06-07 00:00:00.0   PROD3   34
169388149       2010-06-21 00:00:00.0   PROD7   16
169388210       2011-10-05 00:00:00.0   PROD3   85
169388272       2012-02-27 00:00:00.0   PROD6   65
169388359       2012-08-30 00:00:00.0   PROD10  10
169388383       2011-11-09 00:00:00.0   PROD5   95
169388414       2011-07-25 00:00:00.0   PROD2   35
169388433       2011-05-18 00:00:00.0   PROD9   85
169388697       2010-12-25 00:00:00.0   PROD9   20
169388811       2012-04-03 00:00:00.0   PROD9   49
169388872       2010-11-23 00:00:00.0   PROD7   71
169388935       2012-04-18 00:00:00.0   PROD6   62
169389026       2011-03-21 00:00:00.0   PROD10  80
169389070       2010-09-09 00:00:00.0   PROD3   90
169389083       2010-05-20 00:00:00.0   PROD3   41
169389370       2011-01-28 00:00:00.0   PROD6   39
169389409       2012-08-09 00:00:00.0   PROD2   20
169389430       2012-08-23 00:00:00.0   PROD3   47
169389517       2011-10-25 00:00:00.0   PROD8   33
169389759       2010-09-03 00:00:00.0   PROD3   14
169389802       2010-08-22 00:00:00.0   PROD3   55
169389899       2012-01-14 00:00:00.0   PROD5   80
169389935       2010-06-10 00:00:00.0   PROD9   25
169390249       2010-09-05 00:00:00.0   PROD7   89
169390332       2012-07-28 00:00:00.0   PROD9   24
169390405       2011-09-30 00:00:00.0   PROD6   82
169390432       2010-09-04 00:00:00.0   PROD6   3
169390525       2011-04-24 00:00:00.0   PROD6   50
169390529       2012-06-29 00:00:00.0   PROD4   36
169390596       2011-09-29 00:00:00.0   PROD2   69
169390726       2011-01-09 00:00:00.0   PROD4   20
169390784       2011-08-20 00:00:00.0   PROD7   19
169390821       2010-07-14 00:00:00.0   PROD4   44
169390835       2010-09-24 00:00:00.0   PROD2   15
169390858       2012-08-08 00:00:00.0   PROD5   3
169391297       2011-03-24 00:00:00.0   PROD10  75
169391461       2012-03-14 00:00:00.0   PROD4   32
169391509       2010-11-23 00:00:00.0   PROD3   28
169391526       2012-03-28 00:00:00.0   PROD6   35
169391558       2011-02-21 00:00:00.0   PROD2   79
169391632       2010-10-09 00:00:00.0   PROD9   37
169391649       2012-09-22 00:00:00.0   PROD8   80
169391761       2011-03-15 00:00:00.0   PROD7   45
169391765       2011-01-23 00:00:00.0   PROD4   71
169391951       2012-03-08 00:00:00.0   PROD3   97
169392051       2011-05-13 00:00:00.0   PROD9   27
169392357       2010-05-22 00:00:00.0   PROD4   8
169392408       2011-01-06 00:00:00.0   PROD7   31
169392481       2012-07-25 00:00:00.0   PROD10  81
169392709       2012-08-12 00:00:00.0   PROD3   75
169392782       2012-07-28 00:00:00.0   PROD2   8
169392825       2011-03-14 00:00:00.0   PROD7   89
169392843       2010-10-31 00:00:00.0   PROD3   19
169392864       2011-05-19 00:00:00.0   PROD4   88
169392979       2012-05-11 00:00:00.0   PROD4   65
169393180       2011-05-02 00:00:00.0   PROD4   99
169393214       2011-10-27 00:00:00.0   PROD7   31
169393460       2012-07-27 00:00:00.0   PROD8   63
169393613       2011-03-03 00:00:00.0   PROD9   55
169393624       2010-04-24 00:00:00.0   PROD7   80
169393740       2011-08-17 00:00:00.0   PROD8   71
169394026       2012-06-07 00:00:00.0   PROD9   76
169394117       2012-02-29 00:00:00.0   PROD4   72
169394147       2011-12-23 00:00:00.0   PROD7   53
169394177       2011-01-07 00:00:00.0   PROD7   35
169394508       2012-05-24 00:00:00.0   PROD3   88
169394552       2011-07-16 00:00:00.0   PROD4   41
169394614       2010-08-17 00:00:00.0   PROD6   98
169394631       2010-09-23 00:00:00.0   PROD10  45
169394679       2011-01-22 00:00:00.0   PROD6   57
169394778       2011-09-03 00:00:00.0   PROD10  45
169394824       2011-06-04 00:00:00.0   PROD8   82
169394827       2010-07-14 00:00:00.0   PROD9   42
169394830       2012-03-09 00:00:00.0   PROD10  36
169394864       2010-09-17 00:00:00.0   PROD9   56
169394881       2011-07-01 00:00:00.0   PROD6   7
169395019       2011-11-17 00:00:00.0   PROD6   66
169395142       2012-01-21 00:00:00.0   PROD6   54
169395197       2012-08-10 00:00:00.0   PROD5   72
169395226       2010-09-20 00:00:00.0   PROD3   88
169395253       2011-12-31 00:00:00.0   PROD4   56
169395358       2010-07-16 00:00:00.0   PROD2   75
169395367       2010-12-16 00:00:00.0   PROD4   86
169395398       2012-01-07 00:00:00.0   PROD5   18
169395418       2011-05-08 00:00:00.0   PROD7   82
169395463       2011-08-23 00:00:00.0   PROD9   44
169395636       2011-01-16 00:00:00.0   PROD8   11
169395766       2012-06-05 00:00:00.0   PROD4   43
169395909       2011-12-10 00:00:00.0   PROD5   79
169395943       2012-05-11 00:00:00.0   PROD4   27
169395960       2012-01-17 00:00:00.0   PROD7   43
169396093       2011-08-28 00:00:00.0   PROD8   60
169396142       2010-11-13 00:00:00.0   PROD7   46
169396183       2011-06-16 00:00:00.0   PROD8   88
169396195       2010-10-06 00:00:00.0   PROD3   60
169396279       2012-06-18 00:00:00.0   PROD2   65
169396328       2011-05-14 00:00:00.0   PROD5   21
Time taken: 251.799 seconds, Fetched: 100 row(s)
hive> 

三.数据块抽样

1) tablesample(n percent) 根据hive表数据的大小按比例抽取数据,并保存到新的hive表中。如:抽取原hive表中10%的数据
(注意:测试过程中发现,select语句不能带where条件且不支持子查询,可通过新建中间表或使用随机抽样解决)
create table xxx_new as select * from xxx tablesample(10 percent)
2)tablesample(n M) 指定抽样数据的大小,单位为M。
3)tablesample(n rows) 指定抽样数据的行数,其中n代表每个map任务均取n行数据,map数量可通过hive表的简单查询语句确认(关键词:number of mappers: x)

代码:

create table sample_test1 as select * from ods_fact_sale tablesample(10000 rows);

测试记录:

hive> 
    > create table sample_test1 as select * from ods_fact_sale tablesample(10000 rows);
Query ID = root_20210106103549_9aaeea0b-6414-40ea-af0b-2942c80ad3a4
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
21/01/06 10:35:50 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0031, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0031/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1609141291605_0031
Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 0
2021-01-06 10:43:18,970 Stage-1 map = 0%,  reduce = 0%
2021-01-06 10:43:25,150 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 2.23 sec
2021-01-06 10:43:26,183 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 4.48 sec
2021-01-06 10:43:29,274 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 6.63 sec
2021-01-06 10:43:33,375 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 10.81 sec
2021-01-06 10:43:34,415 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 13.01 sec
2021-01-06 10:43:37,513 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 15.14 sec
2021-01-06 10:43:38,545 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 17.34 sec
2021-01-06 10:43:41,660 Stage-1 map = 9%,  reduce = 0%, Cumulative CPU 22.66 sec
2021-01-06 10:43:45,757 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 27.05 sec
2021-01-06 10:43:48,836 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 29.23 sec
2021-01-06 10:43:49,866 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU 31.3 sec
2021-01-06 10:43:53,953 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 36.41 sec
2021-01-06 10:43:57,029 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU 38.51 sec
2021-01-06 10:44:01,131 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU 42.67 sec
2021-01-06 10:44:02,159 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 44.8 sec
2021-01-06 10:44:05,239 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 47.68 sec
2021-01-06 10:44:06,263 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 49.82 sec
2021-01-06 10:44:09,337 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 51.91 sec
2021-01-06 10:44:10,363 Stage-1 map = 21%,  reduce = 0%, Cumulative CPU 54.01 sec
2021-01-06 10:44:14,485 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 58.3 sec
2021-01-06 10:44:17,602 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 60.69 sec
2021-01-06 10:44:18,629 Stage-1 map = 24%,  reduce = 0%, Cumulative CPU 62.8 sec
2021-01-06 10:44:20,695 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 62.8 sec
2021-01-06 10:44:21,722 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU 67.1 sec
2021-01-06 10:44:25,807 Stage-1 map = 27%,  reduce = 0%, Cumulative CPU 72.28 sec
2021-01-06 10:44:28,901 Stage-1 map = 28%,  reduce = 0%, Cumulative CPU 74.58 sec
2021-01-06 10:44:29,928 Stage-1 map = 29%,  reduce = 0%, Cumulative CPU 76.67 sec
2021-01-06 10:44:33,007 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU 78.79 sec
2021-01-06 10:44:34,028 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 80.96 sec
2021-01-06 10:44:37,102 Stage-1 map = 32%,  reduce = 0%, Cumulative CPU 83.07 sec
2021-01-06 10:44:41,245 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 87.27 sec
2021-01-06 10:44:42,273 Stage-1 map = 34%,  reduce = 0%, Cumulative CPU 89.43 sec
2021-01-06 10:44:45,358 Stage-1 map = 35%,  reduce = 0%, Cumulative CPU 91.67 sec
2021-01-06 10:44:46,384 Stage-1 map = 36%,  reduce = 0%, Cumulative CPU 93.74 sec
2021-01-06 10:44:49,455 Stage-1 map = 37%,  reduce = 0%, Cumulative CPU 95.87 sec
2021-01-06 10:44:50,475 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 97.96 sec
2021-01-06 10:44:54,573 Stage-1 map = 39%,  reduce = 0%, Cumulative CPU 102.2 sec
2021-01-06 10:44:57,641 Stage-1 map = 40%,  reduce = 0%, Cumulative CPU 104.33 sec
2021-01-06 10:44:58,664 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU 106.43 sec
2021-01-06 10:45:01,731 Stage-1 map = 42%,  reduce = 0%, Cumulative CPU 108.6 sec
2021-01-06 10:45:02,748 Stage-1 map = 43%,  reduce = 0%, Cumulative CPU 110.65 sec
2021-01-06 10:45:05,815 Stage-1 map = 44%,  reduce = 0%, Cumulative CPU 112.77 sec
2021-01-06 10:45:09,914 Stage-1 map = 45%,  reduce = 0%, Cumulative CPU 117.8 sec
2021-01-06 10:45:11,961 Stage-1 map = 46%,  reduce = 0%, Cumulative CPU 120.7 sec
2021-01-06 10:45:13,062 Stage-1 map = 47%,  reduce = 0%, Cumulative CPU 122.89 sec
2021-01-06 10:45:15,114 Stage-1 map = 48%,  reduce = 0%, Cumulative CPU 125.05 sec
2021-01-06 10:45:17,165 Stage-1 map = 49%,  reduce = 0%, Cumulative CPU 127.48 sec
2021-01-06 10:45:19,206 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 129.7 sec
2021-01-06 10:45:23,292 Stage-1 map = 51%,  reduce = 0%, Cumulative CPU 134.0 sec
2021-01-06 10:45:25,332 Stage-1 map = 52%,  reduce = 0%, Cumulative CPU 136.29 sec
2021-01-06 10:45:27,388 Stage-1 map = 53%,  reduce = 0%, Cumulative CPU 138.46 sec
2021-01-06 10:45:29,446 Stage-1 map = 54%,  reduce = 0%, Cumulative CPU 140.55 sec
2021-01-06 10:45:31,492 Stage-1 map = 55%,  reduce = 0%, Cumulative CPU 142.66 sec
2021-01-06 10:45:33,543 Stage-1 map = 56%,  reduce = 0%, Cumulative CPU 144.88 sec
2021-01-06 10:45:37,635 Stage-1 map = 57%,  reduce = 0%, Cumulative CPU 149.09 sec
2021-01-06 10:45:39,684 Stage-1 map = 58%,  reduce = 0%, Cumulative CPU 151.19 sec
2021-01-06 10:45:41,722 Stage-1 map = 59%,  reduce = 0%, Cumulative CPU 153.36 sec
2021-01-06 10:45:43,772 Stage-1 map = 60%,  reduce = 0%, Cumulative CPU 155.63 sec
2021-01-06 10:45:45,845 Stage-1 map = 61%,  reduce = 0%, Cumulative CPU 157.83 sec
2021-01-06 10:45:47,898 Stage-1 map = 62%,  reduce = 0%, Cumulative CPU 160.0 sec
2021-01-06 10:45:50,964 Stage-1 map = 63%,  reduce = 0%, Cumulative CPU 164.4 sec
2021-01-06 10:45:53,011 Stage-1 map = 64%,  reduce = 0%, Cumulative CPU 166.52 sec
2021-01-06 10:45:56,082 Stage-1 map = 65%,  reduce = 0%, Cumulative CPU 169.38 sec
2021-01-06 10:45:57,100 Stage-1 map = 66%,  reduce = 0%, Cumulative CPU 171.54 sec
2021-01-06 10:45:59,150 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 173.78 sec
2021-01-06 10:46:01,196 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 175.86 sec
2021-01-06 10:46:05,279 Stage-1 map = 69%,  reduce = 0%, Cumulative CPU 180.23 sec
2021-01-06 10:46:07,323 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 182.35 sec
2021-01-06 10:46:09,367 Stage-1 map = 71%,  reduce = 0%, Cumulative CPU 184.54 sec
2021-01-06 10:46:11,417 Stage-1 map = 72%,  reduce = 0%, Cumulative CPU 186.79 sec
2021-01-06 10:46:13,466 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 189.05 sec
2021-01-06 10:46:15,512 Stage-1 map = 74%,  reduce = 0%, Cumulative CPU 191.37 sec
2021-01-06 10:46:19,604 Stage-1 map = 75%,  reduce = 0%, Cumulative CPU 196.46 sec
2021-01-06 10:46:21,656 Stage-1 map = 76%,  reduce = 0%, Cumulative CPU 198.58 sec
2021-01-06 10:46:23,700 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU 200.71 sec
2021-01-06 10:46:25,743 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 202.83 sec
2021-01-06 10:46:27,790 Stage-1 map = 79%,  reduce = 0%, Cumulative CPU 205.01 sec
2021-01-06 10:46:31,884 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 210.06 sec
2021-01-06 10:46:33,933 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 212.22 sec
2021-01-06 10:46:36,001 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 215.35 sec
2021-01-06 10:46:38,047 Stage-1 map = 83%,  reduce = 0%, Cumulative CPU 217.44 sec
2021-01-06 10:46:40,097 Stage-1 map = 84%,  reduce = 0%, Cumulative CPU 219.61 sec
2021-01-06 10:46:42,146 Stage-1 map = 85%,  reduce = 0%, Cumulative CPU 221.86 sec
2021-01-06 10:46:45,215 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 226.88 sec
2021-01-06 10:46:47,259 Stage-1 map = 87%,  reduce = 0%, Cumulative CPU 229.08 sec
2021-01-06 10:46:50,350 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 231.42 sec
2021-01-06 10:46:51,376 Stage-1 map = 89%,  reduce = 0%, Cumulative CPU 233.67 sec
2021-01-06 10:46:53,421 Stage-1 map = 90%,  reduce = 0%, Cumulative CPU 235.9 sec
2021-01-06 10:46:55,456 Stage-1 map = 91%,  reduce = 0%, Cumulative CPU 238.13 sec
2021-01-06 10:46:59,543 Stage-1 map = 92%,  reduce = 0%, Cumulative CPU 242.35 sec
2021-01-06 10:47:01,588 Stage-1 map = 93%,  reduce = 0%, Cumulative CPU 244.55 sec
2021-01-06 10:47:03,636 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 246.69 sec
2021-01-06 10:47:05,701 Stage-1 map = 95%,  reduce = 0%, Cumulative CPU 248.95 sec
2021-01-06 10:47:07,755 Stage-1 map = 96%,  reduce = 0%, Cumulative CPU 251.08 sec
2021-01-06 10:47:09,798 Stage-1 map = 97%,  reduce = 0%, Cumulative CPU 253.23 sec
2021-01-06 10:47:13,877 Stage-1 map = 98%,  reduce = 0%, Cumulative CPU 257.48 sec
2021-01-06 10:47:15,930 Stage-1 map = 99%,  reduce = 0%, Cumulative CPU 259.56 sec
2021-01-06 10:47:17,973 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 262.3 sec
MapReduce Total cumulative CPU time: 4 minutes 22 seconds 300 msec
Ended Job = job_1609141291605_0031
Stage-4 is filtered out by condition resolver.
Stage-3 is selected by condition resolver.
Stage-5 is filtered out by condition resolver.
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
21/01/06 10:47:19 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0032, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0032/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1609141291605_0032
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2021-01-06 10:47:29,293 Stage-3 map = 0%,  reduce = 0%
2021-01-06 10:47:37,518 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 5.67 sec
MapReduce Total cumulative CPU time: 5 seconds 670 msec
Ended Job = job_1609141291605_0032
Moving data to directory hdfs://nameservice1/user/hive/warehouse/test.db/sample_test1
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 117   Cumulative CPU: 262.3 sec   HDFS Read: 61853025 HDFS Write: 47856547 HDFS EC Read: 0 SUCCESS
Stage-Stage-3: Map: 1   Cumulative CPU: 5.67 sec   HDFS Read: 47866656 HDFS Write: 47847187 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 4 minutes 27 seconds 970 msec
OK
Time taken: 709.589 seconds
hive> 
    > select count(*) from sample_test1;
Query ID = root_20210106105110_0c94562d-021f-45ac-bf4e-d0fa98dcf849
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
21/01/06 10:51:10 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0033, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0033/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1609141291605_0033
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2021-01-06 10:51:17,757 Stage-1 map = 0%,  reduce = 0%
2021-01-06 10:51:25,012 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.8 sec
2021-01-06 10:51:30,170 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.23 sec
MapReduce Total cumulative CPU time: 6 seconds 230 msec
Ended Job = job_1609141291605_0033
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.23 sec   HDFS Read: 47855329 HDFS Write: 107 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 230 msec
OK
1170000
Time taken: 20.625 seconds, Fetched: 1 row(s)
hive> 

参考

1.https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling
2.https://blog.csdn.net/baidu_20183817/article/details/84099049

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 212,332评论 6 493
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,508评论 3 385
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 157,812评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,607评论 1 284
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,728评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,919评论 1 290
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,071评论 3 410
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,802评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,256评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,576评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,712评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,389评论 4 332
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,032评论 3 316
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,798评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,026评论 1 266
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,473评论 2 360
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,606评论 2 350

推荐阅读更多精彩内容