数据仓库_数据粒度_数据分区

Granularity

Granularity refers to the level of detail or summarization of the units of data in the data warehouse.

The more detail there is, the lower the level of granularity. The less detail there is, the higher the level of granularity.

Granularity is the single most critical design issue in the data warehouse environment because it profoundly affects the volume of data that resides in the data warehouse and the type of query that can be answered.

The volume of data in a warehouse is traded off against the level of detail of a query.

In almost all cases, data comes into the data warehouse at too high a level of granularity. This means that the developer must spend a lot of design and development resources breaking the data apart before it can be stored in the data warehouse. Occasionally, though,data enters the warehouse at too low a level of granularity.

The Benefits of Granularity

The granular data found in the data warehouse is the key to reusability, because it can be used by many people in different ways.

Looking at the data in different ways is only one advantage of having a solid foundation. A related benefit is the ability to reconcile data, if needed.

Another related benefit of a low level of granularity is flexibility

Another benefit of granular data is that it contains a history of activities and events across the corporation.

Determining the level of granularity is the most important design issue in the data warehouse environment

An Example of Granularity

There is, then, a very good case for the compaction of data in a data warehouse. When data is compacted, significant savings can be realized in the amount of DASD used, the number of index entries required, and the processor resources required to manipulate data.

Put another way, with a very low level of granularity, you can answer practically any query. But a high level of granularity limits the number of questions that the data can handle.

Another consideration in designing granularity is determining which architectural entities will feed off the data warehouse.Each DSS architectural entity has its own unique considerations. The data warehouse must be designed to

feed the lowest level of granularity needed by any architectural entity.

Dual Levels of Granularity

Most of the time, there is a great need for efficiency in storing and accessing data, and for the ability to analyze data in great detail.

The data warehouse in this example contains two types of data—lightly summarized data and “true archival” detail data.

Light summarization data is detailed data that has been summarized only to a very small extent. For example, phone call information may be summarized by the hour. Or, bank checking information may be summarized by the day.

There is a significantly less volume of data in the lightly summarized database than there is in the detailed database.Of course, there is a limit to the level of detail that can be accessed in the lightly summarized database.

At the true archival level of data, all the detail coming from the operational environment is stored. There is truly a multitude of data at this level. For that reason, it makes sense to store the data on a medium such as magnetic tape or another bulk storage medium because the volume of data is so large.

If a pattern of searching the true archival level of data develops over time, the designer may want to create some new fields of data at the lightly summarized level, so that most of the processing can occur there.

Living Sample Database

The greatest asset of a living sample database is that it is very efficient to

access. Because its size is a fraction of the larger database from which it was

derived, it is correspondingly much more efficient to access and analyze.

If very high degrees of accuracy are desired, a useful technique is to formulate the request and go through the iterative processing on the living sample database. In doing so, the DSS analyst quickly formulates the request. Then, after several iterations of analysis have been done, when the request is understood, it is run one final time against the large database.

Partitioning as a Design Approach

A second major design issue of data in the warehouse (after granularity) is partitioning

Partitioning of data refers to the break up of data into separate physical units that can be handled independently.

It is often said that if both granularity and partitioning are done properly, then almost all other aspects of the data warehouse design and implementation come easily.

Data is partitioned when data of a like structure is divided into more than one physical unit of data. In addition, any given unit of data belongs to one and only one partition.

The choices for partitioning data are strictly up to the developer. In the data warehouse environment, however, it is almost mandatory that one of the criteria for partitioning be by date.

As a rule, it makes sense to partition data warehouse data at the application level.

The acid test for the partitioning of data is to ask the question, “Can an index be added to a partition with no discernible interruption to other operations?” If an index can be added at will, then the partition is fine enough. If an index cannot be added easily, then the partition needs to be broken down more finely.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,496评论 6 501
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,407评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 162,632评论 0 353
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,180评论 1 292
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,198评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,165评论 1 299
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,052评论 3 418
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,910评论 0 274
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,324评论 1 310
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,542评论 2 332
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,711评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,424评论 5 343
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,017评论 3 326
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,668评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,823评论 1 269
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,722评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,611评论 2 353

推荐阅读更多精彩内容

  • 继续~ 1)感恩我的触觉让我的双脚感受到床单的软软,毛绒绒的,暖暖的~心里好舒服!谢谢!谢谢!谢谢! 2)感恩今天...
    不器zlj阅读 184评论 0 0
  • 1 早餐过后,我问母亲:“今天去干啥?” 母亲说:“咱去够红柿吧?” 一听是够红柿,我立马来了精神,可我知...
    路雨飞飞阅读 531评论 8 5
  • 小面包这几天晚上终于睡觉比较好了 一觉能睡上3,4个小时 我再也不用从晚上10,11点抱到凌晨2,3点或者4,5点了
    小面包妈妈阅读 149评论 0 0
  • 前言 你是否常常缺乏幸福感,哪怕是拥有令人羡慕的美貌、身材?或者财力、权力?或者美满的家庭、爱自己的老公、学习优异...
    恩煦阅读 269评论 0 0