BI GuideBook(下)

九.Dimensional modeling

.The chapter covers dimensional modeling concepts such as date, time, role-playing, and degenerative dimensions, as well as event and consolidated fact tables..

1.Dimensional modeling

Dimensional modeling is a logical design technique. which is much better suited for business intelligence (BI) applications and data warehousing (DW). It depicts business processes throughout an enterprise and organizes that data and its structure in a logical way. The purpose of dimensional modeling is to enable BI reporting, query, and analysis.

2.Facts

1）A fact is a measurement of a business activity, such as a business event or transaction, and is generally numeric. Examples of facts are sales, expenses, and inventory levels.

2）3 types of fact:Additive Facts;Semiadditive Facts;Nonadditive Facts.

3.Dimensions hierarchies and keys

A dimension is an entity that establishes the business context for the measures (facts) used by an enterprise. Dimensions define the who, what, where, and why of the dimensional model, and group similar attributes into a category or subject area. Examples of dimensions are product, geography, customers,

4.Schemas

types of schemas:star schema,snowflake schema,multidimensional schema

5.ER vs. dimensional modeling

ER and dimensional models both depict the same business processes, but the ERmodel has significantly more entities and relationships than the dimensional model.while In the dimensional model, the dimensions are denormalized and the facts are normalized

6.Purpose of dimensional

The models can help generate clear, effective reporting and analysis that will ultimately help business people view the data and make informed decisions for the enterprise. Dimensional modeling is well-suited to the generation of exactly the kind of information business people need to see. Common business uses for these models are displaying the facts (measures) and dimensions in BI reports, OLAP analyses, or spreadsheets.

7.Fact tables

1）three types of fact tables in dimensional modeling:

·Transaction fact tables record a business event or transaction, one record at a time, with all the data associated with the event,

·Periodic fact tables record snapshots of data for specific time, such as inventory levels at the end of a quarter or account balances at the end of the month

·Accumulating fact tables store a record for the entire lifetime of the event, showing the activity as it progresses.

8.Consistency and conforming

Data consistency leads us to one of the key differences between operational and DW/BI systems: that the operational system only has to keep data consistent within itself

Conforming in this context also means standardizing—defining standard definitions for facts, measures, or key performance indicators (KPIs).

十.BI Dimensional Modeling

This chaptergives you a strong understanding of how to develop a dimensional model and how dimensional modeling fits in your enterprise.

1.Hierarchies (balanced, ragged, unbalanced and variable-depth)

1）From a data modeling perspective, hierarchies are cascading series of many-to-one relationships at different levels. Each level corresponds to a dimension attribute. The hierarchies document relationships across levels. In an entity relationship (ER) model, each of these attributes would be an entity unto itself.

2）Dimensional models have three types of hierarchies: balanced, ragged, and unbalanced.

2.Outrigger tables

Outriggers are used when a dimension is referenced within another dimension. When a dimension is being designed, often all the attributes that are related to each other are gathered into that dimension. Further inspection reveals groups of attributes that are populated by different sources, at different times and for different purposes.And outrigger eliminates the need to redundantly store and update dimensional attributes in multiple locations

3. Slowly changing dimensions

A business’s operational and transaction processing applications keep dimensional tables updated to the most current values to support ongoing business processes. Operational applications do not track changes to dimensional values because transactions only need the most current values

4.Causal dimensions

Facts have obvious attributes such as product, customer, geography, and time. This is the “who, what, where” of the fact. But facts also have a “why” component. This is the causal dimension.

5.Multivalued dimensions

When you start to encounter more complicated data situations that require advanced data modeling techniques, you may need to use a multivalued dimension situation. These kinds of situations arise when a fact measurement is associated with multiple values in a dimensional table

6.Junk dimensions

Junk dimensions are a way to handle the multitude of low-cardinality transactions, system codes, flags, identifiers, and text attributes by putting them into one table so you can better manage them.

7. Value band reporting

You group the facts by a particular dimension or fact measurement. When grouping by a dimension, you may have a hierarchy, which is a natural group. When grouping a product that has a subcategory and a category together, you could aggregate on each of those levels

8. Heterogeneous products

Heterogeneous products and services can present some challenges with dimensional models. When you have diverse sets of products and customers, you can simply have multiple facts and dimensions

9.Hot swappable and custom dimension groups

Hot swappable dimensions are switched at query-time based on business needs. They got their name because someone thought it was analogous to hot swappable hardware components that are switched in real time.

十一.Data Integration Design and Development

The chapter covers the DI design steps for creating each stage’s process model and deter mining the design specifications.

1.Data integration design and development

Data integration combines data from different sources and brings it together to ultimately provide a unified view.

2.Data integration architecture

The data integration architecture represents the workflow of data from multiple systems of record(SOR) through a series of transformations used to create consistent, conformed, comprehensive, clean,and current information for business analysis and decision making

3.Prerequisites and requirements for data integration

1)The prerequiste:

• Data architecture

• Data requirements

• Source systems analysis

• Data models for each target data schema

2)busines requirements

Data velocity or latency,Tracking changes to data,Historical and archiving requirements,Security and Privacy.Regulatory compliance,Regulatory compliance:

3) data sourcing system requirements:Extraction strategy for each source system,Systems availability,Staging areas,Database strategies (assuming relational)

4.Data integration design

Data integration design starts at the first stage that gathers data from the SORs and proceeds incrementally until you design the final stages that load the data marts. Each stage can be designed as its prerequisites (data architecture, data requirements, and data models for the sources and targets) become available

5.Data integration workflow

1）The source to target mappings and data integration process models are the building blocks of the overall data integration architecture. The data integration team needs to design a data integration architecture with a workflow that controls all the data integration processes used throughout the enterprise

2） things to watch for workflows:First, you have to synchronize source systems across an enterprise.Second, the dependencies in the dimensional model between different table types for related data may also necessitate the design of multiple workflows

6.Project and DI development standards.

1）One of the key drivers to developing a cost- and resource-effective data integration architecture is creating and using a broad set of design, development, and deployment standards. These standards include principles, procedures, use cases, templates, and reusable code that is applied to specific combinations

2)The minimum set of standards that should be agreed upon for the data integration process

·Assume all dimension tables have surrogate keys that function as their primary keys.

·Decide whether fact tables will use surrogate keys as their primary key.

·Decide if surrogate keys are built using a database function

·Decide what data type will be used for surrogate keys

·Implement a SOR dimension listing the SORs used in data integration processes

·Retain the primary key from the SOR as a natural key in the BI data store tables and point a foreign key to the SOR dimension if more than one SOR is used to populate the tables

7.Reusable components

“Write once, use many times” is the driving principle for designing and deploying a common set of reusable data integration components.

8.Historical data

When the SOR schema or business rules have changed in the past or when historical data has data quality problems, then you need to create a historical data integration process that operates independently from the ongoing, incremental process.

9.Prototyping

Prototyping is a best practice approach toward enabling this discovery process.

The prototyping tasks include:

• Defining the scope of the prototype:

• Establish the prototyping environment:

• Create a data integration component prototype:

• Perform tests:

• Review results:

• Revise as necessary:

• Prototype signoff:

10.Testing

Data integration testing is another set of tasks that should be performed iteratively and incrementally, Testing complements prototyping in the requirements discovery process by identifying differences in source system data and business rules.Test each data integration component to ensure it gathered source data completely and correctly,then proceed to transforming that data and loading it into its targets.

十二.Data Integration Processes

This chapter ,cover why it is best to use a DI tool (as opposed to hand-coding) and how to choose the best fit.

1.Manual coding versus data integration tools

Data integration (DI) activities are the most time- and resource-consuming portion of building an enterprise business intelligence (BI) environment.

Manual coding are prevalent in small- to mediumsize businesses (SMB), but even large enterprises that adopted data integration tools as their best practice to load their enterprise data warehouses typically resort to manually coded extracts to feed their BI data stores such as data marts, online analytical processing (OLAP) cubes, reporting databases, and spreadsheets

2.Data integration services

The first generation of ETL tools, were simple code generators that were expensive and had limited functionality. Many companies that evaluated them found it more effective to develop their own custom integration code.

The second-generation ETL products, offered more functionality, particularly in managing operational processes and software development. These tools, however, were still primarily batch oriented, required extensive skills to use, and needed an additional infrastructure investment to perform well

3.Access and delivery (extract and load)

1)The purpose of access services : gather data from one or more sources so that it can then be consumed by other data integration services with the delivery services and then loading that data to its targets

2）The common functions:

·Adopters and connectors

• Sorting utilities

• Bulk import and export loading

• Custom data loaders

4.Data ingestion

After access services gather data, data ingestion services update the targets with the latest data based on business rules for tracking data as it is created and modified. The processes involved are change data capture (CDC), slowly changing dimensions (SCD), and reference lookups

5.Data profiling

Data profiling, often called source system analysis, is a process of examining the structure and content of data sources.

To profile data effectively, you have to understand the source system data—how it is defined, its condition, and how it is stored and formatted.

6.Data transformation

The data transformation services build and populate the schema—tables, columns, and relationships—of each of these data stores.

The key data transformation services are:

• Data conversion

• Business transformation—rules and metrics

• Data structure transformation

7.Data quality

Data quality is the result of errors in data entry or data acquisition

Data entry or data acquisition in operational (source) systems is often blamed for data quality problems.

8. Process management

•The process management services are the command-and-control center for the data integration jobs.

Depending on what data integration product is being used, jobs may also be referred to as sessions,

workflows, flows, job sequences, or packages. These services include:

• Job control—manages the workflow of data integration jobs.

• Audit—records processing metadata of the data integration jobs.

• Error handling—processes errors according to predefined rules and records error events.

9.Operations management

You can use operation management services to monitor and manage the design, development, and running of data integration services. These services need to make the data integration environment efficient in terms of utilizing resources and effective in terms of achieving business and technical objectives

The operational management services include:

scheduling;restart and recovery Security and privacy Security and privacy;Software management system;Repository management;Operations reporting.

10.Data transport

These data integration vendors have integrated a wide variety of integration technologies into their tools. The integration technologies we have labeled as data transport services enable data integration to be performed across a variety of integration platforms.

This list of data transport services includes:

• ETL and ELT

• Enterprise application integration including ESB, SOA, and Web (or data) services

• Enterprise message services including XML and JSON

• EII including data virtualization and data federatio

十三.business intelligence applications

This chaptertalks about specifying the content of the BI application you are building, and the idea of using personas to make sure it resonates with your intended audience

1.Creating BI content specifications

The project team needs to develop BI content specifications in order to design and document the BI applications they are going to build.

2. Choosing the BI application

Create the initial list of BI applications by reviewing the data and business requirements gathered earlier in the project. The BI content specification process will help you create a revised and agreed-upon list of BI application deliverables.

The revisions will evolve during the following steps:

• Assessing scope

• Consolidating BI applications

• Reviewing specifications

• Creating an agreed-upon BI deliverables list

3.Creating personas

1）there is a wide variety of BI analytical styles available to implement BI applications. The key to success, just like in marketing, is to target the BI application to the correct customer segmentation. We refer to the characteristics of a segmentation of business people as a persona

2)The business personas are:

• Casual consumers

• Analysts

• Power users

• Data scientists

4.Best practices for BI design layout

1）BI is an enabling technology. It is the business person who performs the analysis, gains insights, and takes actions based on those insights

2)Design elements that are standardized in this example include:

• Filters located on the upper left corner of the display.

• Comparison and contribution analysis presented in two left-hand quadrants. Data is aggregated

and drillable when there is a dimensional hierarchy.

• Trending analysis presented as a line chart in the upper right quadrant.

• Sales data presented in tabular form in the lower right quadrant.

5.Data design for self-service BI

BI designers need to be concerned with the analytical functionality and the visual layout of their BI applications. Both of these aspects are adequately being addressed as BI vendors enhance their products with ever-increasing functionality and evolving best practices to design effective BI application layouts.

6.Matching visualization and analysis types

When designing the BI application, keep in mind the type of analysis that the business person needs it to perform. This will help you choose the best type of chart to optimize visualization. The most common types of analysis performed include:

• Comparative

• Time-series or trending

• Contribution

• Correlation

• Geographic data

• Distribution

十四.BI Design and Development

THis chapter includes creating and adhering to standards for the user interface and standards for how information is accessed from the perspectives of privacy and security.

1.BI design

The business intelligence (BI) content specifications document all the data processes in the BI application—access, integration, transformation, and analysis. The next step is to design the BI application’s visual layout and how interacts with its users.

2.BI user interface

Websites are designed with UI standards that make the layout, navigation, and functionality consistent across all pages. This consistency makes us more productive; we can immediately use the Website for its intended purpose without having to figure out how it works

3. Privacy, security, access standards

Although privacy, security, and access are often intermixed, it is import to examine and establish standards for each.

First, establish standards for data that should not be disseminated because of regulations or a codeof conduct

Second, create security standards on how to determine what data needs to be secured and how that

will be implemented in the BI applications.

Finally, with privacy and security standards in place, a BI team can work with the business to define

access standards.

4.Design methods

1)sketches:Most visual designs will start off as a sketch as depicted in Figure 14.2. Using a whiteboard or paper is a quick, easy, and inexpensive way to draw the initial sketches and begin discussion

2）wireframes::Designing applications using wireframes is a technique that is used extensively in Web design.wireframe is a much more detailed representation of the visual designthan a sketch;

3)Storyboards:The storyboarding technique is used extensively in the film industry to design both live-action and animated movies

4):Mock-ups:Mock-ups are the medium- to high-fidelity representations of the BI application’s layout.

5.Prototyping lifecycle

1)two primary objectives for BI prototypes:

• Obtain feedback from its intended users.

• Enable developers to build BI applications in an incremental manner.

2）success factors in the BI prototyping efforts:

Dedicate time and effort.

Work incrementally.

Time-box it

3）BI prototyping lifecycle consists of the following tasks and feedbackloops:

Define the prototype scope.

Create BI application prototype.

Conduct unit tests.

Users test prototype.

Gather user feedback.

Revise as necessary.

Prototype signoff.

6.Application development tasks

1）BI prototyping is used for one or more of the following reasons:

· Validate, expand, or gather business, data, and technical requirements

·Determine feasibility of specific functionality

·Examine effectiveness of components built

·Engage business people to work with and provide feedback

2）Tasks:Create Data Content;Create Data Chart or Visualizations for Analysis;Create Overall BI Application

7.BI application testing

tasks need to be performed

·Developer unit tests

·User unit tests.

• Developerintegration tests.

·UAT System and performance testing.

十五.Advanced Analytics

This chapter shows how you can use analytics not just to learn about what has happened, but also to gauge the future and act on predictions.

1.Advanced analytics definition

Enterprises are deluged with data about their customers, prospects, business processes, suppliers, partners, and competitors. They come from traditional internal business systems, cloud applications, social networking, and mobile communications. With the flood of new data and analytical techniques comes the opportunity for business people to perform new types of analyses to gain greater insight into their business and customers.

2.Predictive analytics & data mining

Predictive analytics and data mining are the processes by which you perform advanced analytics for forecasting and modeling. While BI has become widely used, even to the point of being pervasive in many organizations, far fewer organizations use advanced analytics tools.

3.Analytical sandbox

Some of these special needs can be anticipated and built into a data mart, often called an exploratory data mart or OLAP cube, but even with this there is a significant data gap.An exploratory data mart may be terrific, but IT also needs to create an analytical sandbox environment enabling business users to be able to add data and derive metrics.

4Analytical hub

The analytical sandbox and hub have similarities, but the hub is a more advanced architecture created for more advanced users and uses

5.Big Data analytics

When working with Big Data, companies need to understand what it is, how to manage it effectively,and how to get a business return on investment from their investments to leverage it.

6.Data visualization

Data visualization presents information pictorially, often graphically, helping communicate it in a more intuitive way. Using data visualization techniques to mix and match data can help your business group view it in new ways and ultimately glean more insights from it

十六.Data Shadow Systems

This chapter sheds light on these frequently-seen departmental systems(usually spreadsheets) that business groups create and use to gather and analyze data on their own when they do not want to work with IT or cannot wait for them.

1.Identifying and triaging data shadow systems

Clues that business groups are using data shadow systems are that people are finding out that numbers are inconsistent, there is a lot of debate about the differences in these numbers during meetings, and business analysts are spending time gathering and reconciling data.

2)3 types of data shadow systems:One-off reports (mild),Ad hoc analysis (moderate),Full-fledged analytical application (serious or very serious)

3)data triag:Not all data shadow systems are created equally, so handle them differently based on scope, use, and risks

2.The evolution of data shadow systems

1)filling in a gap

2)as they grow it is harder to turn back

3) it group response to data shadow systems

3.The damages caused by data shadow systems

The problems associated with shadowsystems include:

Inconsistent data across the enterprise

Lost productivity due to “analyst time sink”

Lost productivity due to reconciliation

Data error #1 (import) Data error #2 (calculations) Data error #3 (data sources change) Data error #4 (stale data) Limited (or no) scalability Increased risk Lack of discipline No audit trail No documentation

4.The benefits of data shadow systems

The advantages include:business knowledge,Responsive,Fast and flexible,Fills in IT gaps,Fills in tool gaps,Accessible and inexpensive,Familiar,Effective

5.Moving beyond data shadow systems

moving beyond data shadow systems is a

process that requires communication, accountability, and compromise.

6.Renovating data shadow systems

The key to renovating a data shadow system is to split it into its data integration and analytical processes, and then work to understand each.

十七.People, Process, and Politics

This chapter lays out the relationship between the business and IT groups, discussing who does what, how they interact, and how to build the project management anddevelopment teams.

1.The technology trap

the problem with our comfort level with products is that often the critical success factors of a solution lie with the other three P’s: people, policies, and politics, These are three things that are a lot harder to predict and control than technology. Technology will do whatever people make it do; however, people, the processes they create, and the politics that drive them can be weak links in a project if they’re not understood or handled properly.

2.The business and IT relationship

As with any relationship, the interactions between IT and business are vulnerable to misinterpretations, poor communication, and disagreements that can bog down a BI project.

3. Roles and responsibilities for IT and the business group

When it comes to business and IT groups, each has its main role, and then there are areas where the roles can overlap. For the most part, IT’s role is to create the infrastructure, integrate the data, and give business people what they need to perform analytics. IT handles everything in the “back office.” The business group handles the “front office,” that is, using the data IT provides to do business analysis.

4.Building the BI team

A successful BI project team is like a four-legged table—each leg holds up its share of the weight.

Remove one and the project wobbles. The four legs of a team are:

• Project sponsorship and governance

• Project management

• Development team (core team)

• Extended project team

5.Project sponsorship and governance

IT and the business should work together to sponsor and govern design, development, deployment, and ongoing support. Although business sponsors obviously need to come up with the money, they also have to commit business resources to work with the BI team throughout the project.If there is just one BI project on the table, a steering committee will usually suffice. If there are several projects, additional committees may be required

6. Building the project management team

The project management team needs extensive business knowledge, BI expertise, DW architecture

background, and people management, project management, and communications skills. The project

management team leadership includes three functions or members:

• Project development manager

• Business advisor

• BI/DW project advisor

7. Building the project development team

The project development team performs four core functions (Figure 17.3) and is typically split into

corresponding sub-teams:

·Business Analysis

• Architecture

• Data integration (DI) Development

• BI Application Development

8. Training types and methods for IT and business people

1）One of the biggest mistakes project teams make is in refusing to acknowledge the need for training.

This is just one of the ways companies undermine their own efforts to take BI to its full potential

2）types of training:

endor-agnostic foundational training,

Tool-specific training,

Training with use cases

9.Data governance

More and more companies are recognizing that they’re accumulating ever-increasing amounts of data,but not necessarily gaining business insights from it. The missing link is the transformation of data into information that is comprehensive, consistent, correct, and current. That is a problem technology cannot solve for you; it requires people.

The answer is establishing a data governance program to help your organization truly treat its data as a corporate asset and maximize its value.

十八.Project management

This chapter stresses the need for an enterprise-wide BI program, which helps the BI project manager better plan and manage

1.Establishing a BI program

Enterprises are managed strategically by their senior management and operationally by business groups or departments such as marketing, sales, finance, and human resources. This division of labor enables specialization and fosters improved productivity

2.BI assessments

An assessment helps answer these questions and keep a project on track. It can be done either internally or by an outside consultant.

The end result of the BI assessment is a BI roadmap or program to achieve your business and technical objectives in a manner that is as cost and resource efficient as possible

3.Work breakdown structure (WBS)

There are often many unknowns or high-level-only requirements at the beginning of the project, making it difficult to create a detailed project schedule.

4.Project methodologies

A systems development life cycle (SDLC), sometimes called an application development methodology, is the term used by software engineers and IT professionals to describe the process used to plan, design, develop, test, and deploy software applications.

5.Project phases

At a high level, a BI project shares the same phases as a typical IT application development project. The project phases are:

·Scope and plan

·Analyze and define

·Architect and design

· Build, test and refine

·Implement

·Deploy and roll-out

6.Project schedule

These phases represent the full life cycle of a project from planning through deployment. An effective approach to creating a BI project schedule is to decompose the BI Project WBS (Figure 18.14) into greater levels of detail until it is sufficient to schedule tasks, assign resources and track progress.

十九. Centers of Excellence

This chapter discusses these organizational teams that address the problem of disconnected application and data silos across an enterprise.

1. Need for BI and DI COEs

The reasons why businesses need BI COEs are deeply ingrained in the way they operate and are organized. These factors are never going to change, so it’s almost inevitable that a BI COE is going to be needed.

2.Business case for COEs

The benefits of a BI COE are:

• Increased BI business value and ROI

• Enterprise-wide BI strategy and road map is agreed upon

• Improved business and IT productivity

• Leveraged BI skills, knowledge, and experiences across enterprise and over time

3.Skills needed

The BI COE needs people with a variety of skills: technical, business, and analytic. Ideally, you’ll have people with combinations of these skills. It is important to have business and IT people working in the COE.

Business skills—Some members of the team must understand what data the business groups, such as HR or finance, need and how they will use it in their jobs

Analytic skills—People skilled with analytics are becoming more essential in every enterprise,and especially in the BI COE. These people need to understand business issues and the data required to address those issues

4.Organizational models

The four organizational models most often used to implement DI COEs are:

• Best practices

• Technology standards

• Shared services

• Centralized services