Echo Kanak

Data Warehousing, Business Intelligence, and Dimensional Modeling Primer

Apr 2, 2024

The book for today is The Data Warehouse Toolkit by Ralph Kimball. So for this one I am only going to keep notes for the first two chapters for now.

Overview

  • First and foremost the DW/BI system must consider the needs of the business.
  • With business needs firmly in hand, we work backwards through the logical and then physical designs along with decisions about technology and tools.

Different worlds of Data Capture and Data Analysis

One of the most important assets of any organization is its information. This asset is used for 2 purposes :

  • Operational record keeping
  • Analytical decision making

Operational systems are where you put the data in.

DW/BI system is where you get the data out.

  • Operational systems are optimized to process transactions quickly and typically don’t maintain history but rather update data to reflect the most current state
  • DW/BI systems are optimized for high-performance queries and typically demand that historical context be preserved to accurately evaluate the organization’s performance over time

Goals of Data Warehousing and Business Intelligence

  1. DW/BI system must make information easily accessible : Contents must be understandable and the data intuitive and obvious to the business user. Simple and fast
  1. DW/BI system must present information consistently : The data must be credible. Carefully assembled, cleansed, quality assured and released only when fit for user consumption
  1. DW/BI system must adapt to change : designed to handle inevitable change gracefully so that existing data and application are not disrupted
  1. DW/BI must present information in timely way
  1. DW/BI must be secure bastion that protects information
  1. DW/BI system must serve as the authoritative and trustworthy foundation for improved decision making : DW must have the right data to support decision making.

Dimensional Modeling Introduction

Dimensional modeling is widely accepted as the preferred technique for presenting analytic data because it addresses two simultaneous requirements:

■ Deliver data that’s understandable to the business users.

■ Deliver fast query performance.

3NF and Dimensional Modeling

  • Dimensional modeling makes databases simple
  • Dimensional models are different from normal form 3NF models which seek to remove data redundancies
  • Normalized 3NF structures divide data into many discrete entities, each of which becomes a relational table.
  • 3NF models are sometimes referred as ER models. ER diagrams communicate relationship between tables
  • Both 3NF and dimensional models can be represented in ERDs because both consist of joined relational tables; the key difference between 3NF and dimensional models is the degree of normalization.
  • Normalized 3NF structures are immensely useful in operational processing because an update or insert transaction touches the database in only one place.
  • Normalized models, however, are too complicated for BI queries
  • most RDBMS can’t efficiently query a normalized model; the complexity of users’ unpredictable queries overwhelms the database optimizers, resulting in disastrous query performance.
  • Dimensional modeling addresses the problem of overly complex schemas in the presentation area

🐲
A dimensional model contains the same information as a normalized model, but packages the data in a format that delivers user understandability, query performance, and resilience to change.

Star Schemas versus OLAP Cubes

  • If your DW/BI environment includes either star schemas or OLAP cubes it leverages dimensional concepts.
  • Both stars and cubes have a common logical design with recognizable dimensions however the implementation differs.

OLAP

  • Data loaded into OLAP cube is stored and indexed using techniques designed for dimensional data.
  • Performance aggregations/ precalculated summary tables often created and managed by OLAP cube engine.
  • Cubes deliver superior query performance because of the precalculations, indexing strategies and other optimizations.
    • Downside being we pay a load performance price for these capabilities especially with large data sets.
  • OLAP cubes provide more analytically robust functions that exceed those available with SQL.
  • Although capabilities of OLAP cubes are continuously improving, it is recommended that detailed, atomic information be loaded into a star schema, optional OLAP cubes are then populated from star schema.

OLAP Deployment Considerations
🌟
Dimensional models implemented in relational database management systems are referred to as star schemas because of their resemblance to a star-like structure.
🌟
Dimensional models implemented in multidimensional database environments are referred to as online analytical processing (OLAP) cubes

Fact Tables for Measurements

🗨️
Fact Table in a dimensional model stores the performance measurements resulting from an organization’s business process events.
The term fact represents a business measure.

Each row in a fact table corresponds to a measurement event. The data on each row is at a specific level of detail referred as the grain such as one row per product sold on a sales transaction.

  • The most useful facts are numeric and additive such as dollars sales amount.
  • Additivity is crucial as BI applications rarely retrieve a single fact table row, rather they bring 100’s, thousands or millions of fact table rows. And the most useful thing to do with these rows is add them up.
  • Facts are sometimes semi-additive or non-additive.
  • Semi additive facts such as account balances can’t be summed across the time dimension
  • Non-additive facts such as unit prices can never be added

Tags