Introduction

Welcome to the Dimensional Modelling Docs — a shared, practical source of truth for building dimensional (Kimball-style) data warehouses and marts. It collects the concepts, definitions, and reference models we rely on when designing facts and dimensions, written to be useful both to the analytics engineers on the team and to the LLMs that assist them.

Why dimensional modeling

Source systems like SAP are built to run the business — thousands of normalised tables optimised for fast, safe transactions. They are not built to analyse the business. Asking “what did we sell, to whom, by product line, this quarter?” against raw source tables means joining dozens of tables, decoding cryptic keys, and reconciling inconsistent values — slow to write, slow to run, and easy to get wrong.

Dimensional modeling reorganises that same data around how the business actually asks questions. Facts hold the measurements (amounts, quantities); dimensions hold the descriptive context (customer, product, date) you slice and group by. The result is a model that is:

Understandable — business users recognise the structure without a data dictionary.
Fast — star schemas are optimised for the read-heavy, aggregate queries that reporting needs.
Consistent — conformed dimensions mean “revenue by business unit” reconciles across every report.

In short: we do dimensional modeling so the business can answer its own questions quickly, correctly, and in the same language everywhere.

Why this guide

There are many excellent Kimball books and tutorials, but we kept hitting the same gaps. They were:

Not written for LLMs — hard to feed to an assistant as ground truth.
Light on concrete examples — strong on theory, thin on the exact SQL and edge cases you actually meet in production.

So this guide is deliberately pragmatic: short explanations backed by real examples we have encountered while developing facts and dimensions in production.

Who is this for

This documentation is designed to be accessible and valuable across different roles and experience levels:

Data & Analytics Engineers (Junior to Senior): To learn the technical standards, transformation patterns, and key strategies (such as hash hybrids and SCD2) required to build robust pipelines.
Data Analysts: To understand how our data marts are structured, how to query facts and dimensions correctly, and how to utilize business-facing views.
Business Stakeholders: To grasp the core concepts of our data models and align on shared terminology (like grain, dimensions, and facts) when defining data requirements.

How this fits with the Playbook & Templates

This guide does not exist in isolation. It acts as the conceptual bridge between our high-level processes and our low-level code templates. Here is how you can navigate the ecosystem:

The Playbook: Covers the “how-to” of our delivery process, CI/CD, and overall architecture.
This Guide: Covers the “what” and “why” of our dimensional models (Core Concepts, Transformations, Patterns, and Conventions).
Templates & Reference Catalogues: For practical, everyday implementation, refer to our Dimensions Template and Facts Template. They provide concrete examples based directly on the rules defined in this guide.

Dimension

A dimension is a table that provides the descriptive context around the measurable events stored in a fact table. Where facts answer “how much?” or “how many?”, dimensions answer “who, what, where, when, and why?” — they hold the attributes analysts use to filter, group, and label results. Together with facts they form the star schema. See Dimension Tables (Kimball Group) for the original definition.

What is a dimension

A dimension table describes a single business entity — a customer, a product, a sales territory, a date. Each row is one member of that entity, and the columns are the textual or low-cardinality attributes that describe it.

Wide and shallow. Dimensions typically have many columns but relatively few rows compared to facts. Denormalize attributes into the dimension rather than snowflaking them into sub-tables.
Attributes are the filters and labels. Anything you would group by or put on a report axis belongs in a dimension (e.g. category, region, segment).
One row per member version. With history tracking, a single business entity can occupy several rows over time — see Slowly Changing Dimension.
No fact-to-fact context here. Keep numeric, additive measures in the fact table; the dimension holds context, not metrics.

erDiagram
    dim_customer ||--o{ fact_invoice_lines : "customer_sk"
    dim_customer {
        bigint customer_sk PK "surrogate key"
        string customer_id "natural key"
        string customer_name "attribute"
        string segment "attribute"
        string region "attribute"
    }

Natural Keys

The natural key is the identifier the entity already carries in the source system — a customer_id from the CRM, an SKU from the product catalogue, an order number from the OLTP database.

Carries business meaning and may be reused, recycled, or reformatted by the source.
Can be composite (several columns) and can change format over time.
Stored on the dimension (often suffixed _nk) so a row can be traced back to its source record.
Not used as the primary key of the dimension: a single natural key can map to many historical versions of a row once you track history.

customer_id = "CUST-00417"

Surrogate Keys

The surrogate key is a meaningless, warehouse-generated integer that serves as the primary key of the dimension. Every row — including every historical version of an entity — gets its own surrogate key, and fact tables join to the dimension through it.

Typically a monotonically increasing BIGINT or identity column; carries no business meaning.
Decouples the warehouse from source-key changes and enables Type 2 history.
Reserve special values (e.g. -1 Unknown, -2 Not applicable) so fact foreign keys never have to be NULL.

Convention: name surrogate keys <entity>_sk, e.g. customer_sk, product_sk, date_sk.

customer_sk = 1048576   -- primary key of one specific version of a customer row

For the full set of key types and the conventions we use for each, see Kimball Keys Definitions.

References

Dimension Tables — Kimball Group
Fact Tables and Dimension Tables — foundational definitions
Surrogate Keys — why warehouses generate their own keys

Fact

tba

Grain

The grain of a fact table is the precise meaning of a single row — the level of detail it records.

In practice: Declare the grain in business terms before choosing dimensions or facts (“one row per order line”, “one row per daily account balance”). Every dimension and measure on the table must be true at that grain; mixing grains in a single table is the most common dimensional modelling mistake.

Example: fact_sales at the grain one row per product per order line — so quantity and extended_amount are recorded per line, never per whole order.

See also: Standard Cost (declares its grain as an explicit key tuple)

Star Schema

A star schema is the foundational pattern in dimensional modelling. It organizes data into a central fact table surrounded by dimension tables, forming a star-like shape when visualized. The fact table holds measurable events (e.g. orders, shipments, invoices), while the dimensions provide the context around those events (e.g. customer, product, territory). See Star Schema (Kimball Group) for the original definition.

This structure is optimized for analytical queries. Because every dimension joins directly to the fact table, queries are predictable — analysts always know where to look for metrics (facts) and where to look for filters and groupings (dimensions).

Example Star Schema Diagram

erDiagram
    dim_sales_territory ||--o{ fact_invoice_lines : "sales_territory_sk"
    dim_product_hierarchy ||--o{ fact_invoice_lines : "product_hierarchy_sk"
    dim_industry ||--o{ fact_invoice_lines : "industry_sk"

    fact_invoice_lines {
        string sales_territory_sk FK
        string product_hierarchy_sk FK
        string industry_sk FK
        decimal quantity
        decimal amount
    }
    dim_sales_territory {
        string sales_territory_sk PK
        string territory_code_nk
        string territory_name
        string region
    }
    dim_product_hierarchy {
        string product_hierarchy_sk PK
        string product_code_nk
        string product_name
        string category
    }
    dim_industry {
        string industry_sk PK
        string industry_code_nk
        string industry_name
        string sector
    }

Rules

One fact table per star schema

Each star schema should contain exactly one fact table at one grain. The grain is the business definition of the measurement event that creates a fact record — it should always start at the lowest, most atomic level. If you need to combine metrics from different business processes (e.g. sales and inventory), build separate star schemas rather than merging everything into a single fact table. This keeps each schema focused and avoids grain conflicts. See Four-Step Dimensional Design Process and Keep to the Grain for more on defining grain.

Do not join dimensions to other dimensions

In dimensional modelling, joining a dimension to another dimension is known as an outrigger dimension. While the Kimball methodology technically allows outriggers, they are rarely necessary and they complicate the SQL logic — queries become harder to read, maintain, and optimize. As Kimball notes, outriggers should be used sparingly, and in most cases correlations between dimensions should be demoted to a fact table where both dimensions are represented as separate foreign keys. See also Design Tip #105: Snowflakes, Outriggers, and Bridges.

If you find yourself wanting to join two dimensions together, use a factless fact table instead. A factless fact table captures the relationship between dimensions as a fact at its own grain — it has no numeric measures, only foreign keys to the dimensions involved. This keeps the star schema clean and the joins predictable. See Design Tip #133: Factless Fact Tables for Simplification for practical examples.

Always use left joins, fact on the left

When joining dimensions to the fact table, always use left join with the fact table on the left side. This ensures that every fact record is preserved in the result, even if a matching dimension record is missing.

a missing dimension match usually indicates a data quality issue — the left join makes these gaps visible rather than silently dropping rows
if you use inner join instead, you risk losing fact records and underreporting metrics without realizing it

Keep aggregate calculations in the reporting layer

The star schema should store atomic, grain-level data. Derived calculations like percentages, ratios, running totals, and year-over-year comparisons belong in the reporting or semantic layer — not in the fact table itself.

storing pre-aggregated values in facts makes them inflexible — they can’t be re-sliced by dimensions they weren’t originally grouped by
let the reporting tool handle aggregation so that analysts can drill down to the detail when needed
the one exception is additive measures (e.g. quantity, amount) — these belong in the fact table because they can be meaningfully summed across any dimension. Semi-additive measures (e.g. balances) can be summed across some dimensions but not all, and non-additive measures (e.g. unit prices, ratios) should never be summed directly

For more on aggregate tables as a performance optimization, see Aggregate Fact Tables (Kimball Group).

References

Dimensional Modeling Techniques — Kimball Group — complete list of techniques
A Dimensional Modeling Manifesto — the original case for dimensional modelling
Fact Tables and Dimension Tables — foundational definitions
Kimball Dimensional Modeling Techniques (PDF) — comprehensive reference document

Kimball Keys Definitions

In the Kimball dimensional modelling approach, keys are the backbone that connect fact tables to their dimensions and that let us track history correctly. This page defines the key types you will encounter and the conventions we use for each.

Quick reference

Key	Lives in	Stable?	Meaningful?	Purpose
Natural key	Source system	Yes (in source)	Yes	Identifies a business entity in the source
Durable / supernatural key	Dimension	Yes (forever)	No	Identifies an entity across all source changes
Surrogate key	Dimension	Per row version	No	Primary key of a dimension row
Foreign key	Fact	—	No	Points a fact row at a dimension row
Degenerate dimension	Fact	Yes	Yes	Operational identifier with no dimension table

Natural key

The identifier an entity carries in the source system — for example a customer_id from the CRM, an SKU from the product catalogue, or an order number from the OLTP database.

Carries business meaning and may be reused or recycled by the source.
Can be composite (several columns) and can change format over time.
Not used as the primary key of a dimension, because a single natural key can map to many historical versions of a row (see surrogate keys).

customer_id = "CUST-00417"

Durable (supernatural) key

A warehouse-assigned, never-changing identifier for a business entity. While surrogate keys change with every new version of a row, the durable key stays constant for the lifetime of the entity.

Use it to group all historical versions of the same entity.
Survives source-system migrations and natural-key reformatting.
Sometimes called a persistent or supernatural key.

customer_durable_key = 90231   -- one value for CUST-00417 across all versions

Surrogate key

A meaningless, warehouse-generated integer that serves as the primary key of a dimension table. Every row in the dimension — including every historical version of an entity — gets its own surrogate key.

Typically a monotonically increasing integer (BIGINT) or an identity column.
Carries no business meaning; never expose it to end users as a “real” id.
Decouples the warehouse from source-key changes and enables Type 2 history.

Convention: name surrogate keys <entity>_sk, e.g. customer_sk, product_sk, date_sk.

customer_sk = 1048576   -- primary key of one specific version of a customer row

Why a surrogate key instead of the natural key?

Slowly changing dimensions (SCD Type 2). When an attribute changes we add a new row with a new surrogate key, preserving the old version.
Performance. Single-column integer joins are faster and smaller than wide or composite natural keys.
Insulation. Source-system key changes don’t ripple into facts.
Late-arriving / unknown members. Reserved surrogate values can represent “Unknown” or “Not applicable” rows.

Foreign key

The column in a fact table that stores a dimension’s surrogate key, forming the join between the fact and that dimension.

One foreign key per dimension the fact relates to.
Always points at a surrogate key, never at a natural key.
Should be enforced (logically, at minimum) so every fact row resolves to a valid dimension row — including the special “Unknown” member.

fact_sales.customer_sk  -->  dim_customer.customer_sk

Degenerate dimension

A dimension key that lives in the fact table itself because it has no interesting attributes of its own and therefore no separate dimension table. Classic examples: invoice number, order number, transaction id.

Useful for grouping the line items of a single operational document.
Stored as a column on the fact, often suffixed _id or _number.

fact_invoice_lines.invoice_number = "INV-2026-008812"

Special dimension members

Reserve a handful of surrogate key values for rows that don’t map to real source records, so that fact foreign keys never have to be NULL:

Surrogate key	Member meaning
`-1`	Unknown
`-2`	Not applicable
`-3`	Missing / not yet arrived

Putting it together

erDiagram
    dim_customer ||--o{ fact_sales : "customer_sk"
    dim_customer {
        bigint customer_sk PK "surrogate key"
        bigint customer_durable_key "durable key"
        string customer_id "natural key"
        string customer_name "attribute"
        date valid_from_to "SCD2 validity + is_current"
    }
    fact_sales {
        bigint customer_sk FK "to dim_customer"
        bigint product_sk FK "to dim_product"
        bigint date_sk FK "to dim_date"
        string order_number "degenerate dimension"
        decimal quantity_amount "additive facts"
    }

A fact row joins to a specific version of a customer via customer_sk. To analyse an entity across all its versions, group on customer_durable_key. To trace a record back to the source, use the customer_id natural key.

Prepare

Prepare is the first transformation in the pipeline. It turns raw application tables into clean staging tables that are ready for the later steps. You should never build facts and dimensions directly on top of raw tables — always stage first.

flowchart LR
    raw[Raw application tables] --> prepare[Prepare → staging]
    prepare --> join[Join per source]
    join --> union[Union all sources]
    union --> keys[Keys → surrogate keys]
    style prepare fill:#ffd54f,stroke:#f57f17,stroke-width:2px

Rules

Staging tables are 1:1 with raw tables

Each staging table maps to exactly one raw application table, with the same grain and (broadly) the same set of rows. Prepare is about cleaning, not reshaping — stay close to the original table so it remains easy to trace a staged row back to its source.

Clean, don’t aggregate

Typical SQL used to build staging tables:

CAST — fix data types (text dates → DATE, numeric strings → DECIMAL).
TRIM — strip stray whitespace from text fields.
SELECT — pick and rename the columns you actually need.
WHERE — drop obviously invalid rows (e.g. soft-deleted records, test data).

Avoid GROUP BY and aggregation at this stage. Changing the grain here makes the downstream join and union steps harder to reason about. Keep staging atomic and let aggregation happen later, in the reporting layer.

Join

Join is the second transformation. Once the raw tables are cleaned into staging tables, there are usually many of them. We join those staging tables into a smaller number of consolidated, per-source tables.

flowchart LR
    raw[Raw application tables] --> prepare[Prepare → staging]
    prepare --> join[Join per source]
    join --> union[Union all sources]
    union --> keys[Keys → surrogate keys]
    style join fill:#ffd54f,stroke:#f57f17,stroke-width:2px

Rules

Join per source

Build one joined table per source system. If there are three sources, you should end up with three joined tables — not one giant join across everything.

Different sources have different data structures, grains, and key conventions. Joining them all at once produces a big, messy join that is hard to read and debug.
Consolidating per source first keeps each join focused and predictable, and isolates source-specific quirks before everything is brought together in the union step.

Keep the target grain in mind

Join staging tables up to the grain of the entity you are building (the fact or dimension). Use left join so you don’t silently drop rows when a lookup is missing — a missing match is usually a data-quality signal worth surfacing.

Union

Union is the third transformation. The per-source joined tables all describe the same kind of entity (e.g. customers, invoice lines) but come from different systems. Union stacks them into a single combined table.

flowchart LR
    raw[Raw application tables] --> prepare[Prepare → staging]
    prepare --> join[Join per source]
    join --> union[Union all sources]
    union --> keys[Keys → surrogate keys]
    style union fill:#ffd54f,stroke:#f57f17,stroke-width:2px

Rules

Keep a `source_name` column

Every record must carry a source_name column identifying the system it came from. When rows from multiple sources are combined, this column preserves the lineage of each record so you can:

trace any row back to the source that produced it,
filter or group metrics by source, and
debug discrepancies between sources.

Align columns before unioning

A union requires every input to share the same column set, order, and types. Make sure the prepare and join steps have already reconciled column names and data types across sources, so the union is a clean stack rather than a place to patch up mismatches.

Keys

Keys is the final transformation. After sources are prepared, joined, and unioned, we add the surrogate keys that make each row uniquely addressable and that fact tables join on. See Kimball Keys Definitions for the full set of key types.

flowchart LR
    raw[Raw application tables] --> prepare[Prepare → staging]
    prepare --> join[Join per source]
    join --> union[Union all sources]
    union --> keys[Keys → surrogate keys]
    style keys fill:#ffd54f,stroke:#f57f17,stroke-width:2px

Rules

Hash the natural key

The recommended way to generate a surrogate key is to hash the natural key of the table — whether it is a fact or a dimension.

Hashing produces a stable, deterministic value: the same natural key always yields the same surrogate key, so re-runs are reproducible and idempotent.
It is easy to explain and audit — the key is a direct function of the business identifier rather than an opaque sequence number.
For composite natural keys, concatenate the parts (with a separator) before hashing so the combination stays unique.

customer_sk = hash(customer_id)
invoice_line_sk = hash(invoice_number || '|' || line_number)

Convention: name surrogate keys <entity>_sk and the source identifier they are built from <entity>_nk (natural key).

Where keys live

On a dimension, the surrogate key is the primary key — one per row, including each historical version (see Slowly Changing Dimension).
On a fact, store the surrogate keys of the related dimensions as foreign keys, plus the fact’s own key built from its natural key.

Unknown member

tba

Late-arriving members

tba

Degenerate Dimension

Also known as: DD

A business identifier stored directly on a fact table that has no attributes of its own, and therefore no separate dimension table.

In practice: It usually identifies the operational transaction or document a fact row came from — an invoice number, order number, or ticket id. Keeping it on the fact lets you group the line items of a single document without a join.

Examples:

fact_invoice_lines.invoice_number = "INV-2026-008812" — one invoice spans many line-item fact rows but needs no dim_invoice.
standard_cost_amount = 3,4 - standard cost can be used as a degenerate dimension.

Common pitfalls:

Don’t build a dimension table for it just to “be consistent” — with no descriptive attributes, it stays on the fact.
If you later discover real attributes (status, channel), promote it to a proper dimension and replace the degenerate key with a surrogate-key foreign key.

See also: Kimball Keys Definitions

Outrigger

tba

Resolution engines (map_)

tba

Composite keys

tba

Views for business (vw_)

tba

Slowly Changing Dimension

Also known as: SCD

A dimension whose attribute values change occasionally over time, together with the technique chosen for whether to keep or overwrite the prior values.

In practice: Pick a strategy per attribute, not per table:

Type 1 — overwrite the value; no history kept.
Type 2 — add a new row with a new surrogate key and an effective-date range (valid_from / valid_to / is_current); full history preserved.
Type 3 — keep a “previous value” column alongside the current one; limited history.

Example: A customer moves city. Type 1 overwrites the city; Type 2 closes the old row (valid_to) and inserts a new current row, so historical facts still join to the address that was true at the time.

See also: Kimball Keys Definitions · Standard Cost

Naming (sk, code, number, is, vw, map)

tba

Key strategy (hash hybrid, surrogate keys)

tba

Standard Cost

Type: Dimension (per-unit cost rate) · Primary home: dim_standard_cost (SCD Type 2) · Also surfaced on: dim_product (current value only, Type 1)

Summary

The standard cost is the predetermined, planned unit cost of a product, it typically includes these costs:

materials
labour
allocated overhead

Standard costs are calculated per material, typically during a periodic cost roll (often annually or quarterly). It is used for inventory valuation, margin reporting, and variance analysis against actual cost.

Standard Cost is not the price the customer pays and not the actual cost incurred.

Natural Key

Standard Cost is per unit of product, the natural key tends to be:

Plant ID
Material Number
Effective Date (period in which the cost is valid)

In the warehouse this natural key maps to a cost_durable_key (stable per plant + material across every cost version) and a per-version standard_cost_sk surrogate key.

Schema

One row per: plant_id + material_number + cost version (effective period).

Column	Type	Role	Notes
`standard_cost_sk`	BIGINT	surrogate PK	one row per plant + material + cost version
`plant_id`	VARCHAR	natural key	ERP plant / costing location
`material_number`	VARCHAR	natural key	ERP material (≈ product)
`cost_durable_key`	BIGINT	durable key	stable per (plant, material) across all versions
`cost_effective_from`	DATE	natural key · validity	inclusive start of this cost version
`cost_effective_to`	DATE	validity	exclusive end; `9999-12-31` while current
`is_current`	BOOLEAN	validity	flag for the active version
`material_cost`	DECIMAL(18,4)	attribute (rate)	per-unit component; non-additive
`labour_cost`	DECIMAL(18,4)	attribute (rate)	per-unit component; non-additive
`overhead_cost`	DECIMAL(18,4)	attribute (rate)	per-unit component; non-additive
`standard_unit_cost`	DECIMAL(18,4)	attribute (rate)	= material + labour + overhead; non-additive
`currency_code`	CHAR(3)	attribute	ISO 4217; cost is per this currency
`uom_code`	VARCHAR	attribute	unit of measure the cost is expressed in

Source & lineage

ERP.COST_MASTER  ──┐
ERP.BOM_ROLLUP   ──┼──> stg_standard_cost ──> dim_standard_cost
ERP.COST_PERIODS ──┘                              │
                                                  └──> dim_product.standard_cost (current only, Type 1)

The cost roll job lands a new effective period; the staging model derives standard_unit_cost, closes the prior period’s cost_effective_to, and assigns the new standard_cost_sk.

How to use it

Current standard cost of a product

SELECT plant_id, material_number, standard_unit_cost, currency_code
FROM   dim_standard_cost
WHERE  is_current;

Margin — the dimensional way (fact carries the cost surrogate key)

The ETL stamps each fact row with the standard_cost_sk for the version in effect on the transaction date, so this is a plain equi-join that is already point-in-time correct — no date logic needed at query time.

SELECT  s.order_number,
        s.sale_date,
        s.extended_revenue,
        s.quantity * sc.standard_unit_cost            AS standard_cost_of_sale,
        s.extended_revenue
          - s.quantity * sc.standard_unit_cost        AS standard_margin
FROM    fact_sales         s
JOIN    dim_standard_cost  sc
  ON    sc.standard_cost_sk = s.standard_cost_sk;     -- point-in-time resolved at load

If the fact has no cost SK — point-in-time range join

When a fact only carries the natural key, match each row to the cost version that was active when it happened — never to the current cost:

SELECT  s.order_number,
        s.quantity * sc.standard_unit_cost AS standard_cost_of_sale
FROM    fact_sales        s
JOIN    dim_standard_cost sc
  ON    sc.plant_id        = s.plant_id
 AND    sc.material_number = s.material_number
 AND    s.sale_date >= sc.cost_effective_from
 AND    s.sale_date <  sc.cost_effective_to;          -- half-open interval

Purchase price / cost variance (standard vs actual)

SELECT  p.plant_id,
        p.material_number,
        SUM(p.actual_unit_cost   * p.quantity)        AS actual_cost,
        SUM(sc.standard_unit_cost * p.quantity)       AS standard_cost,
        SUM((p.actual_unit_cost - sc.standard_unit_cost) * p.quantity) AS variance
FROM    fact_purchase_receipts p
JOIN    dim_standard_cost      sc
  ON    sc.standard_cost_sk = p.standard_cost_sk
GROUP BY p.plant_id, p.material_number;

Common Pitfalls

Standard cost is a dimension, not a fact — because its amounts are non-additive. standard_unit_cost is a per-unit rate: summing it across products, plants, or periods is meaningless (SUM(standard_unit_cost) answers no real question). You look it up and multiply by a fact quantity (quantity × standard_unit_cost) to get an additive measure — the cost of sale — which belongs in the fact/query, not here. Modelling these rates as a fact table is what tempts that erroneous SUM.
Always join point-in-time, never to the current cost. Resolve the version at ETL into standard_cost_sk, or range-join on the natural key + date (above). Using dim_product.standard_cost (the current value) to value historical sales silently restates past margins every time a cost roll runs.
Use a half-open interval [from, to) (>= from AND < to). Closed intervals (BETWEEN) double-count on the boundary day when one version ends and the next begins.
standard_cost on dim_product is Type 1 (overwrite). It exists only for convenience / current-state lookups. It carries no history — don’t report trends from it.
Currency and UoM are part of the cost. Don’t sum or compare standard_unit_cost across rows with different currency_code or uom_code. Convert first.
Standard ≠ actual ≠ average ≠ list price. Keep cost types in separate, clearly named attributes. Mixing them is the single most common reporting error.
Cost-roll timing. A roll dated the 1st but loaded on the 5th leaves a 4-day gap if effective dating isn’t backfilled. Validate that max(cost_effective_to) for the prior version meets min(cost_effective_from) of the next with no gap or overlap.
Missing cost for a product. New materials may sell before a standard cost is rolled. Provide a -1 “Unknown cost” member (see Kimball Keys Definitions) and a data-quality check, rather than producing NULL margins.

Kimball Keys Definitions — natural, durable, and surrogate keys used above.
fact_sales, fact_purchase_receipts — consumers; carry standard_cost_sk.
dim_product — carries the current-value convenience copy (Type 1).

Change history / SCD

dim_standard_cost is an SCD Type 2 dimension with effective dating. Each cost roll closes the current row (cost_effective_to, is_current = false) and inserts a new current row with a fresh standard_cost_sk. Facts reference the version in effect at their transaction date via that standard_cost_sk. The dim_product.standard_cost convenience copy is Type 1 (overwrite, no history).

Sales Territory

Type: Dimension · Primary home: dim_sales_territory (SCD Type 2) · Also surfaced on: fact_invoice_lines (FK sales_territory_sk)

Summary

dim_sales_territory maps each salesperson (also called a sales rep or sales district) to the nested chain of sales territories they roll up through, flattened into one row per salesperson. It lets you aggregate sales, quota, and commission at any level of the territory tree — from a single rep up to global sales — with a plain GROUP BY.

It is not a geography dimension: the levels are an internal sales hierarchy, not postal/administrative geography (a rep’s territory need not match where customers live). The salesperson is the leaf of this dimension, not a full employee/HR dimension.

Natural Key

One row represents one salesperson for an assignment period (SCD Type 2); there is exactly one current row per salesperson. The natural key is the source salesperson identifier, made unique per version by the effective date:

Natural Key Fields:

Sales Person Id

The territory tree is balanced and fixed at 6 levels, flattened onto each salesperson row — the standard Kimball treatment for a fixed-depth hierarchy (no bridge table needed). Level 1 is the root (Global Sales); level 5 is the most granular territory; the salesperson is the leaf below level 5:

L1  Global Sales
└── L2  EMEA
    └── L3  Italy
        └── L4  Northern Italy
            └── L5  District 512 – Milan
                └── Salesperson  Maria Rossi (REP-00417)

Schema

Column	Type	Role	Notes
`sales_territory_sk`	BIGINT	surrogate PK	one row per salesperson version
`salesperson_id`	VARCHAR	natural key	source sales rep / district code
`salesperson_name`	VARCHAR	attribute	leaf of the hierarchy
`territory_l5_code`	VARCHAR	attribute	level 5 — most granular territory (e.g. district)
`territory_l5_name`	VARCHAR	attribute
`territory_l4_code`	VARCHAR	attribute	level 4
`territory_l4_name`	VARCHAR	attribute
`territory_l3_code`	VARCHAR	attribute	level 3
`territory_l3_name`	VARCHAR	attribute
`territory_l2_code`	VARCHAR	attribute	level 2
`territory_l2_name`	VARCHAR	attribute
`territory_l1_code`	VARCHAR	attribute	level 1 — root; constant
`territory_l1_name`	VARCHAR	attribute	always `Global Sales`
`valid_from`	DATE	validity	inclusive start of this assignment
`valid_to`	DATE	validity	exclusive end; `9999-12-31` while current
`is_current`	BOOLEAN	validity	active-version flag

A worked row for the example above:

Column	Value
`salesperson_name`	Maria Rossi
`territory_l5_name`	District 512 – Milan
`territory_l4_name`	Northern Italy
`territory_l3_name`	Italy
`territory_l2_name`	EMEA
`territory_l1_name`	Global Sales

Source & lineage

CRM.SALES_REP        ──┐
CRM.TERRITORY_TREE   ──┼──> stg_sales_territory ──> dim_sales_territory
HR.REP_ASSIGNMENTS   ──┘                                │
                                                        └──> fact_invoice_lines.sales_territory_sk (FK, resolved at load)

TERRITORY_TREE is a parent→child recursive table. The staging model walks it, flattens the five territory levels onto each salesperson, and — on a reassignment (reorg, rep moves district) — closes the prior row’s valid_to and inserts a new current row with a fresh sales_territory_sk.

How to use it

Sales rolled up to any territory level

Because the hierarchy is flattened, grouping at any level is a plain GROUP BY — no bridge, no recursion:

SELECT  st.territory_l2_name        AS area,
        SUM(s.net_amount)           AS net_sales
FROM    fact_invoice_lines            s
JOIN    dim_sales_territory   st ON st.sales_territory_sk = s.sales_territory_sk
GROUP BY st.territory_l2_name
ORDER BY net_sales DESC;

The full territory chain for one salesperson

SELECT  salesperson_name,
        territory_l5_name, territory_l4_name, territory_l3_name,
        territory_l2_name, territory_l1_name
FROM    dim_sales_territory
WHERE   is_current
  AND   salesperson_id = 'REP-00417';

Point-in-time: attribute each sale to the territory in effect then

The ETL stamps each fact_invoice_lines row with the sales_territory_sk that was active on the sale date, so this equi-join is automatically point-in-time correct — a sale stays with the rep’s territory at the time, even after a later reorg:

SELECT  st.territory_l3_name        AS country,
        DATE_TRUNC('quarter', s.sale_date) AS qtr,
        SUM(s.net_amount)           AS net_sales
FROM    fact_invoice_lines            s
JOIN    dim_sales_territory   st ON st.sales_territory_sk = s.sales_territory_sk
GROUP BY st.territory_l3_name, DATE_TRUNC('quarter', s.sale_date);

Common Pitfalls

Mind the level direction. Level 1 = Global (root), level 5 = most granular.
Salesperson ≠ person. The leaf is a sales role/district. One human may cover several districts, and a district may pass between people over time. Keep this dimension at the territory-assignment grain; model the individual separately if HR attributes are needed.
Unassigned reps. New reps may book sales before territory setup. Point the fact FK at a -1 “Unknown territory” member (see Kimball Keys Definitions) rather than leaving the FK NULL.

Kimball Keys Definitions — surrogate, durable, and natural keys plus the special “Unknown” member used above.
Slowly Changing Dimension — the SCD Type 2 pattern this dimension uses for reassignments.
fact_invoice_lines — primary consumer; carries sales_territory_sk.

Change history / SCD

dim_sales_territory is an SCD Type 2 dimension. Each territory reassignment closes the current row (valid_to, is_current = false) and inserts a new current row with a fresh sales_territory_sk; the salesperson_id natural key stays constant so all of a rep’s history can be grouped together. Facts reference the version in effect at their transaction date via sales_territory_sk.

Invoice Lines

Type: Fact — transaction · Grain: one row per invoice line · Primary home: fact_invoice_lines

Summary

fact_invoice_lines records the billed detail of customer invoices — one row for each product line on each invoice. It is the backbone for revenue, discount, tax, and margin reporting, sliced by customer, product, date, and sales territory.

It is not an invoice-header fact: whole-invoice charges (freight, invoice-level discounts) are not repeated on every line — keep those in a separate header fact or allocate them down to the line, or you will double-count.

Grain

One row per invoice line — a single product line item on a single invoice.

grain = (invoice_number, invoice_line_number)

invoice_number and invoice_line_number are degenerate dimensions (identifiers with no dimension table of their own); together they uniquely identify a row.

Schema

Column	Type	Role	Notes
`invoice_number`	VARCHAR	degenerate dimension	the operational invoice id
`invoice_line_number`	INT	degenerate dimension	line position within the invoice
`invoice_date_sk`	BIGINT	FK → `dim_date`	date the invoice was issued
`customer_sk`	BIGINT	FK → `dim_customer`	bill-to customer (version at invoice date)
`product_sk`	BIGINT	FK → `dim_product`	product sold
`sales_territory_sk`	BIGINT	FK → `dim_sales_territory`	rep / territory in effect at invoice date
`standard_cost_sk`	BIGINT	FK → `dim_standard_cost`	standard-cost version in effect at invoice date
`currency_code`	CHAR(3)	attribute	document currency (ISO 4217)
`quantity`	DECIMAL(18,4)	measure	units invoiced; additive
`gross_amount`	DECIMAL(18,4)	measure	list value before discount; additive
`discount_amount`	DECIMAL(18,4)	measure	additive
`net_amount`	DECIMAL(18,4)	measure	= gross − discount; additive
`tax_amount`	DECIMAL(18,4)	measure	additive
`standard_cost_of_sale`	DECIMAL(18,4)	measure	= `quantity × standard_unit_cost`; additive
`unit_price`	DECIMAL(18,4)	measure	net per unit; non-additive (a rate)
`load_ts`	TIMESTAMP	audit	warehouse load timestamp

Measures & additivity

Measure	Additivity	Notes
`quantity`	additive	sums across all dimensions
`gross_amount`, `discount_amount`, `net_amount`, `tax_amount`	additive	the money measures; sum freely
`standard_cost_of_sale`	additive	pairs with `net_amount` to give margin
`unit_price`	non-additive	a per-unit rate — never `SUM`; for an average use `SUM(net_amount) / SUM(quantity)`

Margin is derived, not stored: net_amount − standard_cost_of_sale. Because both inputs are additive, margin can be summed at any level.

Source & lineage

ERP.INVOICE_HEADER ──┐
ERP.INVOICE_LINE   ──┼──> stg_invoice_lines ──> fact_invoice_lines
ERP.FX_RATES       ──┘

The staging model joins header to line, then resolves each foreign key. The SCD Type 2 keys (customer_sk, sales_territory_sk, standard_cost_sk) are looked up as of the invoice date, so every line carries the dimension version that was in effect when it was billed — point-in-time correct (see the pitfalls).

How to use it

Net sales by month and sales area

SELECT  d.year_month,
        st.territory_l2_name          AS area,
        SUM(f.net_amount)             AS net_sales
FROM    fact_invoice_lines   f
JOIN    dim_date             d  ON d.date_sk            = f.invoice_date_sk
JOIN    dim_sales_territory  st ON st.sales_territory_sk = f.sales_territory_sk
GROUP BY d.year_month, st.territory_l2_name;

Standard margin by product category

SELECT  p.product_category,
        SUM(f.net_amount)                            AS net_sales,
        SUM(f.standard_cost_of_sale)                 AS standard_cost,
        SUM(f.net_amount - f.standard_cost_of_sale)  AS standard_margin
FROM    fact_invoice_lines f
JOIN    dim_product        p ON p.product_sk = f.product_sk
GROUP BY p.product_category;

Average selling price (the non-additive measure, done right)

-- weighted average, NOT AVG(unit_price)
SELECT  product_sk,
        SUM(net_amount) / NULLIF(SUM(quantity), 0) AS avg_unit_price
FROM    fact_invoice_lines
GROUP BY product_sk;

Common Pitfalls

Never SUM(unit_price) (or AVG it). It is a per-unit rate — non-additive. Compute an average as SUM(net_amount) / SUM(quantity).
Header vs line grain. Whole-invoice charges (freight, invoice-level discounts) belong to the header, not each line. Repeating them per line double-counts — allocate them to lines or keep a separate header fact.
Dimension fan-out. Joining to a dimension at a coarser grain or through a multi-valued bridge multiplies rows and inflates the measures. Join on the line’s own surrogate keys, and pre-aggregate the fact before any 1-to-many join.
Point-in-time keys. customer_sk, sales_territory_sk, and standard_cost_sk are the versions in effect at the invoice date. Don’t re-derive them from the current dimension row, or history is restated on every reorg / cost roll.
Returns & credit notes. Credits arrive as negative quantity / amounts. Keep the sign convention consistent so SUM nets correctly; don’t silently filter them out of margin.
Currency. Amounts are in currency_code (document currency). Convert to a single reporting currency before summing across currencies.
Unknown members. A line that can’t resolve a dimension points at the -1 “Unknown” member (see Kimball Keys Definitions), never a NULL FK.

Standard Cost — supplies standard_cost_sk / standard_unit_cost behind standard_cost_of_sale.
Sales Territory — supplies sales_territory_sk for territory rollups.
Grain and Kimball Keys Definitions — the concepts this fact builds on.
dim_date, dim_customer, dim_product — the remaining conformed dimensions.

Change history / load pattern

fact_invoice_lines is a transaction-grain fact, loaded insert-only: each invoice line is written once and never updated. Corrections and returns flow in as new (often negative) lines rather than edits, preserving an auditable history. Late-arriving invoices are appended with their historical invoice_date_sk, and their SCD Type 2 keys resolve to the version that was current then (or the -1 Unknown member until the dimension member appears).

SCD, grain, degenerate dim, conformed dim, etc.

tba

Keyboard shortcuts

Dimensional Modelling Docs