Home > Software engineering >  In a datawarehouse can a dimension be related to another dimension?
In a datawarehouse can a dimension be related to another dimension?

Time:02-02

I am currently developing a Data Warehouse and I was wondering if it makes sense to have a dimension connected to another via a foreign key.

For example, let's say we have two dimensions 'Country' and 'City' should we store only the City Dimension Key in the fact table. And the City is aware of it's Country.

Or does it make more sense two store both foreign keys in the fact table.

But the City dimension will have to know which Country it belongs to (it looks like it defies the star schema, since we now also have links between dimensions)

Or is this purely a design choice and will have no impact in querying, etc?

CodePudding user response:

Not a straight answerm but consider these two scenarios;

A. You have one fact table at the grain of city

  • You may choose to have a star schema, which is
    • A single dimension containing city
    • This dimension contains a country column (which is repeated)

factTransactionA >- dimCity

  • Or you may choose to have a snowflake schema, which is
    • A city dimension table
    • a seperate country dimension table
    • these dimensions can be joined.

factTransactionA >- dimCity >- dimCountry

Both are valid but consider....

B. You have one fact table at the grain of city and another one at the grain of country

When you're not sure of a design decision.... look for other constraints or requirements that help you make a decision

For case B you have to have a Country dimension. You shouldn't for example "overload" the city dimension and try to make it fit a fact table at the grain of Country. So you know you must have this:

factTransactionB >- Country dimension table

So if I extend this explanation on the fly.... typically, you use "conformed" dimensions between fact tables, so when we consider both fact tables, I would actually suggest this type of schema:

factTransaction2 >- dimCountry -< factTransaction1 >- dimCity

Rather than this

factTransaction2 >- dimCountry -< dimCity -< factTransaction1

This actually means baking the dimCountry surrogate key into factTransaction1 which is actually at City level.

Because

  • My gut feel tells me we should avoid two conformed dimensions between facts
  • If you have one fact at Country dimension, then Country is probably important enough in your business to bake into other facts to make it easy to compare metrics across facts.

So I feel in this long winded explanation I have posed one reason to avoid snowflake schemas, but they are definitely valid

  •  Tags:  
  • Related