Data Modeling Whitepaper
AN INTRODUCTION TO DATA MODELS
WHAT IS A DATA MODEL?
A data model organizes data elements and standardizes their relation to each other and the
properties of real-world entities. A database model is a type of data model that determines a
database’s logical structure and how data can be stored, organized and manipulated in that
database. A normalized relational database and unnormalized star schema are two of the most
common database models, each of which has its advantages.
RELATIONAL DATABASE DATA AND DATA NORMALIZATION
E.F. Codd introduced the relational database model in 1970, which uses tables to make
databases independent of other software applications. Database normalization is the process of
structuring a relational database to reduce data redundancy and improve data integrity. It
generally consists of organizing the columns, or attributes in tables to enforce various data
integrity constraints, either by creating a new database design or improving the existing design.
Normalization is a progressive process, meaning that the requirements for one level of
normalization must be met before the next level can be achieved.
FIRST NORMAL FORM
Unnormalized data may have a primary key, meaning the table doesn’t have duplicate records.
First normal form (1NF) requires a primary key and also requires all cells in the table to have
single values. The most common solution to this problem is to move attributes that could have
multiple values to a separate table, along with a primary key to connect the new table to the old
table. Assume for this example that a table’s attributes include the title and subject of books
such that one book may have more than one subject. One way to place this table into 1NF
would be to move the subject attribute to another table, which would also include the book’s
title.
SECOND NORMAL FORM
Once a table is in 1NF, placing it in second normal form (2NF) requires removing its partial
dependencies, meaning that all attributes must depend on the entire primary key. Consider
a table that contains the attributes of title, format and price such that the same book can have a
different price depending on its format, which could be a hardback, paperback or ebook.
Furthermore, the only candidate for the primary key is the combination of the title and format
attributes. Placing this table in 2NF requires moving the format and price attributes to a second
table that also contains the title attribute. Note that the title attribute now becomes the only
candidate for the primary key in the first table.
THIRD NORMAL FORM
Third normal form (3NF) requires the removal of all transitive dependencies, meaning that
attributes may only depend on the primary key. Assume for this example that the book table
contains the genre id and genre name for the books. While both of these attributes depend on
the primary key of title, the genre name also depends on the genre id. Placing this table in 3NF
requires the genre name to be moved to another table that also contains the genre id.
POST-RELATIONAL MODELS: STAR SCHEMA
Some data models are more generalized than the relational model and are sometimes known
as post-relational models. A star schema is one such model and is the most common data
model used in data warehouses. Star schemas are unnormalized and consist of at least one
fact table that references any number of dimension tables.
FACT TABLES
Fact tables record measurements for specific events, so they usually consist of only numeric
values and a foreign key to a dimensional table containing descriptive information for those
values. Fact tables should record measurements at a low level of detail, or granularity, which
typically results in the accumulation of many records over time. Fact tables may be classified
into three types, including transaction fact tables, snapshot fact tables and accumulating
snapshot tables.
DIMENSION TABLES
Dimension tables usually have a smaller number of records compared to fact tables, although
the number of attributes in their records can be very large. These tables can define many types
of dimensions, but the most common ones include time dimensions that describe the time at
which events are recorded in the fact table. Range dimensions describe a range of measurable
quantities to simplify reporting. Other common dimension tables include tables for employees,
geography, and products.
DATABASE DESIGN CONSIDERATIONS
The primary advantages of a normalized relational database as compared to a star schema are
that a normalized database strictly reinforces data integrity, which prevents anomalies like one-
off inserts and updates. Normalized databases also have greater flexibility in performing
analytical tasks, provided they follow the database model’s business logic. Furthermore,
normalized database models support many-to-many relationships between business entities
more easily than star schemas, which must simplify these relationships to conform to the
dimensional data model.
The benefits of star schemas include simpler reporting logic compared to a normalized
database, especially as-of and period-over-period reporting. Star schemas also have simpler
join-logic, which improves performance on read-only operations like reporting and queries. This
advantage also improves the performance of aggregation operations in star schemas. In
addition, all online analytical processing (OLAP) systems use star schemas to build proprietary
cubes efficiently. An OLAP cube is a three-dimensional dataset, as opposed to a traditional two-
dimensional table. For example, an OLAP user can represent financial data by product, time
period, and location with a cube.
ER/Studio Data Architect
Document, design, and communicate data assets faster and easier.
Round-trip database support gives ER/Studio Data Architect users the power to easily reverse-
and forward-engineer, compare and merge, and visually document data assets across multiple
relational, NoSQL, and ETL platforms and data sources. Built-in facilities automate routine data
modeling tasks so users can analyze and optimize database and data warehouse designs faster
than ever.