Data Modeling for Data Governance
Better Data Models, Better Data Governance
Are your data models truly supporting data governance?
Most data professionals have been involved in some way in data governance programs at some point in
our careers. These programs range from light governance to fully staffed, well-funded C-level offices to
ensure data quality and data protection for the enterprise.
No matter what level of data governance you have been involved with, data models provide a vital
resource for successful governance programs. In this work, we look at the following data modeling
features that need to be part of an enterprise’s data modeling portfolio to ensure compliance with
better data governance processes.
Data Governance Tasks Are Completed Faster with Better Data Models
In my experience, when teams are working toward the same goal, both individual and collaborative
tasks are completed faster. Logical data models that have had strong engagement data stewards
comprise hundreds of hours of decisions made by business users and data professionals. The results of
these collaborative efforts can save a significant amount of data governance time.
Collaborative Tasks Are Easier
When there is less contention and more trust among team members, tasks are easier to complete
because there are fewer distractions. Happier teams have better outcomes.
Data Model Quality is Data Quality
The quality of data models has a direct impact on data quality. Data models, as requirements and
specifications for data, are the standards against which we measure data quality. Barebones data
models, often just diagrams of databases, do not aid in better data governance.
Better Data Governance
Now that we have covered the benefits of better collaboration, let’s look at how we can leverage good
data modeling practices with good data governance. You may already practice a number of these, but
ensuring you are doing all of them increases the value of your data models significantly and, therefore,
your organization’s value
in you.
Let’s first start with a definition of data governance from the DAMA Data Management Body of
Knowledge:
“Data Governance (DG) is defined as the exercise of authority and control (planning, monitoring, and
enforcement) over the management of data assets…While the driver of data management overall is to
ensure an organization gets value out of its data, Data Governance focuses on how decisions are made
about data and how people and processes are expected to behave in relation to data.1”
Build Tailored Model Layouts for Data Stewards and Stakeholders
Data stewards are individuals who manage data governance in the field. They may be responsible for all
the activities of data governance, or just one aspect, such as data quality, technical implementations of
requirements, or data security.
Too many project teams, due to time constraints, prepare only one display of their data models. Often,
this layout and markup are tailored for technical consumers of the models. But if data professionals
focus models on technical audiences, these models form obstacles to data steward engagement.
Stewards: Hide non-business requirements
Technical objects and specifications can be obstacles to reviewing data models for business
requirements. Some of the items that should be hidden for business reviews:
- Indexes
- PK and AK notations
- Surrogate keys
- Database owners and schemas
- Table names and properties
- Constraint names
- Database Triggers
- Relationship triggers
- Technical columns (such as GUIDs, system dates)
There may be cases with certain data stewards, where adding more information may be warranted. For
instance, if you use generic data types in your logical models, include them. If you need to review
default values, add them. Incorporate just enough information to focus a review and approval results for
better quality data models and faster results.
If time is a constraint to managing several views of a model, leverage data modeling tool features for
templates, subject area inheritance, and macros to make preparing and managing multiple subject areas
more efficient. The payoff for data steward success through the removal of distractions is significant.
Assign Data Privacy and Sensitivity Classifications
Consumers of data must be clear on usage requirements and constraints early on to avoid exposing the
company to risk. A critical responsibility that a data modeler has is in classifying entities and attributes.
Privacy and sensitivity classification involves assessing attributes for their sensitivity levels. As penalties
from newer privacy and security legislation arise, the need to classify and model these levels becomes
more significant.
The last tip discusses an additional type of data classification that of grouping data by business terms.
Use Data Privacy and Sensitivity Classification Tools to Assist
Traditionally we data professionals have inspected each data attribute to determine which elements are
Personally Identifiable Information (PII), financial data, health sensitive data, or confidential data. One
must understand the meaning of the data to know whether it is sensitive data. Who better to
understand data than the data architect and a business steward?
Now we have tools that can assist us in identifying the classification of data attributes.
Most RDBMS vendors include data classification features within their database client tools. These
classifications are usually accomplished by examining database column names to infer a sensitivity type.
Figure 2 - SQL Server Data Classification Recommendations shows an example of how Microsoft SQL
Server Management Studio (SSMS) makes data sensitivity recommendations based on column names in
a database. These types of tools are highly dependent on having meaningful column names. Data
classifications are reviewed and stored in the database as metadata. This metadata should be compared
into physical and logical data models and vice versa.
Figure 2 - SQL Server Data Classification Recommendations
Other data sensitivity tools can perform data profiling (examining data itself) to make data classification
recommendations. Both types of services are essential to efficiently classifying data. However, data
stewards, together with data modelers, still need to categorize data by inspecting each attribute. For
instance, a special meal request or a wheelchair request by a traveler may even be considered health
sensitive data.
Once data classifications have been completed, data models can be updated with this critical
information.
Extend and Enhance Data Entities with Metadata
Many data models begin their lives as a reverse-engineering of a database. These models contain only
those objects and properties a database contains, namely data structures and constraints. Yet useful
data models, especially logical data models, are enhanced by business-related metadata:
Use extended definitions and notes Time-constrained data professionals often find it difficult to develop meaningful data modeling object
definitions, but the payback for doing so is significant over time. The best time to write a meaningful and
complete definition is at the time the object is created in the model. To make entity definition writing
faster, consider using the format of:
A {noun} that {verb phrase + context}. This includes {more detail} and excludes {more detail}.
Data professionals that follow a pattern in writing definitions find that they are faster to write and
better understood. One of the biggest myths in writing definitions is that one does not need them if they
have good naming standards.
Refrain from listing attributes in the definition, as attributes may end up in other entities or be renamed
later. It is also recommended to exclude references to other entities, except in referring to their
business concepts: invoices, not INVOICE, as an example.
Developing meaningful and extended definitions that appear in the models, portals, data catalogs, and
even database objects is a tremendous value in supporting data governance throughout the
development process. Idera’s ER/Studio Data Architect supports hover- based tips that show definitions.
An example of this presentation is shown in Figure 3 - Entity Definition Display.
Figure 3 - Entity Definition Display
Include Data Steward Information
Since data stewards are responsible for managing data governance processes and policies, ensure that
modeling objects contain stewardship information. In some organizations, there is just one stewardship
role, but in more mature data governance programs, there may be many. These might be divided into
strategic, tactical, and operational positions or in business and technical roles. No matter which formula
your data governance areas follow, including the metadata ensures that everyone with a question or
concern knows whom to contact. See Figure 4 - Stewardship Metadata for an example of Business Data
Steward and Technical Data Steward metadata in a Logical Data Model.
Including this metadata in models also means it can carry forward to data portals, data catalogs, and
data itself.
Figure 4 - Stewardship Metadata
Include Data Privacy and Sensitivity Classifications
All the work done in Tip 2, Assign Data Classifications, is an absolute must-do in data modeling. Data
privacy and sensitivity classification metadata is an essential aspect of any review or discussion about
data. From C-level reviews to developer implementation of applications, data classification should be at
the forefront of each stakeholder’s thinking.
In Figure 5 - Data Privacy Classification, see how data privacy metadata is shown in a data model
Figure 5 - Data Privacy Classification
Use Other Visual Artifacts
One of the best ways to leverage data modeling tools is to make use of text and shapes to draw
attention to specific concepts or to enhance the review process. Sometimes data professionals do all
this annotation in other tools using screen captures of data models. However, leaving the annotations in
other tools means they are lost to the data model and usually have to be recreated for future
discussions. Since these are independent objects in a model, they can be hidden or included as needed.
Including them in a model also means they are subject to version control and backups. Figure 6 - Entity
Note shows an example of a text note added to an entity to clarify why it is used in the model.
Figure 6 - Entity Note
Use Business Data Objects to Group Entities
One of the mismatches between how data stewards think of data and how data modelers do is one of
data normalization. To a data steward, an Invoice is a document that contains all the data on a paper or
electronic record of a demand for payment. To a data professional, Invoice is just one entity of a
collection of entities that comprise that document. These points of view, while different, are easier to
manage with the use of Business Data Objects. These are non-technical groupings of entities in a logical
data model. Figure 7 - Person Business Data Object Expanded shows how Person entities are contained
in a business object named Person Object. This object can be collapsed as in Figure 8 - Person Business
Object Data Collapsed to hide complexity when needed and expanded as in the figure to include the
details of the entities.
Figure 7 - Person Business Data Object Expanded
Figure 8 - Person Business Object Data Collapsed
Data Model Security Requirements
Gone are the days that we data modelers put together a logical data model, generated a physical model
based on it, then generated a database script and threw it over to a database administrator (DBA) to
worry about how to protect that data.
Data protection requires data classifications, compliance, and business reviews to identify which data
needs extra security. Then we need to model how to secure that data. All these things must be
documented across conceptual, logical, and physical data models. This attention to security ensures that
the models can adequately support model-driven development that is not dependent upon the memory
of a developer or DBA to apply the security after the fact.
Add Business Security Requirements to Logical Data Models
Security is a data steward responsibility, which means it’s a data modeler’s responsibility. Data modelers
may not need to specify how data is secured, but we do need to model security requirements:
- Data encryption requirements
- Data masking requirements
- Data access requirements, including attribute level and instance (row-level)
Data sensitivity classifications are where we should start modeling requirements. The business sets
these requirements, often through the data governance program. With the assistance of data security
teams, we might find that we must encrypt or mask specific columns. We might also need to specify
which business roles should have the right to see unencrypted or masked data.
Add Technical Security Requirements to Physical Data Models
When working with DBAs, a data modeler needs to ensure that the security requirements are
implemented thoroughly. The data modeler represents the data steward’s need for good governance,
and the DBA serves the operational needs to implement requirements that perform well.
Modeling data security designs can also contribute to ensuring ad hoc uses of data, such as self-serve
Business Intelligence (BI) tools, carry the security requirements to business users of the data. A data
scientist using customer data should be able to tell how the data she wants to use may be used, how it
may have been masked, and how it should be protected.
Implement Easy Data Model Collaboration
The value of a data model comes from its regular use, not just by data modelers. Properly used data
models can be easily leveraged by data stewards, business users, developers, DBAs, data scientists…the
list could go on to almost everyone in an organization.
One single data model display does not meet all those needs. We have seen in earlier tips on how data
models can be tailored to meet the needs of target audiences. But there are other ways effective
modeling can meet different needs.
Include Comments and Questions
In the past, model comments and questions were often managed on paper notepads, e-mails, and
spreadsheets. These comments were difficult to tie back to a specific object. Collecting and managing
them required Herculean efforts, often by both the data steward and the data professional. We can now
manage these in our data models. That means they can be versioned, shared, secured, and backed up.
They are attached to an object and can be seen by others who often have the same questions or
comments. Follow up is also more efficient as comments are tracked back to a specific account.
Figure 9 - Data Model Comments
Leverage Data Model Repository
The ER/Studio Repository provides version control at the property level of the data model object. It
offers versioning, snapshots (named releases), and checking of data models. The check-in and check-out
functions support sharing and collaborating on data modeling efforts at the lowest level of granularity
required by data professionals. Data stewards, data modelers, DBAs, developers, and security analysts
can work together on the same modeling objects, even at the same time without fear of overwriting
each other’s changes. As you can see in Figure 10 - Collaboration with Repository, data model work can
happen at the same time, across many groups, and even remotely
.
Figure 10 - Collaboration with Repository
Manually sharing and coordinating data modeling work at the file level is not possible on real-world-
sized projects – there’s just too much complexity and reuse of objects. File-level version control is not
fine-grained enough to support responsive data modeling efforts.
The Repository also includes security features to protect the integrity and security of data models. Data
governance requires data model quality as well as data quality; having the right collaboration and
protection tool is imperative.
Promote Models as the Go-To Record for Data Knowledge
Anyone working with data assets should use data models as the authority for business requirements
about data. A data model provides knowledge on not only the structure of data assets, but also the
meaning, rules, and guidance for that information. In support of these authoritative records, Idera Team
Server also notifies users of changes to ‘followed’ data assets. If developers are concerned with a table
in a database, they can follow it and be notified of any changes to that table are made.
Leverage Data Portals and Catalogs
Data Stewards and Data Modelers generally use different modeling paradigms to manage this
knowledge. Data Stewards are more focused on Glossaries of Business Terms. Data Modelers
concentrate on logical and physical models. Idera Team Server contains all these paradigms in the same
repository and joins them together. Both roles need to be able to review and comment on all of those.
One can define business glossaries with terms and relationships between them, then link those glossary
terms to entities and tables in data models. This ability to drill down or up from glossary to
implementation objects is critical to managing the effectiveness of data modeling initiatives. In Figure 11
- Business Glossary Terms, the list of terms and their metadata, including definitions, can be seen.
Figure 11 - Business Glossary Terms
One can drill down from a term to see the objects that are classified or characterized by that term. This
traceability is the other type of data classification discussed in the first tip. In Figure 12 - Term Found In,
the business term and its grouping of tables can be seen.
Figure 12 - Term Found In
This classification of data is a crucial part of data governance as it can be used for impact analyses and
measuring data and data model quality. Furthermore, the roles involved in data now have a single
continuous ecosystem to manage their data. The knowledge and requirements of Data Stewards can be
connected and compared with that of Data Modelers and DBAs to maintain consistency throughout the
process of managing data.
Data Modeling for Data Governance
While the tips here cover a range of topics from collaboration to technical implementations, they all
share a common trait: the maturity of data governance in an organization impacts their ability to
succeed. Good data modeling also affects how and where data governance work is completed.
Collaboration, usability, completeness all form vital components of data governance efficiency. Within
this harmonious ecosystem, as data stewards define standards around data, data modelers can ensure
that data assets deliver against those standards.
New to Data Governance?
An organization starting its path towards formal data governance might be enhancing its logical data
models with necessary metadata around data stewards and sensitivity levels. This organization might
also be starting a business glossary and implementing a data model and standards portal. New data
governance initiatives can leverage such knowledge from mature data modeling assets.
More Mature Data Governance?
A more mature organization with a formal, enterprise-wide data governance program might be
managing these items in other systems and publishing the results of that work to their data models and
portals. They might also be documenting new data governance items in their models, then publishing
those out to their formal data governance tools. Where the work is done is less important than the fact
that the activities are happening and being recorded then shared.
Being part of TeamData® means leveraging your data models to support data governance. No matter
where your organization fits in these maturity models, your data models play a crucial role in ensuring
your data governance is timely, engaging, and successful.
About IDERA
IDERA understands that IT doesn’t run on the network – it runs on the data and databases that power
your business. That’s why we design our products with the database as the nucleus of your IT universe.
Our database lifecycle management solutions allow database and IT professionals to design, monitor,
and manage data systems with complete confidence, whether in the cloud or on- premises.
ER/Studio is the collaborative data modeling solution for data professionals to map and manage data
and metadata for multiple platforms in a business-driven enterprise data architecture.
Whatever your need, IDERA has a solution.