Video : ER/Studio Data Architect
Hierarchical Data Models for MongoDB and JSON
MongoDB is a cross-platform, open-source, NoSQL document-oriented database program. Its design enables it to store unstructured data as JSON-like documents and provide high performance, high availability, and easy scalability. MongoDB uses binary JSON (BSON), a binary-encoded serialization of JSON-like documents, to store data, which makes it easy to work with data in a variety of programming languages. MongoDB supports various features as indexing, ad hoc queries, aggregation, and replication, to provide efficient data management. We widely used it in modern web development, big data, and real-time analytics because of its scalability and flexibility.
Discover how ER/Studio can help you create hierarchical models for the NoSQL document-oriented database MongoDB and JSON.
Hierarchical Data Models
In this recording, we’re going to look at using ER/Studio for modeling hierarchical structures. We’re going to do this for MongoDB and JSON. Before we start, let’s summarize what is ER/Studio? It’s an enterprise tool to allow data architects to design and document data assets. We’re going to use models of information to be able to do that. We support a wide range of products. Our history is standard relational databases and we’re doing a lot more with newer structures like MongoDB and JSON. Our secondary mission is to be able to connect those models and those data architects in with data governance and data analytics initiatives. Overall, we want to form a companywide data ecosystem. Wherever you go, you’re going to get the same answers. All of these initiatives are going to pull their knowledge together. ER/Studio itself has got two main components. There’s data architect and team server.
ER/Studio Data Architect
Data Architect is a thick client tool, runs under Windows that allows you to create conceptual logical, physical models, generate code, reverse engineer, et cetera.
ER/Studio Team Server
We’ve also got the Team Server components which allows our data architects to pull their models together into the central repository. We’ve got Team Server Core, which is a web based component that allows you to publish those models to a wider audience and also to be able to build and connect models into a business glossary.
Enterprise Data Architecture
Many of our customers have an overall objective to create an enterprise data architecture structure. In this diagram here, we can see we’ve got three layers of the model.
Conceptual Data Model
We’ve got a single enterprise conceptual data model. At the top we might have a small number, ten, 5100, whatever it is, important concepts of the organization.
Logical Data Model
The second level. Enterprise logical. Then we’ve got a more detailed model. Those entities from the conceptual will be expanded into more detailed entities. Those entities themselves will then have attributes. Those attributes may then have keys assigned to them, will then have relationships between them. The Enterprise Logical model will usually be in third normal form. We’ve got one single model of our information across the organization.
Project Data Model
The third layer is then the project models. For each data asset we’re going to produce a project model. Now, each project model may or may not have both a logical element to it and a physical element to it. We may then also include a business glossary.
We’re going to expand this model out to connect to our data governance world. The Business Closure will have lists of business terms that may or may not be arranged into an ontology of relationships between them. We’re then going to connect those business terms across to our data models on the right hand side and that allows us then to govern our data.
We can assign rules to the business terms credit rating score we have on the screen here. We can say that is sensitive data, personally identifiable information, et cetera. Here are the rules that we apply to it. As we connect that business term to all of our models, then it allows us to say, well, where are we storing credit, rating score? Are we following those rules? Likewise, we can go to any piece of information in any assets and say, what does this information mean and what are the rules applied to it? If we expand the model again, then down at the bottom we’ve got project models that cover three groups of data assets. The first is our application databases over on the right. The third one is our data warehouses. In the middle we’ve got our application messages, pieces of information that flow between our applications or are used within our applications.
Logical Data Modeling
All of these we want to be able to model. We want to be able to model information in all of its forms. Let’s look at the models in more detail. We’re modeling information,we’re generally going to use a logical data model.
Entity Relationship Diagram
In this we’re going to use an entity relation diagram. We can see that we’ve got four entities here publisher, book, chapter and section. For each of those entities we can define what are the attributes of it. Book we have a title and an author, a publisher name, language, et cetera. We can work through each of the entities and then define how do we identify them. A book is identified by its title and its author. If we’re looking at a publisher, it’ll be identified by its publisher name. We can then work through and create relationships between these entities that we can see. There’s a relationship from a publisher to a book.
A publisher publishes a book and if we’re looking at the book, then we’ve got the primary key of the publisher propagated to the book. We can see the identity of the publisher that publishes that book, allows us to join those entities together. Now, from book to chapter, we’ve got a different kind of relationship. This is an identifying relationship. You can see that a book identifies the chapter within it. The chapter then identifies the section within that chapter. In our model here, we’re propagating the primary keys from the identifying objects. If we look at a section in isolation, then we can see to identify every section, we need to know the title of the book, the author of the book, the chapter number and the section number. We can then use these models to create physical models of databases. We’ve been doing this for many years.
If I want to create a standard relational database that contains information about books, then again I’m going to have four tables. Reach and Steele have a table. We’ll replicate those relationships, the primary keys, and create foreign key constraints between them.
Hierarchical Data Model
Now, for hierarchical models, things are different we have different rules. Our models before will be very much normalized. Within hierarchical structures, we might have a lot of denormalization where we’re repeating information.
Here in our JSON structure, we’re going to contain a list of books and we’re going to create arrays of book objects.We can see for each book we’ll have fields contain information about the book. Here we can see our book contains title and author and then the chapter and the section, and then hierarchical substructures within each book. We can model this in the same way. Again, over on the left we’ve got our traditional logical data model.
Physical Data ModelM
On the right we’ve got a physical model representing that book. Now, in our JSON structure, we’re going to have one single top level entity that represents the array of objects, in this case book. We’re then going to take the other concepts or objects in the model. We’re going to create those of substructures within our book. For each book, we want to know the publisher. We’ll create a sub-item for the publisher with the publisher name when it was found in the location, likewise for the chapter and the section. An alternative visualization of this, we’ve got a single object in our JSON, which is a book which contains fields. Those fields can have subfields and so MongoDB, which is fairly similar. It’s built around JSON. So we have collections of JSON objects. Here we’ve got book, library,patron, publisher, and then we can link those collections together.
JSON and MongoDB
Each collection is just a collection of JSON documents as normal, each one having surrogate key of underscore ID.Again, to look at the model of that, we’ve got two collections, publisher and book. Each of them has a primary key,a surrogate primary key underscore ID, and then it’s fields. And those fields can then contain substructures. Again,see a different visualization of this. We’ve got two top level collections, publisher and book. Each have both got a surrogate ID and then some fields between them. The big difference between JSON plain JSON and MongoDB is that we have both these IDs and then we can also create foreign key relationships between them. Our book is referencing its publisher, so as a foreign key relationship based on that surrogate key. Let’s see how we can build these models using ER/Studio. I’m going to take you through a demo of a number of stages here.
First of all, we’re going to take the standard logical data model and we’re going to use that to create a physical model of a MongoDB structure. Here’s ER/Studio Data architects, we’ve got a very standard logical data model of our book with our library system here, I’ve got books that contain chapters and sections published by a publisher.They then can be checked out to patrons within a library and our publisher and our library, our patrons can have addresses. So, very standard logical data model,everything is provided with primary keys. Those primary keys have been propagated as foreign keys. When we create our hierarchical structure and in this case we’re going to use Mongo, we need to make some decisions around containment. Which entities are going to be contained within other entities. Now, some of them are very obvious. Here we’ve got these identifying relationships from book to chapter and chapter to section.
It’s pretty obvious that a chapter will be contained in a book and a section contained in a chapter. For other entities, things get more complicated. For our checkout we might want to make a decision. Do we want to put our checkout inside just our book or do we want it to exist as a substructure within our patron or even our library or all three? Very commonly within MongoDB and JSON, denormalization is important to improve our performance. So it might repeat objects. The address for instance, is a good example of that. We might want to place the same address within our publisher, a library and our patron. So first decision is on the containment. Now what we’ve done is we’ve added in a new property against our relationships and you can specify is this relationship a containing relationship and if it is, what is the direction of containment?
So in this case, pretty straightforward. The parent will be the container, a book will contain our child entity of chapter. Now, for some other relationships we want to make them containment but we might want to change the direction. Our standard relationship here in the logical is that the address is the parent entity in the relationship. We’re storing the primary key, the address against our publisher as a foreign key. In this case we want to reverse the containment. We want to be able to store within the publisher the address. We’re going to reverse the direction by unchecking parent’s container. We’re going to go through our model like this. We can also display these on the diagram to make it easier to understand.
Physical Data Modeling
The next step is to generate our physical model. It’s a usual practice from the logical generate physical so I can go and select MongoDB and then I’ve got some options.
Here we’ve got some options that allow us to define all these entities. Which ones are going to become top level collections and which ones are going to be nested objects substructures in the hierarchies of those collections. Now we’ve got options to say convert everything to a collection, everything to a nested object. We’re going to use this new rules based approach. I’ll explain how this works in a moment. We can also decide on how we handle any foreign key properties that have been propagated in the logical model. Do we want to delete them or make them native? Sometimes it’s useful to retain those foreign key properties. If I hit finish then it creates this pseudo hierarchical structure. Again, we’re using these little lines here to show that there is a containment relationship.There these dash lines here showing that there’s a MongoDB referencing relationship. The process that it goes through is first of all creates an object for each entity, then looks at all the lines for the objects and then creates the lines according to the rules.
If it’s a containment line then it creates a containing relationship. You can see here between chapter and section and book. Once it’s created all the lines, it then looks at each objects and looks at the lines and says am I contained in any other object? If I am, then I must be a nested object. So section and chapter are both contained.Checkout is contained within book but book is not contained within anything. Likewise with publisher, library and patron. It makes those top level collection object based on that initial automatic behavior. We could then change that where it sets the object to be a collection. It puts in the surrogate primary key the underscore ID field. I can change this model. Now, if I’m not happy with the rules that the tool is apply then I can go through and select an object and switch the class and change it from nested objects to collection.
If I do that then it will take away the surrogate key or apply the surrogate key and it’ll look at the lines and try and change those as best as it can based on the rules. Or I could go back to the logical model and change the logical model and use compare merge to propagate those changes and it should update the model accordingly.
Visual Data Model
The next thing we’ve introduced is a different way of visualizing the model. Here we’re looking at the pseudo hierarchical model and if I select diagram objects and sets roll that contained objects, it now gives us a view that is very much more like the MongoDB structure. We can only see the collections and all of the nested structures are shown as a hierarchy within it and we can expand or contract these structures. Now, I’ve just generated the model again, I’ve introduced an error into the model.
Now ER/Studio is telling us that this line here is invalid. The logical I didn’t set that containment relationship and if I open up the relationship what we’re trying to do here is create a MongoDB referencing relationship from the address to the library. So the address is the pair intensity. What this would try and do if I actually drew this in the model would be trying to propagate the primary key of address to library. Now,address being a nested object, it doesn’t have that surrogate primary key that underscore ID property. So it’s not valid. I can’t do this in Mongo. The little ghostbuster is telling me that the line is invalid. I’ve got two options here. I can either delete the line and recreate it, maybe as a containing relationship, or I can go back to the logical. This is the standard way of working with ER/Studio.
We make all changes in the logical and then we’re going to use compare merge to propagate those changes back to the physical. That’s what we’re going to do here. If I run compare merge against my physical model, it’s telling me that the only difference between these two models is the relationship here, saying that the logical is containing and the
physical it’s not containing. As usual, I can merge all those changes to the target. Presto, it’s now changed that line to be a containment line, which is valid. The ghostbuster symbol is gone. Now, we’ve done a lot of work with our in place edits of models. A lot of our customers are saying well, as well as being able to generate physical models from logical, we also want to be able to create them directly into the logical. Just rough something out and they may be used, compare merge, then generate logical from it later on.
So, going back to our rolled up view here, we’ve added in some nice new features for editing the physical model. If I shift click on a field, I can add in new fields. If we add in country, immediately specify the data type of that so I can do everything through the keyboard. Now notice as I’ve edited the address object here within library, it’s also edited the reused address in my other structure. Again, I might be using a denormalized approach, but I’m trying to encourage that standardization by reusing the object. We’ve also added in some nice features to be able to better promote or demote fields between objects. Using the tab key, I can promote car park size to be a field of library or move it to be a field of the address.
I can also do the same thing for a JSON model. Again, starting from my standard logical model, with my additional markup for containment, I can generate a JSON structure in my model.
We’re retaining the referencing relationships, we can leave these as documentation lines. Here it’s created four top level objects. Now, I might want to make some decisions and say okay, I really only want one top level objects. If I was to generate JSON for this right now, it would put a wrapper around the four objects. I’ve got options here. I can either go back to the logical as we did with the MongoDB model and change our containment relationships, or I can modify the model directly. Again, so using my containment line, I can specify that the library has patrons, I can specify that the book has a publisher and that a library contains books and there’s my JSON structure. At any point I can open up the objects and see the fields within it. We’ve got some new capabilities to be able to switch between those objects nice and easily.
I can also see the JSON structure within my sample field here. Not only can we see the JSON instance version, we can switch and say show me the JSON schema for this particular object. We’re supporting draft seven and draft 2020.Here’s the draft seven schema version of our library object. As usual, I can go from the JSON and generate database.Again, same mechanism, say give me the JSON schema as a file.
Now like we can for SQL models, we can also import code and reverse engineer the model. Here I’ve got a JSON structure representing my book with some chapters and sections. Now I can go to our studio and import this model.Now once I’ve selected that this is JSON, it will determine automatically whether it’s an instance file or a Schema file and then it will reverse engineer the file and create a model for me.
Here’s our JSON structure we’re just looking at, we’ve got a book object that contains chapters and sections and there’s all the fields. Now as usual, when we reverse engineer a physical model, it will also automatically create for us a logical model from it. And here is that logical model.
Load JSON Model into SQL Server
In the scenario here we want to take data from a JSON model, a hierarchical model, and we want to load it into a SQL Server database. We need to be able to design that SQL Server database. As usual with the ER/Studio we can from a physical model create the logical and then from the logical create a new physical model from it. So let’s go ahead and do that. Now the first thing we need to do if we want to create a database, then we need to get it to third normal form. First job is to provide some keys for each of our entities.
I can open up each of the entities and go through and decide on the keys. For a book, how do we identify a book? It’s going to be the author and it’s going to be the name or title of the book. Now automatically the tool is propagated the foreign keys through the model. I can then go to chapter and say well how do I identify a chapter? We’ve got our foreign keys of author and name, that’s great, but we also identify by the chapter number and likewise the section.Once we’re happy that the model is then normalized, I can then generate any physical model for any database. Here Microsoft SQL Server 2019 and generate the model. Now from this I’ve got a physical model, I can then go on and then generate my SQL file, create the database directly and here is my SQL code that then creates the SQL server database from our book model.
The last thing I wanted to talk about was how do we connect our MongoDB and JSON models into a data governance program. To go match this slide here, we can see that we’ve got our project models representing our application databases, our JSON messages, our MongoDB structures and data warehouse. Now, we can link both of these either into the Enterprise logical model and or across to our business terms and our Business Glossary.
We could potentially relate every physical field in every object in our models of our data assets directly to business terms. We can link them via the project level logical model or via the Enterprise Logical data model and link all the way up. I’m going to show you in the tool how we can connect directly to business terms. I think we’ve already seen how we can take a fragment of the Enterprise logical model and then use that to generate our JSON messages or our MongoDB structures.
So we’ll link to the business terms. Let’s look at that. Okay, so here we are in Data Architects with our book example. Again, very simple. I can open up an object this classification process. We really want the people that totally understand the data asset to be able to make that mapping to the business terms. We don’t want to rely on our data stewards to have to go and find out about the objects. Against each object in my model, I’ve got a Glossary tab, the same for each field, I hit Add term and it gives me a list of all of the business terms in the business Glossary that have the name of the object or the field in its name. Obviously we can change this search term. Here we can see in our business Glossary, I’ve got four terms containing Book in Library, net Book Value and Accounting open Book Management.
MongoDB and JSON
Yes, it’s going to be Book in Library. I attach it to the term hit Apply and now it’s mapped. I have to do this on objects that are checked into to the repository. So, to summarize, with El Studio 19 two, we can rapidly document and design MongoDB and JSON structures. Those hierarchical structures. We can start from standard logical data model faithfully representing information and convert that to a hierarchical structure and then generate code from it.Likewise, we can reverse engineer the JSON in a MongoDB or a plain JSON structure or even a JSON schema, create a physical model of it. That physical model then automatically generates a logical relational model from it, which we can then use to document it. Standardization is really important in data management. We want to be reusing concepts,standard concepts. When we’re talking about a client, then we’ve got a standard set of fields for that client.
Enterprise Logical Model
We understand how to identify that client. We understand what all of those pieces of information mean so reusing either an Enterprise Logical Data Model or smaller project level Logical Data models gives us that standardization and that reuse. We can connect our models into our data governance program either via the Enterprise Logical Model or directly to business terms. Okay. I hope that was useful. Thanks for watching. If you want to speak to us more about this, then contact us at sales.idera.com.
Topics : Data Governance,Data Modeling,Metadata,
Products : ER/Studio Data Architect,ER/Studio Data Architect Professional,ER/Studio Enterprise Team Edition,ER/Studio Team Server Core,