Data Modeling for Hadoop Hive

With ER/Studio Data Architect

ER/Studio Data Architect Support for Hadoop Hive

Hi, thanks for joining. We're going to take a look at yours. Do you do a data architect and its support for duke pipe? I will show you reverse engineering. How you pie a set of tables to show you how to create an edit. I have tables. And then for an engineer, this a pipe

Reverse Engineering with ER/Studio Data Architect

I'm going to start with reverse engineering. Bring up data architect. I'm going to go to file new would choose the option to reverse engineer and then click on login. I'm going to choose the native direct connection and choose hive from my list. I'm going to enter the data source name. You have the option here to use Kerberos authentication. I'm going to hit next from here. You can select your database that you want to reverse. Engineer. I only have the default, so I'll select that and click.

Okay, I'm going to hit next again. Here is a list of tables that we can reverse engineer. If there are certain tables that you don't want to reverse engineer, just highlight those and bring those to the left. In this case, I'm reverse engineer, everything. Someone move everything over to the right. I'm going to hit next. Since hive doesn't support relationships, we won't be able to infer any referential integrity on this page. We do have the option to refer domains. We can treat a domain for each of your columns in this database and domain and your studio D architect is a reasonable call. So if you have a column that in multiple tables that are always the same, then you want to use a domain so that when you make a change, you only need to do it once.

Applying Naming Standards

Instead of multiple times, I'm going to hit next. You want to keep this as a relational database, you can apply naming standards template here. A naming standard template allows you to expand any abbreviations on the physical side to a friendly name on the logical side. No I'll hit finish here.

Physical Data Modeling in Hadoop Hive

You'll see that we now have a logical model and a physical model. The logical model is database independent, and your physical model is geared towards hive. I am just going to concentrate on the physical model today, up on the top, you have your table button, which allow you to create new tables so it can create a couple of tables here. If you want to add some relationship lines for viewing purposes, you can do that here. This will not generate any relationships on the database itself. Here.

Data Types by Hive

I'm going to double click on this publisher information table and I'll show you what we can do here. Here, you have a column called logo with the data type of map. You have the pub info column with an array data type here. These are specific data types to hive. There are also a few hive specific tabs in the table editor as well. We have the storage tab, which contains all these storage information by hive. We have the bucketing tab and here we have the skewed by tab. These tabs are all specific to hive.

DDLs

You can also take a look at the DDL tab to take a look at the DVL that you'll see generated for this specific table.

I'm going to click, okay, here, once you're ready to forward engineer this, you can go to database generate database, and this will generate a detail strip for you that you can use to create the objects in your hive instance. On this page, you can choose whether to generate a single order script file or into multiple files. I'm going to hit next here. You can choose what objects you want to generate your deal script. You also have the opposite here. You can choose what objects you want to add to your DVL strip. You have the option to add some drop statements. And if you add any pre or post sequel statements, you can select those options here as well. We also have our table storage information.

So if you added any information in the storage parameters, you'll want to check these boxes here on the general tab, we can close names and back ticks. If you want to do that, and you can choose the delimiter as well. We can click on the SQL preview. This will give you an idea of what the SQL will look like. So here we have the create tables information, and if you want to make any changes, you close this, come back and make those changes here.

Generate SQL Code

Once you're satisfied with what you have and you hit next, and this will give you a summary of what we're doing. And if you hit finish, we'll generate the SQL script. So by default, we generate this in our universal SQL, which is packaged with your studio. You can change this and tools, options to choose a different application.

Conclusion

If you want to use that to launch and run your DBL script. So I've quickly walked through how to use your Stu data architect to reverse engineer modify, and then Ford engineer who do pipe instance. Thank you for joining for more information. Here's a link to our website.