Transcript for:
Data Modeling Tutorial by Ajay Kumar

hello welcome to this data modeling tutorial my name is Ajay Kumar and I have more than 11 years of work experience in the data field if you would like to know more about me either you can check the description or also you can visit my LinkedIn profile please connect with me and feel free to ask any questions if you have anything related to your career or related to this tutorial in this tutorial we are going to discuss everything about data modeling what is data modeling why do you need to use it what are the different kinds of schemas relationships and not only that at the end of this tutorial you would find a workshop this Workshop is specifically designed if you are working on power bi and you need to do the data modeling this data modeling tutorial would help you to get the basics of data modeling as well as the advanced concepts in the data modeling and whether you are a new in the data field or you are an experienced one this tutorial is going to definitely help you out so with the help of this tutorial you can definitely kick off your career and if you are already in the data field and you have experienced then this tutorial is also going to help you to clear some of the concepts probably that you don't know yet so let's see what do we have in this tutorial in this tutorial in the very first section we are going to discuss about the data modeling overview that means introduction to data modeling what is data modeling why do we need to use it then we will go through the data model and what are the different techniques in data modeling then we are also going to discuss why should you go for the data model can you work without the data model or node and why to use the data model we will also discuss the different types of data models and their characteristics and then we will analyze the advantages and disadvantages of a data model in the second section you would get to know about the factors Dimension what is affectable what is the dimension table and what is the primary key what is a foreign key why do you need them into your data model and also we are going to discuss about the different types of facts and the dimensions we should move forward after that and we will lead into the section three where you would get to know about the relationships in a data model what is the relationship in a data model or in a database why do you need it what are the different kinds of relationships and also we are going to have a small lab over there where you where you will get to know about the relationships in a data model then we are going to jump into the section 4 where you will get to know about the different schema in data modeling what is the schema what are the different kinds of schema what are the advantages disadvantages how to use them everything you would get to know in this section and in the last section of this tutorial you would get a workshop that means your hands-on experience on data modeling where we are going to do the data modeling in power bi we are going to manage the different kinds of relationships also you would get to know about the facts and the dimension table how you can connect into your power bi and how you design your data model please remember that this tutorial is not just about getting the theoretical knowledge about the data modeling it's about the Practical experience as well not only that you would get some quizzes as well so that you can check your knowledge what you have understood so far one more thing guys if you have any question concern or any feedback please don't forget to let us know you can provide your feedback over here also if you are not satisfied with this course then you can get your money back with that all said let's move to our very first section where we are going to get to know more about data modeling the very first question comes into our mind is what is data modeling data modeling is the process of creating data model for the data to be stored in a database this data model is a conceptual representation of a data objects the association between different data objects and the rules now you know what is data modeling so what would be our next step now we are gonna have a look into another aspect of that that why do we need to use the data modeling why is it so important I'm sure you are working on many of the different reporting or bi system or even in the database modeling but there also you need to do your data modeling and if you don't know then you should know this answer that why you need data modeling well data modeling helps in the visual representation of data and enforces business rules Regulatory compliances and government policies on the data data models and show consistency in naming convention default values semantics security while ensuring quality of the data these are the main reasons that it's so important also while searching data from the different data sources you also design your data model if you are working in power bi and it's not only probably any other data visualization tool or any database you need to do your data modeling so that you can get the data from the different entities so let's discuss the data model well the data model is defined as an abstract model that organizes data description data semantics and consistency constraint of data the data model emphasizes on what data is needed and how it should be organized instead of what operations will be performed on the data data model is like an architect's building plan which helps to build conceptual models and set a relationship between data items there are mainly two data modeling techniques one is known as entity relationship model and another is known as uml or unified modeling language now let's get into deeper and let's try to understand what are these the very first we are going to discuss about the entity relationship data model so what is it well entity relationship model or ER modeling is a graphical approach to database design it is high level data model that defines data elements and their relationship for specified Software System you should also remember that an ER model is used to represent Real World objects an entity has a set of properties now an entity is a thing or object in real world that is distinguishable from surrounding environments for example your table in your database now let's talk about the uml diagrams or another data modeling techniques which I was saying unified modeling language uml diagram stands for unified modeling language it is a standard which is mainly used for creating object oriented meaningful documentation models from any software system present in the real world it provides us a way to develop Rich models that describe the working of any software Hardware system however you should note overhear uml is an essential part of creating an object-oriented design of systems it provides you means for creating powerful models and designed for rational systems which can be understood without much difficulties now another question arises why use data model well in the very first part I told you that what are the benefits of a data model or why actually you should use it so now let's discuss some of the primary goal of using data model the very first comes over here it ensures that all data objects required by the database are accurately represented that means you have your different tables you have different objects into your database and you have to get the data from them using your SQL language so in that case if you are using a relational database how you can connect all those different objects how you can get the data that is gonna handle by the data modeling part you should also remember that omission of data will lead to creation of faulty reports and produce in current results that means if your relationships are not correct if your data modeling is not correct then definitely you won't get the accurate or correct results a data model helps design that database at the conceptual physical and logical levels so I am going to discuss each of them one by one later on that what is the conceptual data modeling what is physical and what is logical so these are the basically three different kinds of data modeling techniques that we use while doing the data modeling over here you should also note that data model structure helps to define the relational tables primary and foreign key and stored procedures so whenever you are going to connect between two different tables then there should be a relationship between them in our data modeling we generally do it using the primary and foreign key relationship another primary goal of data modeling is it provides a clear picture of the base data and can be used by database developers to create physical database it is also helpful to identify missing and greater than data and lastly though the initial creation of the data model is labor and time consuming in the long run it makes your ID infrastructure upgrade and maintenance cheaper and faster so these were the primary goal of using a data model there can be many more cases but that depends on the case to case why you are using and what you are doing well basically there are three different kinds of data models namely conceptual data model logical data model and physical data model conceptual data model is the data model that defines what the system contains this model is physically created by business stakeholders and data architects the purpose is to organize scope and Define business concept and rules so this is a very high level data model next to this comes the logical data model which is going to Define how the system should be implemented regardless of the dbms that is database management system this model is typically created by data Architects and business analysts the purpose of this is to develop technical map of rules and data structures and lastly the physical data model is the data model which describes how the system will be implemented using a specific dbms system this model is to be correlated by DB and developers and the purpose is actual implementation of the database now we are going to dig deeper into the different kinds of data models one by one and the very first comes the conceptual data model a conceptual data model is an organized view of database Concepts and their relationships the purpose of creating a conceptual data model is to establish entities their attributes and relationships in this data modeling level there's hardly any detail available on the actual database structure business stakeholders and data Architects typically create a conceptual data model now let's consider an example in the example you can see you have two different tables customer and product these are your entities The Columns inside them are gonna known as your attributes and the relationship between them is going to be your relationship they have certain characteristics of a conceptual model the very first is it offers organization wide coverage of the business concept secondly this type of data models are designed and developed for a business audience and lastly the conceptual model is developed independently of Hardware specifications like data storage capacity location or software specifications like dbms vendor and Technology the focus is to represent data as the user will see it in the real world here you should note a very important point that conceptual data models known as domain models create a common vocabulary for all stakeholders by establishing basic concepts and score now we are going to discuss about logical data model The Logical data model is used to define the structure of data elements and to set the relationship between them The Logical data model adds further information to the conceptual data model elements the advantage of using a logical data model is to provide a foundation to form the base for the physical model however the modeling structure remains generic here you will see the table customer and product and there is also certain characteristics for this logical data model at its data modeling level no primary or secondary key is defined at this data modeling level you need to verify the adjuster connector details that was set earlier for relationships and if we talk about the characteristics of a logical data model then you should remember it describes data needs for a single project but could integrate with other logic data models based on the scope of the project secondly it designed and developed independently from the dbms thirdly data attributes will have data types with exact precisions and length and lastly normalization processes to the model is applied typically till third normal form now we are going to move to the last part which is the physical data model a physical data model describes a database specific implementation of the data model it offers database abstraction and helps to generate the schema this is because of the richness of metadata offered by a physical data model and if you don't know what is metadata metadata is a data about a data that means suppose you have your different columns in a table now what is the type of the data inside that what is the length so those are known as the metadata that means data about data the physical data model also helps in visualizing database structure by replicating database column Keys constraints indexes triggers and other rdpms features here you define your primary and secondary key as well so if we talk about the characteristics of a data model the very first is the physical data model describes data need for a single project or application though it may be integrated with other physical data models based on Project scope secondly physical data model contains relationship between tables that which addresses cardinality and valuability of the relationships thirdly it developed for a specific version of a dbms location data storage or technology to be used in a project number four physical data model columns should have exact data types lengths assigned and default values and lastly physical data model has primary and foreign Keys views indexes access profile and authorizations Etc which are well defined inside it okay now we are going to discuss first the advantages of a data model the main goal of a designing data model is to make certain that data objects offered by the functional teams are represented accurately so we are doing the data modeling so that we can get the accurate data while querying the data or while querying the database or while getting a information in a report the second one is the data model should be detailed enough to be used for building the physical database thirdly the information in the data model can be used for defining the relationships between tables primary and foreign keys and stored procedures also data motive helps business to communicate within and across the organizations and data model helps to document data mappings in ETL processes lastly another Advantage is data model helps to recognize correct sources of data to populate the model now we are going to discuss about some of the disadvantages as well and the very first is to develop a data model one should know physical data stored characteristics secondly this is a navigational system produces complex application development management thus it requires a knowledge of the biographical truth thirdly even smaller change made in structure requires modification in the entire application and lastly there is no set data manipulation language in DPMS so these were some of the disadvantages of the data model now we are going to summarize it so let's have a look at the summary what we have done so far the very first we learned data modeling is a process of developing data model for the data to be stored in a database then we learn data models and shows consistency in naming conventions default values semantic security while ensuring quality of the data then we also learn that data model structure helps to define the relational tables primary key foreign key and stored procedures then we learned that there are three different kinds of data models that is conceptual logical and physical then you also got to know that the main aim of conceptual model is to establish the entities and their attributes and their relationships in case of logical data model which defines the structure of the data elements and set the relationship between them and finally we also learned about the physical data model which describes the database specific implementation of the data model here you should remember that the main goal of a designing data model is to make certain that data objects offered by the functional team are represented accurately the biggest drawback is that even smaller chain made in structure requires modification in the entire application which is a disadvantage of data modeling so this is it for this first module before going into the fact and dimension table we would get to know what is the primary key and what is a foreign key it's just a brush up I am sure you already know the key differences between a primary key and a foreign key it is for someone who is very new into Data field our primary key is used to ensure data in the specific column is unique it is a column cannot have null values it is either an existing table column or a column that is specifically generated by the database according to a defined sequence for example you can see a snapshot at the bottom left corner of your screen where I have a table name student and then I have another table named student course so in the student table student number is going to be used as a primary key because it doesn't have any null value and also all the records are unique in fact if we talk about the student name that cannot be used as a primary key because it has duplicate data and if I talked about the student form that has enough values of this column also cannot be used as a primary key now if you talk about the foreign key so primary key in one table can be a foreign key for another table but that means a forum key is a column or group of columns in a relational database table that provides a link between data in two tables if you need to join two tables that means there should be some link for example over here you can see that student table has student number as a one column and the same column is in the student course table however in student course table the same column has duplicate values so definitely that cannot be used as a primary key however it can be used as a foreign key over here so that is the main difference between a primary key and foreign key over here you should remember one more thing that a primary key cannot contain null values and why it cannot contain null values now moving forward we are going to use this model or the diagram or a data model representation to depict what is the fact and what is the dimension table over here the middle table its effect marks table this is your effect table and all other surrounding tables are known as your dimensional tables over here so this is just for the representation only we will talk about this later on so let's move forward so the very first question arise what is a fact table well affectable is a primary table in a dimensional model Y is primary because all the measures or all the calculations are inside this table and this table basically contains numeric data only however it contains some non-numeric data also like for example transaction IDs invoices IDs Etc but that's a later part to discuss over here you would notice two things one is measurements and second is foreign key to the dimension table now perhaps you are wondering what is the measure in a data warehouse a measure is a property on which calculations for example some count average minimum maximum can be made so generally we need to use a lot of calculations for example total YTV total MTD total amount Etc that all we do using calculations for the measures now we are going to move forward and this time we are gonna see where is affectable into our data model and over here this is a representation of a data model where you can see clearly your fact table is there all the foreign keys are in the fact table and those foreign keys are connected to all the dimensional tables where you will find your primary keys in the data model there can be different types of schemas for example this schema that you are looking over here is a representation of a star schema however there are other kinds of schema as well which we are going to discuss in our subsequent videos now what is the dimension table a dimension is the structure that categorizes facts and measures in order to enable user to answer business questions commonly used dimensions are people products please and chime well anything which is numeric you can consider it as effect and anything which is descriptive you can consider it as a dimension people and time sometimes are not modeled as Dimensions as well so please always be mindful about that let's come back to the same data model that I was just discussing couple of minutes back here you will see that fact table has been surrounded by many other tables for example team student parent dim manager Etc all these are starting with dim that means these are dimensional tables and all of these tables has their own unique property for example team student table has your student name surname Etc so these are the descriptive or you can in another words if you're gonna work in power where you can also say that Dimension tables are going to help you to slice and dice the data so that you can see according to different dimensions for example if you would like to view the data for different countries then you can have a slicer as a country names and this is going to come from the country table which is another dimensional table now let's discuss some of the key differences between effect and a dimension table flag table contains measurements metrics and facts about the business process while the dimension table is the companion to the fact table which contains descriptive attributes to be used as query constraining fact table is located at the center of a star or snowflake schema whereas the dimension table is located at the edges of the star or snowflake schema so what is a star schema or what is the snowflake schema or what is a Galaxy schema everything we are going to discuss in our upcoming videos track table is defined by the green or its most Atomic level whereas Dimension table should be wordy descriptive complete and quality assured track table helps to store report labels whereas Dimension table contains detailed data and lastly fact table does not contain a hierarchy whereas the dimension table contains hierarchicals so these were the key differences between effect and dimension table over here we are going to have a look about the differences between Dimension table versus fact table so guys over here you would find these all artifacts are more or less like the same as I just explained you previously if you would like to read it please pause your screen and have a look if you have any question or concern you can let us know now question comes what are the different types of facts there are basically three types of facts additive semi-additive and non-atitude we cannot apply all the functions such as some Max mean average percentage Etc that we use for the calculations to all the facts So based on that we have the three different kinds of sites so the very first comes additive measures should be added to all Dimensions that means you can apply all the functions over there you can apply Max mean percentage average awesome Etc everything on that semi-redited measures in this type of facts measures may be added to some dimensions and not with others for example consider the price or currency rate sum is meaningless on rate however average function might be useful and lastly the non-editive one it stores some basic unit of measurement of a business process some real world example includes sales phone calls and orders for example five percent of profit margin Revenue to asset ratio Etc all these are non-editive facts here these are based on the database Concepts because in data warehouse we use the facts and dimensions while doing the data modeling the very first is confirm dimensions confirm Dimension is the very effect to which it relates this Dimension is used in more than one star schema or data marked generally whenever we are building a Galaxy schema so in that Galaxy schema we have more than one fact table and then we are joining those two fact tables using dimensional tables and those dimensional tables we have Dimensions inside them and those dimensions are going to be known as confirm Dimensions next is Outrigger dimensions a dimension may have a reference to another dimension table these secondary Dimension is called out trigger Dimensions this kind of Dimension should be used carefully for example you have country data the next two country you have your provinces then cities and all these are connected with each other so those are known as Outrigger dimensions then it comes with a shrunken roll of Dimension Shrunk the role of Dimension are subdivision of rows and Columns of a base dimension these kind of dimensions are useful for developing aggregated fact tables Dimension to Dimension table joins well it's very important Dimensions may have reference to other dimensions as I just discussed it can be product to subcategory as well however these relationships can be molded without trigger Dimensions as I have just given you example of country states Etc next from the role playing dimensions that means a dimension can play more than one role for example you did it can be any date can be date of birth or they can be data folder a single physical Dimension helps to reference multiple times in fact tables as each reference linking to a logically distinct role of the dimension so as I just said date can be many like date can be date of birth date can be data folder or there can be anything else now it comes to the junk dimensions it is a collection of random transaction cohorts flags for text attributes it may not logically belong to any specific dimension so generally we keep them in Separate Tables for example Flags indicators Etc then comes the degenerated dimensions degenerative dimensions are without corresponding Dimension it is used in the transaction and collection snapshot of fact tables this kind of Dimension does not have its Dimension as it is derived from the fact table next is swappable dimensions they are used when the same fact table is paired with the different version of the same Dimension and lastly is the step dimensions which are sequential processes like web page events mostly have a separate row ineffect table for every step in process it tells where the specific steps should be used in the overall session so all these were the different kinds of Dimensions if you would like to know more you can read about more data warehousing Concepts and there you will get to know so what's next in our next video we are going to discuss about the relationships in database it is very important for you to get to know what are the different kinds of relationships there can be one to one one too many or many too many and it is also very important to get to know which kind of relationship use when or which kind of relationship you should avoid everything you will get to know in our next video so please stay tuned what is a relationship in database a relationship in the context of databases is a situation that exists between two relational database tables when one table has a foreign key that references the primary key of the other table relationships allow relational databases to split and store data in different tables while linking disparated items what is a foreign key what is a primary key I have already discussed in the last video if you don't know please go and watch that video once more whenever we are working on a relational database we need to get the information we need to query the information using SQL language and in order to get the information we should relate tables from one and another that we do using primary and foreign key relationship but how these relationships are going to actually work we will get to know in this video as relational database is a type of database that stores and provides access to data points that are related to one another relational databases consist of tables columns rows keys and relationships between them and there are many more data objects over there in a relational database so you should brush up your Concepts regarding the database or database fundamentals if you don't know and if you have any specific query you can ask me so this is a kind of a data model that we're preparing for RBI but similarly you can do as a tableau model as well if you are working with SQL Server analysis services data model are going to be more like the same it just depends how you are going to relate your different tables and in our very first video we have already discussed what is a data model what are the different types of data models Etc let me show you something over here right now in my power bi I'm designing a data model I always suggest everyone that you should spend 70 to 80 percent of your time on data modeling whether you are working in power bi or database or anywhere else reason being this is the most critical part in your job once it's gonna be all right then everything is going to be all right so over here you can see that it's quite messy because there are lots of tables and you can zoom it though over here but it's not gonna give you the complete view over here but for you I have divided this into the different parts where we are going to discuss the different types of relationships so let's first go through the different kinds of relationships then we will see in a demo how you are going to manage these relationships and what are these relationships they are basically three different types of relationships one to one relationship one to many and many too many relationship so what are these let's discuss one by one the very first terms one-to-one relationship that means in one table item you would have correspondingly only one item in another table for example you have two tables sales and sales order and these are connected using sales order line ID so there would be only one big or corresponding to sales in sales order table and also if you would reverse it that means in sales order table if you have one record then also you would have correspondingly only one decode not the multiple records so this is known as one-to-one relationship second terms one too many relationships that means in one table you have one record but corresponding to that in another table you would have many records for example let's talk about date so date is our dimensional table and in date table we have the unique values only we don't have any duplicate values but corresponding to that in sales table there would be multiple values of date why because in sales table we have many orders or we have many records for one date for example suppose I am working in a manufacturing company and I have my order dates so my sales table has my order date and when I'm gonna connect this with my date table so corresponding to one date I can have multiple orders that means there would be multiple records over there similarly you can consider it with the target table or sales to product table and all by this means simply that you would have multiple records corresponding to one record and whenever we are designing the data model it's a preference generally to have one-to-many relationship one would be towards your dimensional table and many would be towards your sales table in this diagram you can see that star Mark is always corresponding to the many relationship and one is for your one relationship that means where you have the unique record so that's how it's gonna work now before going to the money to many relationship we will talk about the cardinality but what is cardinality so guys cardinality describes the relationship between two data tables by expressing the minimum and maximum number of entity occurrences associated with one occurrence of a related entity either there is a relationship or not connectivity is the relationship between two tables it can be one two one or one too many so with this diagram you would understand what is the cardinality that means how many relationships are there and connectivity is simply a relationship which is between the primary and foreign key now let's discuss about many to many relationship so here you can see your order table is there then you have your fulfillment table and these are connected using the order line and in order table also you would have the multiple records of order line and corresponding to that in the Fulfillment table you have multiple records of that order line S1 so that's how these are in many to many relationship you should know that many to many relationship causes duplication in database duplication caused false results from queries that means it's highly likely that you would get ambiguity when you are using many to many relationships and also there are adverse effects on the performance of your report or your data model when you are using many to many relationships so you should be very much aware about when you are going to use many to many relationship one good key is to choose the many-to-many relationship can be between your two dimensional tables or maybe when you are applying the row level security into your data model then also you may use the many to many bi-directional relationship over there now let's see how to prevent duplication the solution is generally to use a bridge table or joint tables in this diagram you will see product component is a bridge table which is between the product and component table so it's taking one side the component ID another from the product ID and then you are joining these tables between product and product component you will see these are one to many relationship over here double bars means one relationship and where you see this triangle kind of many lines over there this is many relationship over there and similarly between product component to your component so ideally between your product and component table you have many to many relationship entities relationship diagrams helps to make and understand entity relationship data model that's why I always try to First draw your entity relationship diagram so that you can understand it better so let's discuss somebody over here before jumping into our demo relationship database has the ability to create meaningful information by joining tables joining tables helps to understand the relationship between the data and tables besides relationship database eliminate data redundancy relational databases has three relationship table relational database has three relationship between tables that is one to one one to many and many too many many to many relationship is harder to understand than other relationships that's why I always suggest be careful before using this relationship the bridge table makes the database and relationship easy to understand and it prevents data redundancy now let's get back to our power bi file where we would try to explore these relationships and over here once again so the very first we are going to see one too many relationships so as I mentioned that you can see in this data model there is a fact table and around fact table there are many dimensional tables such as team tape dim cells territory dim currency dim customer team promotion and team product all have a relationship with a fact table and these dimensional table and fact table has a one-to-many relationship star corresponding to many and these one corresponding to one side of the relationship always remember that arrow is gonna go from one side to many side now if you would like to edit this relationship you can do that just double click on any arrow and here you can edit it if you would see over here you would see one two one one too many and many too many relationships then there is a cross filter directions which you can set single or both depends on your situation and then you can choose your columns as well while creating those relationships over here now I'm going to cancel this there is one another way to manage your relationships in power bi you can directly come over here manage relationships and here you can even select photo detect relationships or you can edit or you can delete so all four options are available over here so depends on you what you want to use now let me show you one to one relationship so over here I have two tables usernames and test over here I have one username so here then there is a major and there is a task name and user between that one user corresponding to the another user in the test table and they have one recourse corresponding to another one call only that's why this is my one-to-one relationship and over here you will see the arrow is going in both direction that means it's a bi-directional that means you can filter one table using another or another table using the very first table so that's how bi-directional relationships are going to work lastly many to many relationships it's just for the depiction purpose only as I mentioned when you have multiple records corresponding to one record in one table and vice versa then you would have this kind of relationship you can consider with an example too for example in one school or in one class there are many students so one teacher is teaching many students and also one student is being taught by many teachers for example a math lecturer is coming into the class he is teaching many students but if we take one student out of those many students then he is going to be taught by many different lectures not only math it can be English it can be social science it can be a drawing or any other subjects so that's why there is a many-to-many relationships over here so Guys these were the different kinds of relationships in a data model now what we are going to do next in our next video we are going to talk about star schema versus snowflake schema so please stay tuned for the next video see you in the next video let's discuss very first what is DB schema over here DB refers to the database the database schema is its structure described in a formal language spotted by the database management system the term schema refers to the organization of data as a blueprint of how the database is constructed over here you should note that a database schema is like a skeleton structure representing our logical view of a whole database it devises all the constraint applied to the data in a particular database whenever organization engage in data modeling it leads to a schema now another question arises what is multi-dimensional schema multi-dimensional or Ms schema is specially designed to model Data Warehouse Systems the schemas are designed to address the unique needs of very large database design for analytical purpose we generally have two kinds of system oltp and OLED olep are designed to do analysis on historical data when you have a huge amount of data however oltp databases are transactional databases oltp database example can be your banking database or it can be your sales database however olap olap are the collection of your multiple databases or in an organization you can see Enterprise dataway house system is your or lab data warehouse system or database now there are mainly three kinds of schemas available that is star schema snowflake schema and lastly a Galaxy schema but as I discussed in my previous video mainly we are gonna walk on the start schema only but in complex scenarios in big organization there can be a need of snowflake schema or Galaxy schema let's discuss all of them one by one the very first comes the star schema star schema in data warehouse in which the center of the star can be one fact table and a number of associated Dimension tables we have already discussed in our previous video what is affectable and what is the dimension table if you haven't watched that video please go back and check that video once more it is known as star schema as its structure resembles a star the star schema data model is the simplest type of data warehouse schema it is also known as star join schema and is optimized for querying large data sets in power bi whenever we are doing the data modeling in 99 scenarios we will always stick with the star scheme only you will get to know everything in our last video of this session which is going to be the next video where I have explained with the demo how to connect with your table how to design a data model in power bi and what are all the points that you should always remember in order to design your data model this is a representation of a star schema where you can see in the middle you have a factory reset a sales table which is effectable and that middle table or the fact table is surrounded by many Dimension tables which are like dim sales Trader edem product indeed in reseller or dim employee table also you should note that all the foreign Keys you would find in the fact table while all the primary Keys you would find in your dimension tables Dimension tables are going to represent the descriptions of different aspects of your data objects however factable would generally have your measures or your numeric items likewise also over here you should note that ineffective but you can find some other items too apart from the foreign key there can be values there can be some transaction IDs or there can be some invoices IDs as well now let's move on the characteristics of star schema in Star schema every Dimension is represented with the only one dimension table dimension tables generally be used to slice and dice the data whether we are using multi-dimensional modeling or tabular modeling or data modeling in power bi the dimension table should contains the set of attributes also the dimension table is joined to the fact table using foreign key as I just explained in Star schema the dimension table are not joined to each error because that's going to happen in the snowflake schema fact table would contain key and measures only the star schema is easy to understand and provide optimal disk usage that's why it's being used mainly that's why it's being used majorly in all the data modeling Concepts the dimension tables are not normalized for instance in the previous image product ID does not have the product lookup table as an oltp design would have this is a design that I am just referring to over here product table is their dim product and there is no lookup table for product the schema that is star schema is widely supported by all the bi tools so this is another reason that majority of the bi systems are going to use the star scheme only now let's discuss about the snowflake schema snowflake schema in the data warehouse is a logical arrangement of tables in a multi-dimensional database such as the ER diagram resembles a snowflake shape a snowflake schema is an extension of a star schema and it adds additional dimensions the dimension tables are normalized which splits data into additional tables I briefly described it in my previous video where I have told you that for example you have country table then you can have other dimensional tables related to the country that means country to your Province to your city all these tables are related to each other and we can join these Dimension tables to one another that would be your snowflake schema over here you can see in this image there is a fact table there are the dimension tables around that but as I just mentioned snowflake schema is just the extension of the star schema and over here if you will see the dealer table on the top it has two other dimension tables related to this that is dimension table would be location and the country so those are the extension of the dealer table and that's why we are calling it The snowflake schema not only that if you would look at the right side of the fact table there is a product table and related to product there is a variant table and these are also connected that means that when the dimension tables are connected to other dimension tables it becomes your snowflake schema there are certain characteristics of the snowflake schema 2. the very first is the main benefit of the snowflake schema it uses smaller disk space while the star schema uses considerably more disk space in Snowflake schema it is easier to implement a dimension is added to the schema due to multiple table Square performance is reduced the primary challenge that you will face while using the snowflake schema is that you need to perform more maintenance efforts because of the more lookup tables so that's another reason that we generally don't use the snowflake schema now let's discuss the differences between the star schema and synovx schema a star schema hierarchies of the dimensions are stored in the dimensional table however in case of the snowflake schema hierarchies are divided into Separate Tables in Star schema fact table surrounded by dimensional tables however in case of snowflake schema one fact table surrounded by the initial table which are in turn surrounded by another dimension tables install schema only a single joint clears the relationship between the fact table and any Dimension tables however in contrast to the star schema in Snowflake schema requires many joints to fetch the data install schema database design is simple while in case of snowflake it's quite complex star schema denormalized data structure and queries also run faster however in case of snowflake schema you would find that database structure or the data structure is quite normalized and it would take more time to fetch the data because it has to go through the multiple tables or lookup tables to file data in Star schema high level of data redundancies there however in case of snowflake it's very low in case of star schema the single Dimension table contains aggregated data however in case of snowflake the data splits into different dimension tables install schema Cube processing is faster however in case of snowflake schema Q processing might be slow because of the complex join lastly in case of star schema it offers higher performing queries using star join query optimization also tables may be connected with multiple dimensions in contrast to this in case of snowflake schema the schema is represented by centralized fact table which unlikely connected with multiple dimensions so these were the major differences between the star schema and the snowflake schema now we come to the point what is a Galaxy schema whenever we work on the databases or database structure database modeling we generally hear only about the star schema and snowflake but it really comes to the picture about the Galaxy schema so now let's dig into this one what is it a Galaxy schema contains two fact tables that Shear Dimension tables between them it is also called fact constellation schema the schema is viewed as a collection of stars hence the name Galaxy schema but generally in some of the interviews I have experienced that people call it constellation scheme only and they generally ask what is a constellation schema they never mention it's a Galaxy schema so as you can see on your screen we have Dimension tables and there are two fact tables if someone is going to ask you how you are going to connect two fact tables so your answer should be using Dimension table so over here as a bridge we use the dimension table and in overall if you will see this picture you will find that there are two star schema so if you will see on the right hand side you would find that there is the fact table which is a product table and this fact table is surrounded by the dimension tables and same on the left hand side factable revenue is surrounded by the dimension tables so basically using the dimension table as a bridge you connect these two fact tables and this schema is known as your Galaxy or constellation schema now the very important point over here is in Galaxy schema shears dimensions are called conform Dimensions because these dimensions are being shared with other tables as well or Dimensions as well now let's discuss the characteristics of Galaxy schema the very first is the dimensions in the schema or the Galaxy schema are separated into separate Dimensions based on the various levels of hierarchy for example if geography has four levels of hierarchy like region country state and city the Galaxy schema should have four dimensions moreover it is possible to build this type of schema by splitting the one star schema into more Star schemas as I just mentioned the dimensions are large in schema which is needed to build based on the levels of hierarchy this schema is helpful for aggregating fact tables for better understanding so these were the most important characteristics of the Galaxy or constellation schema in the next video I am going to design a power bi data model and there again I am going to revise all the concepts that we have learned so far in data modeling see you in the next video this is very important and critical concept that you should be aware about if you want to make your career as a part of visual about oh you want to work in data analytics with power bi in this video you would learn a lot of things regarding the data modeling and in our previous videos we have already covered the concepts of what are the relationships what are different types of schemas what is data modeling and how it's going to help you to make your career into data analytics you are going to learn a lot of new Concepts such as how to create relationships between your data sources or the different data tables that you have then you will learn how to create a new phase using the calculated column after that we are going to discuss how to optimize your data model by hiding settings that means the unused data fields that you're not going to use initial report from your data fast then you are also going to learn how to create different visuals in order to perform a calculation secure data model that you are going to use into your power grade report after that we are also going to create a table using the Dax that means your calculator table and we are going to define the relationship between the newly created temporary cable with the other tables in your data model and At Last I am going to show you how you can format the data can also create the recent activity call up inside the same table so what I'm waiting for let's get started before going further I would like to highlight one thing over here you have to go first options and settings and under options there's an option to detect relationships automatically so in options and settings you would come under data load and there you will find the different settings one of them is detect column types and headers for unstructured sources then there's also one more setting which is the time intelligence so guys I recommend always switch it off unless you are sure that you would like to use this what it's going to do it's gonna create an automatic hierarchy from your dateable and because of that there would be lot of extra data is going to generate into your data model which can impact the performance so this is your first lesson for this video how to reduce your data set size another one you can go to your current file this data load for your current file and here you will find the type detection then there is a relationship this is the option which is very critical to know if you would like to Auto detect new relationship after data is loaded then you can check in otherwise you can uncheck this one but for this video I am going to keep it as it is so that you will get to know how it's going to detected relationships but I am going to show you how you can even change the relationship if you would like to or even you can delete those relationships from your data model over here you will get data from Excel book or you can go directly over here and get data then there is one more option over here import data from Excel so just click over here and I'll go down and there I have an introverse DW 2012. my all the data is inside this introduction tabs I can get it like the tables from the database you know as you can see there are the different tables listed over here you can click any of these table and you will get the keyboard so let's see for example fact internet sales table it's going to go through the video you can check over here what is inside this table what are the columns and all and once you decide okay this is the keyboard just click on this one then you can also select the corresponding tables once you decided about all other tables that you need you can click transform data if you would like to make any of the changes in terms of data format data types or you won't remove any of the columns or you won't perform anything now let's click on transform data and here you can see your editor is available over here now I have distribution tables it's saying distress so you can ignore it or if you would like to research you can refresh all of these tables now I can see that all the data has been loaded in the game day table I want to change this one column name which is my full date key so you can just double click over there and rename it that's our name then you will see there are lots of unnecessary fields in this table which I don't want as I mentioned in the agent.tech I'm gonna show you how to reduce the data size so this is another way that you can reduce the data size from your power editor model so what you can do you can just remove the unnecessary columns from this table so there are a couple of options to remove them either you can just use this remove column option or you can also click on this choose column over here and select choose columns so after that you can uncheck whichever you don't need so like this you can choose the column so here whichever you need whichever you don't need I just click ok otherwise there is one more option you can just click on this remove columns and there is remove column so remove other products that means if I select this column over here and I say remove other columns all other columns are going to remove so for example let's click so if you want to only select one column other columns have been removed now if you want to undo this step just come over here on your right hand side and remove this step from here so now you are going to get back over here so this is the way you can reduce your data model size after removing unnecessary data from your data model or the tables that you have loaded into your audio now once you make the changes you can also look for the other tables whether you need time or not so I'm not even going to do it quickly for all the tables so guys I have now removed all the NSC columns for my data model also I have checked other settings over here and it looks fine to me now what you have to do you have to come over here and just close and apply now our data has been loaded into Power B and you can see all the tables under this sales pane days in product inside strategory and second one access level if you would like to see how your data model is appearing or what is your data model diagram you can come over here and over here you can even zoom in and zoom out from this bottom slider you can see your all the tables are being over here if you would like to Auto arrange this layout you can do that just click on this plus button at the bottom line and over here just click on this Auto layout it's going to Auto arrange your device over here but right now I don't like it so I can arrange myself to you can switch on these properties over here if you would like to we will discuss this in a bit now you can see all the tables are clearly appearing over here however whenever you are working on power then make sure your table name is very much user friendly that means for business people also can understand these table names for that I really don't think so game product indeed Etc makes sense rather than that you should keep them very neat and clean what you can do you can change these table name for example you just like this table come to this property over here rather than deep product I'll just say this is my product table now we are coming into the next spot where I told you you can format your each field like for example data field how to format them and how to use them for that you have to select your table over here you will see the properties pane this is your table name you can even put the description or you can give the synonyms as well if you would like to here you can even apply the whole level to which row you want to apply the neighbor and then there is a key column so now you can also see whether you want to hide certain column or not or you can go to the advanced settings where you can see what is the storage mode of this table whether it's important is it can be involved and trying to respond now this column I selected over here and as soon as you select this forum it's gonna show you the date time format so you can select over here the D Prime format which I generally expect this one month day and year or you can even select any other format that you would like to use so there are many other formats available so let's select this one so now whenever you would work on this Reformation but also if you will come into the state of preview pane over here also you can select performance so just like this one right now since I already did it it's gonna show you in this format only but you can change from here as well so this is a way that you can change the format and not only for that every other column you can change it like over here it's whole number you can change to other if you would like to so this is a part of data modeling when you have to Define your data types for each and every field yeah here you will notice that there is no relationship between the state table and internet access table there's a reason for that and the reason is very simple power bi can detect those relationships automatically once the name of the fields are same if it's not same you have to do it manually that means yourself and how you can do that you have to connect this date feed over here to the date field over here so what I'm going to do I'm gonna connect this gate field to the order date though there is a ship date as well so it depends whether you want to do it or not so let's do that so now it's going to detect this relationship always make sure that you are working on a star schema model as I mentioned in my previous videos and over here one is one side of the relationship and star Mark is the many side of relationships now this is the active relationship but what if you also would like to make the inactive relationship there are many that functions in power wheel that can use your inaccurate relationship for more videos over text functions please don't forget to check that excellent days on our YouTube channel now if I would like to make one more relationship then how to do that for that you have to again get this date field from here just drag and drop to the ship data and now you can see that once you are going to select this relationship which is with the total lines that is your inactive relationship now one very important point over here for the point of view of the certification or your knowledge or your interview then in a public editor model anytime there can be only one active relationship there cannot be more than one active relationship however in active relationship you can use it in order to Define some of the text calculations like use relationship tags okay now we have talked about the relationships we have these relationships over here into our data model now what we are going to do we are not going to connect this table over here and leaving it as it is reason being this research table you can select the key which you would like to connect with the for example geographically or any other table if you would like to but I'm not gonna do any of these because right now I don't have and I'm drawing is a disconnected table so this is going to be my disconnected table so whenever a table has no relationship with any other keyword which you did another this is going to be a disconnected relationship and these connective tables are very important to use for the guys who are new to our Channel or new over here then I would like to tell you that I have already created certain videos on this connected tables and how you can use them please don't forget to check them out here you can create a new column I'm also going to create that and for that I'm going to show you one very quick trick how you can sort your monthly learning into your date table so let's first do this new column then we will consider calculated tables so for that what I'm going to do uh first I need to select my table where I'm going to create it and then you can either click on these three rows and you can see in the column all on the top you can just click the new button from this property one and I'm going to call it month year so so this is the column I'm creating this is going to help me to fold my month here column and for that I'm also going to use the formatting options as well and for that first we have to use a ear text function this is going to give you the ear from the date column so let's add the date column once you have this you have to multiply it by 100 and then you have to add the month number into this one let's get the month number result for my date column let's like this one and here you can see that I have created this column just click over here or you can hit the enter button so now you can see there's a number and with this number I can sort my month here but right now I don't have one here column so we are going to create one for column and where we are going to use a format text function so here I'm going to say month here and I'm going to use again the date column to format it so format now you have to select first your column so my column is date column select it over here then you have to give comma and here you have to Define your format I need mmm then I need y so this is going to be my new column so this is the way you can create your calculative columns ensure data model for but generally I don't advise to create the technology column over here rather than you should do all the Transformations inside your power query as much as possible otherwise there are other performance consequences now we have to sort this column because I'll show you one thing over here if you will go over here and you get this month here column foreign and now it's not folded actually you can see this it will activate deleted and it doesn't make any sense so let me just create into our chart as well so as you can see it's very easy to create and it visualization in powder you just need to turn it off however these are not enough ascending order but we would like to see month number or month here number in ascending order for that you need to sort inside the keyboard how you can do that so that you have to select this column and you have to sort by this column which we created recently that is monthly account so come to this and here we are going to sort by month years old so let's find where is our column and this is over here now it's working and it's on sort it out so let's check now and here you can see July August September October November December and January February and so on so that means we have created a calculator column with the help of that we sorted out our data into audit keyboard now what I'm going to show you I'm going to show you one very important Concept in the power play we have so many time intelligence text functions for example to the MTD total YTD qtd and if you would like to use them first of all you have to mark your field that is your date field as your cable so what you can do over there you have to first select your field over here for example click on date and here you have to go to the table tools and there you have to mark As date table Source like this one once you will select this Mark as date table because now it's going to consider that this label is your date table but there should be one column which is going to be your D column so select your date column over here is going to validate and it's saying validated successfully and okay if you won't do that then your time intelligence functions are not going to work properly now we have done this and it's going to work perfectly fine so what I'm going to do I'm going to create one more visualization over here which is going to be my month here and my total YTV so how to do that for that I am going to create first new page and under my internet accessible I'm going to create one new measure so now I'm going to create my visualization over here let's select this visualization you can expand it drag and drop very easy to handle you have to mark a table as your date table otherwise now let's check what is inside these relationships so once you will check this relationship over here first you have to select your table be sure what you are expecting then you have to select from another table that means the two labels we are going to Define these relationships now we have cardinality one two one one two many and many two many always remember these then also we have this cross filtering simple or both always try to work on single unless until it's really necessary to work on both both simply means you can filter with any direction for example if I make it both and I'll apply ok then I can set a date table with internet access table or and I can also do some internet sales table I can filter this date table however I don't need it so don't use it over here otherwise it's gonna cost you a lot of performance but you should also remember another thing over here both is going to be used whenever you have to apply security filter in both directions as you can see I select both if I get highlighted over here if you will select single it's not gonna highlight so click ok so this is a one way another way suppose you have so many tables into your data model and you would like manage the relationships for that you can come on this top level there's a manage relationship button and here you can use either Auto detect new relationship if you want to Define selectivity rules and then you can select another table for example you have to first select YouTuber then you have to select your quora where you are going to define a relationship that's like another table and here you have to also select the corner for example you are going to select over here the shifted key then you have to select the cardinality which you want to use and also you can make this relationship active or inactive based on that for example if I select this it's gonna give you one error message over here because there's already a relationship defined over here so that's what you can do over here that's how you can manage your relationships and if you would like to read you can just select this and click delete it's going to delete that relationship as well and after that you can close it so this is another way to manage your relationships into your data model there are many direct functions but we really don't have much of the time to Define all the direct functions for that we should go to our text message tutorial and you should learn all the right functions from there so this is the data modeling in power bi now I'm going to show you one last trick that you can use it for that you have to just click on this bottom plus button and over here you have to go to this field screen Suppose there are hundreds of tables but you would like to see one table how that table is connected with other tables how that table is connected with other tables for example in our case product table I can simply drag and drop over here and what I can do I can just right click over here and I'll say add related table that means all the tables that is related to this product table only so you can just do this and it's only gonna show you which table is connected to this but that doesn't mean the other tables are not existing they are still in the play and this is your old tables which is your main data model right now I'm going to just use running very simple example over here so for example you don't have this data over here I'm going to delete this date even from this data model and then I'm going to create a new detail so what you can do just come over here and delete this from this model it's going to delete this data now I need a data so what I can do you can come here under this data preview pane go to this cable tools over here and create a new table foreign and then you can use calendar functions though so there are two calendar functions one is calendar another is calendar calendar function you have to Define your minimum and maximum date but calendar function would automatically detect protein from your other tables and based on that it's going to generate a dictable for you so as you can see you have created one readable now you can format it once more which is this format that you are going to use and over here you can again start creating the new columns that you would like to use so one last comment is over here that you have to again mark it as a teachable so that you can come under again this two table tools and Market capacity table go there select this date okay so it's going to again validate and next your datable but over here you will see the product internet accessible credit has their own relationship but still their table has no relationship so you have to again Define that relationship over here for that just drag control over here and you can mark it over here wherever you would like to so in this case we have to find there is our date column which is our order date this is our so select it from date to this order again you can rearrange these tables the way you would like to and always try to keep table which are your description so this is your star schema here we have also created table different columns and I have shown you everything I'm sure guys now you know how to work on your data model how to get the data how to define different relationships and this is very easy it's not so difficult and you can do it in a precise manner what is next now guys you can practice yourself and please let us know if you have any video concerned [Music]