Transcript for:
DynamoDB Data Modeling Overview

hi everyone welcome to this course on DynamoDB data modeling now this is the introduction you and me both know DynamoDB is the noise QL offering from AWS so when it comes to DynamoDB the data modelling is much different from traditional relational data modelling that we are quite familiar because of that most of the newcomers to DynamoDB they try to simulate the relational data modelling and finally ends up with huge AWS bills at the end of the month so we should actually learn about best practices and recommendation of AWS DynamoDB before even trying to start designing our databases so if you done it correctly dynamodb will be quite cheap and it will give you millisecond latency set any scale so it could be 1 gb of data or it could be 10 terabytes of data amazing right so today I will walk you through a five-step process that will help you in data modeling for your application but before that what really is data modeling now data modeling is all about how an application stores its data in a given database related to the real world entities now there are two types of databases relational databases and no SQL databases I assume you are already familiar with that but let's take a quick refresher now relational databases have been in use for the longest time the mysql oracle microsoft SQL server are some of the example for that right relational databases use data normalization to split the data across multiple tables for efficient storage now the storage is one of the main concerns so by splitting the data across multiple tables you will eliminate any redundant copies of your data thereby reduce the storage so why did you actually need to reduce the storage because the storage is quite costly back then although it is not the case today those days it's quite costly so that's why relational databases data modeling is mainly focused on optimizing the storage now when we split the data across multiple tables we have to join these tables in order to retrieve this data so it needs more compute power when there are a lot of table joins to be done and one of the limitation in relational database is that the performance degrades as the database scales so one of the reason is the complex queries that is generated by RM tools can create a significant performance train although it's not true for the small database environment small application when it comes to applications like social media applications which deals with big data this really is a case and also the relational databases typically has a strict schema so you cannot have items with different number of attributes of different types in a single table because of this trick schema in contrast to that noise ql databases are quite optimized for compute rather than storage because nowadays the storage is quite cheap but compute is not so you already understand what no SQL databases are going to do they are going to give the duplicate copies of data and minimize the table joins so that it will reduce the compute power required to retrieve this data and also noise ql databases provides flexible schemas so the items with different number of attribute can be in the same table and the main advantage of noise ql databases is this scales quite well horizontally I am NOT going to go deep into horizontal scaling at this moment so now that we know about no SQL and SQL differences let's talk about dynamo dB you already know DynamoDB is the noise ql offering from AWS the main advantage of dynamodb is that it provides consistent performance at any scale dynamodb guarantees millisecond latency is for dynamo the operations so if you design it well you can have 10 to 20 milliseconds range of latency and you can further reduce that into microsecond level if you attach caching on top of dynamodb for caching of course you can use decks or DynamoDB accelerator and you can do it with minimum change to your DynamoDB queries now there are some main points to remember or the best practices when data modeling with dynamo DB as I've already mentioned earlier you should forget about traditional relational data modeling you should not fake the relational data modeling at all now if there is one thing I really want you to take from this presentation is this you must identify almost all the access patterns for your application before designing your data model this is very important you should spend as much as time you need to identify almost all the design patterns of course in reality you might not be able to identify a Chan every access pattern but try to identify most of the access patterns that will be used in your application and once you have identified all these access patterns you probably need just one table for your application animation also says in the documentation Sandahl you probably need just single table for the most applications so we don't want to provision read capacity units and write capacity reached for many tables and increase our DynamoDB bill right so once you have identified these access patterns for your application the next major thing is to decide on the primary keys and indexes to satisfy your access patterns so you identify the access patterns now we have to satisfy those access patterns you should target to retrieve multitude of data in a single request for example if you are querying for a blog post try to get the comments as well if required now I want to highlight this point again most application only requires just one table this single table contains records for all your application entities example let's say that you are modeling a software company and let's say there are three entities the company projects and employees so the company data projects data and employee data has to be stored in the same table for efficient queries so how do you ensure that the data can be efficiently retrieved that is done by identifying the most suitable primary key which constitutes of hash and range case now if the primary key is insufficient to satisfy all these access patterns then you should consider using secondary indexes so I think you'll agree with me if I say there are a lot of things to consider before data modeling with DynamoDB because of that I will show you five steps that will help you in modeling data with DynamoDB for your applications number one draw an entity diagram this really helps you to identify the main entities for your application so you can easily do the mapping between real world entities and your application entities number two identify the relationship between these entities with that it will give you a lot of clues for data modeling for example if you see many-to-many relationship you can think about inverted indexes number three lists down all the access patterns for each entity now this is the most important step of all you should spend enough time at this step and identify almost all the access patterns required in your application this could relate to crud operations filters etc number four identify the most suitable primary key the hash key and the sort key for each of your entities with proper hash and sort case you can satisfy most of the access patterns even without having any indexes and number five identify the secondary indexes for additional access patterns if required sometimes your primary keys is not just enough to cater for all the excess patterns then you should start thinking about local secondary indexes or LSI's and global secondary indexes oh gee sis now let's put that into practice and come up with a data model for an example use case imagine that you are given a task to build a multi-tenant project management tool now this is a SAS product so any organization can sign up for an account online so multiple organizations can create account in the tour so you need to have my 10 support and the main two objectives of this tool is to manage organization projects and assign employees within these projects now in order to start data modelling for this application let's start with step number one that is to draw an entity relationship diagram now there are three main entities in our application we have organization entity projects and employees let's look at the attributes for each entities now for an organization you have a unique organization ID and then it has a name and there's a tier that the organization belongs to so this basically it could be the free tier or the professional or premium tier something like that so you choose and tier when signing up for this application then we have projects there's a project ID to uniquely identify a particular project there's a name and there's a type now there are two types of projects there is agile projects and fixed bid projects and the status of the project and mainly we are updating status if the project is on hold then we have employees for an employee another unique ID employee ID employee name date of birth and email now in the diagram you can see the entity relationship is also shown for an organization to projects as a one-to-many relationship one organization can have multiple projects similarly one organization can have multiple employees as well so there's another one too many relationship between organization and employees but between employees and projects there's a many-to-many relationship because one employee can be part of many projects and one project can have many employees so in the next step we will further breakdown these relationships so we can further understand how to do data modeling for this type of a project step number two identify relationship now we have these three entities organization projects and employees now I've broken down that many-to-many relationship into two one-to-many relationship by introducing another entity called project employees so between employees and project employees as a one-to-many relationship and between projects and project employees again one-to-many relationship now even at this stage we can identify that we need an inverted index to query one of the relationships for example to identify an employee's projects we can use primary key relationships but to query the other side of the mental mini relationship or the employees that is part of projects we have to use an inverted index more on that later okay now let's move on to the third step that is identifying the access patterns now there are three main entities including another sub entity so let's consider each of these entities and find out what are the access patterns now if we start with organization first we should be able to do all the crud operation for an organization should be able to create an organization read an organization update an organization and delete an organization so that's the one access pattern and then we should be able to find all the projects of an organization now remember there's a one to many relationship between organization to projects so we should be able to find all the projects belongs to organization similarly all the employees belongs to an organization and apart from that we should also be able to find all the projects and employees for any given organization and there's another access patterns this is about finding an organization by name not by using the unique ID the organization ID but by using the name just like a filter we have to use in our front-end now let's say these are the main access patterns that we have for an organization moving on to projects there is again crud operations for a project create read update and delete operation for one particular projects and then we'll have another front-end filter to be satisfied that is about finding a project by name we also have finding employees assigned to that particular project and remember there are two types of projects agile and fixed speed so we should be able to find all the agile projects as well as fixed speed project as well now the last one is about finding on-hold projects now let's move on to employees so similarly we have crud operations for an employee create read update and delete for an employee and then finding all the projects that an employee is part of because there's a many to many relationship between projects and employees and we broken down it to two one-to-many relationships and finally we should be able to find an employee by name the front-end filter okay so we covered the three main entities but there's another entity called project employee so here the main access pattern is about assigning an employee to a particular project so step number four is about identifying the primary key for our table so now that we have a good understanding about main access patterns let's decide on what primary key is that we should use for identifying these entities so we are going to have a single table for application and the primary key for that table is a composite of partition key and the sort key the partition key is also known as the hash key and the sort key is also known as the range key so I assume you're familiar with dynamodb hash or the partition key and salt or the range key but just in case hash or the partition key determines the correct partition that the data should be stored behind the same so to identify the partition of that particular data dynamodb used this value of the partition key for example let's say the partition key for an organization item is oak hash 1 2 3 so it uses that where you and take it and pass it to a hash function and the result of that hash functions determines which partition the data should be stored or retrieved so in this case this organization item were to put this item in which partition so since it uses an hash function to determine this value we call this key as the hash key or the partition key since this determines the partition of that particular data item then we have sort key now this is about arranging the items within an partition so DynamoDB uses this key to arrange items within an partition and also we can use the sort key for multiple queries we will see that in a bit now since we are using both partition key and the sort key as our primary key the composition of this two keys has to be unique so we have two main objectives when deciding this primary key or the hash and the sort key in this case number one the primary key should be able to uniquely identify each entity within our single table because otherwise we cannot use a single table to store all these entities and number two is that we should be able to decide this primary keys for each entity targeting to retrieve multitude of data using a single query now I hope you have noticed we have different prefixes for example here aaaghh hash and then we have pro hash no project hash so I've used different patterns for each entity so what is the main reason for this because unlike relational database when we query multiple tables and gather data and then shape that data to the query that is received from the client and send it to client in no SQL database like the ELMO DB we store the data in the shape of the query so that we can use the string patterns to query multitude of data in a single request thereby increase the query performance so let's see examples in a bit now let's focus on organization entity now here the partition key is capital o-r-g hache then whatever the organization ID then we have a salt key starts with hash metadata hash now this is basically a placeholder and then you have the organization ID now let's look at some example queries for organization now these are access patterns that we have identified earlier so in order to do the crud operation for an organization create read update and delete we should be able to uniquely identify an organization for that we can use our primary key and the sort key now look at this example now let's say this organization with organization ID equals two one two three four so now we can use our primary key and the sort key to uniquely identify this organization ID now here's an example PK equal capital / g hash now this is the pattern the prefix of the partition key can you see here and then the organization ID one two three four and then sort key is equal to hash metadata so is the placeholder value and then one two three four since we have given exact value for both partition key and salt key there is a only one record in that table we will remember the partition key and the salt key which is our primary key is unique so thereby we can identify this organisation and perform all the crud operations now let's look at the other one find all projects of an organization now here we are expecting multiple projects it's not a single item right so we are going to use the query operation here and as the primary key for that query we are going to use PK equal / g hash one two three four and the sort key we are going to use begins with operator so I say give me all the items that begins with pro hash belonging to this oak hash one two three organization now similarly we can use the next query to find the employees of an organization so for both organization and projects and organization and employees has one-to-many relationship so here again I say my partition key is oak hash one two three for this organization and give me items that begins with EMP hash so it will return only they employee records now if I want to get both employees and projects then I should not even provide a value for the sort key I am just only giving my partition key or hash one two three four then it will return both project items as well as employee items so how about this access pattern finding the organisation by its name so using this partition key and the sword key this is still not possible now let's move on to the example queries for projects now here's the project entity and the partition key for the project is oak hash org ID and for the sort key we use the prefix PR or hash for project and then the type of the project because if you can remember we have queries to find the different type of projects as well right for that we need this and then find out the project ID now I want you to put your attention on the partition key selection for our entities now let's go couple of slides back so for organization projects and employees we have the same partition key oak hash oak ID now this is because guys without an organization these entities will not exist you can't have a project without an organization when the employees are belongs to an organization so there is this hierarchy among the organization projects and employees organization is the parent and the project and the project employees are its children so all these queries run within an organization so if you are doing a query against project or an employee that is within an organization so that's why we are using or organization IDs our partition key for project and employees otherwise your query will send results that belongs to other organizations as well okay let's go back now we were here so in order to do the crud operation for an employee the primary key and soft key selection can be as follows so let's say the organization ID is one two three four and the project ID is 100 so in order to uniquely identify a project we have to give an exact value for primary key and the short key so the primary key value okay one two three four matching this pattern and the short key value is pro hash then the type let's say this is a agile type project and then the project ID hundred this will return a single item then you can perform crud operation on top of that we still can't find a project by name and we still can't find the own whole projects let's take a couple other example now look at this find agile projects so here we are expecting set of projects to be returned so the primary key is org hash one two three four and then we are using a game the begins with operator sort key begins with pro hash agile so we are giving values only for the first part of this pattern Pro hash then the type is agile we are not specifying the project ID so thereby we use the begins with operator and it will return all the items that belongs to one two three four Ganesha and the sort key begins with pro hash a child similarly to identify all the fixed bit projects we use begins with operator game with pro hash the type equals to fix bid I hope you understand let's quickly move to employees as well so we can do the crud operations in the similar manner but still we cannot find all the projects then I employees part of and also we cannot find the projects by name so now let's move on to step number five you in order to satisfy these other access patterns so step number five identifying the secondary indexes yes we are going to use secondary indexes to identify those other access patterns now this success pattern is one of the not satisfied access pattern that is finding all projects an employee is part of now if you can remember I told you we have to use an inverted index to query the other part of the main to many relationship you know one side of the minimun relationship can be done using the primary key but to kill the other side of the main to main relationship we have to use an inverted index now there are two tables this is one of our table in today's project employees so it is this one so I've isolated only that one here and there the partition case like this an oak hash then the old guy ID because we always need to filter by the organization ID and then we have another prefix hash project or hash Pro then the project ID so this is the partition key and the sword key is the oak hash oak ID and here is the employee EMP and the EMP ID now if I want to get all the employees belongs to a project so what I can easily do is I can send and query where it's partition key is equal to all cash then okay let's say 1 2 3 4 then hash PR or hash then the project ID that say 100 and I don't have to specify any sort case thereby it will return all the employee items that is related to this particular project so with this primary key we can query one part of the Minnie - Minnie relationship here finding all the employees belongs to a project but how about finding all the projects and employees part of so in order to make that clear in what we need is to switch this ID in the sort key to the partition key and the partition key value to the sort key so we need to invert this so for that we are going to create a GSI or a global secondary index which we'll call it project employee index and as the primary key for that GSI we are using the sort key value of our table so it is organization ID then the EMP value C and for the soft key we are using the primary key now we can again easily query this GSI to find the projects an employee is part of here's an example primary key is oak hash one two three four then let's say there's a employee with employee ID 300 hash EMP hash 300 we are not giving any sort key values thereby it will return all the items that this employee is part of ok now let's look at another example now this is about finding all organizations projects and employees by name now this is another access pattern that we couldn't satisfy so far and I told you earlier in our front-end we have a filter to filter projects employees and organization by its name now for that we are going to use GSI overloading now when we define a GSI or a global secondary index on a table we can use different attributes as the partition key and the sort keys of that GSI which are different to the partition key and the sword key of the table so now look at this example now here as the partition key of our GSI and I am using the partition key of the table itself but for the sword key I am NOT going to use that SK attribute instead I am going to use another attribute called filter name now this is a different attribute so when I am inserting organization employee and project records or items into this table I am using this prefix for this attribute specifically example say that I am adding an organization so my primary Keys box last an organization ID and for this filter name I use this prefix oak hash then the organization name so if I am adding an employee record then the partition key is or goshawk ID and for the filter name I add em EMP hash EMP name or employee name similarly for projects project has project name so now we can issue queries or now GSI like this now if I want to find an organization by its name so what I can say is I'm creating my GSI where its prime partition key is equal to oak hash 1 2 3 4 here it matches here then the filter name attribute equals to oak hash happy Inc I say the decision name is happying so I don't necessarily have to use the organization ID here instead the organization name similarly if I want to find an employee by its name then the partition case or cash one two three four and then the filter name equal to e MP hash let's say I am searching for the employee called manoj then manoj so i don't have to use the employee ID here now we can satisfy this access pattern as well now means if you have a lot of text queries like this you know finding by name age and so many text queries then there's a better approach so here we are basically using an elastic search cluster so what will do is will enable DynamoDB streams so that anytime we insert a record delete or update a record the dynamodb will trigger a stream and I will use a lambda function to intercept that stream data identify the operation with ice create delete update and then index that data item in elastic search cluster or elastic search service then if I want to do a lot of text searches I will query the elastic search service instead of my dynamodb because elastic search is made to do advanced text searches it uses Lucene index things to do lightning fast queries so I will do all my crud operation like creating records updating and those things directly to the dynamodb but if I wanna do filters queries like take space your queries I will use elastic search directly okay let's move on now we have another access pattern that we couldn't satisfy yet so that is about finding on whole projects now imagine that an organization has 100 projects and out of that 100 projects let's say three projects became on hold it could be due to some budgetary restrictions or whatever so how do we query for these on whole projects now one option is obviously we can define an LSI all over secondary indexes on this particular attribute called ease on hold so that's a one way but here I want to introduce another way of doing that or a pattern that you seen DynamoDB so that is called sparse indexing so sparse indexing is nothing but another yes a global secondary index and we select is on hold attribute as its partition key now those three projects are on hold out of those hundred projects right so what we're gonna do is we will update this is on hold attribute to true or any other value for that matter in those three on hold projects so global secondary indexes behind-the-scene it will maintain a separate table with the indexed items now since we have updated East on hold attribute for only those three projects only three items will be written to that index table it's not going to write all other 97 attributes or items in that index table so that is called sparse indexing so GSIS are in general their sparse indexing so what we can do next is we can then either use query or scan on this GSI now we can you see the scan no query both because in that particular index table it will contain all the records about on whole projects it doesn't contain anything else ok now we have gone through all our five steps now if we have some experience in dynamodb you might have remembered there's another thing called filter condition now filter conditions aren't used if you want to further filter results by none key attributes now the important thing to understand about fill it expression is that this is applied of the query finishes now let's take the same example that we discussed in space indexing now you are not defining any GSI on East on hold attribute instead you are going to use a filter condition so in the query you say filter by unhold is true so you had a filter expression is on hold equal true and expecting those three results of course you will get those three items but the thing is before it does that final filtering it will query all that 100 records in the table - how to pay for read capacity units to read all that 100 records so that's why it says in the documentation the query consumes the same amount of freight capacity regardless whether a filter expression is present so we have to be careful when to use filter condition so one of the common patterns is we do all the filtering based on off partition key and the sort key and the result that is returned we apply filter condition so that is all right and there's another use case for using filter condition if the secondary indexes costs more than the filter condition so that could be due to the low query velocity or the frequency then you can again use the filter condition now here the reason is if you define a GSI then you have to I look at capacity RC u and W Cu for that GSI right and if that GSI is not frequently used then you are spending money on that so you have to check the velocity and the frequency of the queries against that GSI if it is very low then you can check the cost comparisons and maybe the filter condition will give you better cost implications okay now this is the country that I want to cover so in the next episode let's implement this example using dynamo DB node.js sdk now let's implement our example use case of our online project management tool so we are in defied three main entities organization project employees and project employees and also we've identified the partition key and the sword key for each entity as well so let's go to AWS console and select the region us test one and create a dynamo DB table so let's name the table happy projects given that is a project management tool and I select the partition key as P key of string type and also I will add a sort key SK of string type as well so I'm going to use provision capacity here I'm not going to use auto scaling so I will just provision one RCU and one wcu and then I will create the table so now open up with your studio cord in an empty directory and create a file called index J's and let's initialize a node.js project by NP a minute and we are going to use JavaScript SDK so in order to communicate with animal DB thereby we need to have a wsst key installed so I'm installing AWS SDK together with DJ's for date manipulation and also I will let G Y dou u ID for unique ID generations as well so once it is saved I will create a constant AWS by requiring AWS SDK then let's set the AWS region to us least one or North Virginia so you can find the AWS region from AWS console and there's a region code on the address bar as well so I also require GU ID up here so I will assign it to another constant and then let's create a dynamo DB instance to communicate with dynamo DB so Const animo DP and we are going to use dynamo DB document plant which is a much simplified client to interact with dynamo DB here ok so now let's go about creating an organization so what is the partition key and the sort key in order to identify an organization uniquely so let's go back to our slides and find that out so for an organization the partition Keys oak hash oak ID and the sort keys hash metadata hash oak ID we discussed this the other lesson okay now let's look at dynamo DB SDK documentation you can just simply google it and click the first link appearing now this is the documentation for AWS dynamo DB however we are using a much simplified wrapper library of dynamo DB which is called a document client you can find it right here so click on that now these are the methods supported by dynamo DB document client we are going to use put method so this put method allows us to create or replace an item in dynamo DB table so I'll click the link and copy the example code now let's go back to our editor and paste that in and now we need to edit the attributes for example the table name should be happy projects like this and the hash key or the partition key so it is pk and the value is also we can find it from the slides or cache oak ID now before that we need to have an oak ID so let's use our gyd library generate an oak ID so this is a unique ID and I will type or cache then append that oak ID okay now we'll remove the rest of the attributes what I need is another attribute that was sought key or the range key which is of type or the format hash metadata and the oak ID so let me add that as well right our data hash no guide e I will also at the name as another attribute let's call it happy Inc for this company name and we can add other attributes if you will and then I will pass these params object to dynamodb put operation and then I will console.log the output here now this important step which is not shown here so in order to communicate with AWA services from the local machine you have to install AWS CLI and configure it with your I am user credentials so I'll post a link below and once it is completed let's run our application by node index.js and if it is successful we should return an empty object so let's check that in dynamo DB console and there we go we have created our very first record of organization type if I open up the item you can see the name PK SK and all the TI information that we added and now let's create another organization with other information for example let's call it ABC incorporation here with the tier of professional so I will execute it using node index.js again I should return an empty object refresh that there you go so now I have two organization so let's add projects to this organization so this is the query that I am going to use with the partition key and the sword key so let's create a project in happy ink first I will copy the code from the earlier query so I have to use oak hash oak ID and for that I need to find out the oak idea of happy ink so let's go to that record and here we have the oak idea of the hash let me copy that and cancel this out so let's have it in a constant I will call it happy cog ID and just paste that value and I need to have an project ID as well so we'll get to that in a bit here I will change the oak and the oak ID - happy ink so partition key is fine now the sort key it has to be starting from pro hash then the type of the project and we'll call it this is an agile project so it is agile and then the project ID I will get another unique ID for project ID I'll use our GU ID library so get that project ID and put it there and let's call it the project name as Project X and I will separately have the project ID in a separate attribute and afterwards I will execute this query with node index.js again if i refresh it now I should see three items so there we have our project X record now I always want to go to my organization record and then copy the organization ID so instead I need to create another attribute called organization ID separately so let's edit that organization record for that I will use update so here I will have the happy Inc organization ID and the table name is again happy Inc and the hash key is PK and its value of the type of oak ash oak ID I will replace our ID with my happy Inc oak I did now we have to say what sort key is so remember we need both of these keys to identify and record uniquely so here also it's happying coke ID and then as our update expression we need to add a new attribute so we'll call it set hash oak ID to the value we will use another placeholder here : org ID now I can give the names for this placeholder that tribute name placeholder hash org ID is oak ID and attribute value the placeholder was : org ID and for that I will use happy Inc rogue ID here so now I will have a separate attribute called oak ID so I can easily copy from my table so now let's execute it Nord index.js now if i refresh it i should see her new attribute for my organization there we go oke ID and the organization ID so now I don't have to go into the record and you know copy that part after hash so I will copy the oak ID for ABC as well so let's create another oak ID attribute for ABC organization as well so it's the same thing but just the difference is the ABC oak ID so let me replace ABC org ID here and then execute it so that we will have another attribute org ID for ABC as well so tour guide is here so let me create couple other projects for query so now I have to agile projects so let's create some fixed bit projects as well so I will assign these projects into ABC company as well you so now I have set of project assigned to both of these companies now it's time to do the queries or access patterns so the first query is to find an organization so the method that I'm going to use is the get method now you can easily copy this example code and change the parameters right here so the table name is happy projects and the partition key is in the format of oak hash oak ID so I'll copy dog ID from the attribute that we updated the earlier and we have to define so DSL in order to uniquely identify this particular organization the happy Inc so the short key is in the format of hash metadata and then the organization ID now I think we are good to run this query so let's run this and see if we get the information of happy Inc so there you go which returned the happying organization data now let's run another query so this is about finding all the projects assigned to the organization so we'll use the same happy Inc and try to find all the organization or all the projects assigned to this organization so this time I am expecting more results so in this case I'm using the query method earlier we used get so because we are expecting just single result but this time it's a query so in the query we have will define the table name as usual so we are not querying any index or take this out and then we have to specify the condition expression so this about finding all the projects of a being right so since the projects belongs to a particular organization the partitioned key is or Cache whatever the organization ID in this case happy and then sort key we are using begins with operator because in this partition we have projects employees and project employee records so we just want to filter only the projects so we are going to use begins with operator we have multiple operators like this begins with in between ends width like that so for this query we use begins with the prefix of project which is pro hash so I'm going to use expression attribute names to provide values to these expression attributes so PK is PK and hashes K is SK because we have common name so P protection key and short key so it's time to provide the values for these attributes so PK in the forms of all cash oak ID so or cashed and happy Inc ID then SK it's just pro hash so this is the prefix of project let's run this query and check whether it works beautiful so it written project X project y and project f1 of happy ink so it written all the fixed bid and agile projects of happy ink so in this video we are going to cover the additional access patterns forward to animal DP data modeling application we are going to cover the inverted index in the upcoming video we will cover the GSI overloading and sparse indexing so what is an inverted index in order index are used to model a many-to-many relationship now one part of the many-to-many relationship can be queried by the table primary keys using the partition key and the sort key or the primary key of the table but when it comes to other part of the many-to-many relationship the other side of the military mean relationship we can use an inverted index to query that now let me give you an example so this is the example use case that we have been discussing over so in this example it's a project management tool and in tag projects and employees these two entities so between these two entities there's a many-to-many relationship one project can have multiple employees and one employee can be part of many projects in a previous lesson I broke this many-to-many relationship in toward two one-to-many relationship using a third additional table so I called it project employee so there's one-to-many relationship with employee and projects and projects to project employees as another one-to-many relationship and after that we identified the partition key and the sword key for project employee entity now look at this top table I've isolated only the project employee entity here well it had project employees an organization entry says her but he is related only the project employee entity now this project employee entity it has a partition key of all cash org ID then pro hash the project ID we use this o cash or guy D to denote our hierarchy all the projects employees are part of an organization so it has to be within an organization that's why this all casual guy D so similarly for the sword key we have a similar format so oh gosh oh I did the same but instead of project or PR o we have EMP and then the EMP ID of the employee ID now using this table primary key or the partition key and the sword key I can easily query for the employees list of a particular project so I can query one part of the mail too many relationship so all I have to provide is the PK value or the partition key value here so o cash then what other guy D then hash pro hash then what are the project ID then it will return all the employees or the records inside this partition so I don't necessarily have to enter a value for the sword key this can be left blank or empty but when it comes to querying the other way around like to find out the projects that an employee belongs to then I can use a similar pattern but unfortunately I cannot leave the partition key empty so that's why I need to define a new GSI so in this case I call Project employee index where its primary key is the sword key of the table this is the salt key of the table but that becomes the primary key for the partition key rather of the GSI and the partition key of the table becomes the sword key of the GSI so because of this inverse selection we call this an inverted index now let's look at an example all right right now I am in the dynamodb table and I have to organization happy Inc and ABC in corporation that we created in the previous lessons and under happy Inc we have project X Y and F 1 Project X and project Y are a child projects and f1 is a fixed big project now let's create some employees and assign them into these projects so here I am back in the code and let me uncomment the first block of code so here I create a employee called Manoj in happying so I have the happying organization I did and I'm generating an unique ID for Manoj so as parameters we have to pass the project or the table name happy projects and we need to specify the partition key and the sword key for creating an employee entity in our database so during an earlier lesson we discussed the partition key for an employee is in the format of okay choke ID just like this and the sword key of an employee is of EMP hash employee ID we can provide other information like employee name email and so on so IQ employee name and email then we will run dynamodb put operation you know to create the record in DynamoDB so let's save the changes I'll take a terminal and I run node index.js so it's successfully executed now if you refresh the table I should see Manos Fernando created inside our table so let's create couple other guys I will say H own you and I also create chain as well you now let's assign manage to both project X and project why because Manos can be part of both projects I will uncommon this block of code so this is about assigning an employee to a project so this is the happy in project ID this is Project X ID project y ID and project F ID so we are focusing on project X for mano G so let's pick the ID for manage employee ID so if I go to Manos record right here and the employee ID is right here after EMP hash so let me copy this here we use the partition key and the soul key of employee project entity so if we can remember it has oak hash then the oak ID then hash project then project ID for the sword gear is pretty much the same except instead of project PR oh this is an EMP for employee then I've added the employee ID I can add some any additional information like I can have the name of the employee like Manoj and the project project takes as metadata and also another metadata for date of join so let's execute that if i refresh the table I should find project employee entity record here we have it so date of join name and the partition key and the sword key and the project trade so let's set manage to project Y as well so instead of Project X here I will specify project Y and let's execute the same command so he's assigned to both of this project now let's get the ID of John and add him only to project X so I get the idea of John right here after EMP hash and in the same time I will get the ID of Jane as well okay so now let's assign Joan to Project X I will select Joan you now I add Jane to project Y so instead of John this is Jane instead of Project X this is Project why you okay so now that we have assigned employees to projects so let's see how to query the employees assigned to a particular project so let me comment out this code so this is about finding employees assigned to in this case Project X so I have happy Inc ID and also the project X ID so these are the parameters I am passing down to dynamo DB query table I'm happy Inc and the key condition expression partition key hash PK equal the PK value so what is the PK value so it is nothing but the partition key of our project employee entity which is o cash then oke ID hash Project or pro hash then the project ID so we don't have to specify sort key this will query one part of our mini to many relationship between projects and employee so let's run this and check if we get all the employees assigned to project X great so I got both John and Manoj who are assigned to project X but hey how do we find the projects that mahadji is part of well for that we have to use that inverted index so let's create an inverted index I go to the indexes tab here now I already created this index so for the partition key I've selected the sort key of the table for the sort key I selected the partition key of the table so how can we do it from scratch click create index here and then you are supposed to type the partition key so in this case you have to type the sort key of the table SK and check at sort key and here for the sort key type PK so it is the inverse and you can provide any name for the index which you are going to refer as the index name when querying so I'm going to project all the attributes or you can pick keys only or you can include certain attributes so in this case let's pick all for this demo and you have to specifically mention what is the rcon wcu for this index since this is a GSI we have to specify these are Cu and WC u units and then click create index so I will cancel this out since all I already have it so now let's square this index in order to get the projects that mano G is part of so I will comment out the previous code block and then uncommon the last code block here so we need the ID of Minaj which I can copy from here and this is the organization ID now these are the parameters that we passed down to our DynamoDB query as usual we have to specify what project this is this is happy project now in addition to the table name we have to specify the index name as well so our index name is project employee index which is this name and then as the key condition we use the SK attribute because ask attribute is the partition key this index so I can only pass value to SK I don't need to specify any value to pk so what is the value of SK so I can easily say Oh cash then the oke ID and then it is EMP hash the employee ID which is manoj now this should return all the projects that mahadji is part of now this is the other side of the mental manipulation ship between projects and employees so let's run this so mono ji is part of Project X and also Minaj is part of Project well this should be why I think I did not type the project why name when I was assigning Minaj to project Y instead I copy pasted it so I forgot to name the project name but here you can see the difference you see this project ID is one thing start with EB and this project ID start with 25 so these are two separate projects so that's how you model many to many relationships using inverted index so today we are going to talk about GSI overloading now dynamodb table doesn't need to have a strict schema now unlike in relational databases where the schema has to be uniform and straight a dynamodb table can have different data items for example projects employees organization in the same table and the number of attributes in these data items can also be different and in fact the same attribute in two different data items can have entirely different value they don't have to be exact same so in GSI overloading we exploit this ability to have different types of data for a particular data attribute in order to minimize the number of GS is required for a table in order to query different access patterns now let me explain it little bit more now in general we have common attributes names like data in a dynamodb table now even though the attribute name is the same like data the value of this data attribute can be different now have a look at this example now this is from the dynamo DB documentation now you see the first data item is about employee name now in this employee name the data attribute represents the name of the employee Murphy John whereas in the second data item the value of the data attribute represents the owed totals in u.s. T's so it's kind of different and the third data item it represents a date so you get the idea right so in GSI overloading what we'll do is we'll create a GSI or global secondary index and designate this data attribute name as the sort key of the GSI and in fact we will add prefixes in front of these data values as well for example here is Murphy John which is an employee name so the prefix could be e Tempe hash so it could be anything maybe for the date the prefix could be date hash now once we have prefixed these data values we can query this index with DynamoDB operators likes begins with between etc so we can say begins with EMP hash so to return all the data items where the entity type is employee name so we call this GSI overloading so in GSI overloading it reduces the cost by reducing the number of GS eyes required for the table so here again it says now the data attribute represents different types of data we discuss is already now when we look at our project management tool that we have been discussing so far it has these three types of entities organization projects and employees now there was a requirement for front-end filters we already discussed this a quick recap we need a requirement to get all the organization's by its name and also from the front end we should be able to get all the projects by the project name and similarly we should be able to get all the employees by their name because we were using the web front-end can easily type the names rather than remembering the IDS of the organization's projects or employees so how can we go about implementing this filters so as you have already assumed we are going to use GSI overloading so the first step is to create a common attribute so in this case let's call it data so this could be anything so I have a new attribute called data and in this data attribute we are going to prefix some constants value to identify each entity here for an organization entity we prefix it with oak hash then the rest of it is the organization name for project the prefix part is pro hash then the project name and similarly for employee it is a MP hash then the employee name so when we are inserting records into this table let's say we are going to insert a new organization for the data attributes we will just simply add o'cash then whatever the organization name so similarly we do it for projects and employee records as well so once we have data like that we can simply create a new GSI so the GSI name for example filter by name and for this GSI we can define the partition key as the partition key of the table itself so it can be the same but the sword key we will point it to the new data attribute now you might have a question like since we are using the same partition key of the table should that be a GSI who can't that be a LSI or local secondary index of course it can be a local secondary index but LSI has this you know storage limitation so it's safe to use GSI so now how can we satisfy our queries if I want to find an organization by its name so what I can easily do is I can easily query the GSI filter by name GSI where it's partition key is o cash then whatever that organization ID and it's sort key in this case its data we can pass the constant prefix oak hash then whatever the that the use of peak so types into the front end so in this case let's say happying hey guys so I'm on my AWS console so I will go to dynamo dB I will select table from the left side and this happy project is the table that we created earlier so in this happy project we have the partition key as pk and so tssk at the moment so let's go into the table and I will view the existing items so at the moment I have set of employees assigned to a particular organization and I have three projects as well so step one we will add a new attribute called data and the value now this is an employee entity the John Doe so the prefix should be e NP hash and then john doe employee named john doe now we let it another employee now when we're creating a new employee we can of course add this data attribute at creation time as well we saved this employee as erlin will edit the final employee so I will read couple of projects as well now this is one of the projects let me click onto that so it's project X belonging to the same organization so what I will do is I will click append and string so that tribute name still the data but the value start with Pierrot because it's a project and then the project name Project X save it let's edit the second project so let's say how can we query this using an GSI now before querying we need to have a GSI so what I will do is I will click indexers here and then I will create a new index click create new index the partition key as you remember it is the table partition key itself so that is PK I am going to add a new sort key so this time it is not the table salt key which is escape but the data attribute that we created so I type data and these of string type I can change the filter name filter by name and I'm sitting the RCU and WC you to the exact values that I said to the table which is 1 that should be enough I will click create index so while it is creating I go to the text editor and this there you that will be using so far it is we show studio code and Here I am requiring the AWS JavaScript SDK and I'm configuring my SDK with the region that I am right now him us t'east 1 and I am using the document client so I need to now query the GSI not the table so the parameters that I require is of course I need to have the table name which is happy projects then the index name so it should be filter by name which we just created and then the condition expression so the condition expression looks like this so I'm using expression attribute names hash PK and the value of PK so I will use the begins width so as the first query I'm going to find an employee by its name so for the condition expression and here I'm using expression attributes name I will tell you why I am using like that in a bit so it is hash PK then this is the expression attribute value colon PK and I'm using the big with operator so the hash PK is the partition key which we are using the table partition key itself to the speaker and for hashes K which is the range or sort key it's not escape but the new attribute data here now the reason we had to use expression attributes name is that this data is a reserved keyword so you can simply replace data here I'm just gonna give you an iraq so what are the values so the partition key values so I'm using the table partition key format which is all cash then the org ID and for the value of the data it has to be EMP hash which is the prefix of the employee and I will just search Minaj and you can remember the full name is manoj for Nandu but let's subscribe to search for minaj and see if we get minaj Fernanda there and I'm executing the query here now the index is created properly so I will take and terminal and I will run this query Nord indexed ojs hit enter there we go I get this record ma node Fernando now if I search for John Joh M run the query again I will get John Doe how about if I just add only J here let's see I will run it again there we go I get both John Doe and Jane Doe now let's query for projects as well so the only thing that we have to change is the prefix so instead of EMP it has to be PR oh and let's type project clear the screen and then I will run this query so I get both Project X and project Y if I only search for Project X then it should only return Project X for me so it is the end of demo so I think it's very well demo on how to use GSI overloading thank you very much for watching guys I'll see you in a new video