hey guys my name is vardan and on behalf of Eda I welcome you all to this webinar on data warehousing and business intelligence I'll be your instructor for today and I will be teaching you why any company needs to do business intelligence and thus thereby do data warehousing so without wasting much time let me show you the agenda for today so these will be the topics that I will be covering in today's session and this will be the order in which I'll be covering them okay the first topic that I'll be talking about is what is the need for business intelligence and after that I will talk talk about the need for data warehousing so business intelligence is one of the most important aspects for any company to grow well and to do good right and data browsing is among the most important activities of business intelligence so that's why these two things are interl and that's the connection these two have so you can think of data overing to be a kind of a subset of business intelligence so I will talk about these two things and after that I will talk about the key terminologies that are related to data warehousing architecture right and some of the key terminologies are those of olp and olap the differences between the two okay the olp is somewhat very similar to the databases and olap is what represents data wehousing so when you understand the difference between the two you will also understand the difference between a database and a data warehouse you will also understand why a data warehouse suits business intelligence more than datab base so that's about these two topics and then we'll talk about ETL so ETL stands for extract transform and load eil is a strategy to convert your data from your database onto your data warehouse right so moving the data from from one place to another that's all done by ETL so we'll talk about ETL and detail all right and after that I'll talk about what a data Mar is and then what's at metadata now these two things are two topics which I can only explain once I have given you an introduction to the other topics all right so any of your any doubts that you have during the session you can ask me at that time and I'll clear them right away and once I'm done teaching about all these four different topics that related to data warehousing architecture I will show you the complete architecture and the complete life cycle of data and what kind of insights your company can get and what kind of advantages you can get out of data warehousing right so data warehousing architecture will be the last Topic in my presentation and after my presentation I will show you a demonstration of creating a data warehouse where I will import data from a database and store it in a data warehouse right so this will be the topics of today's session I hope the agenda is clear to everyone if you all agree with me please acknowledge that and also do acknowledge the fact if you can hear my voice and see my screen so if everything is fine I can get started with the session you can acknowledge and put any of your doubts or queries during the session inside the chat box that you see on your right side so I'm getting a couple of acknowledgements from everyone all right so Rodney says yes rajes says yes all right guys and a couple more people are also able to hear my voice and see my screen great so since the agenda is clear everyone let me get started with the first topic that is what is the need for business intelligence okay so we'll understand why business intelligence and data rousing are among the fundamental and the foundation for any company's success so why do we have to to go for business intelligence right business intelligence is the activity which contributes to the growth of any company and there are also so many mnc's which have been established over the past few decades now how did that happen they just didn't happen by luck right so they were all small ideas they were small companies that started with a small idea and then they grow bigger so that's what any company that wants to do good that's what they do the first thing is they plan what they want to be and depending on their plan then they start Gathering data okay now once they gather data they know they are in the right Direction now so they know what to do and how to do it and then they do further data analysis on that they make up their plans and they come up with strategies they come to know what is the important thing that needs to be done and all these things so when they finally have a concrete plan then they execute it into business action and once those actions are taken then they're all good right that's when the business starts to grow that's when the company gets back all their investment and that's how actually any company grows so any company that has done well over the past few decades be it Microsoft or Google or Facebook or Amazon Salesforce all of these companies that have all grown from small ideas and they've become something big right and any startup that's also trying to do great nowadays even they have to adopt the same strategy and the same plan this is a very common thing and this is something that everyone knows okay but this is not what I've uh you know come to teach all in this section what I've come to teach all is something about data warehousing and that is one of the most important strategies or activities which is part of business intelligence right so before I talk about uh data browsing let me go into details of a business intelligence so what exactly is business intelligence bi is the act of transforming raw or operational data into useful information for business analysis right so bi here stands for business intelligence that's the short form and yeah it is the act of transforming any raw or operational data so when we say raw operational data it's basically the data that you've collected the data that you have about your business so it can be even if your company is starting from scratch then whatever data you've gathered so you've kind of got to take that data and convert that into useful information right so that you can plan and make strategies and if you're a company that's well established then you have to look at your past results how your company has done over the past six months or the past the last quarter or the last year or two and then make come up with proper plans for your future so when you do this then this entire Act is called as business intelligence and how does it work and this working of business intelligence is with respect to the it technology okay so bi which is based on data warehouse technology okay this is the key term that you got to remember the bi is based on data warehouse technology extracts information from a company's operational systems and the data that is extracted is first transformed and when we say it's transformed it's cleaned and integrated and then it's loaded into Data warehouses now the thing here is there'll be data in many forms it can be in the form of flat files it can be in the form of data bases if it's a Running Company and if you're trying to do good okay if you've been the working for a number of years then you'll have data about your past success about your sales data your marketing data your expenditure all these things you might have stor it in any form maybe in the form of database maybe in the form of excels flat files so all these things so these make up your data source right so you have this data over here and this data is first transformed okay it's of course it's first extracted then inside the data warehouse It's transformed which is cleaned and integrated okay so once it's transformed then it's ready for you to do your data visualization or data analysis it's in a form in which you can get insights and this is the data which the end users will be using so you will have your data analyst in your company right your data scientist your data analyst and all the other people your managers your and the other guys who call the big shots in your company so all these people they'll be getting they'll be using this data to make your analysis and that's the role of data warehouse it is sandwiched between these two end points and this serves as the basis or the springboard for success and finally since the data is credible it is used for business insights yeah this is again something that I just spoke about right so now you'll have a better understanding of how business intelligence works right so you all know that business intelligence is something important and uh how important and how good it is how does it work is what I've explained in this slide so you all guys all agree with me here anybody has any doubts Rodney rajes Jacob okay pretty good all right nice okay so I have a question here from Jacob Jacob is asking is Data Warehouse the only thing that's needed okay Jacob this is just an overview of uh business intelligence okay this a rough diagram and it's actually not just data warehouse so data warehouse is something that I'm concentrating on today's session okay so that's why we have data warehouse there's another important step called as data visualization so that visualization is done by end users right since it's done over here it's not mentioned this diagram but what you got to remember is data warehousing is probably the springboard without data warehousing the visualization cannot be done and the data from the sources right from here it cannot be directly used for any other purpose so that's the role of data warehousing so that's what a data warehouse is and uh yeah are you clear Jacob now okay great that's fine so U let me go to the next slide then now that you know what exactly is business intelligence let's let's go into details of data warehousing let's understand the challenges in achieving business intelligence so first of all why should we use a data warehouse because data collected from various sources and stored in various databases cannot be directly visualized okay now look at this image here in this diagram you have different databases like Oracle you have the sap base you have the Microsoft SQL server and then you have other databases like uh SQL database and all these things okay you can also have flat files in this list so all these make up your data sources you as a company can s data everywhere so if you're a small company you might just deal with Microsoft Excel and you might just use small analysis tools okay but if you're a big company that has a lot of data coming in so if you're a retail company then you'll have to store details about your sales your marketing and what's been your growth so for all these purposes you need big databases right so all the data will be stored in all these databases okay now the problem is some teams in your company might be using one database and the other teams may be using another database now the biggest problem that people would find while they are doing visualization or doing analysis is data is there in different databases and they'll have a tough time integrating them right now that's where data warehouse comes in and that's where data warehouse CES data warehouse it integrates data from all these databases and then processes that data and brings it in a form that is very easy to do visualization okay that's what the second Point says the data first needs to be integrated and then processed before visualization takes place now this is the problem that you have with the regular databases okay the data from here they cannot be directly used for visualization and since data warehouse can do that since it can integrate data from multiple data warehouses and since that data can be processed easily since it brings the data in a form which can be easily visualized that's where dataow has the advantage that's where it scores so that's the problem with database and that's the advantage that data warehouse has in fact data warehouse is more like an act it's a discipline which is followed by people okay these are actions which are adopted and strategies which are taken okay that's what data warehous is and that is the role it plays in us doing very good visualization all right so now this should be a little more clear for you people as to why data warehouse plays the key role in the whole bi aspect right okay great let's go to the next slide then now let's understand in detail what a data warehouse is now a data warehouse is a central location where Consolidated data from multiple locations okay or databases that's what locations mean from multiple locations are stored now this means this is exactly what I explained earlier right so you have a data that's coming in from multiple data sources you have all the data you consolidate all the data into one single place and the data warehouse is maintained separately from an organization operational database now DW here stands for data warehouse all right so and here it says that data warehousing is maintained separately from an organization's operational database okay and uh yeah the DW stands for data warehousing and the reason a data warehouse is stored separately from the operational database is because the data should not get affected so you will have your operational data on one end okay where all your legacy data will be stored where all your even probably your real data will be stored so all your transactions all your sales all your marketing all your operations data all these will be stored in one place and in during the act of data warehousing where you're doing that when you're making analysis when when you're using that data you don't want that to get corrupted right so it's more like a backup so for backup for purpose your operational data is separated so you have your operational data you keep it in one area and then you create a new database okay in fact it's called a data warehouse okay so you get all the uh data from multiple sources or maybe from a single Source get it into a data w and from here you do your analytics so the process of getting the operational data into your data warehouse that's called extraction transform and loading okay now when you've done these three things you form your data warehouse and from your data warehouse you use the olap strategy okay so olap stands for online analytical processing so you use this olap strategy or well this analytics processing for the business users to do analysis so it's there in the name right it stands for analytical processing so the business users whatever analysis they want to do they do it because there is uh the option of olap and then along with the analysis they can also do visualization for visualization you have various other tools like tab blue and click view right there are some amazing tools so you can get this data you can get it into the data warehouse and the data warehouse also it can be stored in some kind of database it can store this uh data back into some kind of Oracle or SQL server or maybe even an Excel and when you've stored there then you can do your olap activities there and also you can import that data into your various visualization tools like or click View and thus you can you know get insights you can get insights into your data you can deliver presentations uh during your board meetings you can show your findings to your superiors or your managers you can do all these things so that's what a data warehouse is okay and then the next point we have here is end users access it whenever any information is needed yeah so this is again the same thing right so once the data comes in from the operational system it's stored in the data warehouse it stays there so this data is not going to change so whatever change you R your operational data that can be done okay you can modify this data you can update it you can delete data here you can do all these things but once your data comes into your data warehouse it cannot be deleted okay you can maybe modify things the worst case scenario you can modify data here okay but it's highly advisable not to but of course you can so that's the thing but yeah the key point you got to notice the end users can use it anytime they can probably access a data that is 10 years old or 20 years old and all these things and uh how can they do it by using olap so they can do the analysis and they can run it over different times right so these are a series of snapshots So based on you can find data analysis like what happened at this particular day on this particular Year all these things you can see what kind of product was sold how many customers bought which product all these details can be easily gathered and accessed from here so that's what we say when you know uh the data can be accessed at any time by the end users or the business users so the business users here are typically those of managers it can be managers or people who are uh leading board meetings who are making a plan for the next quarter or the next uh half year or the next one year and all these things so guys yeah that's that was a question which Jacob asked guys so Jacob asked who will be the end users who will be the business users so that's what they are okay people who will be using this data it can be even data analysts or data scientists and all those people right guys are you clear Jacob okay great and there's a note okay the data warehouse here it is not loaded every time when new data is added to the database so what this means is you have data coming into your operational data okay this operational data will get updated every minute probably every second if you have a 247 working sales team then they'll be making sales around the clock right so as in when any sale happens the data will be added to your data base to your operational data but that not necessarily needs to be added to your data warehouse also so what you find in your data warehouse is Legacy data right it's historical data which you can use to perform analysis or all those find insights the operation data if you have new data coming in here this has to be imported and this has to be moved to your data warehouse first and then once it's moved to your data warehouse from here it can be used for analysis and all these things by our end users so that's what this uh diagram here means and that's what the last point also means okay data warehouse is not loaded every time new data is added to this database okay so I hope it's all clear Rodney Rodney and Rajesh okay great so fine then if you guys have understood what data warehous is I can go to my next slide and talk about the next topic so that was about data warehouse now let's look at the advantages of a data warehouse when we compare it to any database or just regular flat files and all these things the first Advantage is that strategic questions can be answered by starting Trends so this is the biggest benefit that you can get right your data analysts and data scientists they can answer strategic questions they can read the past data they can predict your future also by coming up with uh by having their strategic questions be answered because strengths can be analyzed using the data warehouse it's basically the data that is stored in your operational data also but it's just that it's easier to study Trends on your data warehouse rather than datab base okay because rajes here has a question guys rajes is asking why not a database what is it that a data warehouse can do that a database cannot do so uh guys that's what I was answering correct rajes remember the first thing is let me go back to my previous slide for this okay so you have your operational data here so you have all your data here which is probably Legacy which is even real time all these things will be present here but in your data where house you'll only have your legacy data you'll only have historical data you won't have realtime data here but that doesn't mean it's not you know it's any lesser than operation data since you have your data completely here you have the freedom to do your analysis and you also have your freedom to do your data visualization okay so that's one advantage and the other Advantage with the data warehouse is that your data will be coming in from m multiple sources right we'll have data coming in from multiple sources your tables will not be related to each other even if it's from the same database you'll have multiple tables for multiple teams right and you can't easily integrate all these tables because they'll be separated okay that is one big problem that you will face when you're are doing the analysis or visualization but uh in a data warehouse it will be stored such that all the data will be interlined right it'll be related by using schemas or all these things so you have a different schemas like Star schema uh n schema Galaxy schema and all these things so you have all these dimensions and facts all these Concepts using which you can relate your tables you can relate your data so all your rows and all your columns here which are unrelated which are stored in separate databases or in separate flat files Separate Tables they can all be integrated they can all be stored together so there'll be a relation so every single row or every single table will be linked with each other and when you do an analysis like that then you can probably pull data across the uh database right so whatever data stored across the various databases you can put all that data and Link them all together by running one query and you can get all those details in just uh by just running one single query using your data warehouse so that is the advantage uh rajes right so did you get it rajes this is the second Advantage okay that's why business users they they prefer to use data coming from data warehouse so this is a more structured and this is a more related data when you compare it with the operational data and the basic data source all right fine fine fine very good very good I hope you've you've cleared your doubts I hope that's the same thing with even others right okay even more clear great great Rodney is telling now I've got it even more clear fine fine very nice that that was a very good question good you asked me there and uhu yeah I'm glad you stopped me because I could explain it in a better way so anyways moving on that was about data warehouse and talking about the advantages of data warehouse I spoke about the fact that you know you can IC questions by studying all the trends by studying your past data you you will have all the graphs you can have the uh pictorial representations right you can see what was whether the trend is growing or not which product is getting sold uh how better is getting sold all these things you can easily read by using a data warehouse because data warehouse makes your data more readable information so that's the thing so you guys must know the difference between data and information right so information is something that is processed processed data is called information so information is easier to understand easier to relate to and easier to use now that's what data warehouse does it takes you one step closer to information right so that's the advantage and uh yeah the other thing is data bring is faster and it's more accurate yes this is something that's completely true because in your database you'll have loads of data you of course you'll have historical data and realtime data but the thing is it's not going to be as fast as data warehouse data warehouse you will have links there you will have links you'll have tables you'll have relations between the various tables and because of all these things you can easily gather and you can easily access data here and the data that is gathered from here it's also more accurate because there won't be much change because there's not going to be any question of real time data coming in and changing things around right so you made so whatever uh you know processing or analysis it's done based on the past data that is stored in the data warehouse and that makes this data more accurate it makes it more stable so stability is the key word here and and stability is not something that you can have all the time in database but you will have it with the data warehouse so that's the second big advantage and in fact there are many more advantages right so data overh housing is something that you guys will understand when you start implementing so in the demo session that I'll show you later today that time you'll understand you know you'll understand the other advantages with data warehousing okay and one important point that you need to notice that data warehouse is not a product that a company can go and purchase okay it needs to be designed and uh it depends entirely on the company's requirement so like I said your database is something uh it's a necessity right your database or where you're storing your data your data source is something that you have to have and then your data warehouse is something that is designed and uh which completely depends on your company's requirement based on your data source based on the requirements that you want to get out of your data source out of the data that you have in your data source you can come up with a way to design your data warehouse right so data warehouse is more of a concept and it's a strategy and it's not an end product but it's not a tool or something that you can use you have multiple tools to implement data warehousing and uh the thing you got to notice data warehousing is not a product okay so it's a strategy that you adopt to make your data more readable and uh make your data in a better fashion okay so that's the uh biggest Advantage with data warehouse so look at this guy here okay he'll just run one query on the data warehouse okay now what the data warehouse will do the data is taken from the operational systems all right and in fact if there are multiple operational systems then all those uh multiple data from multiple operational systems will be integrated together and then that will be standardized and any inconsistencies there in that data will be removed okay now these are the three important things the data is taken from the operational systems and uh that data if there are multiple operational systems those will be integrated okay and then the data will be standardized and any inconsistencies will be removed and uh once all these three things are done then it can be stored in an easy format which can be uh you know which is very suitable for analysis and access and that is what a data wouse is so whenever he runs a query on this kind of a data wouse which is a process and which is ready in such a fashion you can get the result quickly and uh this result will also be more accurate all right so this is the big Advantage with the data warehouse so I hope this is clear it's a pretty simple concept here and uh it's just an overview of what I explained in the previous slides right right Jacob rajes Rodney all right okay so moving on there are four important important properties that a data warehouse has okay and uh these four properties are based on what Bill inmon said bill inmon is the father of data warehousing and uh initially he defined data warehouse as a subject oriented integrated time variant and nonvolatile collection of data in support of the Management's decision- making process okay so when we say subject oriented it means that the data will be categorized and stored by the business subject rather than by the application now let me get back to this point after I finish these three okay now this is the most uh complicated Point okay now talking about integration right he said that data is integrated so the meaning here is data on a given subject is collected from disparate sources and stored in a single place so this is something you people know data is collected from multiple sources and they all stored in one single place so you don't have to uh you know go about searching for data in different tables or different uh sources and all those things and then your data it is time variant it is stored as a series of snapshots each representing a period of of time so when you do your analysis you can do it based on a series of snapshots of time okay you can see what was your company status on this month uh that year or on this month this year what is the progress that's been made or if it's not a progress if it's the same if your growth has been stagnated then you can find out what are the metrics what are the reasons why that has happened you can find all these things and you can look at all those details from a Time approach right from a Time variant approach so that's what a data warehouse the advantage here is okay that's one of the properties and the advantages that you have and then data is nonvolatile the data in a data warehouse is not updated or deleted so this is what is the other property that I mentioned earlier once a data comes into a data warehouse it cannot be deleted or neither can it be changed in fact it can be updated but the process of having to update it is a little complicated okay but of course it can be updated and deleted so that's the thing but it's highly recommended not to update okay so that's the advantage with the data warehouse and since it will not be changed there's no question of it getting corrupted and that's why uh doing analysis and all these things are you know a better option now getting back to the first point we said that it is subject oriented right data is categorized and stored by business subject rather than by the application now what this means is the data here will be stored or uh the data that you will that you retrieve from a data warehouse right you will get it in the form that you wanted to now if you want to give me an example example of that let's say that we are dealing with a retail company and in my retail company I have a marketing team I have a sales team and I have a operations team and my sales team kind of keeps keeps track of all the sales that happens over a period of time okay let's say the last one month whatever sales they have done they have stor all those details and then you have your operations team which will make sure there is a smooth running of all the process once the sale is done there are various other activities involved right like shipping the product and uh you know all these things shipping and coordinating the uh transferring activities and all these things and then your marketing team is someone is uh probably that team which would take care of your sales which would ensure that uh the right leads come in to ensure that the right uh people get the right kind of service and it's all about acquiring more such sales right so your marketing team is on top of the funnel and they do all these things now if you want to integrate all these details if you want one single view of them and you want to find Details such that in this particular month what was your sales and uh what was the kind of operations that was done right what kind of service was given to those customers and from how they became our customers so when we the question of how they became is something related to marketing so if you have a question like this where three factors are involved then at that point of time it's your data aers which comes to the rescue because when your question is related to this particular time and uh these three different metrics sales marketing and operations then all these things can be integrated and you can get one single view from your data warehouse this is what a datab base lacks correct so uh you know get integrating all your different data sources and you know storing them together and making them ready for Anytime access is the biggest advantage and the most important property of a data warehouse so if you guys had any problem in the previous slides then I'm pretty sure this is something you have really understood after this slide because um this is probably the epor of data warehouse right these three properties if you understand these four properties then you're pretty much ready to understand the next part the next uh you're ready to go to the next step in data warehousing right so by now you should have understood how important all this is why you know how important business intelligence is and what kind of role this data warehouse plays you can just think of how humongous deal data warehouses correct yeah so couple of people are satisfied with that so uh now that we've spoken about the properties let's go to the next slide and okay now we have to talk about key terminologies okay so right now we've understood data warehousing from a higher level okay now let's uh dig deep let's go to more Basics here okay let's understand the key terminologies that are related and that are involved in a data warehousing so first of all we have olp and olap okay now there are four things which I will talk about the differences between olp and olap okay then I'll talk about edl I'll talk about data mat and then finally about metadata okay so let me go to the first topic that is olp versus olab okay so in uh this part which is olp olp stands for online transaction processing okay now this is something that uh is a representation of that of a database if you're running any kind of uh queries on your database then that's called online transaction processing okay and then olap stands for online analytical processing and this is the property of a data warehouse so any kind of query or any kind of analysis that you run on your data warehouse that called as an olap activity correct so let's go into the differences between the two so first of all any data that is stored in a relational database right in an which involves OLV that contains the current data as well as past data okay current data as well as past data but with respect to data warehouse and uh while performing an overlap you will be dealing with only historical data here okay it contains only historical data and the data that will be stored in your database okay when you use the OTP then those queries will be used ful in running your business okay when you have to run your business like if you want to store the data of the number of sales that has happened today like every time a sales happens then your records in your database has to be updated right so that's what we say so when you update your record with the latest details of your customer then that is what is the meaning of useful in running the business okay but an olap is something that is useful in analyzing the business so here the kind of activity that you would do is that of finding out details like at what time how many customer customers bought the products or at what time which customer bought which all products all these kind of questions will be answered with the help of olab okay and then the whole oldp model of accessing data of accessing or quering data on a database it's based on the entity relationship model okay but whereas with the data warehouse it's based on the star schema or the snowflake schema and the fact constellation schema okay so it's called also called as the Galaxy schema so all these three things will come into picture okay and then your relational database it provides A Primitive and a highly detailed data so since you'll have one database if you run one kind of a query like a select star from this particular table then it would give you all the details that is stored right so you cannot filter too many details with respect to the data that is stored in your database okay of course you can but the level of uh you know the filtering and Analysis that you can do is uh not that much so the data that you will get back from your query right the result that you'll get back from your query will be highly detailed okay and it will not be exactly what you want so it will not be that accurate but whereas with the olap when you do an oap on a data warehouse it provides a summarize and the Consolidated data it will point to you exactly what you want to look for correct so it will uh it's a very processed data and it points to one particular aspect which matters the most so that's what this is and then the oldp you use this for writing data into the database okay so like I said the same sales example whenever a new sales happens your database has to be updated right with the new records of the product sold of the customer who bought the product and all these things so you basically use it for writing data into the database but your data warehouse is primarily used for reading data from the data warehouse so writing to the data warehouse is something that is done so that you can do the reading from the data warehouse okay the primary concept here is to read the data from the data warehouse and to do the analysis and all the visualization activities but with the database it's more of writing the data into the database all right and uh the size So speaking of the size a databases size would range anywhere between 100 MB to 1 GB okay and this is also a very big number 100 MB is a very less number I would say and 1GB is also very less so it would typically be much more than this but compared to this uh range if you look at a data warehouse a data warehouse size ranges from 100 GB to 1 TB correct so your data wouse will have all the historic data and it will have all the relation ships between the different datas right such that you can do your analysis straight away so since it makes all the data more efficient and stuff the data here the size ranges from 100 GB to almost 1 TB so that is what a data warehouse is and that's the part of a data warehouse all right and um I can actually show you the difference between the two in today's demo session okay later during the session I'll show you that the size of the source file that I have I will show you the different sizes of the two files that I'll be using as a source and then after they are processed the data that I'll store in my data warehouse I'll show you what is the size of that data so there'll be a big difference between the two okay and the data warehouse will be more than the database I will show you that aspect later all right so and that's about this point and then yeah of course this will be fast okay database is fast and it provides high performance all right so your data V of course it's not as fast as your database but it's highly flexible it's highly flexible because it gives you different views so you have something called as the olap cube right so using the olap Cube you you can uh get the you can look at insights from different angles different perspectives and different views of data you will get so that is the uh big Advantage here okay and the number of records that is accessed it is in tens but whereas with the data warehouse the number of Records accessed is in millions all right an example of this can be all the bank transactions made by a particular customer so if there is one customer and whatever transactions he's made he will get all those uh you know all those details right so supposing take the example of any email statement that you request for any email statement or Banks account statement all these things so they are an example of OLV so whatever details are present in the database those will be given to you but whereas the bank transactions made by a customer at a particular time okay this is a more filtered query and the answer also will be very accurate and point you to exactly that particular question so that's what an noap is so it's not going to give you an overview here it's going to exactly point you to what you want it's not very much in detail but it's more accurate correct so that's what an olap is and that's the difference between olp and olap okay so these are the two strategies that are used for you know accessing data olp is used for accessing data in a database and olap is used for accessing that data in a data warehouse all right so I hope even this SL is clear to everyone all right guys okay if you have any doubts then that will be cleared by this uh slide okay because the examples are there for both these strategies so if you want more examples of an old TP then one would be that of Supermarket server which records every single product purchased at that market okay so every single product in their history or probably in the last one month all these things can be accessed using your olp okay from your database you don't have options to do much of filtering here and then another example is that of a bank server which records every time a transaction is made for a particular account okay every time a transaction is made in a particular account the data will be updated in that table and uh you can query that kind of data and you'll get that result okay another example is that of a railway reservation server which records a transaction of a passenger so whatever activities at passenger does so all these things will be recorded by the reservation server right so this is one example but when you look at olap they will be much more detailed the queries here will be much more detailed and the answers will be much more accurate and very crisp okay so an example is that of a bank manager wants to know how many customers are utilizing the ATM of his Branch okay because maybe based on that he may take a call whether to continue that ATM or relocate that ATM to a different place right so this guy this bank manager would want to you know first of all understand if there's any use in having the ATM in that place are people using it correct if people are using it then how many people are using it at what time are they using it or would it be better to have the ATM in a different location where it's much more easier for people to access it so all these kind of questions will be answered only if you have olap in place okay only when you have a data warehouse in place not with the database so another example is that of an insurance company that wants to know the number of policies each agent has sold okay this will help an better Performance Management of Agents so you'll have multiple agents in your company and you would want to know their performance right you know you You' want to know who is the best performing agent you would want to know how they're performing why is this person performing better you can you know optimize each of their performances so all these things can be done with the help of a data warehouse and they can be done with the help of olab right the olap strategy the olap activity that is done on a data warehouse so that is the difference between the two right so I hope you're getting the differences here guys right oldp and olap okay since it's all clear let me go to the next slide now the uh second important terminology that we have with data warehouse is that of ETL extract transform and load so by going by the definition ETL is a process of extracting the data from various sources transforming this data to meet your requirements and then loading the data into a Target data warehouse okay you're extracting the data from here right you're extracting it and then you're transforming it into the way you want in a more readable fashion in a more relatable fashion then once that is done you load that data into a data warehouse and the whole process of getting the data from your data source to your data warehouse this is done by the ETL the activities of extraction extraction transformation and loading so we have popular tools for for this very process so you have tools like Talent uh Informatica you have uh irin all these things and Informatica and talent are probably the most popular tools for this process for extraction transformation and loading data into a data warehouse right so uh this is something that uh you should have understood by now any doubts guys because I don't want to waste much time I want to go to the next slide and teach the next concept okay great so the next one is data Mt okay now if you've understood so far ETL then half your job is done because data Mt is something that's very close to a data warehouse and you don't have that much of difference when it compared to a data warehouse but the basic difference between the two is the data Mart is uh just the same data warehouse itself but a smaller version okay so let's look at the line the definition here the data Mart is a smaller version of the data warehouse which deals with a single subject okay the data marks are focused on one area hence they draw the data from limited number of sources and the time taken to build data marks is very less compared to the time taken to build a data warehouse now to give you an example or an explanation of this in your data warehouse you will have all your details right all your details that you have so this itself is more refined okay but your data Mars are smaller versions of that data warehouse which is uh used to satisfy only certain users supposing you have one particular user base okay uh that is your sales team or your sales manager who wants to use only your sales data then that can be done with the help of a data M and similarly you have a marketing team who wants to access all the marketing data then that can be done by using a data M okay and then again you have other operations team that wants to do who wants to access the operations data then you can give them only access to that data by using a data Mt separately okay now the need for data Mart is first of all your data warehouse will store all the data okay your sales data marketing data operations data so all the data coming from different data sources so you'll have multiple data sources right all that will be stored in one single place and from here this will again act like separate data sources right so for the sales team this will act as a data source for a marketing team this will act as a data source and then for the operations team this m will act as a data source now the now answering the question as to why we need them it's to you know probably give more security to enable more security and uh Integrity because since all your data will be there in your data warehouse and if you let all your uh your entire company access the data warehouse they'll get access to all the details and all the uh all the work that is done by other teams okay so there might be multiple teams and there'll be multiple strategies which you might not want to reveal to other teams so at that kind of a time you can divide your data warehouse such that only this particular uh users okay your sales users or a particular user base gets access to only certain data from your data warehouse so at that kind of a time a data Mt is useful or you can have a second data M for only another set of user base for example your marketing users so they will get access to only the marketing data from your data warehouse right and then your operations data similarly which can be accessed only by your operations folks so that's the uh different advantages that you have since the data is all divided it's all stored separately different people have access to different uh different parts of your data warehouse so this probably brings more advantages right so that's the data Mart and that's why I told you that uh you know if you've understood so far till ETL then half your job is done but data m is something that that extends the functionality of a data warehouse right so that is the thing and uh speaking of the differences that is there in this table the data warehouse will store the Enterprise wide data right the Enterprise of the Enterprise wi data whereas data Ms will store only departments wise data there'll be multiple departments in the whole company in the Enterprise and they will store Department wide data and then the data warehouse it will have multiple subject areas okay but a data Mt will have a single subject area there'll be multiple data sources in case of a data warehouse okay but in case of data M there'll be limited data sources in fact there can be just one data warehouse right this will act as a source to your data marks but however we have said limited here because not always do you need a data warehouse there are also instances where your data source itself act as an input to data M okay now that is something I'll that you will understand in the next slide okay so just don't get confused when we see that limited data sources it doesn't mean this is the only source they can also come from a data a proper data source like a flat file or from a a database and all these things all right and then a data Warehouse it occupies a large memory this of course because there's a lot of data Enterprise Way data will be stored here multiple subject areas will be there because of that there's larger data here at stake and that's why there's greater memory that's occupied okay but in case of your data Mar it occupies limited memory because it's very crisp and limited to only a particular Department okay and then the other thing is data warehouse is long takes longer time to implement but a data Mart is a takes very short time to implement because once you have all your data where stuff in place you can easily divide them by just creating different data mods okay so the tough part is your data W so once you've got your data warehouse sorted you can easily form your data marks in fact there's even the other way there are also practices where you first build your data marks and once you've done that after that you create one single repository and that's when you create a data warehouse so there's also two approaches here one is the top bottom approach and the other one is bottom up approach so those are the two things and uh I'll go into details about uh these two approaches in my next session okay of course I can't do it today because we have very limited time all right so moving on to the next slide okay now speaking of the different types of data marks okay this is what I was talking about in the previous slide so you have three different types one is a dependent data Mark the other one is independent data Mark the third one is hybrid data Mark your dependent data Mar is uh the data is first extracted from the oldp systems and then populated in the central data bearhouse and then from this data warehouse the data travels to the data Mar so look at this example okay this is the standard practice or the uh the default approach where you have an oldp Source then you get that data into a data warehouse and then from the data warehouse you form a data Mar okay you'll have multiple data marks where each different Mark will have particular data from the entire data warehouse okay this is the first regular approach and then you have the independent data Mar which is a slight variation compared to dependent so here the data is directly received from The Source system okay you don't have a data warehouse in place over here that is what this line means and this is suitable for small organizations or smaller groups within an organization so basically an organization which is very small it might not need to go through the T of creating a data warehouse and stuff so they can just have an oldp source and from there they can just get the data in onto a data M and they can use it for various purposes okay that's what we have an independent data Mar for so that's the difference it just does not involve a major data Wearhouse so directly the data goes to a data m in fact you'll have multiple data Mars over here you'll have data m one data M two and stuff okay which will be coming directly from the olp source and then you have a third type which is the hybrid data M so by definition you might know what this is right by the name of by the the name itself is pretty obvious it's combination of these two the data here is fed from both the oldp systems as well as the data warehouse okay so look at this example for that instance you the data here it's coming from the oldp source as well as from a data warehouse so this is what the hybrid data Mar and depending on your company depending on the size of your company the requirements of your company or your organization you can choose one of these you can model your entire datab base and data warehouse in any one of these models either the dependent or the independent data March or hybrid data Mar all right so that's about the different types of data Mars so moving on to the next slide we have something called as metadata now people here from programming background or from the uh technology background you might all be aware of what a metadata is metadata basically is defined as data about data okay it contains data about where your actual data is stored supposing you have your raw data right where is that data stored what is the size of that data so these are the answers to these kind of questions is what will be present in your metadata your metadata will contain the location of your actual data it will contain the size of your actual data it will uh contain details like which was the source it came from and when was it created all these details will be stored in your metad data right so that's what metadata is uh so that's how different metad data is from regular data and metadata specifically in a data warehouse it defines the source data that is a flat file relational database and other objects so the reason that we give so much importance to metadata in a data warehouse is because take the example of any company that's having a 24/7 business okay they have a rolling sales team that uh Works throughout the clock uh the 24/7 they have sales coming in data will be going into your database okay now in this case you cannot always uh you know keep adding data into your data warehouse because you know that data warehouse is not uh real time correct so you'll have to manually update your data into your data warehouse uh probably at every day at a particular time maybe at 6:00 every day in the morning or maybe once in the night at 10:00 or at basically at regular intervals you've got to store data into your uh data warehouse and this whole strategy becomes difficult because you have to do this process every day so every day when a guy comes in in the morning he has to look at the new data that has come into your data uh data base correct and then from that uh database which is data source he has to add that data to your data warehouse now this process becomes difficult so that is where metadata comes in uh handy with meta data you define the address where your data source is okay you define the address of your flat file from where your data is coming in or your relational data from where your data is coming in and then you can also in fact store the metadata of your data V where you want the data to go so it basically you know saves a lot of your time okay and this is the most common and the most uh you know unspoken fact about a data warehouse so any professional that's uh dealing with data warehouse you would always be using metadata because it saves a lot of time because every time you cannot be importing data from a database you always have to get it from your uh metad data by defining rules and defining your source and your targets correct so metadata saves a lot of your time and this is something I'll show you a demonstration on okay and when I show a demonstration you will understand how easy it is so you'll just write the rule such that you have your Source over here and then you have your destination over here your target over here and once you've written this thing you don't have to go back here so every day uh it might pick up the data from here and it might move to your data warehouse and all these things it will update all the new details which are present in your database and it will add it into your data warehouse so that's what a metadata does it's a very big advantage and it's one of the best strategies that we can go for okay and then the final Point here is metadata is used to define which table is source and Target and which concept is used to build business logic called transformation to the actual output yeah this is what I told you right so it's used to define which is your source and Target and how to build your business logic called transformation so transformation is basically your act of converting your Source data into the form that you want to and what is the logic that you use correct so all the different uh the filter criterias all the transformation criterias all these things can be also done using the m data correct so even the process of getting the data from your Source your destination it can involve extra additional steps which can save your time and uh default process basically so every time you have data in a particular format you might want to store it in a different format and for all that purpose you can use a metadata right so since things will all already be defined here the work of metadata is just to get the data from The Source defined and uh do the set of activities that is required and which is already defined it will perform all those activities and it will store it in the place where you want it to so that is a role of for metadata correct so metadata is very very important it's very very uh very highly used and it's actually the most important or the let's say it's the epor of data warehousing okay this is probably the best thing that can happen to data warehousing right so that's the thing about metad data guys now going to the next slide we have the architecture so now that you know everything okay these four terminologies are are more than enough for you to understand the architecture okay so let's understand the architecture is so this is the entire architecture diagram okay so this is what you know data comes in from various sources it can come from either a database or it can come from a flat file and then that data an action of ETL will be performed on that data and it will go to the staging area okay this is called the staging area and this is the staging database and the data that is stored over here it this is temporary Data before the data completely moves to the data warehouse it will be present in this area okay and that is done by using the act of ATL and also between moving to the data warehouse the uh process of ETL continues so ETL process starts over here and it ends over here okay and between the conversion it is temporarily stored in a staging area and this is most often present inside the ETL tool itself okay like your talent or Informatica and all these things and then this data will be stored in your data warehouse so whatever is xaed transformed and loaded it'll be loaded into your data warehouse and in your data warehouse you'll have metadata okay and of course you'll have your raw data and then you'll have your aggregate data okay and this is the reason why a data warehouse is generally you know it's larger in size because it has uh not just raw data your database which uh from where the data is coming in it'll only have your raw data okay but your data wehous will have additional stuff here it'll have also your metadata and your aggregate data and together all these three things together is what help and you being you you doing your analysis sooner okay this is what powers your ability to do app online analytical processing okay so that's what a data warehouse is and that's what it stores and then this data warehouse you can either your entire company can use a data warehouse or if you want more of a Security based uh access then you can divide it into different data marks where your sales team and your uh uh different teams like here we have a purchase table and then you have your stock table so these are like three different Mars data Mars for three tables like sales purchase and stock so you'll have different tables here and different teams can access different set of tables okay your purchase can be something that's uh used by your operations people right and then this may be something that's used by your sales people and your stock maybe again used by your operations or your sales uh group so that's what each of these here uh you know defines you have your user group one user group two and user group three and each group will be getting access to different parts of your uh data warehouse because your data Val will be divided into different data mods and your data your different groups will get access to only that data which they want to or which they can get access to correct so this way no group can get access to every data that is present in a data warehouse and there's Advanced there's a little more data security in this case okay same thing over here so this group gets access only to this data and this uh data in this table or this uh Mt right and then this user group gets access to only the data that is present in this Mart and in this m so that's what the entire architecture looks like and this is how the data flow is right so if you guys have understood this much then you're ready to get a demonstration correct guys and this is also my last Topic in this presentation so right now I can go to my demonstration and what I'll be doing in my demonstration is I'll be using Talent okay so I'll be ex importing data from my database I'll be using I'll be getting it from my Oracle database and I'll be storing that into a data warehouse which can be ready for any kind of analysis or visualization on any other visualization tool okay so this act over here which I'm going to show you this is what powers your business intelligence right so are you guys ready for the demonstration any doubts here guys okay great Ronney says he's all ready he's all pumped up all right Ronney that's very good so Jacob's also ready and so is rajes are you rajes can I get a yes from you yes very good very good Al boy rajes so let's go so let's go to the next part of my uh session and this is going to be demonstration where I'll populate a data warehouse okay I'll be using talent for uh the data warehouse activities and let's see how we can import data from various data sources and create a data warehouse so for our Hands-On session I'll be importing data into my talent bi okay like I said earlier I'll be using my talent bi for my data warehousing and ETL processes and I will create a data warehouse out of using Talent bi okay now the data set that I'll be using is that of a 10,000 row table and a 50,000 row table okay so the uh there'll be one table having 10,000 customer details okay and then there'll be another table having 50,000 rows of transactions which each of those customers make okay now based on this data we have to find those customers who have the lowest number of purchases okay so right now my data set is present in my uh Oracle database okay so I have uh two tables and both my tables customers table and my transactions table is present in my Oracle database I also have them in my uh Excel so first let me show you how they are present in my Excel file okay okay so this is my uh customer table as you can see we have all these details right we have all these fields here so we have customer ID we have customer name we have uh the address of that customer City the country he's from the contact number and his email address right and uh if I do this then you can see that there are almost 10,000 Fields here so 10,000 rows are there here so this means that you know I have it's a pretty big data set and I can use this in combination with my other table okay so here we have customer ID as the primary key okay and if I go back to my other data set that's my transactions table you can see that I have further more Fields here I have those of invoice number I have stock code I have description I have quantity I have invoice date unit price customer ID and product ID okay so there's a customer ID over here also so the customer ID is the primary key over there and this is the foreign key over here so basically the customer ID is the same and I will have to do a lookup to that table with the uh help of my customer ID okay uh since these are two different tables I can create my data warehouse with the help of this particular primary and foreign key concept I can create a link and I can create I can probably just uh link these two tables with the primary key foreign key concept okay so um any doubts guys I'm sure that you all know what a primary key and a foreign key is and if all anybody has a problem please let me know okay so Rodney says I don't know what's a foreign key okay uh so see Rodney the thing is we have something called as a primary key and a foreign key okay and we use these two columns when we want uh when you want to use a combination of two different tables right so if you consider the example of uh this one then you can see that there are a number of uh columns here and not everything is matching with the details in that put in in the other table okay so only customer ID is the only common field okay so there's a customer ID here and even in this table we have a customer ID now the thing is the customer ID is a primary key because it's uh it's constant and it's Unique for each and every single record over here so each and every customer here will have a separate customer ID okay so no two customers will have the same customer ID okay and uh over here the transactions here which has the invoice number the description of the product that was purchased the stock code the quantity all these uh things are sorted by the customer ID okay so if you find a particular customer ID more than once then well here you can find the customer ID more than once that's because a customer would have made more than one transactions he might have you know come on to their uh you know he might have bought from the same person more than twice or Thrice or four times or something like that right so supposing I go back to this person and I buy his products since it's a retail store I buy products say 50 50 times then I'll have 50 different times my customer ID will appear here okay but however here I'll it will not appear again because uh it's a primary key right here so that's the difference so here the uniqueness is there that's why it's called the primary key in this case and here there's no uniqueness for the customer ID okay it can appear more than once and that's why it's a foreign key over here and what we're going to do is for every customer ID over here right so for every customer ID over here we're going to link it to uh this particular table with the help of uh this customer ID since the customer ID is a common field in both the cases I can make a table which would show which customer which would probably display all these details along with the other uh columns okay now that's because I have customer ID which is common in both the tables and using that I can link the fields that are there in this table and which are there in the customers table right so that's what I'm going to do now the similar thing is also there in my uh Oracle database so let me show you that so this is my SQL Developer GUI okay so I have two tables here I have customers table and I have transactions table right and then I have a final table but uh this is not something which is relevant so this is not something we'll be using but uh we have the same two tables customer table and transactions table as I showed you we have these two here and we have almost the same Fields over here right because uh if you see the customer table we have the customer ID we have the customer name we have the customer number and the email address okay and uh here we have additionally we have the address City and Country Now what I did is when I imported uh this Excel file into my database I did not I ignored these three columns right I didn't want these details because uh I was not going to use them in my data warehouse so if you see a problem statement here it it only says you have to find out the customers who have the lower number of purchases it doesn't mention that you need their country details or city details or anything for that matter of fact so so because so for that reason using the country City and address Fields would be a waste so that's why I'm not imported these into my data warehouse I have only the customer ID the customer name my contact number and my email address Fields okay this would be uh sufficient to find out the customer details okay and to find the purchases right and to see who has all the lowest purchases we can use the other table so in the other table that is the transactions table we have uh this quantity field right we have this column this quantity table basically represents how many products this particular person purchased so this uh person with this customer right he purchased 71 products right and then you have another person who is purchase 67 projects uh products and so on so you also have people who purchased only seven or nine okay now our problem statement is such that we have to find those people who have made less a very less number of purchases okay now if we Define uh uh less to be 10 then we can extract the data such that uh the data warehouse gets the uh details of only those people who have made purchases of less than 10 items per uh order right so per transaction Whoever has made less than 10 purchases so this person has made seven and this one this person has made nine right so we can get a list of for those customers right using the customer ID we can go to that customer table and get his name we can get his uh email address his phone number and other details okay and we can uh publish all those things and we can probably export them to another uh CSV file like this or we can put that into our database uh basically into anything which would support data visualization right so that's what we want and uh this function is what our talent is going to do okay and what I'm going to do for that purpose is I'm going to do a lookup from uh this table onto my my customer table right so this is my transactions table and from here I'm going to use the customer ID field to to uh look up to my customer table to optain their name email address and phone number and all those things so first of all let me introduce you to our talent interface okay so this was the Oracle SQL database which I showed you earlier and this one is the talent right so Talent Open studio this is basically the data integration version of the project okay now what I'm going to do here is uh I'm going to show you how Talent works I'm going to show you the layout over here first of all so on the left here you can see that you have something called as job design we have a standard you have demo and then you have various other projects here right so all these are the different projects that I have created okay and when we say job design so job design is what you use to create your jobs on the UI here okay so in our case in Talent whatever actions that we performed right we got to do it with with respect to a UI so we don't have a Cod this is not a coding interface okay so of course this Talent runs on Java but it we don't but Talent is known for its GUI okay so we can uh drag and drop the items on the job task so this is the uh uh job design taskbar and we can first of all create a new task okay supposing I want to create a new data ARS by importing details from another database then we can start by creating a new job design okay okay and then create a new project in that job design so right now we are in my local project so I'm going to create a new job design okay I want to say create standard job let's give a name to this particular job so let's say data warehousing bi session okay and uh demonstration is the purpose so I'm just going to copy paste the same details into my description field also and I'm going to say finish okay now uh what you can see this is the job designer so this is where I can get all the data from my different uh databases or Fields files and uh get them to my data warehouse so I can run my I can write my projects over here and then on the right hand side you have your pallet okay so from your pallet you can choose the files or the database from which from where you want to import your data okay supposing you want to import from a flat file then you can just go to this heading file here so under file you have input right so under input you have all these options so if you choose any of these options here and if you just uh drag and drop them over here then you can use them as a means to import data you just need to set the path where the Excel file is stored okay and uh you can uh choose that and when you just enter further Fields you can get the data in from these Excel files to your data warehouse okay to your talent okay but uh Excel files is a very simple thing and I don't want to show you that let me show you how to import this large data from our database right so we have also other options here like big data Big Data int business intelligence we have business we have Cloud right so we have Integrations with a numerous Technologies here and uh what we would be conting now on is that of database right so I have my Oracle database so I'm going to go to Oracle under databases and under Oracle I'm going to say t Oracle input okay let's see where is it t Oracle input is right here so I'm going to just click on T Oracle input and paste it here okay I'm going to drag and drop it here and now that I've done this I can use this as a means to get my data in from from my database okay so this is my uh Oracle database right so let me configure my input first so first first of all I need to configure all these details with respect to my host the port number on which it's hosted the database name the username the password and all these things okay so uh let me first start off from the host uh it's hosted on Local Host right it's not on any server it's not on any remote server so I'm just going to say Local Host and the port number is 1521 so these are details that I got when I was uh when I installed this Oracle SQL Server right so you'll have to probably enter the same details when you create one and 1521 is actually the default port on which oracles Oracle servers usually host it so you shouldn't have any problem there so database here is xe in my case so okay and then my uh username is Ban and the password is something we have to set over here okay all right now I've done this my connection type is Oracle SSID my database version is 11.6 all right so all these details here are correct now let's check my schema the schema basically is what is going to uh uh you know map your fields from your database to your talent right so this is your data wouse so we have to define a schema here to add details here okay so uh now that we have all these fields here now let me go back to my uh database right so we are in my customer table and in my customer table I have four Fields customer ID customer name customer number and email address so let me add uh these four fields in my talent okay First Column is going to be customer ID this is going to be customer name this is going to be customer phone number this will be customer email address all right so that's all fine basically the data type that it's coming from is all Vare okay so if you see here the data type that is uh it currently is is in is Vare okay they have different sizes though but email address be email address or contact number or name or ID all are in Ware to type so I'm going to just uh say the input also to be as uh Ware right so it's all Ware 2 everything is technically Ware 2 it's it's picking up uh the same data type by default all right so let's define the length over here though let's say the customer ID is uh going to be five characters in length I'm going to say the length for the name is going to be around uh 30 the phone number to be around uh technically it's 10 so let's do a buffer of another 10 and say 20 email address is going to be another 20 characters okay uh the customer ID let me just increase the length to 10 so these uh these are the length of the fields uh which are going to be stored Okay so I'm going to set the length over here and I'm going to say okay now what I going to do is I need to set the default query okay so I'm going to say select star from customers select star from customers okay because customers is the name of the table in my Oracle database right so I'm going to choose the table name here I'm going to say the table name here also okay there's something I forgot earlier you got to mention the table name that you want to import from and you got to specify the query you want to run okay now let me run the same query in my Oracle and let's see what details I get so I'm going to feed the same query here okay so it's going to be select star from customers so when I see this I have this table option here okay now let me just uh run this query okay so I've got the same Details In Here Also right so my uh table name is customers so let me go to my talent and uh set that it's set as customers already and this is the query same query I'll be running here by default all right I'm going to save this so uh I have set my first data in right so I've got my customer data in here so let me uh say another T Oracle input and I'm going to configure another input for my transactions table right so I have another table for transactions so let me set that also and as you can notice every time you add a new uh input or configure a new output then you'll have to add these details right you have to add the host name database name uh username password and all these things so instead of this you have a shortcut here so you have a shortcut with respect to metadata so metadata is something that I explained earlier right so it basically contains data about your data so under metadata tab if you see you have uh DB connections you have file all these things right so so if you want to create a standard input for data coming in from a particular database then you can just set it once over here okay so under DB connections you can set uh one connection and it can be used in any number of jobs in this particular project so instead of uh uh so this time probably I'll have to anyways enter the uh details of this particular table okay but uh if I set the same thing through my metadata here then I can use the same metadata every time I want to import data to my uh Talent Warehouse so what I'm going to do is uh this time let me show you how to do it through metadata I'm going to say uh create connection you want to say the name of the connection is DB input all right let's say database input purpose to get data from Oracle to Talent let's have the description also and say finish okay we got to choose a database type over here okay now the database type that I want to import is that of Oracle right and I'm want to do it with the help of Oracle with the service name so when you choose that you have your uh login you have your password and all these details so in place of your login give your username here so in my case it's well done okay password is uh this one the server is Local Host and the port is 1521 service name is xe okay now to verify if the connection is correct you can click on check over here so as you can see the DB input connection successful uh message has come okay okay this means that your talent is able to successfully communicate with your Oracle which is hosted on port number 1521 okay so now that I have done this I can straight away say uh finish and add this metadata over here okay now I'm just going to click on the drag and drop this DB input over here when I do that I can choose this to be for either input or output or anything here because under Oracle you have all these options right so I have chosen this metadata for any of these options here so if I want to use it as input I can do that also if I want to use it as output I can do it even at that time so I'm going to say t Oracle input and I'm going to say okay now as you can see my uh DB input here this is a metad data which has these properties already inbuilt okay so in case of this one I had to give it manually but here my metadata I specified it once and it's already replicated so the advantage here is that even if there are 10 different tables from which you want to get data in then you don't need to add 10 different database Connections in your Talent right when you're creating your metadata if you create the database connection once and then uh using that one time you can establish 10 different connections you can uh get it from 10 different tables or replicate the same thing again and again so that is the advantage here right so that's the thing so next Let's uh give the table name here okay so the table is transactions table right so that's what I'm going to enter here and the query that we have to run is Select start are from transactions i ns transactions okay now let me also verify if this uh query is running in my my SQL Developer okay and let me run this query and yes it's able to fetch data from my transactions table also now going back to my talent now that I've said this let me go and edit my schema now okay so where do I have my schema option okay it's right here so I'm going to say edit schema and I'm going to call config all the different columns and Fields over here if I'm not wrong there are five different columns here invoice number description quantity customer ID and product right so I'm going to rename everything here I'm going to say invoice number description quantity what else do we have we have uh customer ID and product ID right all right so even the uh details here have been set to Ware okay now this basically means that I'm getting the data from my database and the data type of that is Ware right so as you can see from here when I click on this all my data all the fields have are of Vare 2 data type so that's why I'm setting them as Vare 2 over here and I'm going to set the default length for everything to be as uh well not for everything so for invoice number let me give a length of 10 for description I'm going to give a length of uh 100 for quantity let's say five for customer ID it's going to be 10 and for product ID again let me give a length of 10 so I want to say okay so my schema is basically set for both of these files now what I've got to do is now I have configured two inputs now I have to get these inputs into my talent now to check if my data is coming in from my data sources appropriately let me uh just check the t log row right so the component name is t log row but since I can't find it here I can also search for it here so t log row when I just put uh enter I have these options right so just capture this t log row and drop it here now you can create connections from here you can say Main and you can establish a connection over here okay and similarly I can use another tlog row here okay drag it I can paste it here and check actually no I won't do that so I can check if my data is coming to my tlog Row from here right so my component is added over here so let's say so let's just check the schema once so as you can see these are the details that I entered in my uh source and uh it's already been mapped since I gave a connection between to uh between my tlog row and my input one these fields have been mapped here also so I'll be getting my customer ID customer name phone and my email details here okay now let me just click on okay go down and uh yeah the field separator would be this pipeline I'm going to say print header print content with log 4G all right so now let me just uh save this and let me run this now you can see the button here right when I say run it would build my job basically and it should be able to import my data from there as you can see all my data is being imported to this uh let me just minimize my screen so as you can see all my uh details are here right so customer ID customer name customer phone and customer email were the four different fields that I had configured to enter into my T grou from my input and they have come they have the headers are these and the details are here so for customer ID it's th000 and the customer name is uh this one here and then his phone number is right here right so this is his name here this entire and things like that so each and every Fielder is separated by a pipeline so that's because I had uh in the component editor over here because in the component editor over here I had set the field separator to be pipeline okay so that is the thing so since my component editor has a pipeline so every field in my data that has comeing through my tlog row it has been separated by a field separator that is pipeline now if I just go back here as you can see by just clicking on run over here or over here you can get your output okay this I've just done this to show you how the data comes in okay so now that our data is coming in successfully I'm just going to uh delete this dlog row okay because we don't really need it for our uh demonstration so you can just click on this clear here to clear your uh lock screen so now that we've established our connection what we got to do is we've got to merge these two tables right so we've got to create a lookup field and I've got to make sure that my tables are joined so for that we have another uh component here called t t join and when you hit enter you have this right so under processing you have this option of T join so you can just drag and drop here and what this would do is it would basically join these two different inputs so we have the source input of this one and this one right so it would basically merge the two of them so let's first uh create one link over here to my D join and similarly let me create one over here from my input two to my tlog now if you can see here since this was the first connection that was established this took up the name of uh main okay and my second connection took up the name of lookup now that's because uh this table here this will look up to the customer ID field in this table and match the subsequent rows or entities with this particular columns right so what this uh lookup would do is it would use the customer ID field in my uh in this table and it would do a look up to this particular table and match the remaining rows and columns right so let's configure the schema for that first of all since we're joining these two we are using this T Joiner okay so T join Let's uh go to the component and edit all the other details here so we have to check this include lookup columns and output uh this means that at the output that comes out of this joint we want to also include the fields which are present in my uh lookup field right so that's why I've checked this column so besides that here it says column mapping right so we have to always map uh these two different tables using one lookup field right so it's either one lookup field or the primary key foreign key relationship so both are the same uh so that's what we got to Define here we got to Define which is the primary key which is the foreign key or which is the lookup fure okay so first of all let's go to uh let's go edit the schema here all right so we have the two different um field here we have the input field one and the input field two over here so let's add all of these uh I'm going to say control a and I'm going to move it here all right so I'm going to join all these things and from the second table I'm going to move my invoice number my uh description well I don't want to move description so I'm going to move my quantity customer ID is already there so I would then want to move my product ID okay now the reason I'm not moving description is because I don't uh need description as for my problem statement there's no need for description okay uh similarly even product ID is not really needed for me so I can actually remove this also right so I have these so these will basically be my output Fields my customer ID customer name customer phone customer email address invoice number and quantity right so these will be my output fields and now that I've set my schema even the length has been Auto picked right so this is the nice thing about Talent things are pretty simple and it's all straightforward so you can just click on okay so guys now that we've set our schema here let's add the columns that we want in our output from our lookup table okay so this is our lookup table and from here what all columns we do we want in our output so that we've got to include over here right so let me add the uh different entries so I ideally want my customer ID to be there so I'm going to say row to. customer ID uh row to here basically means this table right so and then I want my quantity to come and that I'm going to set it as row to. quantity I'm going to add another uh entry here and that's going to be for invoice number okay so I'm going to say this is row to. invoice number all right so now that we've uh done our column mapping here this is pretty uh good enough now let's go down and set our key definition so here we set our primary key and secondary key uh foreign key definition okay so let me add an entry here so the primary key is going to be customer ID and the lookup table is going to be ru. customer ID okay so with the help of these two Fields we're going to find our we're going to make our do our lookup and another Advantage here is that I can also go back to schema and edit the data type of my quantity okay uh since I'll be using quantity as my uh differentiator for my problem statement I can set the data type to be something else I can set it to be integer okay so I can do this and set the length to be either five or 10 right I can give okay so now this basically means that even though all these are in string format my quantity can be an integer and the best part about Talent is I don't need to change anything over here okay so I can get the data in from any data type the data that comes in can be of any data type okay but the data that I store or the data that I Finly export that I can change the data type of that particular data so that that's the advantage so quantity comes in the form of Vare 2 data type and I'm converting it into integer from inside Talent right so that's the thing about Advantage with Talent so I'm just going to say okay now all right so now that I've done this uh I think I'm done everything with respect to T join okay so now let's go back to my problem statement for a minute now what they say is they wish to find out only those customers who have a low number of purchases okay so it is quantity which defines how many purchases that they have right so I'm going to go back to my talent here now that I've added my T login over here I'll will be getting all my details and all my data into this component right so my uh input one would be giving me uh those rows from customers and this would be giving me those rows and Fields and records from my transactions table so I would be getting approximately 50,000 rows from here and 10,000 rows from here now now that I have all these uh rows over here what I have to do is I have to filter them based on the quantity and our problem statement here says that we have to find out those customers who have a low number of purchases and if you remember we defined that uh when if you want to find the low number of purchases it's not exactly defined so let's say we want to find those people who have less than 10 transactions or whose quantity is less than 10 right who have purchased less than 10 items so for that we can just go back to our data warehouse and create another filter component here which would filter out those people who have less than who have purchased less than 10 items per transaction right so for that let's again go back to my uh pet here and search for this component filter so as you can see there is T filter row and T filter columns okay but I want to filter rows so I'm just going to uh drag this and drop it here so t log filter okay and I want to uh connect these two right so now I'm going to give a connection here and what I'm going to do here is I'm going to define the the filter criteria okay so I'm going to add an entry here but before that let me go to my schema and as you can see I have all these entries here and the same fields are here also right and even the data type has been picked up by default that's the talent functionality so this will be the input that will come to my uh filter and this will be the output that will go right so let's just uh click on okay and now uh coming to my criteria for filtering I want to filter my quantity right so the input column that I want to uh check is that of quantity okay and the function that I would want to do is uh empty now for now let let's just leave it empty and the operator is if it's uh lower than or equal to right so if it's lower than or equal to 10 that is the value that is present in this column if it's less than or equal to 10 then I want to filter only those uh rows and I want to send only them okay so basically my uh command is also set over here all right so even my schema and my function is set over here and now what I have to do is I have to set the output for my data warehouse okay now where do I want to extract these uh results and store these this data warehouse right so let's say I want to store it in an Excel then I have to uh just go back to files over here let me just type file and hit enter and uh I have these options right I have t file input Excel T file output Excel so I'm going to choose T file output Excel and uh drop it here uh this would essentially mean that the uh results that are filtered over here in this component this would uh come and get stored in this Excel file which I'm going to create now okay so let me uh connect these two also I'm going to save filter and put it here and supposing if I want to also get a list of all those people who have more than 10 transactions then I can do that also by using this option of reject so whichever rows get rejected right whichever get rejected uh from moving to this particular file those can go to another file right so we have that functionality also that I'll I'll show you even that but before that so let me just show you how this looks like so I've said row filter T file output Excel and uh let me edit the component over here and uh let's say the file name is uh this one P whbi demo all right I'm going to say include the header so header basically is the part which would say customer ID phone number and all those things so I'm going to say include header all right and uh I'm going to say Define all columns to Auto siiz that's is just uh to Auto siiz those columns right so it's all fine now everything set and my uh file would be saved in this particular path okay and this would be the name of my file which should be stored when I run this so I've copied this path now what I've got to do is uh I need to save this job and execute this okay when I executed that it's uh showing me an error over here okay so the error it shows is that the error and the components properties type mismatch cannot convert from string to integer okay not a big deal I can fix this error okay we we can fix that by first of all checking our schema right so here if you see the schema here it says uh quantity is type integer right but over here quantity is still type string so that is what we need to change we got to change the uh type over here to integer so I'm going to say number and I'm going to click on okay fine okay so uh now let's go back to this file right and let me edit the schema and set the properties correctly okay so it's not able to convert this right so the type is string and the DB type is number so this is the problem so so I can change this also to integer right so uh integer is right here so I'm going to choose this as integer database type I'm going to click on okay and uh now let me go back to the tlog join and check the schema over here okay so this is my table and quantity this is integer this is also integer it's coming here and now it will get converted over here into this so now it shouldn't directly throw me any error now let me just save this and run this job okay building job okay it's showing an error again cannot convert integer to string okay so let's see what the error is now okay so we have a problem here invoice number right so this is the problem so invoice number it should uh be row to. invoice number so this was the type mismatch error that we were getting right so no problem okay so again it's changing right so all right I'm going to save this now and uh we w't have the type mismatch ER now okay great so it says connected and disconnected as you can see the output over here we've got around these many rows from this uh table and about 50,000 rows from the uh table here and we've got it here and then we would have probably joined these two with the help of the customer ID uh field and then to my uh t log row it would have filtered all the fields so we've got the input has basically been 8342 number of rows 8,342 rows out of which 8 41 rows got filtered and stored in this Excel file awesome right so um let's just go to this path and check if our field contents are correct so this is the path where the file is stored right so I'm going to copy this path I'm going to go back here and paste this path so this was the file that we created right and yes as you can see we've got customer ID customer name customer phone the email address invoice number and the quantity okay and the best part is the quantity here is sorted and it's only that Which is less than 10 right so that was the criteria that we had set we had filtered for those fields to be only less than 10 and uh to show you further proof what I can do is I can uh do this I'll add a filter okay and if you can see here the only options here are 1 to 10 okay that means nobody whose quantity was 90 or 100 plus has been added over here right so so this is how simple Talent is and this is what uh data warehousing is in Basics right so this is a very simple example that I've showed you where uh you can get data in your data warehouse by filtering some data from the database and probably you can use this data to make your analysis or visualization on tblue or click view or any other data visualization tool okay but there was another thing that I promised you right I uh told you that you will have a filtered row and you will have another uh reject Ed row so if this was the filtered column then I can probably have another Excel here I can say t file output Excel so okay so T file output Excel I'm going to paste it here and what I'm going to do is I'm going to start one I'm going to say reject and here so the rejected uh fields or rows and columns are going to go into this particular field okay so I'm going to name this also as rejected data all right so this is also going to be present in the same path and uh let me edit the uh I'm going to say Define all columns size I'm going to say include header and yep I'm going to save this and run it again when I run it again I'll have the same uh list of fields over here right I have the same rows over here right so as you can see the same 841 rows have been added to this file okay the filtered results are here and the remaining ones which were rejected they have gone to my new file and this file is called the rejected one so if I go back to that uh same path this was the original one and reject data is here so when I open here you can see that the only options here are those of above 10 okay so if I just uh show you with a filter column right so you can see that there is nothing less than 10 so that's the thing so that's how you filter data that's how you get uh data in and uh all these things so that's how you work with Talent guys right so you get data so the same thing here you can also store it to some database right so instead of uh having a file to Excel for this I can just delete this and uh I can just delete this and uh let me go to databases first of all all right so here I'm going to say Oracle and uh T Oracle output okay I'm going to paste it here and now if I I'm going to say reject or yeah reject right so all these rejected entries would go into this particular file of mine so let me add the components and edit the components here the host details here it's Local Host okay Port is 1521 database is uh XE right yes uh it is xe and my username is Ban my password is edura all right so and then you have the action on table right so the action is going to be create table okay so when we say create table we've got to give the table name also and the table name let's say rejected output okay well I'm guting there's nothing by that name yep there is nothing here by uh that name so so we have customers final table and transactions right so we will have a new table that will be formed over there so with the action is uh going to be insert supposing you want to insert or update an existing file then you can choose the various options but in our case it would be create a new table and the action would be insert okay so let me just uh save this okay and run this now so it seems like my new table has been created here let me just go go back to my SQL here Oracle table and we need to refresh this yes so as you can see there's a new uh table that's been created and when you see here these are the different fields error message was another field that gets autogenerated and if you look at the contents of the field you have all these right so guys that's how you get your data back into your datab base right so earlier I showed you how to do it uh and Stor it in an Excel file now I've showed you how to create new table and uh how to get your data warehouse uh data back into your uh database so that's the end of my demonstration all my uh queries have been passed successfully now let me just quickly go back to my slides for a minute and yeah this Hands-On is what I uh completed showed you how to get data into your data warehouse so yeah so I think that brings us to the end of the session and let me just summarize whatever I did in a minute right so first of all I spoke about uh the need for business intelligence and then I told you what is business intelligence why you need it for your business and what is the role of data warehouse and then I spoke about data warehousing right I G went into the depth of data warehousing and spoke about the key terminologies in data warehouse uh we have the ETL we have data Mars we have the meta data and all these things and then finally we spoke about the architecture of the data warehouse and we finish off the session with the Hands-On demonstration of how to populate your data warehouse using Talent so guys that's it and uh thank you for being in the session that brings us to the end of the session today probably we'll have another session on data arousing and I'll talk about more advanced concepts like schemas right I spoke about the three different types of schemas right like Star schema snowflake schema Galaxy schema all these things so I will talk about all these things in my uh next session and probably also I will talk about dimensions and fact tables and all these things right so doesn't seem like you guys have any doubts so if you do have any doubts please put that in the chat box right okay so thank you guys thank you for being in the session probably I'll see you all until next time okay see you I hope you enjoyed listening to this video please be kind enough to like it and you can comment any of your doubts and queries and we will reply to them at the earliest do look out for more videos in our playlist And subscribe to our Eda channel to learn more happy learning