Transcript for:
MySQL Database Tutorial for Data Analysis

Last video we talked about AIMS Grid and data discovery. In this video, we will look at our MySQL database, which is owned by falcon team and the data master team will be doing analysis on that. So now both the teams are having a discussion and falcons have given access to MySQL database to data masters and data masters are now accessing that database and they will run some SQL queries. If you want to build a career as a data analyst having SQL knowledge is must. So in this video you don't need any prerequisite for SQL knowledge. You can just start from you know very basic and we will look at different tables, their relationships, will run simple query select where inner join etc., and we'll do simple data analysis. So I will show you that even if you don't use tools such as tableau or Power BI, data analyst often they do a simple analysis using SQL itself, okay. So SQL itself is a data analysis tool and we'll look into that analysis and later in the in the next video we'll see how using tableau can provide some extra benefits or simple SQL queries. So let's get started and I'm going to now link a video that I did for the other sales inside Power BI series, the SQL analysis part is the same. So I'm just going to attach that particular video here and if I ever mention Power BI in that video just assume that it's actually tableau okay. So we'll we'll get started and I have a complete database down so we'll download the database dump, run SQL queries and it's gonna be a lot of fun, so let's begin! Alright the first thing is let's install MySQL on your local computer. I'm going to provide a link of this great video which goes through the installation instructions, so just follow the video. It is linked in the video description below and install MySQL. So when you install MySQL you are installing MySQL server as well as MySQL workbench. So MySQL workbench will allow you to see the tables and run the SQL queries. After you have installed it click on MySQL wash workbench and launch it. So when you launch it it looks something like this. Now you can create a new connection and I will call it dev let's say sales insights, okay. Keep other values default as it is. My username password is root root, and test connection and it looks good, I just press ok. So it creates this new connection and now you click here. So now you are going into that connection. If you want to check the server status you can click on administration and check server status. Here it says server status running. If you are facing issues in running SQL servers just google it figure it out. I don't want to make this like a full-fledged SQL tutorials, there are so many tutorials out there. So just figure it out and I'm assuming that now you are at a stage, where your SQL server is running and you have in the schemas, there is a default schema called SYS. Now I have given the entire dump of our sales insights database on my github page. So follow uh I have the link in the video description below. So use that link and download this file called db underscore dump dot SQL. So I downloaded that file here db underscore dump dot SQL okay. Many times people don't know how to clone things from github you have to just go to root repository and click on the green clone button and that way you can download the file. So assuming you have downloaded this file, now what I'm going to do from workbench is click on tools or rather server and say data import. In the data import, click this option and here go to the location. So my location of the dump file is this, so select that and what this will do is this will create the entire database along with the records in your system. Okay, so this is a self-contained SQL file I created database and exported the entire database to SQL file for you. So now all you have to do is just go here and import it. Then you can click on this button start import, and it is importing so here it says import is running. Looks like it finished without an error so now you can click on this refresh icon and you will see this sales database created. You'll see a couple of tables here now. Let's look at some records from customers table. So you can right click and select this and you will see this records. It might take some time to get used to this user interface but don't worry just play with it. This is actually a very simple interface. If you want to export this data to excel file you can click here on excel import, you know so it you can export the whole customer table as a csv file then you can do that using this option. Also you can run our different queries here. So here when I uh what it is doing is, it is printing all the records from customer table. Now if you don't know much about SQL and relational database, I will suggest that you do this course on Khan Academy it is for free and that SQL course will teach you the fundamentals of relational database, what are tables, columns, foreign key, primary key and so on. In this table I have three columns customer code, name and customer type. These are all the clients of AtliQ Hardware, okay. Let's look at transactions table, because that's the most important one. Transaction table contains all my transactions, so you can see columns such as product code, customer code, market code, order date, sales quantity, sales amount, and currency. Now let's look at products and markets table really quickly. So here you can right click and click on products table. So these are you know like there are no fancy products name, so we have just rename name this product one two three etc, and if you look at the markets table, the markets table has the market code, the name of the city. So AtliQ Hardware is doing business in India, so market's name would be one of the cities in India. The zone would be either south, central, or north. Now you can see some records here; New York and Paris and there is no zone. So looks like this company did some business with New York and Paris maybe one time business and that's why the software engineers might have added these records here, but these records are not useful right now. Because right now the company is doing business only in India. When you're dealing with relational databases you will find so many garbage, you know because these databases have been in use for so many years. So there will be so much garbage and that's where the main challenge comes in. You have to do data cleaning as a data analyst and deal with cases like this. Okay! You might have noticed in transactions table I had this, which is sales amount is negative. Now that cannot be negative. So, I had some uh messy data here as well. Also few transactions are in USD if you want to do let's say, some analysis you have to uh convert this USD into INR. Otherwise if you are trying to find out the total sum of revenue it becomes really difficult, you cannot add this 500 with this one. So you will be facing a lot of these challenges. So let's do some primary analysis uh of our database. In transactions we see some records here. I want to see how many total records are there. For that you can use this query select count star from sales transaction okay, and when you click here it will give you the total count. You can see there is 150000 records. So this is not some dummy database it's some serious stuff going on 150000 sales transaction. Similarly if you want to look at number of records in customer table you can say select count star from sales, sales is the name of the schema by the way it's here and transaction is the name of the table. So similarly if I do customers you know there are 38 customers. Okay so this looks good so far. Now I want to show the transactions only from Chennai. So if you look at Chennai, the market code is mark 001. Okay so let's first again print some transaction record, so you can say select star from sales transactions limit five. When you do limit five it will only print the first five records. Okay and there is a market code here. So if you want to print only market code 001, there is a where clause. So you can say where market code, basically give me all the records where market code is Mark001, and when you execute this you know you get all these records. Again if you want to do count you can you can see how many transactions were performed in Chennai. So we are already doing now data analysis. We are generating insights from our data by using SQL. That's why SQL is one of the tools that you need to know as a data analyst. If your business manager asks you how many transactions we did in Chennai in total, you can go to SQL workbench run this query and get the answer immediately. If he wants to know all the transactions, you know he'll be like okay give me the dump of all the transactions in Chennai. Then you can run this query click on export here and export the result to CSV and give it to your business manager. So you realize how SQL can help you with your data analysis. Now I saw this USD currency, I want to know how many transactions have USD currency. So it is simple, again it's a very close you can say currency equal to USD and you will find there are luckily only two transactions. Somehow two transactions have USD. So when we'll build our Power BI dashboard we will convert this USD value into INR. Okay the next thing is I want to show transactions in 2020 joined by date table. So what does that mean? Well, when I do this okay see I'm printing sales transaction here. Okay fine I get these records but I want to know how many transactions were performed in 2020. Let's say I'm interested in particular year. One thing you can do is uh get the year out of this column, or if you notice we have this date table. So this date table is very important actually. So here so date table has a date and it will tell you what is the given year for that particular date. So we can do now and inner join with this table and we can figure it out. There is a outer join as well, there is inner join, and then there is a left and you know right join. So you can read about all those joins online, it's not a rocket science you can refer to some materials and you can get an understanding easily. So let's perform our join now. So what you will do is you will say select sales dot transactions dot star. dot star means uh print all the columns, then you want to print all the columns from date table as well. Right so just printing columns from both the tables. Now you want to join those two tables together, so how do you join it? So you will say from sales dot transactions you know from sales to transactions inner join sales dot date on say okay inner join sales dot date dot date, because sales is the name of the schema date is the name of the table and date is the name of the column here. Date on sales dot transactions dot order date is equal to sales dot date dot date. Oh so in you are doing inner join with sales.date table, so that was correct. So now you print uh more columns so these two tables are joined using the date columns. So now when I'm looking at this transaction you know the good thing is I also know the year, here see and what I can do now is I can print all the transaction in 2020 by saying sales dot date year equal to 2020. So when i do this now it will show me only the transaction from year 2020. You see there is only 2020 here. If you 2019 it will show you 2019 transactions. see 2019 and in 2020 now I want to, let's say I want to know the total revenue in year 2020 or total sales. So the way I can do that is uh I can just print sum, so I can say sum sales dot transactions dot what is the amount, well sales amount right, sales amount. Okay now I know we have USD thing going on but that is not in year 2020. So you don't have to worry about that currency but if there was a USD currency then you have to chnage the query a little bit, but when you execute this now I got this number that there was 152 Million rupees revenue in 2020. Similarly if you do 2019, it will tell me 2019 had this much revenue. So you see the revenues are I think declining, because previous year it was 336 Million now it is I think in 2020 the revenue is 142 Million. So this way you can get uh the aggregation data, the aggregation insights. You know you can do sums, you can do average, anytime you are facing issues with SQL just just google it. Google is your friend in SQL, you can say SQL average column and you know there is so much help available in Google. See they will show you simple table and give you all the syntax. So do not worry about this SQL syntax that much. It is actually very very simple. Alright, so now we have sum, what we want is now we want the sum in Chennai. So I want to know how much business I did in Chennai. Okay, so what we can do is now we can join uh we can join use we can do market code basically. Okay so we can say where sales.date is equal to this, okay and so in the where clause you can put And and sales dot transactions dot market code is equal to you know for Chennai the market code is 001 so if you look at market code here so it's in a market 001, that's why. Okay, so where is my thing and this is now showing me the revenue in Chennai which is one two three one two three two point four Million. Uh you can also show the distinct products. For example if you want to know the distinct products that you sold in Chennai, then you can print this queries select distinct product from that and you know this will show you the uh distinct products, list of products that you sold in Chennai so far. So that's all I have for this video you can run more queries, and perform more in-depth data analysis using SQL. I have given the dump file on my github page, so if you look at the video description below there is a readme file link. If you go to that link I will have instructions on how you can download this SQL dump so that you can initialize your database. also I have given all the queries that I ran in this video on that page. So try those queries out and try to perform some analysis and try to generate some insights just using SQL. In the next video, we will use Power BI and Power BI will be connected to the same SQL database and we will do some data cleaning, because you saw some USD columns, there were some negative values in sales and all of that. So in Power BI we will initialize the model in the next video and we'll do data cleaning. I hope you're liking this series so far, if you are then please give it a thumbs up!