you are in the right place to learn to become a data analyst in this massive boot camp Alex the analyst will cover all the core topics that data analysts need to know and along the way you'll build plenty of projects to gain hands-on experience hello everybody my name is Alex freeberg better known as Alex the analyst on YouTube and in this video you're going to be taking my entire data analyst boot camp this boot camp is comprised of videos that I've made over the past 3 years and they cover a lot of different topics like SQL Excel powerbi tableau and python throughout the boot camp there are a lot of Hands-On guided projects that will really help you learn these skills well and speaking of projects there's an entire Part near the end where you can build a free portfolio website where you can put all of your projects on so that hiring managers and recruiters can go and look at all these projects that you've built if you wanted to go even more in depth into the skills that we learn in this boot camp I have a data analytics learning platform called analyst Builder analy Builder was designed specifically for data analyst so all of the courses and all the content are just for you and it has a coding section where you can learn and practice for technical interviews and lastly before we jump into the boot camp I want to give a huge shout out to free code camp for putting this all together personally learned a ton from free code camp and so I'm really honored that my boot camp is going to be here for you guys to learn and I really hope you enjoy it what's going on everybody it is 2023 and in this video I'm going to help you become a data [Music] analyst we're going to start at the very beginning assuming you haven't started this process at all of becoming a data analyst if you already have you can kind of find IDE identify where you are in this process and then go from there now before we dive into everything I want to warn you I will be mentioning my own channel a lot in this video I have videos and playlists on just about every single topic that we're going to be talking about today I'll have all the links to those videos in the description so you can dive into those topics more in depth so I hope that's okay and it's all completely free I've been building this out for the past 3 years and honestly you can probably get 90% of the way to learning everything you need for data analytics just on my channel so now that I've warned you let's J been of number one and that is learn the data analyst skills now there are literally a hundred different things that you can learn for data analytics you can learn things like alter X or a cloud platform or different programming languages but there are some core skills that I recommend you start out with before kind of branching into some of those other skills the number one skill that I always recommend people start with is SQL SQL is just one of those fundamental skills I think everybody should learn even if you don't use SQL you'll use some variation of SQL if your company has a large enough data set SQL is used to actually query and retrieve data from a database so if your company collects data which every company does they're going to put it somewhere to store it's usually stored in a database and sqls how you get that data from the database I think SQL is also fairly easy to learn which makes it really good when you're just starting out I have several playlists dedicated to SQL starting from beginner all the way to Advanced and you can learn all of that for free one other reason why I think you should learn SQL first is that a lot of companies interview or have a technical interview during the interview process on SQL that's something that really caught me off guard when I was first starting out out because I thought it was going to be more behavioral I didn't even know what a technical interview was so knowing SQL actually became a really important part of interviewing and getting a job as a data analyst the second skill that I would learn is a business intelligence tool like Tableau or powerbi now there are a ton of different bi tools I can literally name 10 off the top of my head that I've used throughout my career but what I will say is that learning something like Tableau or powerbi is pretty transferable to almost all those other bi tools they're all fairly similar and how they do things and how they show display the data you most likely won't have a technical interview asking you about Tableau or powerbi like to build something for them that usually does not happen but the combination of SQL where you can query your data and then taking that data to build something that is a really really great combination to learn right away I have entire series on both Tableau and powerbi with projects on my channel the third skill that I would learn is Excel now most people have used Excel they know what Excel is and how it's used but it can be used a little bit differently for a data analyst for example example in Excel a lot of people haven't cleaned data in Excel or built charts and graphs using Excel and those are things that data analysts would probably do excel is also just a fundamental skill that every company is going to expect you to know so I have an entire playlist dedicated to excel to actually walk you through how to use it for data analysis the fourth skill that I recommend you learn is python now a lot of people will have python higher up on their list they only use Python they don't use SQL or a bi tool they just do everything in Python now python is a fantastic tool you can use it to manipulate your data to create data visualizations and a ton more like web scraping and regular expression and a hundred different other things but it can be kind of hard to learn it took me a long time to really learn the basics very well that's really the only reason why it is farther back I feel like SQL and a bi tool are really easy to learn and really pack a big punch whereas python can be quite tough to learn in my experience and you may not use it as often as you would something like SQL or a bi tool if you're interested in learning py python I have an entire series dedicated to python as well as projects that you can build again I warned you there's going to be a lot of self-promotion in this video I have videos on just about every single one of these topics the fifth and the last skill that I recommend you learning and this is the only one that I don't have a series on yet I will make those is learning a cloud platform like AWS Google Cloud platform or Azure there's no denying that these platforms have played a huge impact in how we use data as a whole in the data analyst industry they can be kind of tough to learn though if you aren't using it Hands-On in an actual job I think that learning a cloud platform is already something that most people should start working towards because in the future it's only going to become more prevalent now where can you go and actually learn all of these skills that you need to become a data analyst well the number one place I'd recommend of course is my channel I have free tutorials on all these skills and a lot of other topics and I think it's just a really great place to start the next place that I recommend you looking at is udemy I recommend udemy especially if you're just starting out because it's pretty pretty cheap you can buy an entire course entire SQL course for $10 or $15 and they have courses on every single one of these skills and I just recently made a video called DIY data analysts curriculum using udemy for under $75 so you can create an entire curriculum to learn all of these skills for under $75 which is just amazing the next place I'm going to recommend you look is corsera now udemy is fantastic they have really good instructors and good courses but as a whole I find that sometimes corsera just has more professional or better content corsera is a bit more expensive though you're looking at $59 per month for all of their courses or you can pay upfront an annual fee of $399 so again it's just a lot more expensive I moved to corsera once I started having a data analyst job and had a bit more money but when I was first starting out I just couldn't afford it so I went to udemy and it was a really great place to start there's also places like data camp and data Quest that kind of gamify learning and they're more text based so all these other platforms udem me corsera and me they're all video based but if you like reading data camp and data Quest are a lot more of text where you can learn it by reading it and doing it after you learn all of these skills the next thing that I recommend you do is actually build projects with those skills now what is building a project actually mean it means taking a skill and then building something out of it that you can then show a potential employer for example if you went through and learn Tableau you go and take a data set and you could build a visualization and a dashboard in tableau and that would be a project with these projects you can build something called a portfolio and I usually call it a portfolio website a portfolio website is a website that you create where you store all of your projects and then you can share that with recruiters and hiring managers so that they can see all of your work now do you absolutely need a portfolio to show employers no you don't but it does help in two different ways the first thing that it may do is actually help you land the interview if you have a link on your resume and they click on it they may see your skills and see your projects and be like man this person really knows what they're doing this is exactly what we need the second reason that I recommend building projects is because most likely during your interview you're going to get asked questions like how have you used SQL how have you used Tableau and if you don't have any experience in that you're just going to say well you know I've taken courses to learn it but with a project you can be a lot more specific you'll be able to say well I actually just built out this project in Tableau I took the data and cleaned it in Excel and then I put it in Tableau and built out this Dash board and here are the insights that I found from this data set it's just a much better answer and as a hiring manager myself I can tell you that it is definitely beneficial to build out these projects The Next Step that I recommend you take in becoming a data analyst is building a data analyst resume the resume to say the least is extremely important it's what's going to actually allow you to land an interview to potentially get a job now if you were like me when I was first starting out I had a resume it just had nothing to do with data analytics so how do you make a data analyst resume if you don't have any experience as a data analyst well you are asking the perfect questions because the very first things that we talked about are what are going to go on your resume those skills and those projects if you have no experience or degree like myself who has a recreational therapy degree if you have no background in this it can be really daunting to kind of display that you know what you're doing and that a company should hire you so what I usually recommend is right beneath your contact at the top you put your skills and your projects that you built out on your resume things like work experience and education should go on your resume as well but just a little bit lower you want them to see those things before they see that your last work experience was at Domino's and you have a degree in Marine Biology it's just not relevant to data analysis and if you put those things at the top they're probably going to rule you out right away the fourth step to become a data analyst is actually applying you have the skills you have the projects you have the resume now you're ready to start applying for those data analyst jobs now there's there's a lot of different opinions on how you need to go about applying for data analyst jobs but I'll give you my take on it and this has been the most successful for me in my career the first thing I want to mention is actually what I would not do which is just blindly apply on glass door monster zip recruiter and all these other platforms to just any data analyst job that you can find now I'm not against this I think you should do that but I don't think that's the only thing that you should do because the chances of you getting a call back or actually hearing something back are extremely low to really increase your chance of becoming a data analyst I highly highly highly recommend working with a recruiter a recruiter is literally someone who is there to help you find a job now when I first started out I didn't understand what a technical recruiter was at all I was kind of nervous or scared to work with him but it's actually pretty simple a company has a position that they want to fill and they don't want to spend hours and hours and hours to find someone to fill that position so they hire a recruiter a recruiter is going to go out and try to find someone to fill that position AKA you and so if you go into talk to that recruiter and they have a position that opens up they will help you get that interview and then if you get a job let's say for $50,000 the company is going to pay that recruiter let's say 10% of your salary so they'll give them $5,000 so you don't actually lose or have anything to lose using a recruiter you can reach out to Recruiters in several ways and I've done every variation but I'll tell you my most successful way which was using LinkedIn there are tens of thousands of Recruiters on LinkedIn I made an entire video of how you can reach out to recruiters and what to St Recruiters on LinkedIn to help you land a job so be sure to check out that video when you actually get to that point but you can also just cold email and cold call these recruiting companies but to me it's just not as effective as reaching out directly on LinkedIn and this is just a bonus one the last thing that you need to do is accept a job offer so on step number four after you apply to those jobs you do actually have to go in interview and then get a job offer which you will accept I just thought I'd mentioned that just in case that was not super clear now that was a lot of stuff let's talk about time frames to actually complete all of these things now doing all of these things from scratch is going to take a while but let's break it down by each step and see how long I generally think it's going to take let's start with step number one which is actually learning the skills now just to be up front this one probably is going to take the longest for most people for most people to learn all of these skills it's going to take around 3 to four months now if you don't learn a cloud platform and python which are the last ones that I recommend and you just focus on SQL a to in Excel I think you can do that in under 3 months that is very dependent though on how much time you have to study that time frame is more for someone who has several hours per day maybe 3 hours in the end of a night after you go to work that is someone who has quite a bit of time to dedicate to learning during their week of course that time frame is going to take longer if you don't have as much time to dedicate to learning now let's look at number two which was creating projects and a portfolio of projects from my experience when you're first starting out it takes a lot longer to actually create these projects it can take one one or two weeks per project I usually recommend people doing three to five projects in their portfolio before they start applying and since they can take anywhere from 1 to two weeks you're looking at anywhere from 3 to 6 weeks The Next Step was to create a data analyst resume now in my opinion this one should take the shortest out of every single step here because you're really just kind of reformatting a resume or creating a resume you're just adding skills you're adding your projects and then kind of reformatting it to make it look nice this should hopefully take under a week but if you use something like a professional service so they help you build a resume it could take one to two weeks the two last steps which kind of go hand inand are step four and five which is actually applying for jobs and then Landing a job now this process can take as little as a month or it can take as long as 6 months or a year it really depends on how you're applying where you're applying and just the kind of luck that you're having with actually Landing interviews I've seen people who have never had any experience land a job within a month of starting to apply and it's incredible it's amazing but it doesn't happen too often you're usually looking at around 2 to 4 months on average to land your first data analyst job if you put all of those together and kind of average everything out you're looking at around 6 months total for the entire process now I don't want that to discourage you okay 2023 is a long year you have a lot of time and it doesn't have to take 6 months you could do it faster you could do it in three months and just prove me wrong but if you are really focused and you are really driven to become a data analyst this year I know that you can do it now to maybe boost your spirits and make you feel a little bit better I didn't know any of these things when I first started out I didn't have anyone telling me kind of a plan on what to do I had to go out and figure all these things out by myself and it took me almost a year to land my first real data analyst job so with all that being said I hope that this video is helpful I hope you now have a path on how to become a data analyst this year and that my channel can be a big part of that so thank you guys so much for watching I really appreciate it if you like this video be sure to like And subscribe below and I'll see you in the next video [Music] what's going on everybody my name is Alex freeberg and in today's video we're going to be starting our basics of SQL series now in this series we're going to be going over everything you need just to get started and then in future videos we're going to be going over some intermediate Concepts and some more advanced concepts and then in the final series we're going to be going over some portfolio projects in this video in particular we're going to be downloading SQL Server Studio we're going to be creating our tables inserting data into our tables and in future videos we're going to actually learn how to query those tables if you already have SQL Server management Studio downloaded you can skip ahead to where we actually create the tables and insert the data into the tables if you don't care about that at all and you're just looking at a query I would skip to the next video where we actually start quering the data that we inserted into those tables so to download SQL Server management Studio we actually have to download two things and I have both links right here I'm going to leave those in the descriptions that you guys have those but this one is to actually download SQL Server management studio so let's go down here I actually deleted it off my computer so I can walk through this with you guys so we're going to download that let's also go over here this is actually a server so we have to download a SQL server and if you go down right here there's a free version now I don't need the developer version I'm just going to download the express version it's actually smaller so let's download that as well now once this is done running we're going to open it up and I'll show you what to do next so it just finished running let's click on it all right so we need to install it we're going to click yes and this is going to take a little while so this popped up I clicked install and it's been running for the past couple minutes apparently I was not recording so I apologize for that but that's all I did so now it's been installed I'm actually going to pull it up right here and let's open it up now when it pulls up it's going to ask you to connect to a server and that's why we downloaded the SQL Express server so let's connect to that and there you go it's as easy as that so now we have SQL Server management Studio set up and we are good to go so the first thing that we need to do is actually create a database so let's go over here to databases and let's click new database and let's just do SQL tutorial keep it simple and if we click that it's going to create our database for us now when you open up the database there's going to be a lot of stuff you really do not need to know all this really what we're going to be sticking to is this tables right here uh as of right now we do not have any tables so we need to create tables now there's two ways that you can do that you can click right here and you can go to new and create table we're not actually going to do that we're going to create it using a script or a t-sql so we're going to go over here and do new query and we will get started on actually creating uh the two tables that we're going to be using for all the stuff going forward all right so let's get rid of me CU you really don't need to be seeing me anymore let's get started by doing our very first table which is going to be our employee demographics table so let's start off by saying create table and we have to name it so let's do employee demographics and enter down we want to do an open parenthesis now we need to specify what our column names are going to be and what the data type is for each column so let's start off with employee ID and we want that to be an integer so that'll be like 1 2 3 4 uh anything numeric now we want to do first name and let's make that varar 50 if you don't know what these data types are that's okay uh that will probably be covered in a different video that's not really necessary for this video uh let's do last name we'll also make that varar 50 let's do age make that an integer and very last let's do gender and we will make that varar 50 as well so now we have our very first table let's run that and we'll see if it works we'll go over here we'll refresh our tables and there you go so we have our very first table let's go up here let's get rid of this one and now let's create our second table so we're going to do basically the exact same thing but we're going to have a little bit different information in it this is going to be our employee salary table so let's do create table and again we need to name it and enter and open parenthesis so now we're going to do the same thing we're going to do employee ID let's make that an integer now we want the job title because we want to know what they do and this one is going to be varar 50 because we keep it pretty simple whoops and then for our very last one we're going to do salary and that will be integer as well and I'll just do PR here so let's create this table let's see if it is there and there we go so let's open up one of these tables really quick see what's in there see what it looks like as you can see we do not have any information in there uh when you create a new table sometimes when you open it up you're going to see this if you want to get rid of that you just need to do a I think it's called A Hard refresh or something like that but you can do control shift R let's see if it works for me I just did it all right it goes away so now it recognizes it as a table so we're good there let's go back here and let's get rid of all this we've already created our tables now we want to insert the data into our tables so let's see what that looks like let's do insert into and now we need to specify what table we're inserting our data into so let's start off with employee demographics let's do values so now we have to select what values we're going to put into um into this table so now we're going to have to do the employee ID so let's do 101 then we're do first name so let's do Jim last name Halpert and then his age let's say he's 30 and he is a male now just for fun let's execute that let's go back to this table right here and execute and as you can see all of our information actually went in there so now we have his employee ID his first name his last name age and gender now we need a lot more information uh for this table in order to actually learn a lot of the concepts of quering the table so I'm actually going to go through and add a ton more information I'm not going to bore you through that but I will show you the final product before I actually hit execute so stick with me I'm actually just going to cut to the end where I insert all my stuff down on here and then if you want that I'll probably leave it in the description or maybe put in my GitHub or something so you can easily just go copy and paste that if that's what you want to do so I'll see you in a few seconds all right so I have all my values right here I actually going to take this one out cu I already did that one but this is our additional information let's insert that into our table real quick and go back here and take a look at it and there you go this is going to be our core information that we are querying off of uh in future videos so that table is completely finished let's go back here we're going to get rid of this because now we want to insert our information to our other table so let's do insert into and let's do employee and now we're going to do salary so let's do values to specify that we're inserting values into there and in this one we have employee ID so again let's do in th1 that's gym his job title is salesman and let's say his salary is $45,000 and let's execute that and you can't see it but down here it says it's done let's go to that table and as you can see that is inserted I'm going to do the exact same thing as I did before I am going to fill out all these and in a second it will be done uh on your side and then again I will leave it in the description or I'm going to put it on my GitHub and you guys can just copy and paste that if that's what you want to do or you can write it out whatever you want to do all right just like before I'm going to get rid of this first one that is Jim he is already done now let's insert this information Ed is finished let's go back here and there we go now we have both of our tables and we are good to go for future videos so thank you so much for sticking all the way through this one in the next video we're going to actually begin uh quering the table and learning the select the from the where the group by and the order by statement everything is in these upcoming videos so stick around and we will learn all of that together thank you so much for joining me if you like this type of content be sure to subscribe below and I'll see you in the next video what is going on everybody my name is Alex freeberg and in today's video we're going to be going over the select and the from statement so if you joined us for our last video we went over creating our tables and inserting data into those tables and so we have this employee demographics table and we also have this employee salary table and today we're going to be walking through the select statement in the fir statement on these tables so here are some of the concepts that we're going to be going over today let's just get it started by doing select everything and let's do this from the employee demographics table so let's execute this if we wanted to only show the first names we can just do first name and run that and if we want first name and last name we can just separate that by using a comma and it will return those well if we want to return all columns and all rows then all we have to do is use this star so that's what the star does now we have nine rows of data here and if we only wanted to return let's say the top five we can easily do that and we can just say top five of everything now the reason this could be useful is say you have a table that has millions of rows in it and you only want a small sample you can say select top 1,000 and when you do that it will only select the top five rows now let's get everything back in here really quick because we're going to move on to this distinct feature so when we use distinct we're actually saying that we want the unique values in a specific column so if we say distinct and then let's do employee ID D everything should be returned so all nine rows should be returned and that's because every single one of these are unique now let's try gender so there's only going to be two results the male and the female and that's because there's only two distinct values in that column now let's look at all of our data again so now we want to look at count now count is very simple all is going to do is going to show us all the non null values in a column so let's look at last name for example if we do count of last name all that's going to give us is a count of nine because we have nine last names if for whatever reason somebody's last name was left out and that was null then it would have returned maybe eight or seven depending on how many were actually in there so if an entire column was null we it would be a Return To Zero and if you notice we are not given a column name that's because this is derived information based off the last name so if we want to actually give this a name so that that column does not say no column name we can use this as right here so once you put as you can actually name it so since this is the count of the last name we'll write last name count keep it simple and if we execute that as you can see we have last name count right there so that's how you use that as let's look at all of our data again we want to look at some Max mins and averages right now and the only column here where it would be useful to do it on is age but let's actually go over and let's look at our salary table and at our salary table we have some really interesting salaries that I think would be a little bit more useful for this information so let's go over to employee salary all all right and let's look at this table really quick so we have our salary now we want to look at the maximum salary that is in uh that column and that is going to be $65,000 now let's say we wanted to know what the minimum salary was let's execute this and the person who makes the least money is making $36,000 now what's the average what is the average salary for all employees that's going to be $ 48,5 so so super easy to use all of these things they're extremely useful I use them every single day so I know that each of these are very very useful and are definitely among the basics that you have to know let's look real quick at everything really quick so we just learned the select statement but learning this from statement really quick is also important up here this actually shows us that we're already Hitting off the SQL tutorial database but let's say we change it to master when we try to run this it's going to give us an error and that's because now we're hitting off this database and this database does not have this table in it so in order to do this in order to still hit off that table while up here we're actually hitting off a different table we can change this information so the from statement you have to specify three separate things the first thing that you need to specify is the database so let's say we want to hit off the SQL tutorial database now we want to select what table we're going to do this is actually a dbo so let's put dbo there's there's a lot that can go into that um it's not worth getting into now but dbo do and let's do employee salary when we execute this our information comes up even though up here we're still hitting off the master database when we specify it right here then we actually are choosing what database and what table a hit off of and so it does not matter what it is up here so that's how you use the from statement in the next video we're going to be going over the wear statement and then after that the group by and order by statement and that will be the complete basics of SQL tutorial and then we'll start getting into a little bit more fun stuff some more advanced concepts which I think it be really really exciting for everybody to learn thank you guys so much for joining me I really appreciate I hope this has been helpful if you like this type of content subscribe below and I'll see you in the next video thanks and goodbye what's going on everybody my name is Alex freeberg and in this video we're going to be going over the we statement and SQL in the very first video we created our table inserted data into our table in the second video we went over the select and the from statement and now we are on to the wear statements now what does the wear statement do it helps limit the amount of data and specify what data you want returned we have quite a few Concepts that we're going to be covering today let's just start out with something really easy let's do where first name equals gym really simple so we're selecting everything where our first name equals gym and this is our output so really really simple now let's try where it does not equal this right here says does not equal gym and let's execute that and as you can see we have everybody except Jim Halbert in there so now let's look at the greater than or less than so in this table I think the one that we're going to look at is age so let's look at age and let's do where it's greater than 30 and when we execute that we're going to get everyone who is over the age of 30 now as you can see we're not including people who are 30 years old if we want to include people who actually are 30 years old we're going to add the equal sign right there so we should be seeing people who are now 30 so before Pam and Jim were not in there and now they are if we do the exact same thing let's do less than 32 here's everyone that's going to be included but if we want to include the people who are 32y old then we are just going to add that equal sign and now the people who are 32 years old like Toby and Meredith are now included if we want to go even further we want people who are less than or equal than 32 and who are male we can say where gender equals male so now we have two two things that we are specifying that we need we need somebody whose age is less than 32 and we need their gender to be male so let's execute that and we have four people who meet that criteria so that's what the and statement does if we write or then only one of these criteria has to be correct in order for it to be met so if we hit execute now we're saying anybody who's under the age or equal to 32 or their gender equals male so if we look down here Michael Scott is actually 35 years old so he's over 32 but since he is male he is now included let's get rid of everything really quick I want to look at this like really quick so let's execute just that and if you do that you highlight just that hit execute then it uh will only run what you have highlighted so now let's look at this whole table now when you're using like you typically are doing this for sometimes numerical but most of the time you're using it for text information so if we're looking at this right here if I'm looking at last names and let's say I want everybody whose last name starts with s you can't really do that with anything else so I'm going to say where it's like and then I'm going to say s and after that I'm going to put a percent sign that's actually called a wild card and if I close that off what this is saying is is I want every last name where it starts with where it's like where it only starts with an S so let's run this really quick now we have two people whose last names start with s now if I put a wild card at the beginning we are now saying where there's an S anywhere in anybody's name so let's execute this and see what we get so now even if the S is like flenderson towards the end it's still counts so you can specify multiple things in here as well so let's say I want it to start with s that would return shre and Scott but now I want something that also has an o in it so so it has an S at the beginning and then somewhere in there there's an O now let's execute that and there's only one person that meets that criteria so you can do that for multiple things you can even say OT TT and let's execute that and he's still going to be returned and if we put C at the back it's not going to be returned because it follows it in order so isn't s o TT C the C would actually need to go over here so now we have s c o t t and although there's a bunch of wild cards in here it is going to return Scott so that is a little bit a little hint at how you can use like there is a little bit more that goes into it you can use it for numerics um there's a lot of things that you can use this for but this is just the basics how you can use it today how you get started on using the like a nutshell that is how you use like and as I said before you can use like with numerical data as well but for demonstration purposes I wanted to use text Data let's get rid of this really quick um let's look at our entire table and I wanted to show you how to use null and not null I can't really show you how to use null because I do not have any null Fields I could easily update this table and make n but that's in a future video where it's a little bit more advanced where you can start altering your data but just for purposes of showing you what null and not null is let's do where first name is null and if we see that is not going to return anything but if we say is not null it's going to return everything because nothing in here is null nothing in this first name column is null so that's how you use it um there are a lot of use cases where you actually will use null and not null that will be in future videos probably in the project section or the portfolio section we weren't able to show really how to use this super well but just as a demonstration that's really all it does it looks at the whole column and whether it is null or not null that's really all it's used for this is actually super useful and you can use it in a ton of situations but again for demonstration purposes that's really all it does so let's get rid of this let's look at in really quick so in is kind of like the equal statement but it's multiple equal statements so let's say we want to say we first name equals gy and then we were like wait we also want to include Michael Scott so then we would have to write and where first name equals and then we would do Michael and then etc etc for anybody that we wanted to include but if we said in we could do an open parentheses and then we can say gy we can say Michael and we can say as many people as we want going down the road just separating it by commas and if we had execute everything would be returned so it really is just a condensed way to say equal for multiple things so that is the we statement I think the wear statement can get extremely complex but this really is highlighting the basics so if you can learn all of these Concepts you will absolutely have the basics down and will be set to go over some more intermediate and more advanced things with the we statement later on in the next video we're going to be going over the group buy and the order buy and then we are done with the SQL Basics and then you can practice and work your way up into my intermediate level videos which are going to be coming out very shortly after these videos thank you guys so much for joining me if you like this tutorial Series be sure to subscribe below and I'll see you in the next video going on everybody my name is Alex freeberg and in today's video we're going to be going over the group by and the order by statements in previous videos we created tables we went over to select the from and the where and now we are at the very end of our SQL basic series if you stayed with us for the whole time hopefully you have learned a lot and learned the basics of SQL in future videos we're going to be going over intermediate and even more advanced concepts and even going through portfolio projects that you can use to put on your resume if you like this type of content be sure to subscribe below but let's get into it for today the group by statement is similar to distinct in the select statement in that it's going to show the unique values in a column the difference is is if we say distinct gender what's going to be returned is the very first unique value of female and the very first unique value of male but if we say gender and we say Group by gender it's only going to return two values but in these two values we actually have all the males rolled up into this one row and all the females rolled up into this one row now let me further show you what that means if I say count of gender now you can see that this whole time there were six males in this one row and there were three females in this one row so with a distinct it really is only showing us what value is in there that's unique but with the group by it's showing us what the unique value is but it's also rolling them all up into one column that we can use it for other things now real quick I want to be able to see both of these at the same time so let's just put this up here and let's run this so we can actually see both now let's add age to this statement down here or this query and let's only run this one and I want to show you what happens and why it happens we're now looking at gender age and then the count of gender so if we look down here we only have one male who is 29 we have one male who is female that's age 30 and so on and so forth so none of these people are both the same gender and the same age if for example we had two or three people who were male and who were 30 years old then we would have a two or a three over here so this count is actually being counted at each row that's being returned so for our data that we have today this isn't a fantastic example CU it really split it out there any that were the same but as you can see you can put multiple columns as long as you put multiple down here now why did we not have to put this count gender down here in this group by that's because this count gender is actually a derived field or derived column it's derived based off the gender column so it's technically not a real column that's in the table it's one that we're creating that's fictional uh per se so the age and the gender are actual fields or actual columns that are in our table they have to be down here and like I said before it's the comparison to that distinct in the select statement because we're looking at the distinct of gender and age so we're saying distinct across multiple columns both gender and age now as we had it before we were only looking at gender it's going to roll all of those up into just male and female but if we want to add more we can easily add more in this group by statement we can still do things like where age is greater than 31 we can still do those things so let's execute this and our numbers are going to change now we're doing it based off gender and we're looking at the count of people whose age is greater than 31 which is smaller than before now let's look at order bu I'll do it down here really quick for demonstration but I am eventually going to come up here and use it because I think it'll be a little bit better to completely round out this query down here let me give this a name let's do count of gender and then let's come down here and let's order by uh let's order by count gender and when we run that it's going to do 1 three and that's because as a default SQL has an ascending feature which is going to be smallest to largest going down if we want to change that we can change it to descending that's going to be largest to smallest so now we have 31 and if we want to do it based off gender and we do it descending now we have Z to A and so that's going to be male female and if we get rid of that it's going to do the the default ascending and let's see what that brings female male now for what we're trying to do let's look at this large table so I think it's going to be a little bit more descriptive or a little bit better visually let's do order by and let's do age let's run this and it's going to order smallest to largest if we do descending it's going to do largest to smallest now you don't only have to do just one thing you can do multiple columns so if I wanted to do age and then gender I can do that as well so let's do gender and let's run that so now we have the age but under the age we also have it ordered by female and that's an ascending order so AB BC d f so females first so it's going to be female first and then it's going to be male and again female and male now we don't have to just let it be ascending for each one if I wanted to do it reverse in this column I can do descending now let's run that and when we have 30 now male is first and female second and if I wanted to do that over here I can do descending and now we have them both descending so it's going to go top to bottom and we have 32 it's going to be male 32 female so you can specify lots of different things in here and we don't actually have to use column names we could just use numbers so if I wanted to do 1 2 3 4 5 I could but let's try to replicate the exact same thing before this would be column 1 2 three four so let's do where four descending and then let's do five descending and if we execute that it's going to give us the exact same result as if we' actually put in the column name and I I do use this a lot oftentimes I don't use the column name I just if it's a small table I'll just use the number so in my actual queries I do this a lot where I just use the number instead of the column name so that is the group buy and the order by statement and if you have walked through my previous videos you should be completely done with the basics of SQL so congratulations the next thing to do is really just practice the basics because the basics are what you're going to be using day in day out and so what I would recommend is create a few more tables query those tables try to think of use cases and what you would actually want to know from that information after that I would move on to my intermediate videos if those are already out and then I would move on to my Advanced videos those are going to go over some more challenging topics but things that would be very useful for anybody to know in my next video I'm going to be going over intermediate SQL topics things like joins and subqueries and a ton more so if I already have posted those be sure to go check those out on my page and if I haven't I hope to have those up soon thank you thank you guys so much for watching I really appreciate it if you learned anything in this basics of sequel Series be sure to subscribe below and I'll see you in the next video what's going on everybody my name is Alex freeberg and today we're going to be starting our intermediate SQL series if you joined us for our last series we walked through the basics of SQL which is everything you needed just to get started and in this series we're going to be walking through some intermediate Concepts to really take your skills up to the next level now today we're going to be walking through joins but let me show you what you can expect from the entire series for this intermediate course so we're going walking through joins today and then in future videos we're walking through unions case statements updating and deleting data Partition by data types aliasing creating views having versus the group by statement the get date function primary care of your foreign key and then we're going to have an advanced course and this is not set in stone yet but these are some of the things that I think I will be going through or walking through we're going through CTE CIS tables or system tables subqueries temp tables string functions regular expression store procedures and then importing and exporting data so with all that being said let's get into it all right now let's get rid of me because we do not need to be seeing me for the rest of the series at the very top here are some of the things that we're going to be going through today which are inner joins and then outer joins and in the outer joins we have a few different styles or a few different types of outer joins now a join is a way to combine multiple tables into a single output for now we're going to be using the employee demographics and the employee salary table so let's get a look at both of these tables and see what's in them in our employee demographics table we have employee ID first name last name age and gender and then down here in our employee salary table we have employee ID job title and salary if you notice they have a similar column and that's going to be the employee ID now when you're doing a join you have to do this based off a similar column and typically you want it to be a unique field so we're going to be using the employee ID from both tables to join these tables together to create one output so let's get rid of this real quick and let's start building our query to join these two tables together so the first thing we're going to do is an inner join so let's do select everything and let's do it from SQL tutorial. db. employee demographics and let's do join we can also say inner join but join by default is going to say iner and we're going to do SQL tutorial. db. employee salary now we have to join them together which is what we talked about earlier and we're going to be doing that based off the employee ID so for that we have to say on and then we're going to say employee demographics dot employee ID is equal to employee salary dot employee ID so let's run this real quick and take a look at the output and let me pull this up real quick so what we are looking at is actually both tables combined we have the employee ID first name last name age gender and then here's the salary employee ID job title salary now an in join is really only going to show everything that is the same so in both tables there are employee IDs of 10001 all the way down to 10009 but if you notice there is data that is missing real quick let's go down to this graphic and let's look at this inner join an inner join is going to show everything that is common or overlapping between table a and table B so what we are looking at here is exactly that we're only looking at the things that are similar based off this employee ID in both tables now let's change this join to a full outer join and let's run this and see what we get now if you notice the output is very different so let's take a look at it and see why it's so different if you notice everything down till here is the exact same so employees 101 down to 1009 are exactly the same but once we get down to row 10 it starts to get very different now we are joining these tables based off the employee ID so for example right here Ryan Howard has an employee ID of 101 but as you can see in this table for salaries there is no 101 employee ID so it has nothing to link it to so because of that it fills in everything as null because it has nothing to match on this table and vice versa in the employee salary table there's a person in here that's a Salesman and there's no employee ID at all which means all this information is going to be null and we can see that in this diagram right here so this is the full outer join right here and what it is saying is we are going to show everything from table a and table B regardless of if it has a match based on what we were joining them on so even if table a has an employee ID but there's no employee ID in table B we're still going to show it and vice versa so now let's look at a left outer join a left outer join is going to take the left table and say we want everything from the left table and everything that's overlook lapping but if it's only in the right table we do not want it now what is the left and the right table the left table is going to be our first table that we use our right table is going to be the second table that we use so we're going to look at everything in the employee demographics table regardless of whether or not it has a match on the employee ID in the employee salary table so this is what that looks like so as you can see this is our entire table for employee demographics and down here we have three that have information in the employee demographics table but have absolutely no information in any of the employee salary table because there's nothing to match it on so this 101 is not in this table this 13 is not in this table and this one does not even have an employee ID so we're not going to have a match at all and if we change that to the right you'll see the exact opposite it's going to show us everything in the employee salary table so now we have all of our information right here from the employee salary table and if it doesn't match in this table it's just going to give nulls so down here we have 1,0 and obviously there's not going to be anything associated with that because there's no 10,0 in the employee demographics table and for this one we have a Salesman with no employee ID and since there's no employee ID to tie it to this demographics table we're going to have nothing and we can see that in the diagram right here so for the left outer join we're looking at everything in table a which is our demographics table and in our right outer join looking at everything at table B which is our salary table now let's pull this down a little bit so so far we've only been using the select star so we've been selecting everything and I only did that just for demonstration purposes but you most likely would not be doing this when you actually use these joins what you're probably going to want to do is Select exactly what columns you want in your output so for example let's do employee ID let's do first name last name and let's do job title and let's do salary and let's try to run that really quick and as you can see it is not going to work now why is that not working it's not working because we have two Fields one in each of these tables and we have to specify what employee ID we want because that is going to drastically change what our output is so we have an employee ID in this table and in this table which one do we want to use so for this demonstration let's use employeed demographics. employee ID and let's actually just do an inner join because it's easier for the output now let's run this and see what we get so as you can see we now have the employee ID first name last name job title and salary now we're doing this with an injoin based off the employee ID from the employee demographics table but if we use the employee salary table it should give us the exact same output and that's cuz we're using an in join and an in joint is only going to show us everything that overlaps between both tables but now let's try a write outer join and let's run this now we're using this employee ID from our employee salary table and since we're doing a write outer join we're going to get all the information from our employee salary table and it does not have to be in our left table which is our employee demographics table so if you look at the information down here this 110 is in the employee salary table but it's in this position because that's what we're looking at in our select statement and then over here we have our salary and since we have information right here which is in our employee salary table but there is no employee employe ID our employee ID is null now let's change this to look at the employee demographics employee ID and execute it as you can see that 110 is gone now we just have this information right down here and we didn't have the employee ID for either of these so it's going to show it regardless and that's again because we have a right outer join and that's why we have no employee ID down here now let's do a left join and it's basically going to do the opposite of what we just looked at now we're looking at everything from our left table regardless of if it's in our right table and so our left table is our employee demographics table and we are looking at our employee demographics ID so with the employee demographics ID it's going to show us the first name and the last name which is everything in our left table our employee demographics table and since for these IDs or lack of IDs it's just going to give us NES in all of these places if I change it right up here to the employee salary employee ID and I execute it because we're showing everything from our left table which is our employee demographics table we are still going to see our names but since we're using the employee ID from our right table now we're just going to have blanks in this information and this information now let's look at a use case for these joins let's say Robert California is pressuring Michael Scott to meet his quarterly quota and Michael Scott is almost there he needs like a thousand more dollars and he comes up with the genius idea to deduct pay from the highest paid employee at his Branch besides himself so how does he go about doing this and identifying the person that makes the most money well of course he's going to come to SQL first so we actually want to look at a full outer join real quick and let's just look at everything so here's what we have we have the employee ID first name last name age gender employee ID job title and salary now what information do we need to know to get the information that Michael Scott needs well we need the employee ID we want the first name and last name so let's write all that real quick so employee ID we need first name name we need last name and then we're also going to need the salary cuz we need to know who is the highest paid employee so now let's do an injin because we really only want to look at the employee IDs where we know what their name is and their salary is and let's do this based off the employee demographics table really doesn't matter for an in join but let's do that real quick so let's look at this so we have our employee ID we have our first name our last name and our salary and we want to do it where it's not Michael Scott and that's because Michael doesn't want to take away his own money he wants to take away his employees money so let's do where first name does not equal Michael and he knows that he's the only one that is not named Michael so now we have our list and let's do order bu and let's do salary and let's execute this and let's do descending so that we can get at the very top and this is tough tough news for Dwight shut because it looks like he is the highest paid employee besides Michael and so it looks like he is going to get a cut in his pay this quarter so that Michael can meet his quota so that's just one use case let's look at one more use case let's start out by getting rid of this and looking at everything again so for our next use case Kevin Malone who is an accountant thinks that he may have made a mistake when looking at the average salary for our salesman now Angela Martin is very good at SQL and so what she is going to do is she wants to go in and calculate the average salary for our salesman so let's try to get that information so all we're going to need is the job title and the salary so let's come up here and let's get job title and let's get salary and let's look at this and now we only want to look at where the job title is equal to salesman now the very last thing we want to do is we want to say we want the average of salary now since we're going to need to do a group buy we're going to have to get rid of this salary and just take job title write down here and do group by job title so we're going to have job title and then the average salary and there you go we have the salesman and the average salary is 52,000 so Angela now knows to go back and fix what Kevin made a mistake on so that's how you use joins I will includ include this image in the description so you can go and look that up yourself if you are curious and want to look at that that really helped me out when I was first getting started to kind of conceptualize and understand what kind of data I was pulling based on what join I was using so I hope that was useful to you as well in the very next video we're going to be looking at the union so if that is posted be sure to check that out next thank you guys so much for joining me I really appreciate it if you like this type of content or got anything out of it today be sure to smash the like button smash the Subscribe button and I'll see see in the next video what's going on everybody my name is Alex free in today's video we're going to be looking at unions now in the very last video we walked through joins and I thought it was appropriate to look at unions next because unions and joins are somewhat similar or closely related and that's because in both instances they're combining two tables to create one output now what's the difference the difference is that a join combines both tables based off a common column and in last video that was the employee ID so in both tables we had an employee ID and when you're selecting your data you have to choose either to only select one employee ID or you can choose both employee IDs but they're in separate columns and with a union you're actually able to select all the data from both tables and put it into one output where all the data is in each column and not separate it out and you don't have to choose which table you're choosing it from now that may not have made1 100% sense but let's look at it real quick in stages so let's go down here and let's actually join this table together and see what we get now the two tables that we're looking at is employee demographics and warehouse employee demographics so over here we have our employee demographics information and then over here or actually down here we have our warehouse employee demographics now right now I'm doing a full outer join so we're looking at all the data and if we were to pull this in to an Excel spreadsheet we could just copy this and paste it over here and we would be good to go and that's because we have all the same columns first name last name age gender first name last name age gender but if we tried to combine this in a query where we have this information right here it wouldn't work we cannot get it in the same column and that's where a union comes into play so let's go back up here and let's actually run both of these now as you can see they have the exact same columns and that makes it super easy for what we're about to do all we're going to do is between these two queries which are completely separate right now all we're going to do is write Union so let's run just this now because of the Union you can look down here and the information that used to be in the other table which were in separate columns are now added Down Below in the exact same order now Daryl filin was actually in both tables and the reason he isn't showing up multiple times is because this Union is actually taking out and removing the duplicates kind of like a distinct statement now there's actually another thing called Union all and if we do Union all it is going to show us all of the information regardless if it is a duplicate or not so let's run that real quick and they they are both there but let's order by and let's do employee ID so now let's run it and as you can see right here these are exact duplicates and so the union got rid of it because they were the exact same but the union all kept it in because it is showing just the data as is now let's get rid of this Union all because the only reason why it works so well is because those two tables were exact same they were employee ID first name last name age gender so they're basically the same tables just with different information so it made it really easy but we have another table employee uh salary and let's look at these two tables so these two tables are obviously very different they hold different information now we would still be able to combine them so let's do employee ID first name and let's do age now down here on the employee salary table we will do employee ID job title and salary now let's use a union really quick and run this one and it is still going to work now why does this work well first off the the reason it's working is because these data types are the exact same or at least similar so text and text age which is an integer salary which is an integer it has the same amount of columns so three and three so we have employee ID first name and age and it's taking that from the first select statement and it's still using a union to take the data from the second select statement so it's still inserting this information now this is not what you want to do because right here we have first name and it's salesman salesman and then our age we have 30 45,000 and 45,000 is obviously not an age so you want to be careful when you're using a union to combine two separate tables and make sure that the data you're selecting is the same in the very next video we're going to be walking through case statements thank you guys so much for joining me I really appreciate it if you like this type of content be sure to subscribe below and I'll see you in the next video what is going on everybody my name is Alex freeberg and today we're going to be walking through cas statements in SQL a case statement allows you to specify a condition and then it also allows you to specify what you want returned when that condition is met so we're going to be using this employee demographics table that we're looking at right here we're going to walk through the syntax of how to create a case statement and then we're going to actually go into some use cases at the end so let's start off by specifying what columns we want let's say we want the first name we want the last name and we want want the age now let's just get that information now for our case statement we're going to be using this age column so we actually want the age to be in there so let's specify where age is not null and run that so now we have a pretty good look at it and let's just order by H just to clean it up a little bit so now let's start building our case statement so we're going to say case and then we want to say when now we need to specify what condition we want to look for so let's do when age is greater than 30 then then what do we want to be returned so we want to return that they are old else so that means anything that is not over the age of 30 we want to return young and then you need to specify that you done with the case statement and so you will write end at the very bottom so this is our first case statement let's run it and see what we get so as you can see a new column was created and if the person is over the age of 30 so 31 and up they are given old and if they're not over the age of 30 they are given young now we can do as many when and then statements as we want so if we want to we can also do when the age is between 27 and 30 then we want to return young and anyone else we're going to call a baby so now we have Ryan Howard as the baby anyone between 27 and 30 they're considered young and anyone over the age of 30 is old now something to note is that the very first condition that is met is going to be returned so if there are multiple conditions that meet the criteria only the very first one is going to be return returned and let's demonstrate that real quick so if the age equals 38 then return Stanley because that is Stanley uh and let's execute this real quick so right here I'm specifying that if it's 38 it should return Stanley but he is right here and it still says old and that's because this condition was already met now if we were to put this right here it should work correctly and let's try it out so now because this condition is met first it is going to return Stanley down here so now let's get into our first use case let's start off by copying this and then commenting it out I only did that because I don't want to rewrite it because I'm lazy uh let's get rid of that and let's look at this real quick we are going to join on another table that we have really fast um that's going to be SQL tutorial if you watched my other videos then you know this table and we're going to do that on employee demographics. employee ID is equal to employee salary. employee ID okay so let's just look at everything in these tables really quick now we are going to be focusing on the job title in the salary column but we want their first name and last name as well so let's start building that out let's do first name last name job title and salary and let's look at this really quick so now we have our employees and here is the situation we had a fantastic year this year selling paper and corporate has allowed Michael Scott to give out a yearly raise to every single employee but not every employee is going to get the same raise because our salesmen are genuinely the people who made us our money and they're going to get the biggest raises well other people really aren't going to get that big of a raise so now let's go through and create a case statement to calculate what their salary will be after they get their raise so let's start off by saying case and when and we want it to say when job title is equal to salesman so when they are a Salesman what do we want to happen so this is where the calculation occurs so we're going to take their salary and then we're going to add their salary times how much their raise is going to be so the salesman did really really well and we want to give them a 10% raise this year now when their job title is equal to accountant then and we'll take their salary we will give them let's give them a 5% raise still very generous there we we go and when the job title is equal to HR then it's going to be the salary plus the salary times and then we're going to do 01 all right and else we are just going to do salary plus salary oops let's do parentheses times and let's just give everyone else a 3% rays and then we'll write end now let's take a look at our results so here's what we have so far we have our first name our last name our job title and our salary that is our current salary and then we're going to have our salary after we get our raise so I'm going to actually write that up here so let's do as salary a after raise and let's execute that so let's look at these raises really quick so we have 45,000 and since he is a Salesman he gets a 10% raise which is a raise of $4,500 so 45,000 plus $4,500 is $49,500 and as you can see down here we have HR who is making $50,000 and now he is making $5,000 5 so everybody got a raise so that is our case statement I hope that was helpful I find myself using the case statement a lot when I'm wanting to categorize things or label things and that's kind of what we did in the first example and you can even do calculations like we did in this use case so I hope that was helpful thank you guys so much for watching I really appreciate it if you learned anything from this video be sure to like And subscribe below and I'll see you in the next video what is going on everybody my name is Alex fre and today we're going to be looking at the having Clause now the having Clause I feels a little bit unappreciated in the SQL Community I feel like it doesn't get a lot of love and so today I want to describe how to use it and what it's used for so before we use the having Clause I want to set up our query here uh we want to use an aggregate function in the group by statement and then I will show you how to use this having Clause so let's look at the job title and let's look at the count of job titles and then down here we need to do group by job title and let's execute this and here is our job titles and here's the count of how many people have those job titles so now let's say we want to look at all the jobs that have more than one person in that specific job so let's do where uh the count of job title is greater oops is greater than one and let's run that and as you can see we're going to get this this message right here now let's read it an aggregate may not appear in the wear Clause unless it is in a subquery contained in a having clause or a select list and the column being aggregated is an outer reference what that is basically saying is is we cannot use this aggregate function in the wear statement we need to use a having Clause so let's get rid of this and let's say having the count of job title greater than one I did the same thing again and let's execute this and we're still going to get an error now why are we getting that error the reason is is because this having statement is completely dependent on the group by statement because we are performing this after it has been aggregated so this having statement actually needs to go after the group by statement because we can't look at the aggregated information before it's actually aggregated in that group by statement so now let's run this and and it worked perfectly so now we only have the jobs that have more than one employee for that job title so now let's look at one more example let's do the average let's say salary and let's get rid of this having Clause real quick and just to look at this information uh let's do order by and we'll do average salary so let's look at this and we have 36,000 to 65,000 so in the middle we got 44,500 so let's use this having statement and let's say the average of salary where it is greater than 45,000 and we actually need to put this right here right after the group buy and before the order buy so let's run this and see what we get and it worked perfectly so now we're looking at the job titles that have an average salary of over $45,000 so there you go that is the having Clause definitely one that is good to know and is very useful in specific situations thank you guys so much for watching I really appreciate it if you like this video or learned anything today be sure to subscribe below and I'll see you in the next video what is going on everybody my name is Alex freeberg and today we're going to be looking at updating and deleting data in a table now what's the difference between inserting data into a table and updating data insert into is going to create a new row in your table while updating is going to alter a pre-existing row while deleting is going to specify what rows you want to remove from your table so let's get going with the updating so down here Holly flax does not have an employee ID age or gender now we want to update this table to give her that information so let's do update now we need to specify what table we are going to be hitting off of so let's do SQL tutorial. db. employee demographics so now we're going to use something called set and set is going to specify what column and what value you actually want to insert into that cell so let's set her employee ID equal to and it's going to be 1,2 and we have to specify which one to do this to because if we ran just this is going to set every single employee ID to 112 because we haven't specified that we only want Holly flax's row to be updated so now we have to specify where first name is equal to Holly and last name is equal to flex so now let's run this and see what we get so one row has been affected let's see what we got and there we go as you can see the employee ID was updated exactly how we specified it right here so we also want to update age and gender and let's do that in the same query so let's set the age equal to 31 and instead of using and we actually need to use a comma so let's say age equal to 31 comma gender is going to be equal to female and let's write this and see what we get there you go now let's look at our table and as you can see it was updated to 31 and female so very easy very easy to specify what you want often times uh tables like this will have a unique key like employee ID is our unique key in this table so I could easily just say uh where the employee ID is equal to and then you know 102 so it's an easy way way to specify what employee you're trying to update so now let's look at the delete statement the delete statement is going to remove an entire row from our table so let's do delete and we actually need to say from and we have to specify what table we want to be removing this information from so let's do SQL tutorial. db. employee demographics and now we need to specify what row we want to remove so let's do where employee ID is equal to and let's choose a completely random employee ID 105 so let's run this and see what happens so one row is affected let's look at our table and as you can see 105 is now gone now you have to be very careful when you use the delete statement because once you run it you cannot get that data back there's no way to reverse a delete statement so if I had gotten rid of this wear statement and I ran this it would delete everything from the entire table and you could not get that data back so a little trick that I use before I actually run a delete statement is I make it a select statement because you're going to select everything where the employee ID is equal to let's just do 1,4 and now when you run this you are going to see exactly what you will be deleting and now we know that Angela Martin that entire row is going to be gone if I hadn't done that and I just went like this and I wrote delete and I only had this running I would not know that this information is going to be the only one that's gone maybe I made a mistake down here maybe I accidentally put something in there that wasn't supposed to be in there and now I'm deleting much more than I thought I was actually going to delete so using the select statement can be a very good Safeguard against accidentally deleting data that you do not want to delete so that is update and delete thank you guys so much for watching I really appreciate it if you like this video be sure to subscribe below and I'll see you in the next video What's going going on everybody my name is Alex free and today we're going to be talking about aliasing now all aliasing really is is temporarily changing the column name or the table name in your script and it's not really going to impact your output at all aliasing is really used for the readability of your script so that if you hand this off to somebody or somebody comes behind you and starts working on this they can more easily understand it and it may not sound super useful especially for small scripts like what we have on the screen but when you start getting to larger scripts where you have six seven or eight joins and you're selecting 10 different column names it actually is very useful and very important so let's get into how that actually works and then I'll have an example later of how we can use aling with a little bit of a larger query so in this table let's select first name and execute what we want to do is just write as and let's do FN name and all that's going to do is it's going to rename this column from first name which it was originally named to FN name now you can can use as but you can also just get rid of that and do it exactly how I have it and it's still going to work perfectly you can either use the as or you can not use it I typically don't I just put a space in between the actual column and the Alias now let's look at an example of how this might actually be useful so we have a first name and a last name in this column so what we're going to do is actually combine those so let's do plus and let's add a space in there and let's do a plus and let's do last name so this is going to take the first name add a space and then do the last name and we're going to do that as and let's do full name and let's execute this so now we have a column called full name which is our Alias so we've combined the first name and the last name column into one single column and we've renamed it full name if we had not used this Alias at all it would have just said this which is no column name at all we don't typically want that when we have an output we want to give this column a name so that somebody who's actually looking at the script or who's looking at the output of the script actually understand what is contained within this column so for that we're just going to keep it as full name now another time that you're often going to use aliasing in the select statement is when you're using aggregate functions so in this table we have age so let's pull that up really quick so we have age right here and let's actually just do the average age and when we execute this we're going to get no column name and 31 so we want to do is give it average age and when we do that we now have a column name and again you want to have a column name in case someone comes up behind you and is reading the script so that they understand what this column is being used for now that we've looked at aliasing column names let's look at aliasing table names it basically is the exact same thing uh we're just going to write as and let's do demo for demographics and let's do demo Dot and it's going to give us all of our options and we'll do employee ID so when you alias in a table name when you are selecting in the select statement you actually need to preface your column name with a table name or the table Alias Dot and then employee ID and this is extremely important to do especially when you have a lot of joins that you're doing or you're selecting a lot of columns when you have several joins because it can get very very messy quick so let's actually join this to employees salary and let's do that on demo. employee ID is equal to s. employee ID so now let's do demo. employee ID comma s do and let's do salary so looking at the script now is very clean it is very easy to understand and that is what's so important with aliasing if for for example we took this off every time we wanted to reference this table we would have to put the entire table name and putting the entire table name is correct it just is very cumbersome and does not look clean at all and so using something like demo as an alias makes it a lot more easily readable and a lot more manageable when you're looking at it when you have a very long script let's look at this queer where we're joining together three Separate Tables and after each table we have an alias for employee demographics we have a employee salary we have B and warehouse employee demographics we have C now unfortunately I have seen a lot of scripts that look exactly like this and this is what you do not want to do you do not want to use your aliasing to just write an a a b or a c that is very frowned upon when writing queries because it really doesn't give any context to what the table that you're referencing is and it gets really confusing as this query continues to grow and as you add more columns to your select statement it makes it more difficult to understand where those columns are coming from and so when I'm reading that I say select a. employee ID okay what's a a is employee demographics so you really do not want to do that now let's look at an example of what it should look like so for employee demographics instead of having an alias of a a I used demo for demographics for employee salary I used s and for warehouse employee demographics I used where now this is not perfect by any means but in the select statement if you're just glancing at it you can easily understand which columns are coming from which tables so when I look at employee ID I know that's coming from employee demographics CU I have demo as the Alias so it's a lot easier to understand and when you hand this query off to somebody it is going to be a lot easier for them to read through it and understand where those columns and those table names are coming from and so they will appreciate that in the long run so that is all I got that is aling again not a super tough subject but a really important one to understand especially as you start working in teams and as you start creating more and more complex queries you want to have it more organized and more easily readable and so it may not come into play with those really simple queries but again as as you build out those more complex queries this becomes very useful I really hope you enjoyed this video if you did be sure to comment and subscribe below thank you so much for watching and I'll see you in the next video what's going on everybody welcome back to another intermediate SQL tutorial today we're going to be covering Partition by now Partition by is often compared to the group by statement the group by statement is a little bit different the group by statement is going to reduce the number of rows in our output by actually rolling them up and then calculating the sums or averages for each group whereas Partition by actually divides the result set into partitions and changes how the window function is calculated and so the Partition by doesn't actually reduce the number of rows returned in our output let's get started to look at the actual syntax of how to use Partition by and then we'll compare it to the group ey statement later just to see the differences between the two we're going to be using these two tables on our left over here so I'm going to pull those up really quick so let's run this and let's look at the two these two tables Side by well one underneath the other really quick so what we're going to be using to demonstrate these partitioned by is this gender column as well as this salary column and so we just need to join these two tables together on the employee ID and then we'll go from there now I'm not going to bore you with that I'm going to skip ahead and we'll actually look at how to use this partition bu so I've joined these two tables together and this is our output but we don't want every single column I'm going to start selecting some of these columns and then we'll start using this partition Buy and see what the output looks like after that all right so let's go right up here let's choose the first name let's do the last name we'll do gender and let's do salary and now we want to identify how many male and female employees we actually have and so we're going to say count of gender and this going to be over and now we're going to do our Partition by and we're also going to partition that by the gender as total gender now I'm going to come back to why we did each part but I want to see the output first and then we come back to why we wrote it this way so let's just do this really quick so it's going to be a little bit different than what you typically would expect in a group by statement the group by is going to roll everything up and you typically wouldn't have like a first name last name in a group by statement because it would be very hard to roll all those things up into those individual columns and to reduce the number of columns that are in your output and so in our output we can see Pam Beasley she's a female she makes $36,000 as a salary and there are three total women that work alongside her in this employee demographics table and so in our total gender column over here this is where we use the partition bu and if we used a group bu statement to get this kind of information all we would be able to do to get this information in a group by statement is say select gender count of gender and then Group by the gender down below underneath the join so because we're using the partition bu we're able to isolate just one column that we want to perform our aggregate function on and so we're able to add things like the first name and last name columns even though we aren't trying to include that in any partition or group by statement yet we're still able to add the aggregate function to each individual row while still maintaining those other columns let's take this entire query and let's basically just transform it into a group by statement and we'll see kind of what that looks like and what the difference is so all I'm going to do is get rid of all this I'm going to copy all of this and I'm going to say Group by and I'm going to do that because we have to use all these columns in our group by statement so let's execute this and as you can tell we are not able to see the output for the aggregate function that we were hoping for if we wanted to get the same output that we had before where we're showing three for females and six for males what we'd have to do is get rid of this first and last name and the salary and do the same thing in the group by statement and so let me get rid of these really quick and run this and so what the Partition by is doing is basically taking this query right here and sticking it on one line in the select statement and so I hope now you can see how valuable the partition bu can be if used correctly thank you guys so much for watching I really appreciate it if you like this video be sure to like And subscribe below and I'll see you in the next video what's going on everybody welcome back to another squl tutorial today we're going to be talking about CTE a CTE is a common table expression and it's a named temporary result set which is used to manipulate the complex subqueries data now this only exists within the scope of the statement that we were about to write once we cancel out of this query it's like it never existed a CTE is also only created in memory rather than a tempdb file like a temp table would be but in general a CTE acts very much like a subquery and so if you know how to do subqueries you should be able to pick up on CTE fairly easily so let's get started writing our very first C CTE and we're going to come down here and we're going to say with and we're going to write CTE employee and we're going to say as and this is where everything's going to start now CTE are sometimes called with queries I've never personally used that but I've seen it called that online but that's because it uses this with statement right at the very beginning so now we have with CTE employee as then we have an open parenthesis and now we have to construct our select statement and this is kind of where we build out our quote unquote subquery and so I'm going to take in a select statement that I actually used in a previous video where we using the partition bu and so I'm going to put that in there and I'm kind of walk us through what that does and how we're going to use this so I'm going to paste this down right here and I'm actually going to go like this just to make it look a little nicer and then I'm going to close the parentheses at the end so now we have our CTE in place and as you can see it is basically just a select statement within the with CTE employee as and what this is going to do is going to take the first name last last name gender and salary and then it's going to take this aggregate function with the partition buy aggregate function with the partition buy and it's going to place it to where we can now query off of this data so it's putting it basically in a temporary place where we can then go and grab that data so all we're going to do at the very bottom is we're going to say select everything and we can do that from CTE employee so let's run this entire thing and see what we get so as you can see this select everything from CTE employee we are selecting everything from this select statement and so this feels a lot like a temp table we're actually quering off of a temp table but it actually acts a lot more like a subquery now we don't have to the select everything we can just do first name and let's do average salary and when we run this we'll just get those two columns and we don't have to go through and actually write this out each time it's just in this CTE for us so it does all the heavy lift within the CTE and then we can just query off of what we want now something to note is that the CTE is not stored anywhere and so it's not stored in some temp database somewhere if I try to run just this by itself it is not going to work so let's try that out really quick and we should get an error and that's because each time we run this query is actually creating the CTE again and so it's not being saved anywhere and so each time we run it we have to run it with the entire CTE another thing to note is you actually have to put the select statement right after the CTE if I try to go down here and say select everything from uh let's do CTE employees it doesn't actually work it's not going to come up at all and that's because it only is going to work with the select statement directly after the actual CTE that you've created I hope this was helpful and I hope that you understand how to use a CTE a little bit better again you don't have to go super complicated with the select statement within your CTE it can be very very simple I just wanted to demonstrate that you can use aggregate functions within your CTE and then just query off of those without having to do the the aggregate function again which I find is very very useful again thank you for watching if you like this video be sure to like And subscribe below and I'll see you in the next video what's going on everybody welcome back to another squl tutorial today we are looking at temp tables and if you can guess it based off of the name they're kind of like temporary tables and we create them very much the same way we're going to do create table um it's just a little bit different and you can hit off of this temp table multiple times which you cannot do with something like a CTE or a subquery where you can only use it one time or with a subquery you need to write it multiple times within a query and so these temp tables are extremely useful I'm going to kind of talk about how you can use them as we're going uh throughout this video but let's get started right away with actually creating one looking at it inserting some data and and and kind of showing you how temp tables work and what we can do with them so uh we're going to start off with create table much like uh a regular table is created the only difference is we're going to do this pound signed and then we're going to do tempcore employee uh so literally the only difference between a regular table and a temp table is this right here at the very beginning this this pound sign so uh let's just start by doing employee ID we make that an integer we'll do job title and we'll make that a varar 100 and then we'll do salary and let's make that an integer and so now we have our temp table uh let's go ahead and create it so now we have our temp table created and so we can look at it really quick so let's select everything from and we'll do temp employee so let's take a look it's completely empty um and we can insert data very much the same way we'd insert data into a regular table so let's start doing that let's do insert into and we'll do temp employee and we'll do values and let's just do something really quick because I'm going to get to a little bit more interesting stuff in a second oops so we'll make this person HR that's their job title then for salary we'll give them 45,000 and close it off so let's run this and let's select everything again and see what's in there perfect so we were able to insert data into this temp table and again we we don't have to create this every single time we um um or we don't have to run this every single time we need to hit off of it like we did a CTE if you watch my previous video and this one we can just run it and it sits there and so U again it feels very much like a real table and I'm going to get to a little bit of the nuances of of the and the differences between a regular table and a temp table in a second but let's really quickly um we want more data in there you don't have to just um do it value by value we can also just do um uh where we select all of the data from a specific table and insert that into a temp table and that is really quickly you know how I do it most of the time most of the time I'm not inserting values um I am you know taking a large table and taking a subset of that and then sticking it into a temp table so let's look at this really quick and and run that so now we took all of the data from employee salary and then we just stuck it into this table and really quickly this is one of the big uses of a temp table we had let let's say for example that this employee salary table had a billion rows or or or just an extremely large number and we were trying to uh you know hit a somewhat complex query off of it where we're using joint coins and we're using U maybe some window functions or different things you know it would take a very long time to hit off of this but what we can do is we could insert that data into this temp table and then we can hit off the temp table and it already has that sub uh that subsection of data that we're wanting to use for all of our later queries so really quickly that's kind of um kind of a use case for that so let's go down here we're going to kind of create another one and this one's going to be a little bit more advanced a little bit of how I would actually use a temp table above was just kind of showing the basic syntax how you kind of put data into it you know kind of how it's used now I'm going to show you kind of how I would actually use it so let's do create table uh let's do temp oops create table uh let's do temp uh employee 2 and then let's do open parentheses and we'll do job title and we'll make that a varar 50 and then we can do employees per job we'll make that an integer now we need average age make that an integer and the very last one will be average salary I'll make that an integer as well and let's run this oops so we have our second table now we want to insert data into this one so we're just going to do insert into and we'll do temp employee 2 and for this one I'm going to take a query that we used in a previous video and so I'm just going to copy and paste that to save time uh and then we'll keep on moving from there all right so I'm just going to paste that in we will run this and really all it's doing is from this these tables it's taking the job title we're getting a count on the job title average age average salary and that is it um so let's see if that worked which it looks like it did but you know let's actually take a look at the [Music] data and so now we have this subsection of data from this join above and what this is going to do is is whenever we want to run this we don't have to run it on these two tables and create the join and then do the calculations which takes time what it's going to do is it's going to take this these exact values and place this into this temporary table and if we want to run further calculations on these values we can easily do that in a fraction of the time instead of having to run this every single time which will take up so much uh uh processing power and it will reduce your runtime dramatically when you're placing this data in this temp table and hitting off of that instead of all these joints and everything above uh a lot of times these temp tables are used in store procedures now if you haven't learned about store procedures or used stor procedures at all you know that's okay I still want to show you something that might be useful um although this is used a ton in store procedures so for example let's say we have a store procedure set up we run the store procedure and we get an output and you know we for whatever reason want to run it again and when we run it again uh we get this error and you know this temp table lives somewhere it it doesn't live in an actual in the actual database uh but it lives somewhere and so when we run it again we get an error because there's already a temp table created one trick or one little tip that I would give is doing something like this saying drop table oops I don't know why I did so many spaces drop table if exists and we'll do temp employee 2 just like that now what this is going to do is when you're running that store procedure over and over and over again you're getting error or whatever for whatever reason you need to run it multiple times every time that you run it it's going to encounter this and so if that already exists it is going to delete that table and then allow you to create it again and this is just a really good thing to do so now if you see down below I can run this time and time and time again and it is going to work every single time because it is checking to see if that exists and if it does it deletes it and then I can create again and so that is just a helpful tip if you're going to try to use this I highly recommend adding that to your query just to make sure things run smoothly I know there is a lot more that can go into temp tables a lot more of the technical aspects or the DBA stuff um obviously I just want to teach you how to use it and what you might use it for and how to actually write it out but you know there are a lot more things that you can do research on about processing speed and storage but unless you are something like a DBA you probably don't need to worry about those things and so if you are a DBA I do recommend looking into those things making sure you understand how that works how this data is stored uh so that when people use them or you are using them you know what's going on in the background but for getting up and running with temp tables I hope that this was helpful thank you guys so much for watching I really appreciate it if you like this video be sure to like And subscribe below and I'll see you in the next [Music] video what's going on everybody welcome back to another SQL tutorial today we're going to be looking at string functions some of the things that we're going to be looking at are things like trim replace substring and upper and lower uh we're going to create a new table insert a little bit of bad data into it and then we're going to be using that to work on our string functions today so I already have this set up right here um I'm going to put this in the GitHub that you can just download this you don't have to you know type this out manually so go look in the description if you know you just want to get that off the GitHub and download that and copy and paste it save you a little bit of time but let's go ahead and run this really quick and as you can see in this table we have uh our data right here give me one second so in this employeee errors table basically what we have actually let me pull this back up basically what we have is in this first one we have here we go we have some uh basically blank spaces on the right side the second one some blank spaces on the left side U we also have Jimbo which is an error because his name is Jim um and Halbert because his name is actually Halbert um and then for Toby for whatever reason that o is capitalized and then uh Michael got in here and added this extra part so we're going to have to figure out a way to take that out when we're doing our query and that'll come in a little bit later I think in the substring section so let's get into it right away let's start using uh our left trim and right trim we're going to kind of go through each one um pretty quickly hopefully I'm not not trying to make this a super long video because we got a lot of things to get through in this one video uh so I'm going to go through the trim right trim and left trim let's look at uh the employee ID because that's the one where we have some blank spaces on the right and the left side the left side you'll be able to obviously you're going to see that one much easier but uh let's start walking through this so let's do select employee ID and before we get any further let me just get the employee errors on here so we can um so that we can see everything as it comes up so we're just going to do trim and then type in the column that we want to uh take these blank spaces out of that's where the trim does the trim gets rid of Blank spaces on either the front or the back or or the left on the right side so on both sides that's what trim does and we'll say as ID trim so let's run this one really quick and as you can see this is our regular employee ID and so you know you can't visually see it as easily on this first one but there are blank spaces after this 101 and we got rid of those and then there were blank spaces before the 102 and we got rid of those now I'm just going to copy this uh two times because it's basically the exact same thing but uh I'm going to show you them all at the same time so it's the exact same thing except lrim and right trim uh and let's take a look at all these at the same time and let me pull it up so in the me see if I can get these all in here okay in the trim it got rid of both the left and the right side so all of these were fixed in the employee ID for the left trim we're only getting going to be getting rid of this one this one still has um blank spaces on it and when we do the right trim we're only going to get rid of the stuff on the right side so this one doesn't change because this is on the left hand side where the blank spaces are so this one was fixed again not super visual so you can't really see it but that one is fixed uh let's move on to the next part uh which is using replace so for this one we're going to be looking at the last name so let's go back up really quick to the employee errors uh as you can tell the last name um the biggest one where we kind of want to take something out of because we don't want that um that Dash fired still in there we're going to replace that and so let's look at how to do that um let me just copy this real quick and get rid of this top part um so we're going to do the last name so let's just start off with our last name um and then just as a baseline so we can see what it looks like before and then we'll do replace and all we're going to specify is the column that we want uh to do the replacing in we're going to specify the value that we want to replace so in this it's going to be Dash fire oops got a little aggressive on that one dash fired and we're going to indicate what we want to replace it with now I'm just going to replace it with blank um and we can say as last name fixed so let's see what this looks like really quick and it looks like it worked so in this last name it originally had flenderson DF fired and when we replaced it and we took that Dash fired and replaced it with basically nothing uh it then fixed it and so now it looks correct all right let's move on to the next one I think this one might be um the the longest one to write but that is the substring um and let me take this real quick trying to save us some time so substring is very is very very unique you can specify um in a either a number or a string you can specify the place that you want to start and then you can also specify how many characters you want to go out um and and and it pulls that in so just as a really quick example um and then I'm going to show you kind of a use case for this one that I think is pretty cool that um you know maybe let me see so that maybe that you'd find useful so I'm going to do first name and then I'm just going to do one comma three so it's going to take the first name it's going to start at the very first um very first letter or number and it's going to go forward three spaces or three spots spots so let's just take a look at what that looks like so for our table it's going to take Jim Pam and to or or Tobe for Toby um and so it's only going to take the the first three because you're starting at number one now what if we started at three so we do three comma 3 it's going to go to the third um digit or or third letter and then it's going to go forward three so you kind of get a sense of how this works now I'm going to show you something that I think is very interesting that I think you guys will also find interesting uh let me fix that CU I just messed it up so if you've ever heard of something called fuzzy matching now if you don't know what fuzzy matching is I'll give you an example let's say in one table my name is Alex and in another table my name is Alexander if we tried to join those two together based off of my name they will not join because one is Alex and one is Alexander there's not they're not an exact match but for if I take the substring and start position one and move forward four characters it's going to take Alex from both and then it will match them together uh and say that they are the same so that you know it may not be perfect that's why it's called a fuzzy match because it can work for a large majority of the time but it's not going to work every single time and so I want to show you how we can use this here um really quick I need to join this to um the demographics table so I'm going to do that really quick bear with me for just one second let's try to make this at least look somewhat good so what I'm going to do is I'm going to start off by saying um let's tie it to the first name uh let's do whoops let's do air. first name is equal to the demographics table first name okay so I want to see and I'm just going to do first name for ER and let's do them. first name so let's see what comes up when we do it like this so the only one that is going to work is Toby and that's because even though it has a capital O it's still going to take it um so you know we want to get all of them to match and we can do that but it's going to be um a little bit of a different way than maybe is perfect but that's why they call it fuzzy matching so we're going to use substring on this so I'm going to say substring oops let me that right so I'm going to say substring and we're going to go one three so starting at the first position and going forward with three and we're going to do the exact same thing on the oops subst string it be great if I could spell that correctly we're going to do the exact same thing so one and three so we are actually going to take this give me a second missed that we're going to take this up here and we're just going to go like that and I why did I copy it with the error okay so let's run this really quickly and as you can see it is now going to match all of them and you can do this on a lot of different things typically when I'm doing a fuzzy match like this I'm not just going to do it on a first name right because if every there can be a ton of people named JY you know we want to do it on uh and and real quick let me actually show you um what the originals looked like just to make sure I hit the the point across um and that is going to be first name and come all right so real quick let's actually look at this so it originally was Jimbo Pamela and Toby uh in this one was Jim Pam and Toby And so when we just took the first three because it was Jimbo it then becomes Jim it was Pamela it becomes Pam now it matches and so that's what that's kind of the example that we're going for like I was saying I typically will not just filter on a first name because there's going to be a ton of people named Alex or Jim or or or you know Henry or whatever you're going to do this on many different things so would be doing it on things like uh if I'm trying to do a fuzzy match on a person I do it on their gender to make sure that their gender is the same um and I wouldn't probably need to use a substring for that but just to kind of give you a little bit more information I need to do it on the last name um so I need to use that substring again and I would probably do it on the age oops the what am I doing come on the age and the date of birth okay so all of those things if you if you fuzzy match on the first name and the last name and then the gender the age and the date of birth are all the same then you can typically get a very high accuracy in matching people across tables whether or not you have you know this is an example if you don't have like an employee ID which is what we do have but take for example we were not given that uh this is a way to match them using substrings let's move on to Upper and lower all upper and lower is going to do is basically take all the characters in The the text and make them either upper or make them lower so it's very self-explanatory uh let me copy this up here and we will get going on this one uh let's just look at the first name um specifically we're going to be looking at Toby right here so let's do first name let's do uh lower and all we have to do is put in the column that we want to do so this is our original first name and it then takes every single uh string that is in here or every single I guess character and and it makes it lowercase that's all it does uh and it is the exact opposite when we do upper so we can now take take a look at this one and now everything's going to be capitalized so there is a lot that you can do with these string functions and this is not all the string functions that there are there are a lot more but I would say that these are the more popular more useful ones that I typically use on a regular basis and so I hope that this has been helpful I hope that you learned something from this if you did be sure to like And subscribe below I have a lot more videos coming out with tutorials on everything from SQL python Tableau and Excel thank you so much for joining me I appreciate it and I will see you in the next [Music] video what's going on everybody welcome back to another SQL tutorial today we are talking about stored procedures now what is a store procedure a store procedure is a group of SQL statements that has been created and then stored in that database a store procedure can accept input parameters and we will be looking at that today but that means that a single store procedure can be used over the network by several different users uh and we can all be using different input data a store procedure will also reduce Network traffic and increase the performance and lastly if we modify that store procedure everyone who uses that store procedure in the future will also get that update let's start writing out the store procedure so we can look at the syntax we'll start off very simple and then in the next one we'll get a little bit more complicated so the very first thing that you need to write is create and then procedure and after that you're going to name it so let's just call this one test and all you're going to say is as and then you're going to write your query and so let's just do select everything from employee demographics and that is it we have created our very first store procedure of course this is super super simple but let's execute this really quick and take a look at it so it says that the commands completed successfully let's go over to our SQL tutorial we're going to go over to programmability store procedures and it is not showing up there what we need to do is we need to refresh our store procedures we're just going to go right here we're going to click refresh and then there is our store procedure now how do you actually use the store procedure that we just created so let's go right down here and let's say x which means execute and then all we're going to say is test test and we're going to run this and there we go so all we put in this store procedure was a select statement and so when we actually Rebrand the store procedure it returned our select statement now let's go down here and we're going to make it a little bit more complicated we're going to do the exact same thing in create store procedure make sure I spelled that right and let's call this tempore employee so if you remember from a previous video we worked on temp tables and we created our temp tables then inserted data into that we are going to add that to this St procedure so we can see the difference between a simple query versus a little bit more complicated query so I'm going to say as and then I'm going to insert that in here now what this is doing is I'm creating a table and then right down here I inserting that table now if I create this store procedure and then execute it nothing is actually going to be returned it will insert the data into that temp table but since I don't have a select statement in this proced procedure nothing will be returned so let's write select everything and we'll just do from and this is temp employee and right here and so now let's create our store procedure so that created successfully let's refresh over here and let's execute this so let's just go down right here and say execute and it's going to be temp employee and now we will execute this and there is our output now really quick let's go into temp employee and we actually want to change this store procedure so we're going to go over to modify so when we modify it a few things are going to show up on your screen the first thing that you're going to see is it says use SQL tutorial so it's just specifying the database the next two things you may not be as familiar with it's set anzy nules and then set quoted identifier if you don't know what these are it's not super important the first one just talks about how it to deal with nulles when you're using the wear statement and then the quoted identifier just talks about how it uses quotes in the actual query itself again not super important but they have those automatically turned on let's go down a little bit further and we're going to look at the alter procedure so we created our store procedure but now we want to alter it so this is the alter procedure and we are going to add a parameter to this so what the parameter is going to allow us to do is when we're actually executing the store procedure we can specify an input into that store procedure so that we get a specific result back and I'm going to show you what I mean by that in just a second but let's actually add our input and we're going to say at we're going to say job title and we need to specify the data type that that is going to be so let's just say nvar 100 I know below it says varar 100 but that's um not extremely important so this is going to be our input so we need to go down here and say where job title is equal to at job title so when we actually are executing this and we say the job title is equal to let's say accountant this is going to become accountant and it's going to give us our results based off of it being an accountant so let's go over here and we are going to click this execute temp employee which we just modified and when we run it we're going to get an error because it is now expecting us to include our parameter of job title so what we need to do is we need to say at job title and let's say it's equal to a Salesman now let's try running this one and see what we get and so there is our output if we go back here I just wanted to show you really quick we do not have to put this job title right here you can put this anywhere in the query and use it however you want that's how parameters work and that's why parameters are so useful and you can use multiple parameters for one store procedure so you don't have to just limit yourself to one or none you can put as many as you really like so I hope that this video is helpful and that you understand store procedures just a little bit better thank you guys so much for watching I really appreciate it if you like this video be sure to like And subscribe below and I'll see you in the next [Music] video what's going on everybody welcome back to another SQL tutorial today we are going to be talking about subqueries now subqueries are often called inner queries or an nestic queries and they're basically a query within a query a subquery is used to return data that will be used in the main query or the outer query as a condition to specify the data that we want retrieved you can use subqueries almost anywhere you can use it in the select part of a query the from the where you can also use it in insert update and delete statements but in today's tutorial we're only going to be looking at the select the from in the Weare statements and you should get a pretty good idea of how to use it in those other statements all right now I'm going to paste on screen basically what we're going to be walking through today but really quick let's just take a look at the table that we're actually be working in and that is going to be from the employee salary table and I just want to show you the data that we're going to be working with before we actually get into it so we have an employee ID we have a job title and then we have a salary so really quick I'm going to show you what it looks like to have a subquery in the select statement so let's go down here really quick and what we're going to try to do is kind of do something like a Windows function but without actually having to do the windows function um and so we're going to do this with a subquery so I'm going to select and really quick actually let me copy this so we're going to do employee ID there we go we're going to do salary and now we can start building our subquery so we need to do an open parenthesis and I'm just going to copy this really quick because we're going to be doing it off of that table so we're going to say select and then I'll paste that and close it as well but what we want to do is we want to say average and salary now what this is going to do is it is literally going to run this and let's run this really quick it is going to run this and is going to show that the average salary for all the employees is 40 $ 7,99 so we are looking at the average salary for every employee so when we run this it is going to give us the employee ID the salary and then in the very last one is going to show the average salary for every employee now it doesn't have a column header so or or a column name so let's give it um let's say as all average salary and we'll run that one more time just to make it look a little prettier um you can also do this in Partition bu I'm going to Super quickly just really quickly write this out um it should take no time at all and then I'm going to show you why we can't do this without the subquery why you aren't able to do this with a group buy so really quickly let me copy this I'm going to put it right down here and we're going to say average salary whoops and we can get rid of all this and we can say over and we're not going to partition it by anything but let's run both these at the same time and you'll see that they're the exact same outputs and so it's just a different way of doing it in this example but it really is just to show a comparison of how you might be able to use a subquery in the select statement now you might be wondering why group I does not work for this uh really quickly I'm going to write this out and let's get rid of that and we'll say Group by whoops let me at least try to write it correctly Group by and we'll do employee ID and we also have to do salary and then we'll say order by one two so let's run this and as you can see since we have to use the group by it groups by both the ordered ID and the salary and so we're not going to be able to get that all average salary that we're looking for that we can get in the partition buy and also the subquery in the select statement now I'm going to show you the subquery in the from statement so let's just get rid of that really quick and let's say select everything let's say from and we're going to do an open parentheses here and here is where we're going to write our subquery so if you have watched previous videos where I've done uh tutorials on the CTE or tutorial on the temp tables this is one that is very much like those except I think a little bit less efficient when I'm doing something where I'm creating a table and then quering off off of it which is what we're about to do I much prefer a CTE or a temp table subqueries tend to be a little bit slow compared to a temp table or a CTE I tend to use temp tables a lot more because you can reuse them over and over whereas a subquery you cannot you have to write it out each time so really quickly I'm going to show you how it's done although I don't really recommend using this method really quickly let's go up here and let's steal this partition bu really quick this will be our subquery uh and let's paste this in here I'm going make this look a little nicer just so you can visualize it a little bit easier um so really quick what this is going to do is it is first going to run this and create this table again much like a temp table or a CTE so let's execute this really quick it's going to create this table and then it's going to allow us to query off of it so I can actually say um and let me give kind of kind of an alias to this a. employee ID and then let's say all average salary so now I can take um columns from this inner query if I want to and just select those or I can select everything and return that entire table again I much prefer a temp table or a CTE for this type of situation but as an example I just wanted to show you how it works now let's go down to the subquery in thewar statement but really quick I just steal this query so I don't have to rewrite everything and let's get rid of this really quick and add back the job title all right so let's look at this really quick so we have our table that we've been using our employee ID job title salary so for this example we only want to return employees if they're over the age of 30 and as you can see in this table there is no age column that is in the employee demographics table now if we wanted we could join to that table and get that information or we could use a subquery and so for this example we are going to be using a subquery so let's go right down here and say where employee ID is in and we'll do an open parentheses and now this is where we are going to build out the subquery so just for visual purposes I'm going to go right here I'm going to say select everything and we'll do from employee demographics and close the parenthesis so we're going to try to select something in this subquery that will then identify the employee IDs that are over the age of 30 so really quickly let's take a look at this table so right now we have the entire table selected so we have the employee ID first name last name age and gender so in this subquery the only thing that should be returned is the employee ID and in fact in your subquery you can only have one column selected so I can't select everything I have to specify one column and that's a little bit different than how we did it in in this from statement where we were basically able to select the entire table and then in the select statement specify what columns we wanted in the where statement we can't do that so we want to return the employee ID and we also want to say where the age is greater than 30 so let's run this really quick and see if it works as you can see in the results these are the employees who are over the age of 30 now if you wanted to display the age as a column in this output you would have to join to that table and then put that column or that field in the select statement but in a lot of situations you won't actually want or need to do that and so a subquery can be a really good option in these scenarios with that being said this is the last video in the advanced sequel tutorials I hope that this Series has been helpful and that you learned something along the way thank you so much for joining me I really appreciate it if you like this video be sure to like And subscribe below and I'll see you in the next video [Music] what is going on everybody welcome back to another video today we are starting our data analyst portfolio project [Music] series now before we jump into our first project I wanted to talk with you for just a second so that we're all on the same page first thing is that there are going to be four projects the first one is going to be SQL and we doing a lot of data exploration and we'll be setting up a lot of our data to visualize it in Tableau Tableau is going to be our second project in our third project again we're going back to SQL but we're going to be doing a lot more of the ETL process so a lot more of the data cleaning I did that one as the third project because I think it's going to be a little bit more advanced than this first project I tried to make it as beginner friendly as possible so even if you are a complete beginner as long as you've walked through uh you know the tutorial that I have made on my channel you should be pretty good and then the fourth and the final project will be with python we'll be using a lot of pandas doing a little bit of data cleaning and then doing visualizations as well as I said just a second ago I'm trying to make this as beginner friendly as I possibly can the whole point of the series is that if you are trying to apply for a data analyst job by the end of the series you should have an entire portfolio or at least a a really good start at a portfolio to show a potential employer I give you full permission to copy every script every query line for line if that is what you want to do and create your own portfolio I am totally fine with that but I will encourage you and I'm sure I'll say this throughout the video I encourage you to try to think of your own queries try to think of your own insights and your own things that you can do to make this portfolio project unique with that being said I'm super excited to get started on this with you guys so let's jump over to my screen and get started on our very first project all right so now that we are on my screen we are going to get started on this project we're going to download the data set we are going to format it just a little bit in Excel and then we're going to get into sequel where we will start querying it I will say that I think this is going to be a very long video I'm hoping to keep it under an hour and a half I may separate this into two videos depending on how long it runs um but you know I I will do my best to keep it short but we have a lot to get through I'm going to basically do no Cuts I'm I'm that's my goal is to do no cuts um in this because I want to walk you through each step of the process so that you understand everything that's going on and I I you don't get lost at some point um but I think this is probably the best way to do it we'll see uh the very first thing we're going to do is download our data set so you know as we're looking at this there's an option right here to download the data set I don't recommend that one um you can it just won't give you all the information that I personally want which is go back to like the very beginning um if you go down right here to the very first graph um you can actually push this back and then download it and what this will do is it will go back to I think January 1st of 2020 so let's open this one up um and when we get in here we're going to reformat it just a little bit it's nothing too complicated I hope um I'm just going to double click here actually let me let me go up here and filter just in case we want to filter anything so um what we have here is a ton of information on Co I mean just a ton and it goes back to early 2020 I believe it does go back to the first of 2020 so really quick a really brief introduction of what kind of data is in here we have total cases new cases um total deaths new deaths we use those quite a bit in the the queries that are coming up um if we go way over here we have total vaccin vaccinations people vaccinated um and then over here a little bit farther we have population that's the main stuff we're going to be working with today as you can see there's so many other things in here I mean you can use this if you want to go back and do more stuff on this I highly recommend it there's such you know there's so such unique data in here about smokers and diabetes and like all this random stuff that I did not do a deep dive in I mean I could I could spend you know a month just like looking at this data set and and getting really interesting stuff from it um but I'm not going to do that I wanted to do this faster than uh two months to to complete what we're going to do um is we're going to go back over here we're going to take this population and we're going to click on this as and we're going to click contrl X and that's going to cut it we're going to go back to the very beginning and we're going to place it right here and we're going to right click and say insert cut cells now why are we doing this because I've already done this entire project um and if you don't do this you're going to do a join with every single query you do which if you want to do that keep it there and then just you know change your query for for that I did it like this because I wanted to show joins later on I wanted to keep it kind of simple at the beginning um and then work my way to a little bit more advanced things which you will see um it gets you know semi Advanced but not too much I promise um just stick with me let's go back over here we're going to go to uh actually double A and then we're going to click control shift right key that's going to select everything over here and we're going to literally delete it okay this is going to be our first table over here so everything you see over here is our first table um and we're going to save that so let's save as I'm just going to keep it in my downloads as and let's do covid deaths so that has our death information the next one is going to include our um vaccination information which is what we're going to join on and then um we're going to do that later so let's let's hit contrl Z that's going to bring it back now let's select on Z and go all the way to e and we're going to do the same thing we're going to delete this looks like there's no data but I promise there is later on the vaccinations um like total vaccinations if we go down um you can see that that starts on in February the end very end of February in 2021 that's because vaccinations are you know didn't come out till recently now let's save this file and we're going to save as instead of covid deaths we'll do Co vaccinations all right now let's save that so now we have our two excels that we want we need to get them into SQL we're going to go over to SQL and we're going to create a portfolio project database I've already done this all you have to do though is rightclick click new database type in portfolio project and then click okay and it will create your database for you um if you open up the tables it should be empty and that's where we're going to put these two Excel files now uh I had a ton of trouble actually importing these excels um I mean I tried everything and I eventually just went down a rabbit hole of how to get these in I don't know if it's me or or what but I could not figure out how to do it if you go to portfolio project you hit tasks and you hit import data that may do it for you and it may work um it did not work for me uh it just it kept giving me errors so what I would recommend you do right off the bat just to make sure that we're doing the same thing um and you can do it that way if you want I went over here to start um again I'm on a Windows and I went down to Microsoft SQL Server 2019 and clicked Import and Export looks the same but for whatever reason it it all the research I did it has to do with the 32-bit versus the 64bit when you do it this way it goes to the 64-bit and it is able to import the data if you do it the other way it was doing it the 32-bit version and gives you an error I don't understand it don't ask me that's that's the re that's I mean I went down a huge rabbit hole but this one works so let's go over here and this is going to be our data source where is the data coming from it's an Excel file so let's do that let's browse and let's go over to my downloads I thought I saved it in downloads uh maybe because it's an Excel workbook what was I saving before Oh that's a CSV okay something important to note is we're doing an Excel and not a CSV you're going to get the same error I'm just doing it live and I'm making myself look stupid so um we're going to save it but instead of a CSV we're going to save it as an Excel workbook so let's save that um now we have to go back to how it was right here um the same way and we're going to file save as and let's do this is now covid deaths and save it as a workbook now we have them now let's go back um now we have our covid deaths and our covid vaccinations let's do our deaths first um let me get back right here so it looks kind of more normal um so we have our Excel file we have our covid deaths let's go next and now we have to say where we're going to place it where's our destination so we're going to click over here and go down to SQL Server native client 11.0 I want to say this is something that I messed up and it took me like 45 minutes to figure out it was the stupidest mistake um it's gonna autop populate a server name and I never checked to confirm that this was my server name and so I couldn't figure out why I wasn't able to insert this into my portfolio project uh database that's because mine is 01 I created two different servers um intentionally and for whatever reason I forgot that and so all I have to do is add 01 over here so just make sure yours is is the same thing click portfolio project click next yes we're want to copy the data should autop populate if it doesn't if it gives you like multiple you can always uh check mark on the one that you think is the right one it should be the first one we'll click next we'll just click finish I'm sure it says run immediately we'll click finish and finish now while this is running um there should be around 89,000 that's how it was like a week ago when I started it maybe a little more now because there's extra days um with that being said you know there's going to be a good siiz amount of data um we're about to do a lot of different things we're going to start at the very basics of just like queer quering the table like super simple um and then we're going to go into things like joins ctes temp tables creating views um I the whole purpose of what we're about to do is not to it's not to keep it too simple um I want to showcase to a potential employer right that you can do more advanced Advanced things so I'm going to probably do I mean I'm I'm looking at because I have already done this entire project individually I mean we've probably got like 15 to 20 queries here you don't have to do all of them um I'm going to walk through all of them and you can choose which ones you want but you don't have to do all them it is quite a few so just know that so there's 85,000 right here that's fantastic uh it won't show up immediately you need to refresh it uh and there we go so that's our covid vaccinations U let's get rid of this so we just have Co vaccinations um I thought that was our covid deaths one but maybe I'm wrong um but let's do the exact same thing down here and we will import and say next we're going to go down to Excel and browse and now we want to do the covid deaths apparently last time we did the vaccinations which um I actually actually you know what I bet what it did was it took yeah it took this right here as Co vaccinations but that was the deaths one as it saved so uh forget that let's go right here let's do the co vaccinations it just has the same sheet name uh so sorry for the confusion destination is going to be the exact same place it's going to be SQL Server native client let's add that 01 and let's click refresh portfolio project next next um like I said before if it does this just click the first one it's going to be Co vaccinations it did that for the covid deaths that's because I made the mistake earlier I hope you I hope when you're watching this you aren't super confused um the whole point make two tables or make two excels one should be covid deaths one should be Co vaccinations upload them and then rename them in a nutshell U so we have the same amount uh let's refresh this this one is actually the co vaccinations this one is covid deaths I'm telling you this stuff is it's confuses me sometimes to be honest um but we're going to query this really quick to make sure we act are actually doing um what we're supposed to be doing so let's do select everything from um and let's do portfolio project and you can do dbo or you can do dot dot I tend to just do that because it's easier um let's look at this one make sure it's the right table so we have total cases new cases perfect um and let's order on let's do three comma 4 just to make sure or order by of course just to make sure that we have all everything that we're looking for so this looks right this looks like our Excel let's copy this let's go down here we're going to do covid vaccinations and let's run this one make sure the second one came in correctly as well so perfect so we have our two tables this is fantastic news um and now we can get going um we can keep this one I'm GNA comment it out in case you know we want to come back to it um I'm going to really quick again right here I have another laptop I have already done this whole project so I'm just using it as a guideline to know kind of what I'm doing next so that I don't waste everyone's time um so really quickly let's just let's select the data that we are going to be using you don't have to use these comments I will say that I'm going to specify I'm going to say hey this comment is something I would keep in your portfolio project I'm going to add a bunch of extra stuff that is not needed um just for your purpose but when you are creating your portfolio project you shouldn't be adding some of the things that I'm going to be commenting um on so we're going to do um or actually let's do really quick let's copy this so that it kind of knows what we're doing so let's select the location uh the date the to total cases the new cases the [Music] total deaths and then population uh now where we're at I'm going to turn off my camera because it's going to get it's going to start getting in the way to be honest I don't want it to interfere with your ability to see what we're doing on screen so it's been great seeing you guys I'm going to turn this off and we will continue from here all right that should be turned off so let's keep running so this is what we're doing let's actually let's keep this going because I I don't like things not being organized um so we have our location oh no we want to do one two we want to do it based off the location and the date makes things everything easier I promise you so we're going to be the first one's obviously Afghanistan here's our date we have our total cases are new cases total deaths and population so really quick I'm just going to scroll down just a second um they started having you know the the total deaths it's um it started about a month after they got their first case it looks like so and then it just like ramps up a lot um and we're going to be diving into all these numbers what they mean how to you can do some really simple calculations on them um but really quickly we're just going to do again a super simple calculation um and one that we do multiple times for different things um so let's go right down here and let's say uh we're going to be looking at the total cases versus total deaths so how many cases are there in this country and then how many deaths do they have per um uh you know how many deaths they have for their entire cases so let's say they have a thousand people who H who've been diagnosed they had 10 people who died what's the percentage of people who died who had um who had it so uh let's go right down here and we're gonna I'm just going to copy this really quick this just going to make our life easier I think you should do the same as well um so we have location date total cases um and we're going to get rid of our new cases we don't need that one in this query right here uh nor do you need this population so let's work on our calculation really quick it should be super super easy let me make sure I'm still recording perfect oh man we're 25 almost 25 minutes in um or more because I have the intro so now we're going to do uh we want to know the percentage of people who are dying who actually get infected or or or or who um report being infected so we're going to do um total underscore deaths we'll go right down here and we're going to divide that by the total cases total cases and if we do this really quick um what it's going to have and well let's go down to where there's actually numbers so we have 34 we have one um it's it's showing 0.029% if you ever try to get a percentage of something you have to multiply times 8 100 um so let's do that really quick all we have to add is the what's that the asteris sign um times 100 um and while we're here let's just add the um what's it called Alias Let's do let's call this death percentage I don't know that that works for me and let's take a look at this it'll be a little bit more accurate accurate so when there were 34 there was one and that gives gives us a 2.94% death rate and we can go down even further um and this is still all Afghanistan let's go down to the very bottom let's go down to the very very bottom so as of as of today yesterday there were 59745 total cases in Afghanistan and there were 20 2,625 deaths which is 4% so you have a 4% chance basically right now of dying I mean if if you want to look at it like that 4% chance of dying if you get it and you live in Afghanistan um let's I mean we you don't have to but really quick just to look at it further let's look at where the location um I think it's let's say like real quick because I'm not 100% if it's States um it should I think it's United States but yeah so I mean I live in the United States if you don't you can look at your country but um you know we we this is like this is genuine real reported data so it's really interesting um right at the beginning I mean the I don't know if it was the way we were reporting or what but we had really high percentage rates um as we go down we're looking at a 5% 6% I mean this was the peak of it this got really bad in the US um maybe get I hope it gets better um how many are we at this is I'm going to go to the end of this year we sitting at around 2 to 3% um um yeah it goes down to under 2% so at the end of at the end of the year we were looking at over 2 million people that's 2 million no wait 20 million 9363 wait wait wait 20 million people who have been infected um that's a lot that's a lot of 20 million people who have had it 35,000 or 352,000 deaths by the end of the year that's a lot um let's keep going um and at the very end we had over 32 m346fa um there's a lot of deaths 576,000 and I verified this number um I Googled it Google knows all I googled this number and it's pretty accurate um and it's really sad that's a lot of lot of lives um and that's 1.78% so as of right now if you're were to get it today a estimate is around one uh and three fourest to 2% chance that you're that you could die from it um so really interesting numbers this is the kind of exploratory stuff that that you know we're going to be doing we're going to get a lot more advanced as we go on but this shows you know the likelihood um and we can I'm going to write that shows the likely I hope I'm spelling this right I'm not spelling this right likelihood I hope that's right if this not I apologize likelihood of dying if you contract uh covid in your country um again rough estimates but you know just glancing at the data that's kind of what we're looking at um now we're going to look at and let's go down here let's look at looking at the total cases versus the population again we're going to do a lot of this like percentage stuff um it it's pretty simple um that will only last for so long I promise you but it'll be really I'm going to keep it on the states just because um I'm going to be looking at that one the most because obviously it's pretty relevant to me um so if you're in another country filter by your country you'll be really interested in the stats I I know I was really really really um shocked by a lot of the things that we're going to find today so we're going to keep the location we're going to we're going to keep the date keep the total cases um but let's change this to population and then instead of um the total cases being here we're going to put the total cases there and then change this to population so what is this going to do for us this is going to show us what percentage of the population has gotten covid so shows what percentage of population oops got covid um some of these things again they're they're good to know um the one that I upload to GitHub will have the notes that I recommend keeping um again not everything in here is um not everything in here is what you know you need to have in there this is mostly just you know what I think you guys need to see while we're actually typing this out all right so let's take a look at this um actually I want to change this I want to put this right here just as easier for me visually um just for because the total cases right here so our our population in the US is around 331 million um so at the beginning when we had one case I mean it's like nothing let's keep scrolling um and see where we get to 1% so 1% that's 3,311 32 uh people and that happened in what is that August August of last year so 1% of the population let's keep going all the way down again we're just kind of glancing at this we're about 10% um again we're at the that 32 million so 10% of the population has has gotten it gotten a test and it's been confirmed so really interesting um you know we'll come back to that one I'm sure in the future I you know we might make we might use this one as like um a visualization again uh I'm only looking at the states or United States right now but you know think about it in terms of how we're going to visualize this in the future cuz a lot of what we're doing we're going to visualize in the future um in Tableau I have Tableau even open right here you can see I have a map um this is just a super I threw this together in like two seconds um we have the uh we have the location and so you know this is like our future this is what you need to be envisioning when you're looking at this data so we have you know Afghanistan and let's just scroll through bellaro and Bolivia and Bulgaria and cambod all the every single country um that that is reporting so we're just looking at the states but remember all of these are going to be used so just something to remember um I want to know and I'm really curious as to what countries have the highest um infection rates compared to the population so we're just looking at our population um up here um how are we going to do this we'll do actually let me say well let me write it out really quick so let's look looking at countries with highest infection rate compared to population so that's what this script is going to do or this query is going to do I'm going to copy this um so we're going to keep the location we are not going to keep the date this is not going to be date specific this just going to be overall and then we're going to look at the max of the total cases so we only want to look at the highest so when when we were looking at the us we had 32 million we don't want to look at every single Pop um uh of the total cases we only look at the very highest one so we'll look at the Max total cases um and let's right here we'll just say give it an alias at least something to recognize it so highest U I guess we can say infection count so we'll say highest infection count that's the highest infection count per country um so per location um and then we want to also take because it's going to it's not going since we don't have Max total cases here if we just kept total cases here it'll give us the same one that we were looking at in this above query what we need to do is we need to look at the max of this um so we're going to look at Max and just add a parentheses there um and we'll look at this isn't the death percentage anymore I forgot to change it in this last one this is is what is this it's percent of population infected so let's change that for both of these because I don't want to get confused when you're looking at the column headers later um so we'll look at the percent of population infected let's run this and see what we get uh list is not contained in either the aggregate oh I need to add a group ey of course um so let's add Group by um and we need to group by both the population and the location so let's try that really quick let's see if this works awesome um well we ordered on location and population but I really want to look at the highest um so let's so let's just see really quick look at some of these numbers got like 1% 4% um 10% okay so yeah yeah what we want to do is order on um this percent population infected so let's go ahead and do that uh and let's do that descending so the descending gets the highest number first um my goodness 177% so what percentage of your population has gotten covid it's been reported and and and um we can see that now so the very first one small population so it doesn't surprise me but if you look right down here here so that's that 32 million that we were talking about that's that Max of total cases um which is the the highest number of our infection count so we have 33 so we're at I mean we're we're right up there on the list let's look for other large countries I mean it's us you know there's Israel there's Belgium Portugal France so you know we're up almost to about 10% in a lot of these countries so some some of us including the United States we are we are in there as well some of us has have really high percentage rates we just did not keep it under control um and you know a large amount of the population has gotten it that's what this one shows um now let's look uh kind of at the sad side of things we were just looking at how many people were infected let's look at how many people actually died um so let's do let's comment and we'll say this is going to this is showing the countries with the let's do highest high am I spelling that right yeah highest death count per population um now how are we going to do this let's copy this off the bat but I don't know if we're going to do it the exact same way because we just need location um and not much else honestly so let's get rid of all this stuff but we do need we're looking at the highest death count so like we did up here with the Max total cases we're going to do Max and then we'll do total deaths I hope it's like this total deaths um and then we'll do as total oops total death count um and we'll order that by the total death count see I don't need this I think yeah I need to group by because there's an aggregate function and let's try this really quick okay so if you're getting this there's a there's a simple slash confusing explanation to this total deaths right now let's go into our covid deaths columns okay let's show the total deaths which is right here it's an nvar chart 255 it's an issue with the data type um oh wait total deaths no total deaths right here it's an issue with the data type um it just has to do with how the data type is read when you use this aggregate function we need to convert it um or cast it is what we're actually do we need to cast this as an integer so that's red as a numeric um why I cannot 100% give you a perfect explanation for it but this happens all the time you just need to look at the data and realize oh it's probably because of this data type let's try something else um and then it'll work so let's cast this and we're in casting it I find is just easier but just as int boom there you go so now we're taking this nvar chart 255 over here and then we are converting it to an integer now let's run this um and let's get rid of this just for visual visual purposes now we are much more accurate but we have a slight issue or we're we're now seeing a slight issue with our data in our data in the location section we have a few ones that really shouldn't be there ones like world or Africa um or South America these are grouping entire continents so let's go back up to our um let's go back up here and let's do actually let's pull it up really quick because this is just part of exploring the data and figuring it out so if we scroll down um we're going to f we're going to see one like right where is it right here this this location is all of Asia whereas in other ones the continent is Asia if I can pull one up real quick so like right here the continent is Asia whereas before the location is Asia but if you also notice um the continent is null here so what we need to do is say um uh where continent is not null because when it is null that means that this location is actually an entire continent and we don't want that um that may be helpful for us um later on but it is not helpful now so now this right here will get rid of that um and just knowing that figuring that out now we can add that to every every script um and we can do you know you don't have to do this I'm just doing this for you know visual purposes I'm not going to do that for everyone um so let's say where continent is not null and now let's look at this and now you can see that the United States is number one and so number one is not the best thing to be number one in but we have a death count of 576,000 and again I I googled this earlier these numbers are pretty accurate there some of them are like a day or two behind give me a second I'm going to take a water they're like a couple days behind um this number is actually higher um and as you know as we continue to have more people die unfortunately that number just continues to go up um so the data that that you download may be a a lot higher um as of right now we've been breaking everything out by location right really quickly let's just do this by something we kind of saw earlier um and I'm just going to do this for breaking it up purposes but I'm going to say I'm going do caps lock let's break things down by continent how SP continent Contin jeez is that even how you spell it I don't even know let's keep going um but now we can do consonant right here and we'll just copy and paste that let's get that back up here um and now we can see where continent is not null let's see if that makes that yeah okay so now it's breaking it out by continents um with North America South America Asia Europe Africa Oceana is this perfect no no it's not perfect um North America looks like it's only including the numbers from the United States and not Canada um so we have some small issues in here um but for the purposes of what we're trying to do which I don't think anyone's going going to come in here and fact check us or check the data they may and then you're I don't know you might be screwed but for the purposes of hierarchy um and you know drill that drill down effect in Tableau which is something we are going to do we want want to start including this continent in our in our queries so that we can drill down um further into these things um we can also do where just wait I'm going to do where isnull um actually let me see so before we were doing work continent is not null but let's do location I'm just I I'm doing this on the Fly I haven't done this before I just kind of am doing this um this actually is the correct numbers and I don't know why I didn't do this before when I was actually creating this project but now this is a wonderful beautiful thing I believe this is the correct numbers um I could verify but I don't want to do that live because I I might look stupid but I think this is accurate um remember before we were looking at the location and the location um and it was actually the countries itself and then there were ones where we did where is notnull to get rid of all the ones that were like world and all those other things well now I'm just filtering on those instead of deleting them before we were looking at everything but these now we're only looking at these and these numbers look a lot more accurate so with that being said um I'm going to use this going forward in my script so I'm going to kind of change things up to where from what I originally had um let me see though because if that is the case it may screw up our drill down effect um which is highly unfortunate I may I I honestly might just revert back to it for the pure fact that we want the visualizations to look correct um just know that this is the right way and if you want to go back and do that I highly encourage that I didn't figure that out my first time around um but I'm willing to admit when I'm wrong let me see what let me do a time check all we're run like 50 minutes or so I think we're gonna we're just going to keep going all the way through I I I don't think we're going to stop um I don't think we're going to stop in this project so we want to do some of the the above queries were kind of what we were going for nothing crazy difficult right nothing crazy hard um and now we want to we want to start breaking this out by um continent as well I'm I'm going to go back and is this correct let me look no so is not no um so we want to start doing some of the above queries but adding that content in there you can even go back and add that as well um if you want to that's totally fine I'm going to do some more queries down here um or at least one more one or two more and then we're going to start getting I think into some a little bit more advanced things we're going to start getting into some temp tables uh stuff like that because we're going to eventually set these up in um views so that we have these views to um use for Tableau later um and again it shows you know how to create a view so that's important so we we've we've done this first one this next one is going to let me go down one more this is showing the continents with the highest death count so almost the exact same as we did before but now we're looking at the continents um we can even go up and look at uh just wait we literally just did that um so that's what this one is actually looking at my notes wrong idiot okay perfect um now we you know we want to start looking at this from a Viewpoint of I'm going to visualize this so how do we do that what we want to look at let's look at some Global numbers um you can do as many many of these as you want anything up here just add continent to it um anything what like groupy just replace it with continent and you and you got it um so I don't want to go through and do every single one of those but that is kind of the gist of what you might want to do especially if you want that drill down effect and if you don't know what that is um you know it's like clicking on North America and then when you bring up North America then it shows all the countries in North America so Canada uh and the United States and so it's a drill down so you like on Africa and then there's all the African countries that's what drilling down does and that's what you can do when you have um those layers so you have the continent then you have the location um so you know I'm not going to we we'll look at that when we actually get to Tableau but I don't want to actually spend all the time writing that out um but what we now want to do is we want to calculate everything for the across the entire world so let's do this let's say um breaking let's do Global let's just say global global numbers easier easier than nothing um all right uh I let me really quick find the I think it's probably the first one the death percentage let me let me see if this is the one that we want [Music] okay let me see all right so let's take this one I'm sorry that took me a while to find again I'm not cutting any of this stuff out you just got to stick with me you if you're sticking with me this long I know you care I know you're not you're not cutting away because I'm trying to figure things out on my side so um let me get rid of this so this is the exact same SC what well let's say where just so we can get the right numbers um so we are now going to look at the global numbers uh so we're not going to we're not going to uh include any location any continent or anything like that but we do want to make sure that we're only looking at all of the um countries and we're not looking at the world numbers plus all the countries because then the numbers would get astronomical so instead of now now we can't do so let's try running this really quick so now we really can't do this um because now it's breaking everything out by um by you know that uh which is the date it's breaking everything out by the date because um these total case the numbers are different right so really quick let's Group by date and now let's see what it looks like uh it's going to give us an error obviously that's because we're looking at um that's because when we're looking at this we're looking at multiple things and we can't Group by just the dates obviously if we wanted to group by something which we need to do we then need to um start using aggregate functions on everything else um so really quickly let's do some aggregate functions I'm looking at my notes for just a second um to see what I did basically what we want to do and I think what'll make things easier is I mean I could try to do the sum of Max total cases I don't think that's possible um let me comment this out really quick yeah um it's because there's an aggregate function within an aggregate fun function and we can't really do that um if we go back to the data and you we kind of looked at this earlier there's one called new cases um let's use this because instead of doing Max we can just sum it or or or do a sum on it and that's going to give us the sum of all the new cases which adds up to the total cases so if we do this let's see this will give us on each day the total across the world because we're not filtering by any continent or or we're filtering out um like the world and in the actual continents we're not filtering by location or continent or anything it's just by date so we're looking at the sum of the new cases so now let's do uh let's do the [Music] sum of uh new underscore deaths and we can run that one um operating data type and our chart is invalid for the some operator so going back um and this is something I encountered a lot when I was doing this is these new cases is a float which is why it's working in the sum but the new deaths is an narar so what we need to do again is cast that as an integer it's just the easiest thing to do um and now that one should work so um let's get rid of the well let's get rid of down to here so we're we're about to do another one and that's going to be our death percentage globally across um across the I guess the world so we need to do the sum of I think it's we need to do new deaths all right divided by the sum of new [Music] cases all right times 100 let's see what this takes us um okay of course we're getting the same thing let me um let me put this right here and see if this works um invalid data oh that's because this was new cases the new deaths one is right here and let's run this and now we are looking good um and as you can see the death percentage is right here we have 91 um and let me give these I don't we can't let me go back real quick and just say as total cases as as total deaths um and let's run that again okay and so across the world these are our numbers so we have total cases on that very first day that cases were starting to be reported there were 98 total cases there was one total deaths that gives us a death percentage of 1% across the country or across the world and as we scroll down it gets lower and lower and that's cuz we have a lot of people who have gotten infected are the total cases um and again that's per day right so if we remove this all together that date Al together which we can do right now this will uh this will give us the total cases which is oh gosh let me read this through one two 150 million um versus 3,180 26 so overall across the world we are looking at a um a death percentage of a little over 2% so interesting numbers you can keep both of those queries separate if you'd like um you know they might come in handy later but let's do this so we have um give me one second check on my notes again because I just want to make sure I'm not doing something stupid all right all right so again we have a whole another table that we haven't used yet uh it's this covid vaccinations um and just to you know refresh your memory let's do um let's look at the table from portfolio project. Co vaccinations let's jog our memory on what we got here so we have um we have these tests we have um vaccinations over here which was what we're actually going to be using um excuse me me uh that's what we are going to be using so let's join these two tables together uh and let's let's actually just do from actually let's just do this whole thing from let's do covid deaths and here's how we're going to join it so we're going to say join and we're going to say oops wait that is wrong join and we're going to say on so what are we going to join them on um we're going to join them on two things we're going to join them on location because that's much more specific than the continent we're going to join them on location and we're going to join them on date let's call this one DEA let's call this one vaccination so a little Alias for these so that we don't have to type out this entire table name each time so let's do dea. location is equal to vac. location and da do and we'll say date is equal to vac. date and let's just see what we get really quick so we'll have all of these things and let's look at Granada 0717 let's go all the way over here and it should have Gren 0717 so just making sure that they were joined correctly um for this query what we're going to do is look at the total population and let's do that right here so looking at total population versus vaccination so how many PE what is the total amount of people in the world that have been vaccinated that is that is what we're going to do in this query right here so let's do dea. continent location uh da. date again these are going to be the same in either one but we have to specify um let me just for example if we do population population oh actually that's a terrible example um because population's only in one let me go back real quick let me say I only write date that's going to give me an error because there's date in both of them in fact we joined it on them so we know there's date in both of them so it's going to give us an error we just have to specify what table we want to pull it from so we going to do DEA um and da. population just to keep it consistent um and now we're going to add the next one da do and let's do new vaccinations um and really quick let's just look at this um and let me get my orders cu I want it to be organized I I actually one let's do one two three I don't like it when it's not organized it bothers me so we're looking at oh no I also need to add or consonant is not [Music] null there we go uh da perfect now let's run this this should look much better there we go all right we are in fact if we want to look at Afghanistan like we have normally been doing um in previous ones we do two slash3 so there's our population here's our new vaccinations now let's see we're going to go back go down and let's see they have vaccinations starting on 218 um if we go even further down let's just go to who's this Canada oh yeah Canada would be a good one to look at they started doing vaccinations on right here so 12:15 I mean they started very early and their numbers only increased and now they're you know doing this is per day right so this is 288,000 in one day um so that's you know really high numbers but this is the number of new vaccinations um there is a column called total vaccinations in this table but we're going to do something pretty just to display again this whole portfolio project is to show potential employers that you know how to do certain things so I want to set up opportunities to do that we're not going to use the total vaccinations we're going to use this new vaccinations which is new vaccinations per day um so we want to we want to know or do kind of like a rolling count um out here so as this number let me go back to the beginning as this number increases 718 2300 4179 we want it to add up over here it's a pretty cool thing I mean you know it's once you see it you'll be like oh that's pretty easy but you know we're going to be using partition bu we're going to be using um uh this a Windows function so it's really good to to Showcase I think so we're going to do um and let's do um we need to do the sum because we're going to be adding these together so we need to do the sum of new vaccinations oops do the sum of new vaccinations let's do over and we're going to say partition oh gosh Partition by and we need to Partition by the location first and foremost because we're breaking it up by if we do it by continent the numbers are going to be completely off we need to do it by location location and and also partly the date but you'll see that in just a second but we need to partition it by breaking it up by um location and why is that because every time it gets to a new location we want the count to start over we we don't want this aggregate function to just keep running and running running it'll ruin all of our numbers we only want the this part a partition on the the location so that it runs only through Canada and then when it gets to the next country it doesn't keep going um and if we only did that by the way let's look at what this looks like uh okay real quick I need to cast this um as an integer like we've been doing in the past you can also do um real quick I want to show you another one convert and I think it's comma [Music] integer um or is it integer comma let me try integer comma I think it's that way actually um and you can do it this way as well that is up to you um you know either one is totally fine if you want to use both that's even better because then it kind of shows you can do both um but they basically do the exact same thing so let's go down and let's see what what's happening here so it goes down to Albania and since we're partitioning on Albania Albania their total amount of vaccinations is 347,000 I know that going into it because it has it on every single stinking row but down here they started to add they started to add up right but we didn't do that we only partitioned on location so it added it did the sum of all the new vaccinations by that location so what we need to do is go over here and say order by and we need to order it by both the location oops da. location and the date that is very important uh the date is what's going to separate it out um and you'll see in just a second what I mean so now let's run this and let's go back down to Albania I think it was so here's Albania let's go to our first one so here's what we have we have 60 and it gives us 60 then we add 78 so we add 60 + 78 = 138 then 78 + 1 78 sorry 60 + 78 + 42 = 180 then 60 + 78+ 142 + 61 241 so you get the point it adds up every single uh consecutive one and when there's nulls or there's zeros it's going to uh not anything it's just going to keep it uh going and then you can see as it's it's a rolling count so we're going to name this let's do as um let's do as um rolling people vaccinated let's call that um I think that's good now what we want to do is actually look at the total population versus the vaccinations um and really what we want to do is use this rolling people vaccinated we want to use the max number because at the very bottom is our Max number this is how many people in Albania um we want to use that number and divide it by the population to know how many people in that country are vaccinated so what we want to do is we'll do this we'll do rolling people vaccinated divided by population times 100 and as you can see we're getting an error you can't use a column that you just created to then use the next one so what we need to do is we need to create either a CTE or a temp table um this is at this is the time of of the show of this tutorial whatever you want to call it where I'm going to give you some options you can do one you can do both you know there's no preference to me um but we're going to take this and we're going to at least for this first one we're going to use a CT so we're going to say excuse me we're going to say with and let's call it um pop vers vac I don't know population versus vaccination and then all we need to do is specify the um basically the columns that we're going to input um so let's put as and let's insert that down here because what we need to do is we want to say um we do continent oh gosh I'm so bad at spelling continent uh location date population um and then we'll have this rolling people vaccinated that should be it um and let's see if there's we just need to close this parentheses so this is our CTE it should be working um actually that's not true I need an open parenthesis here that's why it's giving me that error um let's see it's I'm still getting an error so let me see if I'm doing something wrong do I have this in parentheses there and there I say with pop back there continent location date population ah I believe that is the issue so then we need we just need to add that last column new vaccinations um if the number of columns in the CTE is different than the number of columns here it's going to give you an error so you got to make sure um and then let's just say for real for right now select everything from and we'll do and we can even say pop versus vag it'll come up right away so really quickly let's run this and see what happens uh the order by Clause can't be in there I knew that but whoops let's comment that out let's get that all the way up here let's run this so now that query that we were looking at before is now in here but now we can actually use it to perform further calculations um so we'll just do everything comma and then we'll do rolling people vaccinated uh divided by and that needs to be population time 100 I'm pretty sure this is incorrect give me me a second um invalid object oh it's because I have to run it with the CTE my bad um so let's look at this percentage really quick um it's not wrong and it's actually going to give us a rolling number and this may actually be what we want um so basically what it's doing is it's taking this column and doing it versus this column and so this number should only increase because as this number increases this number will increase because the population stays stagnant um again I'm kind of looking at this as we go so right now 12% of the population in um Albania is vaccinated so that you know that is that's all we know I don't think we need to go any further than that I think um if you want to you can look at the max one um but you'll have to get rid of date and just keep the location um population Etc because the date is going to throw everything off so if that's something you want to do absolutely do that um you can use a temp table here uh we can look at how to do that really quickly I think um so that you guys know how to do that again I recommend throwing in one or two of these um like even up here you can do different um different counts and then do one for each um so let's do temp table all right so it's going to be a lot of the same stuff we're going to keep this and this is going to be what we insert so let's say insert into and we need to write where we're inserting it into but let's say uh again I'm only doing this for it's going to be basically the same it's going to have the same effect but um with a temp table so uh we're going to do temp table and let's look at um let's say let's call percent population vaccinated and we need to specify our columns so let's go down here excuse me let's go down here and let's do the basically the exact same thing so continent I think I spelled that right no I didn't spell that right I almost did I got really confident we'll do we and and just so you know for these we have to specify the data type as well um because we're basically creating like a genuine table is just a temporary one so let's do invar Char 255 we'll do um location we'll do the same thing and barar oops 255 we need to do date and we'll do that as date time we'll do population and we can do I mean there's lots of different ones we can do but we'll do numeric for this example there's new uncore vaccinations and let's do that one as numeric again you can use different things um and then we'll do rolling people vaccinated Um this can can be numeric as well um and then we need to insert that into here okay so we're inserting the data and then down here we can actually select it and let's let's take this and do right here except we're going to be doing this by this right here but it hasn't been created yet but it will be created in just a second okay so you let me see if yeah so these were the rows that were affected um and and then we got our actual output from this right here now let's say you wanted to change something in here you're like oh you know I I don't want to do it we this let me comment that out and then let me do this and um create that table again oh no we got an error um how can we get around this very simple I've done this in a I should do this in a different one you can do drop table if exist and then do this right here um and when we run this it should give us our output I highly recommend just adding this especially if you plan on making any alterations so that when you um run it multiple times you don't have to you know go and then delete the view or or delete the temp table or drop temp table or you know it's just built in it's at the top it's easy to maintain and it looks good it's it's something that that a lot of people do and so if you have that at the top of your query and somebody you know somebody who wants to hire you looks at this like oh okay that makes sense I'm glad they included that they know what they're doing this guy's smart I should hire them um now what we're going to do is uh I feel like I've showed you as much as I can show you um with the limited data that we've looked at again I could have done this for six hours straight if I had used all the data at least I mean there's just so much data but let's create a view you know I'm only going to show you how to create one view but I want you to go back and create multiple views you know if this is one that you want to look at these Global numbers um let's look at this one really quick if you want to look at this number right here toss it in a view I mean that one doesn't make sense to toss in a view but this one toss these numbers in a view um and we're we're going to um look at it in Tableau later but for right now let's just create our view um so like let's just say creating view to store data for later visualizations all right so let's say create view um and I want I'm just going to keep the same thing um like that um and for views it's so easy I mean I'm literally just going to and I can even take um the order by I believe we'll see if I'm correct um actually let's get rid of both of these things so it says create view percent uh percent populate oops percent population vaccinated um and let's see am I doing anything wrong [Music] here let me see the order by clause I was completely wrong I was wondering why I was getting that now let's try running it okay so it ran successfully um let's look at our views it's not going to be in there let's refresh it hey look we got our very first view we can open that up like a table if we want to um isn't it's I mean it's gorgeous um if you want to get rid of that select or sorry control shift R that's a refresh um and now it it basically recognized is it but let's go back here for a second um and you know we can now query off of that it's a view now so you know it's it's something that you can it's permanent you know you have to go in and actually delete it's not like a temp table this is now permanent and this could be something that we now use for a visualization later so do some of these look at some of the queries that we've looked at and create a few of these views um and we will use them later um normally in like a normal setting uh if I was actually working I would put some of these in actual like I would call them like a work view or a work table or something set aside so that I can use them consistently um but I would also set them aside so that I could connect Tableau to that view now we're going to be using something called Tableau public that'll be in the very next tutorial unfortunately um let me see if I can show you I can't show you Tableau public does not connect to SQL databases um and that's because it's free and I totally get it you have to pay for the upgraded version but I am not a a billionaire okay I cannot afford uh the real version of Tableau I'm also not like a student or or like something where I can get it cheap so I'm not paying for that so we're going to use Tableau public and and I recommend this anyways because anybody can access it it's it's free for anybody so we're going to be using Tableau in the next one to actually visualize a lot of these things I want to get at least five visualizations we're going to create a dashboard it's going to be a beautiful beautiful thing all right so the very last thing that we are going to do is we are going to actually save this and then put it into GitHub and I just want to show you how to do that that's where we're going to be storing our code at least for now um so let's go up here let's click file let's click save as I've already have multiple versions of this let's just put B2 we're going to save that so we have this saved now I'm going to go over here I'm going to go to my GitHub now if you don't have an account I highly recommend getting an account so you can start putting your portfolio projects in here of course we're not going to put our Tableau one in here but our SQL ones and our python ones you can put in here again I'll talk a lot more about how we actually want to display this in GitHub or other places but what we're going to do for this is we're going to create a new repository let's call this one portfolio projects make it public we'll create the repository we'll do all that extra stuff later so what we now want to do is upload an existing file we'll click right there go to choose files and we'll click this latest one that we saved uh and we'll open it and we can always change the name of it later on and you can add notes if you'd like but we'll commit that change so we'll actually upload this uh this file um but let's look at it really quick and I'm going to go back and I'm going to use the real one where has the formatting and and the notes that I have that I wanted to add in there but as you can see you know you can see all of the queries that we wrote and this is fantastic so if somebody comes in here you know we'll have more notes and kind of better comments on what they do um and what the takeaway is this from for a hiring manager to you know when they actually look at this so this is a really really good place to start again uh this may not be your optimal place to put this I'll give you a few different options in a later video about how we can actually uh potentially improve upon this I'm really looking forward to getting more portfolio projects done so we can actually start building a compl complete portfolio uh if you've stuck around all this way I just want to say congratulations I mean I know this was a long video I know that it took a long time but you stuck with me uh you you put in the hard work and that is fantastic and I really hope that it pays off and I hope that this has been helpful thank you for watching we'll have a lot more uh videos in the future on these portfolio projects and I'm I'm just really really looking forward to doing them to be honest so thank you for sticking with with me uh thank you for watching I really appreciate it if you like this video be sure to like And subscribe below and I will see you in the next [Music] video what's going on everybody welcome back to another video today we will be heading back in a sequel for our third portfolio project now I am extremely excited for this project in particular for a few reasons one we're getting back into SQL and I really like SQL and two we are finally focusing on data cleaning and I have talked so much about why data cleaning is important and that you really need to learn how to clean data and that that's a big part of what a data analyst does but I haven't actually showed you how to do it yet and so that is what this whole project is going to be and then at the end you'll get to add it to your portfolio so it's really a win-win now before we start I just want to say that I think it's going to be a little bit more advanced than our very first video in Sequel where we walk through data exploration if you see something that you have never seen before I will do my best to explain it while we're walking through it but if you get confused or it seems a little complicated please pause it Google it do a little bit of research and then come back and I think that will be very helpful with that being said let's jump over to my screen and we'll get started on the project so we're going to start over here on GitHub and this is where I've actually put the data set that we are going to be using so I will put this link in the description uh we're going to go right over here to the Nashville housing data for data cleaning all you have to do is Click download and it's going to download it and you can open it up if you want to we're not going to do anything to this data at all but really quick I'm just going to show you what it does look like um and we'll of course look at this in SQL in just a little bit but we have a unique ID parcel ID uh we have this address a sales date uh the price of the home so this is housing data if you didn't pick up on that already uh who actually owns the home the owner address and then some information about land value um bedrooms bathrooms things like that again not super important um because we're going to be doing all of this in uh SQL so let's actually get this data into SQL we're going to import it the exact same way that we did uh in the very first video so we're going to come right over here going to go all the way down to Microsoft SQL Server 2019 Import and Export we'll click next our data source is like last time a Microsoft Excel and let's take a look and we'll take that first one this is the most recent one I've downloaded but I just wanted to make sure so I downloaded a few times um for the destination we're going to click SQL Server native client 11.0 and this is my client or my server right here and I'm going to go down here and I want to put it in this portfolio project so you know just configure this to what your server is um again if you haven't done this before you've never set up SQL server or a server um to go on SQL Server I will leave a link hopefully right here also in the description uh like I did for the first project so um you know be sure to go through that video so that you know how to download this and have everything we're going to copy the data we're going to take sheet one um we could renamed sheet one to something else but uh we didn't and then we're going to finish this and finish and it should run successfully hopefully it's looking good perfect so we have 56477 so let's head over to SQL all right let's go to our database portfolio project uh and here is our sheet one now I'm going to rename this um let's rename name it what is it Nashville let's just do Nashville housing that's what I'm going to rename it as um at least so when I post these queries um to the GitHub and you see them this is what they will be so if you want to have them the exact same or be able to copy and paste them um you know you should you should do that as well so let's take a look really quick let's select the top 1,000 but there's about 56,000 rows there's a lot of data in here um and a lot of things so uh I'm about to open up a a save thing and we'll walk through the exact things that we're going to be working on in just a little bit but um yeah this is what the data looks like in here there's lots of columns uh lots of data so really excited about this um let me pull this open really fast it's going to be this project walkth through here are the things and I'm going to show you this really quickly here are the things that we're going to be walking through so we're going to standardize the date format we're going to populate the property address data um that's referring to this right here if you notice there's the address and there's also the city that it's in so we want to be able to separate that out um and that is actually right over here we're going to be doing the same same thing to the owner address except that has an address a city and the state um which makes it a little bit more complicated and so um that one should be really really cool to to show you um oh whoops I I messed up that's what this one is breaking out into individual columns that's where going to do for that this popular in the property address um you know if you notice and we'll go into this a little bit there's actually some values in the property address that are blank but I'm going to show you how you can actually populate that um which you know is a it's just a cool trick that I've used a few times and it it it does work I think you'll find that one interesting um in the sold as vacant field we're going to be doing some um some case statements if then um then we're going to be removing duplicates and then delet deleting unused columns so we have a lot to get through this could be potentially the longest video and I'm okay with that um because I'm I love SQL down here and and I will say that when I when I in the very first video I said it was going to be an ETL video um and I fully intended on doing that but I ran into not issues on my side but issues in the fact that the ma vast majority of people who are going to be watching this are not going to be able to do what I did to configure my server um but I left it in here anyways when I think ETL is an automated process in order to uh extract the data from somewhere we're going to transform it and then put it somewhere this was going to be the extraction method um and I was going to put it in a store procedure so that you could um you know run the run the store procedure run the job import the data it was going to be really cool but I know that if I was having trouble with it me trying to explain it to you and you being able to figure it out on your side was going to be very tough I left the this anyways because I was able to get to work on my computer um but it is tough and it took a lot of research um and I did this for a previous server like a year or two ago and I remember it being crazy hard but I was able to figure it out on my computer so if you want to try it out um try it out and and look into the stuff so I'm going to leave this here this is just for if you want to try it it's a little more advanced um and so you don't have to just important and this will be a data cleaning project instead of an ETL project but data cleaning is what 90% it was going to be anyways um anyways let's go back up to the very top really quickly I have a whole another laptop right here as I did in the first video I didn't show it to you last time but um I have all of my queries written out over here I'm going to try to do this as quickly as possible we have a lot to get through now before we start writing our queries I am going to turn off my camera so I do not get in the way all right you should still be hearing my voice let's let get started let's just start with select everything and we'll do from uh and it is portfolio project. db. Nashville housing so let's just get this pulled up on screen awesome so this is exactly what we were looking at before and the very first thing that we're going to be looking at is this sale date now uh I wrote standardized sale date but I'm really just going to change the sale date um so let's copy this really quick and let's look at just s date and it has this time on the end and it serves absolutely no purpose and I it just annoys me I want to take that off and so right now it's a say it's a date time format but we're going to convert and we're going to do date and we're going to take sale date sale date and we're going to go like that and let's run this really quick and this is what we want it to look like all right so let's say update and we have portfolio project specified up here so we can just say Nashville housing and we are going to set sale date equal to and we're just going to copy this now I will say before we do this um I had some issues in my when I was initially doing it whether or not it made the update and I was I'm not sure why why not it was doing it um so yeah it's not doing it right now I you try it out on yours it may or may not be working I'm not exactly sure why that is because I would say like 80% of the time it's doing it 10 20% it's not I don't know why um no logical explanation of that but uh when I most the time when I did it they would then be the same column something we can do I just thought of we can do alter alter can't even say that word alter table and we can say um I think it's new or it's add add um give me one second yeah so add and we'll just do sale date converted um and let's make that a date format and bum just like this and then we can say like this and say sale date converted um let's try this and see what happens so I'm going to add this column and then I'm going to update this and it says it's affected let's see what happened uh so let's write sale date convert sale date converted let's see what happened let's see if it actually worked and it worked okay so we we now have a column um and maybe at the end we'll remove that sale date column U so that we just have that sale date converted but we know what that is you don't have to name it that you can name it sale date to or something like that um cool well let's go down to the property address and let's get a just a really quick look at it uh let's copy this up here I hate rewriting this stuff so I'm always copying and pasting um but we're going to be working with the prop address there we go so let's take a look at this really quick um so let's look at sorry I was looking at my notes we need to look at where the property address is null so what you'll see really quick when we run this is that there are null values um why there are null values yeah I really don't know um I I really am not sure but let's look at everything where this is um where it's n so we have this property address we have a sale date a price legal reference um there's this parcel ID and there's this unique ID um so we have a lot of information and when you have something like this something like a u an address an address is you know the address isn't going to change the address is the address the owner the owner's address might change but the property itself the address 99.9% of the time is not going to change so you can say with almost certainty that you know this property address could be populated if we had a reference point um to base that off of so really quickly um let's look at just everything and let's look at and we'll just order by let's do property not property address uh let's do parcel ID and let's take a look at this so we have to do a little bit of some research on this um but I'm going to show you something really quick let's see if I can find example um in not too long okay so here's an example here's the same ID so 015 bum and that's the exact same address and we'll find this a lot of times and I look through the data and it's it is pretty much accurate um when it does have it it it is the exact same address so this parcel ID is going to be the same as the property address um so something that we can do is basically say if this parcel ID has an address and this parcel ID does not have an address let's populate it with this address that's already populated because we know these are going to be the same that is basically what we are about to do um and it's not super complicated um but let's get started writing it let's copy that down there um one thing we are going to have to do with this is do a self-join so we have to join the table to itself to look at if this is equal to this then this needs to be equal to this that kind of thing um so real quick let's just write that join part out and we'll go from there I don't know why I sounded Canadian right there we'll go from there uh so we'll join on this and we'll say on a do oh wait let's let's label them I'm gonna do this in a really lazy way I'm just going to do a and b a. parcel ID is equal to b. parcel ID and um let's see really quick so we need to find a way to distinguish these the sale date could be the same um one thing this unique ID is is unique so we need these to be different so let's use this and let's say um let's say and a. unique ID is not equal to b. unique ID so all we have done here is we've joined these the same exact table to it self and we said where the partiel ID is the same but it's not the same row right because this is a unique ID unique will never that means these will never repeat themselves so we'll never get the same one so if this is equal to this but these are different we want to then populate um populate the other one so let's do a. parcel ID and we'll say a do property address B do parcel ID comma bproperty address um and let's take a look at this really quick and let's do let me see if this works where aproperty address is null and let's see if see what comes up here okay so this is perfect this is exactly what I wanted to see so we have this parcel ID we have this parcel ID and here is our address and it's blank in all 35 of these so we have an address for all of these but we're not populating it so what we want to do is we want to say use this thing called isnull so isnull is basically saying it's the first thing is what do we want to check to see if it's null so we want to check aproperty address this whole thing now if it is null what do we want to populate um we want to put in there this B do bproperty um address because we want to take that property address and stick it in there so um let's run this really quick so this row is what is eventually going to be stuck into this row so this is perfect um it's literally saying when it's null take take this and put it there and so that's what this um this part of is doing so let's go in here and write our update uh so we want to update and let's take this whole thing from here up and we this will be the set oops um so we're going to set um property okay we need to specify um and just so you know when you're doing joins in an update statement you're not going to say Nashville housing okay that's going to give you an error you need to use it by by its Alias so let's put a so now we're going to say property address is going to be equal to and now we're just going to copy this is null and put it right here and we only want to update let's see if it it does take this so I think this should be correct let's let's test it out really quick and we're going to run this above query and see if it made that update all right so there you go um as you can see there are now none that have null in there otherwise it'd be giving us an output right now so that one is fixed we can go back and check it if you want to please go back and and double check that um but that is what we did and it worked perfectly so that's what that is null does it checks to see if this is null if it is null it it it can populate with a value you can also do like a string and what we I mean you can write you know no address if you wanted to do something like that we don't want to do that we're going to keep it how it is let's keep moving on we do not have unlimited time here trying to keep this I'm going to try to keep this on one under two hours stretching the rules because for my love of SQL and that is the only reason um and this I think is going to take a little longer so let's take a look and let's copy this real quick and let's take a look at uh what are we doing the property address the property address um and we can get rid of this as well so if you notice we have two things here we have both the address and then there's this comma after all of them and there is the city now you know you don't know that or you maybe you haven't looked into this but I have and there are no other commas anywhere except for in between these things as a separator as a delimiter um a delimiter is lit if you don't know what if you've never heard that term delimiter a delimiter um is something that separates different columns or different values so for us the delimer is a comma and for this first one because we're going to be separating this one out and then we're going to be doing the owner address um for this one we're going to be using something called a substring and we're also going to be using something called a character index or a charart index so let's start writing that out and let's do select and let's say substring now the substring that we want to take we of course want to be looking at oops let me um put this down here so it helps us out a little bit and I'll get do like that so substring and of course we're looking at property address and we want to look at position one so we're going to start at position one one now this next part is something that you may have never seen before um and if that if you haven't that's totally okay um we're going to be the character index is going to be searching for the um it's going to basically be searching for a specific value okay that's all it's doing and you and you can look into this a little bit more if you want um so it's going to be Char index that's how it's spelled and then like an open parentheses and we want to specify what we're looking for so it can be anything you can even do you know if you wanted to things like um Tom or you can do Val well you do it um like this you can look for Tom or if you're looking for a specific word like John you can search that that's what this is for um but we're going to do a comma where are we looking that's what this next one is so we're looking in property address uh and then we're going to close the parenthesis and and we'd also close it again to complete off that substring and we'll say as address um and let's just take a look really quick at this so right now it's taking the it is basically going it's looking at property address it's going to the very first value or starting at the first value and then it's going until the comma Now the unfortunate thing is is we actually getting this comma in this output and we don't want that uh you don't want a comma at the end of every address we can change that um so we can say because this is specifying a position if we just look at this chart index which we can do really quick it is going to give us a a number it is saying at position 19 that is where the comma is right so it's not like it's taking it's not a value or it's not a um it's not a string it's a it's a number so we can say minus one one and if we do that and now we run it now that comma is gone because we're looking back we're going to the comma and then going back one from uh one behind the comma so that's how you get rid of that comma right there um the next one's a little bit more tricky because we're not starting well it's not super tricky but we're not starting at that first position anymore so let's put a comma then we have our substring now where we want to start is at this as at where the comma is so instead of position one we want it to be where that character index um I don't want it to look like this this whole time is it like this what am I doing it doesn't matter let's just get rid of this and see if that fixes it what am I doing here oh it's just because this is wrong um and we'll just do comma parentheses that might fix it ah doesn't matter okay I'm wasting time I'm going to keep going we want to start in this in this position okay um but we actually don't want to start at minus one we need to start at plus one because we want to go to the actual comma itself then once we get to the comma we want to add one so if we didn't if we just left it the same again it would include the comma at the beginning um then we need to specify where it needs to go to where does it need to finish now every single thing is going to be different every single address has a different length but we can use that to our advantage in this one and we can literally say the length of property address you guessed it right and then we can close this off let's see if that works okay what's messing up so we have property substring property address comma character index and then we have specifying it in the comma um we have the property address plus one okay we can't have that right there I don't know why I had that F finally figured it out at the end um so let's see what we're doing here let's see if it worked it works perfect um and again this was one that I'm guessing a lot of people haven't used before so I was trying to explain it a little bit more than other ones um but if we take that out we take out that plus one you're going to see the comma at the beginning right here so that's what that is um so Plus one and that's what we're going to keep now we can't separate two values into from one column without creating two other columns so just like we added this um table up here we're just going to I mean we're we're I'm just going to copy this down here really quick we're going to create two new columns and add that value in so we're gonna we're gonna add that we're going to call this um let's call it because it's property address let's do property property split um and this is the address and then we'll say this one this next one is going to be property and this is City split city city and this isn't going to be a date of course uh this going to be let's do narar and let's make it 255 just in case it's a large um just in case it is a large string a large text so then we can say um update that update that um and now we need to in insert um what we did for it so this first one is the address so we're going to say that equals the address and we're going to take this whole thing this whole substring oops and copy that and that's going to equal this um and then at the end we'll we'll look at it really quick so first let's add this table I'm going to do this one at a time really quick so you can see it so it adds the table now it adds the results and again adds the table of city and sets that City to that substring and now let's take um let's take this and just do select everything from this and you should see at the very end because when you add it it goes to the end we should have two new values and here we are so property split address and property split city um it's much more usable than this I mean this would be a nightmare not a nightmare it just be annoying to use this column I mean now that it's separated on the address and the city it's so much more usable of data it really really is the next thing we're going to be looking at is this owner address now it was hard enough or it was tough enough to do this um but I want to show you maybe even a simpler way to do it even though this is more complicated so let's go down here and let's get rid of this so let's say um let's get this and let's just say property oops no we're doing owner owner address here we go let's just take a look at this let's see what we got so again we're using or what we have in here is the address the city and the state so what we need to do is split all of those out um and again I don't want to use substrings again that was a pain I want to use um something a little different something again that you may have never seen it's called parse name um and parse name is super useful um especially for like delimited stuff stuff that's delimited by a specific value um so let me just show you what it is and then we'll go from there so we can say parse parse name um and we're going to be doing this on the owner address okay let's let me see let me see yeah I mean it's because I don't have this of course I do that all the time so annoying so on the owner address um and then let's do one and let's just see what happens uh nothing changed of course because parse name only is useful with periods or that's what it looks for that's what par name looks for and these are commas so something we can just do is we can replace those commas with uh a a instead of a comma we replace it with a period so super easy we're just going to do owner address comma um and we'll look for the comma in there then we need to specify what we need to change it to we'll change it to a period and let's close that and now let's run it and it's taking Tennessee so something odd about at least to me odd about parse name is that it kind of does things backwards than what you would expect it to do uh let's really quick let's add the other things um you'll you'll get a kick out well you won't get a kick out of this as much as I do here's one two three let's execute this and it separates everything for us but it's backwards so it's 1 2 3 you would imagine it' be one two 3 but no it's one two three so all we need to do is go three 2 one and run this and there we go so now we have it broken out this is now our address this is our city and this is our state so super what I would consider super easy a lot easier than the substring but I didn't want to show you the easy one first and then give you the hard one um so now we just need to add those columns and then we need to add the values so let's do this uh let's make some room and I need to get rid of one of these I think o did I do that right what did I do I have my alter table update alter table update what is this doing here what is this I don't even know what this is we'll just go like that so now we have three perfect um so from National Housing we're going to say we're going to say this is the owner oops owner split address um actually let me just copy the owner make it easier so we have owner split address owner split City and let's do owner owner split and then State oops and copy there owner split City there we go owner split address owner split address so I'm putting all the sets equal to what we're about to add to so now this first one this three is the address we'll paste it there the second one is the city so we'll put that oh I see what happened here that's what happened got to get rid of that um I set the owner split City equal to that middle one and then of course the third one is the state so let's go do that and that should be done so let's do it two at a time oops owner split address what's wrong with that oh I probably just got to run this first let's try that tried to get good too quick um you can do this a much more efficient way I'm just doing this for visual purposes I would update all the tables first or add all the um columns first I mean and then do all the updating at the end that's normally how I do it but um again for visual purposes that this is what we're doing so let's go get this actually let's get this bring this down here um don't keep this in in your final queries it's a lot of extra selecting everything you don't need to do that um so here we go so owner split address owner split City owner split State again so much more usable than when it's all in one column I mean it is 10 100 times more useful data now um you know that one to me you that gets used a lot let's keep it going I feel like we're making fantastic time I don't even know I'm not even keeping track of time time is not even relative anymore be three hours and I wouldn't care let's keep going um let's take a look at this column right here sold as vacant um right now has no but let's look at let's do select distinct oh gosh I hate when I do this I do this all the time am I the only one I don't think I'm the only one and we'll do sp uh what is it sold as okay sold as vacant let's do a distinct count on are distinct on these so right now we have yes no n y I'm guessing which is no and yes and then no so let's look just for just because I'm curious um let's look at a count of I don't want to do the let me just do sold as vacant let me do a count of this and we'll Group by uh sold is vacant okay let's run this and see what we get oh gosh let me order by okay here we go now we're now we're moving that's not what I wanted at all order by two here's what I wanted okay so at no we have 51,000 yes 4,000 almost 5,000 no and then just a few so let's change them to to yes and no because these are obviously the vastly more populated ones um and we're just going to do this through a case statement so we're going to say oh yeah let me get this ready before we start oh yeah I'm ahead of the game now let's do select and we'll do sold as vacant and then we'll start our case statement um yeah let's do right here so we'll do case when sold as vacant is equal to yes all we want to do is say then we want to make it no oh won't make a yes what am I doing geez I'm losing it when and I'm just oops oops oops oops ignore that pretend that didn't happen when sold as vacant is equal to n then no and then else we want to say if it's already if it's not one of those values it means it's already a yes or no so we're just going to say just keep it as sold as vacant and then we'll end it so let's take a look okay so let's scroll through here and see if we get any that we can see oh I just went byy some didn't I oh I just went buy some I know I did um let's see okay here we go so here's an N it's now a no so this this sold as vacant as this column the newly uh the case statement right here is changing it so the N is no so this should work all and this will be a unique update statement um and I hope it works unlike the first update statement that we we did that was a that was a travesty um let's do update Nashville housing um and we'll say set sorry I'm talking faster than I'm going set sold as vacant equal to and we can just literally put in this case statement um it's not but let's try it okay now let's go look at this again and see if it made the update there we go the update statement worked oh fantastic it's a beautiful thing okay great I'm glad that one worked I was worried for a second that uh my update had broken in um in SQL Server now now we're going to do something um these next two things is we're going to remove the duplicates and then we're going to get rid of unused columns um this removing duplicate I got to be honest I don't do it a ton in SQL but I have done it um especially for like queries you know when I'm looking at full tables I I will write some sort of temp table and like put the remove duplicates in there I normally don't delete actual data we are we're going to do that um but it's not a standard practice to delete data that's in um that's in your database so just for future purposes don't blame me if you delete all the all the duplicates back accident in your uh table at work so you can do this a few different ways but the way I'm going to show you is we're going to write a CTE and we're going to do some windows functions to find where there are duplicate values okay so excuse me so let's start writing out our CTE and or you know even we can write out the query first then put it into a CTE that might be a little bit better so let's do select everything and oh my gosh I was about to do it somebody's out there just like waiting for me to make that mistake again so we want to partition our data um when you're doing removing duplicates we're going to have duplicate rows and we need to be able to have a way to identify those rows right so you can use things like rank order rank um row number there are a few different options we're going to be using row number um and you know if you want to look into how Rank and rank uh like dense Rank and all those ones work please do that so you know why we're doing it um but we're using row number because it's the I think the simplest um and it's going to do what we need exactly so I'm going to get this over here we'll say select everything because we're selecting everything then we're going to add this row number on here so row number and we're going to do these parenthesis right here we're going to say over and an open parentheses now we need to write our partition because we're going to partition this data so we're going to say um Partition by cool um now really quickly while we're here we need to actually know what we're partitioning on that's helpful so let me write this so while we're writing it we can see what we're doing we need to partition it on things that should be unique um two basically to each row um if in I guess for the sake of what we're doing we're we're going to pretend this unique ID isn't here um although you know you could say I'm cheating it doesn't matter but I'm going to say you know if things like the parcel ID are the same if the sale date is the same um the property address is the same the sales price is the same This legal reference which I'm guessing is some type of legal document saying it's like somebody's uh property if all of those are the exact same then to me that is the same data it's it's unusable just for example I mean this may I don't I mean this data is just some random data set I found online right so that's what we're going to be going with that's what we're going to be running with and pretend that lie that I just told you is completely true so what we want to Partition by let's start with the parcel um can I is this not right here why is it saying this why is it not giving me okay doesn't even matter I'm just going to say parcel ID um we can say property we'll do a property address stick with me we're getting somewhere we'll do sale price um what do we say sale date I mean there shouldn't be two of this they didn't sell twice on the same day come on and then legal reference and oh I know why it's not working or my autocomplete isn't working which I love um it's because we're creating our own partition so it's its own column of course I don't know why I'm it's late as you can see down here it's 11:15 it's getting late for me but hey I I this is an adrenaline rush for me um now we need to order it now we want to order it on something that should be um not necessar I guess unique so we're going to order it on this unique ID we'll see if that actually does what we want it to do um oops what am I doing order bu come on and we'll say uh unique oops unique ID perfect and we should be able to close that off and we're going to call this R num I mean that's just that just makes sense so now we have this and let's run this really quick and see what happens so um and maybe we should order this as well but we'll maybe we'll do that later yeah let's order this on parcel ID um order by parcel ID let's just see what happens because this I think that should be pretty accurate um let's scroll down and see if we get any this is all ones maybe should be doing it on unique ID I don't know let's see if we get any hits okay there's a two in there let's let's look at this really quick because I want to see it maybe I did something wrong I don't know it is absolutely possible somebody play some Jeopardy music for me real quick yeah I don't know I don't know why it's um okay so let's see let's let's look at these two um and let's see if I did something wrong oops don't need to pull that up I was doing some research when I when that convert by wasn't working um okay so this one and this one it's giving different row numbers so let's look at the actual data ignore the unique ID but the data itself so the the sale date is the same the sale price is the same the legal reference is the same the owner is the same this is the same I mean literally every single thing in here is the same so this is a good example so we're going to in this query that we're about to write that that will be that second one will be deleted because we don't need it now there there's only one so it looks like this is working as intended um I can also do um let's do where rowcor num is greater than one let's see if that I don't think it will work actually yeah that's because uh it is that is in a Windows function of course we can't do that what am I thinking that's why we need to put it into a CTE oh of course it all comes back so let's call this all comes back to the CT those things are amazing um let's call this um row num num CTE and we'll say as and then open parentheses and I don't think we can have an order by in here let's do it like this and let's just do select everything from row number CTE so again if you haven't watched my like CTE CTE video or you've never used a CTE before um this is now basically almost like a temp table so we're going to be able to this query down here is querying off of this table that we quote unquote created so um it looks like it's working so all we're going to do is select um everything from that and we want to say where row num because that's now a row is greater than one and let's order that by I don't know property address let's see if that works and let's see what happens okay so all of these are duplicates we have 104 of them it looks like so there's not many but it there's twos any threes no no threes so there's multiple of these rows or columns that are basically duplicates um and we want to delete them so all we're going to say is we're going to select instead of saying select everything from row we're just going to say delete and uh yeah I got to get rid of that order bu that doesn't work and let's do this there's 104 let's see if it worked um so now let's do let's go back and we'll say select everything and let's see if there's any more duplicates in there there are none that is fantastic every I'm like biting my nails now to see if each one of these Works um because I that first one didn't work um so yeah so it worked we got rid of the duplicates that is fantastic um and now it's smooth sailing from here because we're just going to delete some um unused columns that we don't care about this doesn't happen often um this I would say actually happens more in like views when I'm creating views I have a view and I'm like oh I didn't mean to add that column let me just remove it because it's a I don't need it you don't do this to um like the raw data that you import usually this is I mean again best practices please don't do this to your raw data that comes into your database um talk to somebody before you do this that's just my my legal advice for the day I'm not legally bound or legally held responsible for any mistakes you make so let's keep going um we're literally just going to delete some columns it could be any columns that we want um but for example we got have these property split address and owner split address um in city and state and city and these are perfect and much more useful than these owner um these owner address because this is really unusable to be honest so we're going to delete those um and maybe we'll also get rid of like I don't know maybe the land that land use might be useful this tax tax District who cares about that um so it's going to be super easy we're just going to write alter table alter table did I say that right geez um and we're going to say alter this table and we're going to drop a column and you can do as many as many as we want so we're going to say owner um address we're going to do tax district and let's also do the property address all right and let's try this and let's see if it works I'm nervous all right so as you can see that the property address is gone the owner address is gone the tax what was it tax district is gone and now we are left with this um now remember the whole point of everything we were doing was to clean up the data right we wanted to clean the data and actually now well now that we're here we have this sale date as well U and we have the sale date converted over here let's get rid I forgot let's get rid of this oh that was my dog Max excuse them let's get rid of oops let's get rid of that sale price that that or the um sale date that made me look like an idiot this is Sweet Revenge sale date Sweet Sweet Revenge all right and it is gone so it's as easy as that now remember like I was saying before the whole point of this project is to clean the data and make it more usable um and it may not have felt like that as we were going through cuz I wasn't you know really looking at the clean cleaning data uh uh we were cleaning it but you know what was the purpose of it I may not have highlighted that too much all these other columns that we created um are just it's much more usable much more friendly um this is standardized now and you know we we did that through quite a few various methods um so let's go back up the top we're going to recap what we did really quick so using this convert we tried to standardize the date format or change the date format may or may not have worked for you didn't work for me we populated this property address um which we did that before we broke this out because if we reversed it if we broke these addresses out into individual columns and then we populated the this thing um we would have because then we went and deleted uh we went and deleted this column oops sorry we went and deleted uh this property address so we wouldn't have actually gotten any of that data so there was a reason it was in that order uh don't mess that up that's happened um so we broke it out we did that to to using um substring chart index as well as parse name and replace then we went through and we changed yes to no or Y and n's to yeses and NOS using case statements um then we use we removed duplicates using a row number a c te and windows function of Partition by and then at the end we deleted a few useless columns that we no longer want to see because um they are horrible and terrible and um you know we don't want to see them anymore that is the entire project that was everything and you did it and I'm honestly super proud of you for sticking around this long it this this was not necessarily an easy project we used quite a few new things that I may have not talked about or showed you before this to me is just the beginning right this is just a a glimpse into all the things that you need to do you need to look for um in order to clean data so you know I really do think this is a good portfolio project because it will show that you understand and know how to clean the data although this is not an end to-end project right that could that would take a long time and a lot more exploratory analysis looking into the data to to figure out what we need to change but for all intents and purposes I mean this is a a pretty good project for cleaning data and I hope that you learned something I also hope that you worked on this hard um if you want to make any improvements please do that this is not perfect by any means there's other things that you could change um you could you know I don't even know I'm not even going to try to guess you could do other things to this data though um and and create your own queries create your own um data cleaning uh part of this and so um you do that if you were able to get this um the ETL part of it done do that I think it'd be really really cool um again I was able to get it to work but I don't think 90% of people out there would be able to get it to work um it's just every computer is different every server is configured differently um and so it would just be a huge pain so I decided to cut that out and I'm sorry um but hopefully this will suffice um with that being said this is it you made it all the way to the end again I'm super proud you guys are doing fantastic you guys are the ones putting in the hard work to build the portfolio for your future job I mean it's not easy but you're putting in the work and so and so kudos to you um in our next video we're going to be going into python for the very first time really excited about that one because um I think the only python video that I have up right now is on one where I was scraping data from Twitter so um you know this will be a nice change a pace or a little bit different content than I normally put out and so I'm really excited about it and I hope you are as well with that being said I am done with the video I'm going to be stopping it soon thank you for joining me if you like this video be sure to subscribe be sure to like this video leave a comment below um telling me how it changed your life uh and I will see you in the next video [Music] goodbye [Music] what's going on everybody today we are starting our Excel tutorial [Music] series now there are so many things that you can do in Excel so I don't know how long this series is going to be it could be 15 or even 20 videos but what I do know is that I'm going to be covering just about every single thing that I've used since I became a data analyst and I want to show you how to do it uh so won't just be the more concrete things um you know like pivot tables charts V lookups things like that it'll also be some of the more nuanced things like how to deal with missing data or how to deal with dirty data and how to clean that up within Excel and so those are things that you may not be able to do you know if somebody wasn't showing you how to do it and so that's what I'm going to try to help you because I know that that is something that you will need to do or learn how to do in Excel now before we get into it I want to give a huge shout out to the sponsor of this Excel series and that is udem me I took so many Excel courses on you to me when I was first starting out as a data analyst and there was this one course that I kept going back to over and over again because as I got into it in my job I realized that there were so many things that were in that course that I really needed to know but I didn't realize I needed to know it and so I'm going to put the links to those courses in the description in case you want to take those again huge shout out to you to me without further Ado let's jump on my screen and get started with our very first Excel tutorial all right so I'm going to go ahead and get rid of myself we are going to be looking at something absolutely pivotal in your data analytics career and that is Pivot tables uh and I think that's really appropriate it is probably one of the most commonly used things I think that data analysts use to convey information in Excel it's super easy to group things together to display information in a very easily understandable way especially for people who are not data analysts right I use this a lot for other managers or for higher-ups um who don't want to get into SQL or or you know aren't super text savy in like python or Tableau they just want it in an except sell and so I use it all the time for that reason and so we're going to be using this data set right here bike store sales in Europe I will include this link in the description um we're not going to look at the columns just yet we're going to download it um I've already downloaded it a few times but we are going to go to um our downloads we're going to open it up and we're going to open up this sales right here and give it a second all right perfect and so here's what it looks like uh at least on my screen I'm going to uh spread it out just a little bit um and really quickly let's take a very quick glance at this so we have a date a day a month a year so some um some date information um then we have some customer age information so how old was the customer again this is bike sales so what did um you know what did they buy and they have some demographic information so this is their age group we have uh the gender country State the product category the subcategory the actual product that was purchased and then we have things like um you know how much these things cost the quantity that was that was ordered so we have order Quant quantity unit cost unit price then we have the profit cost and revenue all things that we almost everything in here we can in some way put into a pivot table now I'm not going to go through every single variation of that but we are going to be um looking at a lot of this um Revenue over here because I think it's it's pretty easy to show the value of a pivot table with especially with um you know currency or money so what we're going to do to get started is we're going to go up to insert and we're going to click on insert and then we are going to click on pivot table now really quick there is a recommended pivot tables and if you click on that what will come up is some recommendations that Excel gives based on the data that you have um and it can kind of give you some ideas of of what you can do with pivot tables it's going to generate it for you we're not going to do that we're going to build our own uh but let's click on pivot table and it's going to Auto Select basically everything and that's fantastic um but what if it doesn't come like that I I just erase that if it doesn't come like that you can click right here you can cck excuse me you can click control shift and then the right arrow and then the down arrow and is going to select all of our data um and you have right here a new worksheet or an existing worksheet we're going to create a new worksheet just tends to get too clogged up if we put it on the same worksheet that already has a lot of data in it so right over here are pivot table fields and these are all of our columns that we just looked at and we're going to be able to select those and kind of drag and drop now if you just took the Tableau um tutorial series that I just finished doing last week then this is going to be pretty pretty familiar um you're going to start seeing a little bit of um hopefully some patterns about how the data is kind of displayed and so we have our filters down here we have columns rows values all these things uh we will be using I'll show you how to use today as well as some additional things um one thing that we want to start with uh for this demonstration is we're going to be looking at kind of the um these bottom ones right here profit cost and Revenue and we're going to be doing that per country uh per country and state and we'll kind of do some drill Downs um and I'll show you how those work so for just to start out we're going to take the country right here and you'll see it populate right over here in fact um let me zoom in maybe once uh yeah that should be fine I don't know if I want I might zoom in it again in just a little bit um so we have our country and and it's just like this very very simple oops um now I'm going to include the state now I'm going to drag this um all the way and I'm going to put it under you can put it above or you can put it below I'm going to put it below uh it definitely makes the most sense there now when you do that it it um kind of populates it in an expanded way but you can collapse this very easily we're going to go right here we're going to right click we're going to go go down to expand and collapse and we're going to collapse the entire field and so now here are all of our um all of our countries as they were before now each of them has this plus sign to the left and if you click on it now we can go and we see this state that we that we added to these rows and what this is going to do is it kind of is like a rollup or it's like a grouping um and so if you you know have taken the SQL um tutorial series and you've done things with Group by this is very similar to that um and if you've done the Tableau tutorial series it's kind of like a drill down it's very very similar so you can drill into the information so we um can put some values in here uh and what we're what that's going to do is that's going to kind of create some some context to what this what we're grouping by so just for um visual purposes let's add this Revenue so this is the revenue that is bike uh bike sales revenue right that's what we're looking at so this is the sum of the revenue for these bike sales per country now if we drop down right here we can see that in Australia uh New South Wales had uh 92 was that 9,234 N5 Queensland had 5 million you know etc etc so now we can break it down we can't it's we don't just have to look at Australia we can now drill down even further to the actual state is what they're calling it um the actual state within Australia and so it's super super useful and you can do that for every single one and so we can look at Canada we can look at France and we can really drill down into uh the revenue for each of these countries as well as the states within them now over here this is not the most uh pretty um it just says sum of Revenue and then it has some numbers not not the most pretty thing I've ever seen um really quick we can go like we can um kind of highlight over these and we can go back to home you can do it in a couple different ways we can go to home and will type currency now it has these two. Z at the end you can get rid of those really easily by going like that um already this looks quite a bit better just visually um especially if you're looking at it in uh you know dollars you can change the currency um to different currencies if you want to do that now we don't just have to do uh the sum of Revenue we can do a lot of different things so let's go to the value field settings so we can customize this name so we can do um Revenue oops good if I get spell Revenue per country that's fine that you know it's just a placeholder trying to show you but we don't have to just do that um you know we could do the count the average the max the Min we can do just about anything we want um but let's keep it the sum right now um and if we want to we can show this value as different things so we percentage the percentage of column total percentage of row total let's do really quick just for demonstration purposes the percentage of grand total so when we do that we can see that the United States the per Revenue per country United States has 32% just between these um you know these countries and Australia has the next one so you know it might be kind of hard to glance at this really quickly to know who has the highest um but what we can do is we can go right here and we can go to sort and we can do largest to smallest and there we have the United States on top now when you do it right here it's not sorted largest uh to smallest you'd have to go in again click sort and do largest to smallest and so now we can see that California has the has the um you know biggest percentage they're pulling in 20% of that 32% of Revenue so I'm just going to click C control z a few times and get us back to where we just were um and what I want to do is I want to show you a few different things uh pretty quickly so we want to pull in this profit and this cost uh and so I'm going to pull in this cost next and then I'm going to pull in this profit again uh I'm going to change the currency on this and I'm not going to change the names um right now but you know you absolutely can do that now the revenue is the how much is actually being sold so you know for the United States it was 27 million now the cost is how much did it cost to manufacture or or store um or distribute all of these products so that was 60 million and the profit is actually how much money is being made at the end of the day after um you know all their costs after all their employee costs after everything they're still making the United States is still making $1 million now you might look at this and you might say well you know I can kind of glance at it and say know that this profit is correct based off these two numbers um but we can do a calculated field um if you remember what calculated fields are that's something from Tableau very uh basically the exact same thing and so we can create an additional column right here that is a calculated field that can add and subtract these things to make sure that our numbers are adding up correctly so let's do that really quickly U let's go to pivot table analyze we're going to go over to Fields items and sets and go to calculated field now we can name this anything um and I'm just going to for demo purposes I'm going to say um oops calculated field demo uh I'm sure yours will be different now um if you want to you can go in here and this is the formula it's almost like um you know we haven't looked at formulas up this is our first tutorial but you know when we look at formulas it's basically the same thing as writing it if inside of a cell but here it gives us kind of this um open text to do how we uh do what we want with it now what we're going to do is we're going to do Revenue I'm going to insert that I'm going to get rid of this I'm going to do revenue and so that's the the the very large number and then we're going to subtract and we're going to sub subract our cost we going to insert that and let's do this and click okay so this is our calculated field demo column that we just created and as you can see it matches our uh sum of profit column exactly and that's exactly what we want to see we want to kind of check to make sure that this revenue and cost uh fields are generating the correct profit and sometimes those are off and so it's really good to kind of check those and have that additional column um You probably wouldn't have this if you were um you know going to submit this to somebody uh just so you know now that this is an actual column you can't go here and do something like cut or and paste it over here you know that's not it won't let you do that what it is is is now an actual um column and so we can go and remove that and we can add it back at any moment so if we want to go back and add that um oops add that down here we can do that because we've created that column it's now permanently there unless we go and delete all of that data uh and so we can just click this check mark and it will get rid of it for us all right now the last thing that we have not used down here is the filters now the filters is exactly what it sounds like it's going to allow you to filter on certain things um but probably not things that you already have included in your pivot table so if you add something like the country down here um it's going to kind of expand everything and then if you then go and filter on it it kind of breaks it down that's really not what the filter is kind of used for or meant for um for example right up here we have uh customer gender okay so let's take the customer gender and we'll put it in this filters now we can see all of the revenue all of the cost all the profit and we can do that based off of the gender so we can filter by a gender not really having to change anything about our pivot table and so at a super Quick Glance we can see that uh the males are the profit from the males is 16.48% so at a super uh basic level at a really quick glance we can see that the men or the males are you know spending a little bit more than the females by about about $700,000 now let's go ahead and create one more pivot table uh we are going to create a pivot table right over here let's go back to the sales right here again control shift right down it's going to select all of our data and we're click okay so one thing that we're going to look at is we're going to use some of this date information right here so let's select our country just like we did before um and what we want to do is see you know what year were we performing our best when were we doing our absolute best uh with oops let me go back uh with our sales so I'm going to select the year and put that in our columns and so now we have 2011 through 2016 and we want to look at our Revenue let's put our Revenue right down here and now we have all of our Revenue now let's again make this into a currency just like that and super quickly now we can get a really quick glance at at how Australia was doing each year and we can see that there was a huge uptick in 2013 and a huge uptick in 2015 it didn't happen for every single country uh it did go up uh for most countries very slightly for some but we can see on a large scale from um year to year what that's like and So within just a few minutes we're able to create some really useful pivot tables that anybody could look at and understand and that's really the biggest use of these PIV pivot tables is that you can kind of group these things together show some uh information and data at at kind of a broad larger scale and make it to where anybody who's looking at it can understand it that is why pivot tables are so useful and so I hope that this video was helpful I hope that I was able to walk through it and help you better understand how pivot tables work and how you can use them when you are working within Excel thank you guys so much for watching I really appreciate it if you like this video be sure to like And subscribe below and I'll see you in the next video [Music] what's going on everybody today we're going to be looking at formulas in [Music] Excel now I know what you're thinking there's absolutely no way that you're going to be able to show us every single formula in Excel and you're absolutely right but I am going to show you some of my favorites and the ones that I found the most useful and then you can go ahead and practice those and try those out and if there are ones that you really want me to do and you think that I missed put it in the comments below and I will see those and I'll try to make a list of those and make another video on formulas and include all of those as well and now before we jump into the actual tutorial I want to give a huge shout out to the sponsor of the series and that is udemy you guys already know if you have watched any of my videos that I absolutely love udem me I mean honestly they were the ones who got me started and were able ble to give me affordable courses for me to get started as a data analyst I learned SQL and Excel and python all through udimi courses and so if you are looking for a platform to take a course I absolutely recommend you look at udemy they have fantastic sales going on right now especially during the holiday season in this new year and so if you're looking to take a full-fledged Excel course I have some of my favorites in the description below and now without further Ado let's jump onto my screen and get started with the tutorial all right now before we start I want to say that this is not like every other tutorial that I have created created this one is very streamlined okay so I already know exactly what I'm going to do there's not going to be much messing around I left little notes here and there um and I'm going to try to get through it because there's a lot of them to get through um so all these ones at the bottom now these are ones that I use a lot that I think are useful again if you know other ones that you use a lot that think that I should be using which I know there are ones that I left out of here you know put it in the comments um I'll see the ones that people are liking and I will I will create more videos on the because I know there are so many I also will save this um excel in on the GitHub so you can go and download it it will be exactly what you're looking at right now I highly recommend trying these formulas out for yourself so you can get a feel for how they work and how they're actually used and you can mess around with it yourself so um as you can see at the bottom we're going to start with Max Min and then we're going to go on to some more I think a little bit more uh difficult things um and all these things are super useful I'll try to talk about how you can actually use it as we go through it some are super self-explanatory but some may not be so this one I think is super self-explanatory but again one that you're going to use all the time um and so what we can do is we can say equal and that's how you kind of start off saying this is going to be a formula in this cell equal means uh I am now creating a formula and we're going to say MX and I'll hit Tab and so it'll kind of populate it and right here if you've never seen a formula before it'll to give you what the inputs need to be so it's going to say Max of number one number two etc etc what we're going to do is we're going to give a range so we're going to go from here down to here you don't have to close the parenthesis but you can I'm going to and then you hit enter and so for this date it's going to give us the max date now these are um the start dates for these people right here and so if we just kind of glance through here we can see that 2013 was the last year and this one is actually the latest in that year and so it gave us the correct one the Min is going to do the exact opposite it's going to give us uh the smallest and so we'll give it the same range we'll close the parenthesis and it's going to say December 7th of 1995 and we can see that that is correct so Michael Scott started in 1995 the earliest of all the employees um and you can do the exact same thing for really any of these columns we can see who the who's making the most money or at least what the higher salary is U so we'll do Max and then we'll do the salary range and so this is this one again uh whoops what did I do oh I did the wrong range didn't I no I didn't do the wrong range it's just there it goes uh this column was a date range or a date column for whatever reason let me get rid of that uh and then we can do equals Min and we'll do again we'll do the salary and at a quick glance we can see that Pam Beasley is making the least and 65,000 is Michael Scott who's making uh that so super simple it shows the max it shows the Min you can select a range there you go let's move on to if and ifs now if is um I think pretty straightforward so all you're going to do is you're going to say if this then that um ifs is a little bit different so ifs is you can you can put multiple conditions and as we're writing it I'll show you kind of what it the conditions that need to be met all right so we're going to click right here we're going to say equal we're going to do if hit Tab and we need a logical test uh and so we're going to give it a range or or or something we're going to say if it's equal greater to um something like that then we're going to say if the value is true what's the what is going to be the output or if the value is false what's going to be the output so let's do this right here we'll do this age range and so if they are greater than let's say let's do 30 if they're greater than 30 we're going to do a comma and so if the value is true what what should be the output uh if they're greater than 30 we're going to call them old and then if it is false so if they're younger than 30 what should it say and we're going to say young and we'll close the parenthesis and there you go so if they're over 30 then they are going to have young or if they're younger than 30 they're going to have young now this is something where you need to specify if you want 30 and over or over 30 we chose over 30 so 30 is not included in that um so they're going to be young now uh let's get we don't actually need two of these that's pretty self-explanatory the ifs is a little bit different right you can have multiple conditions so let's open that up real quick so ifs and now we have a logical test value if uh that's true then you can do logical test two value if that's true um so you can have multiple multiple multiple things now this one is a little bit different in this one oops let me get out of this in this one you had a value of true a value of false ifs does not have that ifs is going to give you um different ranges in different specific conditions and you can't say if this one's false you're just going to have multiple conditions so let's do equals and ifs Tab and we'll do our first logical test so let's do um if the salesman or if that equals to salesman we're going to say we're going to respond with sales so that's if the value is true that's what we want the output to be now we're going to go on to our logical test two so you're going to see this pattern right if this is our conditional or logical test so if this is true this is what's going to be returned so you'll notice that's just a a pretty simple pattern we can just do random things so if it's equal to sales um and we'll just do the same one if that is equal to say HR we can say fire immediately and now we're going to say if it's equal to regional manager and we say give Christmas bonus and we'll close the parenthesis and let's see what we get so as you can see there's no default value for true or false like like this one there was a logical test and if it was true there was a value and if it was false there was a value so for every single one you'll get a value for this one that's not exactly going to happen as you can see there are these Nas now when that happens it just means nothing met that condition so we never said anything about supplier relations we never said anything about accountants but if it was part of that ifs statement then it got something um and so that is how the ifs works now let's move on to length uh this is exactly what we're going to do but you know some of the uses for this U for the length I've used it for a lot of different things um one thing that I've used it for in the past and you know Max and ifs you know you can use it for almost anything length is there's a lot of different use cases one I used to work with a lot of um customer data or patient data they had like Social Security numbers and if you know there was bad Social Security numbers we didn't want to include that and so we do like the length of that and if a social security number was let's say 10 numbers or 11 numbers where it should only be nine or or you know however many they are I think it's nine then we know that that social security number is incorrect and then we can get rid of that or discard it from our results that's just an example right um so for this oops what I do that I did control Z to undo that if you didn't know how to do that um so we're going to do equals Len which is length um and again if you didn't see that it Returns the number of characters in a text string so let's go right here and let's go to uh let's go to their last name and we'll give it a range so it's going to tell us how many characters are in that string so for halber it's seven characters for flenderson it's 10 characters and we're able to see a length and so again there are a lot of different use cases for this uh the social security number was one another one is phone numbers right if you look at the length of the phone numbers and there's ones that are like 12 numbers long you know those might not be ones that are accurate and you need to go look at them and see if you want to include them in your results or your output so that is how length is done let's move right over to the left and right um I I might be going a little fast but uh you know I'm keeping it I'm keeping it live I'm keeping this on our feet uh so let's keep going left and right um are kind of like substrings if you've taken the the sequel um tutorial series that I've done uh substrings are where you can choose a certain part of the text string and you can extract data from that um and usually have to reference a certain number so a certain amount of characters that's the exact same thing except uh unfortunately there's no substring there's substitute but there's no substring left and right is really the closest thing that we have so let's kind of take a look real quick and see what we can do so we're going to do left and it's going to say Returns the specified number of characters from the start of a text string so we're starting from the very far left and we need to choose our text and then choose the number of characters that we're going to be looking over so let's go over here and let's just choose you know start symol uh we'll get a little bit more advanced so we have um this is our text range so these are the the the ones that we want to look at and then how many characters do we want to look forward and we'll just choose three as an example and so you can see that it takes the first three characters from every single thing now you can also do this with numbers it doesn't just have to be um you know name with with actual words or letters you can do the exact same thing so you can say write um and we're going to choose our our string uh and let's do this one so you know all of them start with 100 um and we'll just say we want to take the last one so this one is going to start from the very far right and go over one character so right here you can see this is our range and I just chose one so starting from the very far right we go over one character and that's what we take and so that can definitely be useful another one that you can do and this one is one that I have used so many times I mean honestly countless times in in actually using this in my job uh so we're going to go from the right and we're going to look at a date so you know sometimes you have these date structures month month day day year year year or year um you know day month year all these different and sometimes you just want to extract either the month or the year or or something like that the day and so we want to come in here we're just going to extract the oops I wanted to make that arrange we want to extract the year of the start dates so we're going to do that and then we're going to go over four so we want to take the first four characters from the right to give us the entire year let's do that and now we can see exactly the year and this can be just super super useful this is again one that I've used used a lot and so that is one that you might want to remember in case you're ever doing analysis on you know start and end dates or or anything with um date data uh again one that I highly recommend remembering let's go over to date to text I actually probably should have included that um before because I actually used it in this one um if you notice right here this is a text so in in this one we just did that was a text you can't do this right on um start and end dates when it's a date uh format and let me show you so this is a date now if I do equals and you know we just did this uh let's do on the end date and I'll do the whole range give me a second and we'll do four it's giving us completely random numbers why is that because underneath the date range there are um numbers right so if I go right here and I make this a general it's going to have the numbers and look these are the first four characters from the right and so it's doing what it's supposed to do but uh it's not doing what we actually want and that's the issue so how can we convert this now there are a ton of different ways um but the quickest probably the easiest besides actually writing writing it out like this like 11-2 d201 which then converts it to a date format um but what you can do you know just so you know you can create a as a text you can do 11-2 d201 and now it will stay a text string and as you can tell these are a little bit different because this one is uh formatted or situated on the right and this one's on the left that's how you can tell the difference now if you don't want to do it by hand uh completely manually and waste hours of your time you can do it in a very simple way so we're going to do uh text so this is the exact um form for that we're going to use so let's get rid of that one there we go so we're going to do equals we're going to do uh oops text it says converts a value to text in a specific number format so for a date format we can choose a date format and then it'll convert it to a text for us which saves so much time I promise you uh let's do all of these just like we did and then we need to tell it what the format is if we don't if we tell it something incorrect it's going to give us a completely terrible output or just give us an error alog together so this is a DayDay month Monon year year year year format and that is what we're going to do so we're going to do ddmm y y YY and close that up and there you go and now we well because it's in a formula what we need to do is copy this and past paste it right over here and now you can see that is a general this is something that we can use as a string and let's just check it just to make sure we're going to do right we're going to do this one let's do all of them and we'll do four and there you go so now it works that is what we are looking for um and you can do that imagine doing that with millions of rows or you know let's say 10,000 rows it's going to be a breeze right it's going to take you two minutes or a minute to do everything that you want to do instead of having to just do a bunch of mess to convert it to a string which I promise you I've done it it just takes forever it's it's terrible so that is uh date to text super helpful formula let's go over to trim now I I purposefully messed up this column now why do I did I mess it up like this because when you're working with real data you're going to get data like this it it's messy it's dirty it just has random spaces at the end for no reason um because sometimes you're going to be working with um data that is inputed by a user it's not like a drop- down option so imagine somebody's typing this in they accidentally put a space so they actually put an enter or something and then they submit it and this is how it's going to look in the database um and if you're a data engineer or you know you're working with the raw data if they don't clean that up then you're going to be working with that that dirty data and I I guarantee you if you're working as a data analyst you're going to see stuff like this not with maybe a last name but all sorts of data so we're going to go right here we're going to say equals trim do open parenthesis actually this says removes all spaces from a text string except for a single space between words so like you know if it said Halpert space uh or gy space Halpert it won't take the space in between there because it it kind of understands that the in normal language space is supposed to be there so it won't do that um but we'll take that we'll give it this Range close that up and there you go now it is nice and clean much more usable now let's look at concatenate one that I have used just way way way too many times um and something that I've used concatenate for and you'll see this one in a lot of demonstrations for a good reason is because a lot of people use it for this um so what you can do is you can say equals um and well let me tell you what concatenate does real quick so what concatenate does oops I'm totally messing up here um but it joins two or more text strings into one string it basically joins things together and adds them together so let's do concatenate and we're going to add this first and last name again one that gets used all the time but that's because um it really is useful so you can do this and you can say now now I want to include this so concatenating this and this and let's take a look so it says Jim Halpert U but it's all connected and that's typically not how people write their names so what we can do is we can go back in here and we can do what my demonstration up here already tells us to do which is we're just going to add another thing in here and if we add two parentheses we can include anything in here we can include a dash we can include an exclamation point or we can just include a space so let's just include a space really quick and just like that it works perfectly and so now we have the full name now something that you could use it for is something like generating uh an email this is something that you absolutely could do um and it's you know pretty simple so I'm going to do it like this I'm G to say oops what did I do I'm G to say um Dot and then at the end I'm going to say at oops comma quotation gmail.com and now I've created emails for all of these people so just something that you can do with this um and something that it it absolutely is used for and you'll see that demonstration almost everywhere because honestly it gets used a lot um by data analysts and so uh you know just a good one to know understanding how that that concatenation works um let's go over to the next one so we are going to do substitute now substitute's really interesting um there are different ways you can do it I'm going to show it to you on these dates real quick uh that's what we're going to look at so changing a date format changing how what it's supposed to look like is absolutely something that happens all the time and um you know sometimes you'll even get it like this where it'll look like it'll be messy it'll be different a different um I guess format so this one has all the other ones have slashes where these ones have dashes and you know what you can do is if you want to well let me actually go with the no instances real quick because this one is uh actually makes the most sense um so we'll do equals and we're going to say substitute and oops and let me say substitute replaces existing text with new text in a text string so if we do an open parenthesis it says we take the text have the old text we have the new text and then we have how what instance or how many times uh or or or what instance are we looking at it and I'll explain that in a little bit so the text that we're going to be looking at is this one right here so let's take this range and the old is we're going to take this Dash and so let's take the dash and then what do we want to replace replace it with we want to replace it with this slash right here I think it's a forward slash isn't that what it's called it's called a forward slash am I crazy um and we're not going to put an instance notice that that's in a bracket that means it's optional we're going to do none of that um and what it's going to do is it's going to fix this so this one is now in the correct format that we want uh and that's fantastic that's you know that's what we tried to accomplish given what we had now let's fix that if we want to do the exact same thing uh we can say uh what are we doing substitute we can do substitute we can do open parentheses we'll give the range and now let's say we want to change all of them to a different format so instead of the um forward slash I'm going to keep calling it that if that's correct we want to give it a dash and so then we close that and now all of them are in this new format so it it's able to substitute a specific value for a new value and if you don't include an instance then it'll do it to every single one in there so let's go over here and we're going to actually use the the um the the instance num and I'll show you what that does uh and so really quick we'll do the exact same thing that we just did we'll do the forward slash and we want to replace it with this one again this Dash but we only want to do it on the first instance of that forward slash and so as you can see all the ones that um all the ones that were replaced are the very first instance whereas the second instance which is the second time it appears in this string does not get touched so if we take this and we put it right over here and we move it to two it's kind of the opposite so the first one wasn't touched the second one was so we're choosing which instance or which time it shows up in that string and then it replaces it if you do not choose an instance it chooses all of them so this can be super useful if you want to do like a bulk replace um but you only want to do it on a specific column um and you just want to use a formula really quick right um and so you can use this in a lot of different ways so that's how you're able to actually do it with the first instance the second instance and if you don't include an instance at all let's go over to the sum uh this is one I think everyone knows how to use but I want to show you two other ones um as well so let's go to the sum and we're just going to do equals the sum and I hope you know what this is well not hope I if you don't know what this is it just adds up all the numbers in range so we're going to add sum means add so we're going to take this and it's going to give us the uh what all these salaries are together so super super simple Su is one of probably the most basic formulas that you can do um some if is a little bit different you can add an if statement which we learned right back here you can add an if statement and then add it if it meets a certain criteria all right so we're going to do equals some if and then you're going to need to give a range in criteria and you can include a some range if you would like so we're going to do the salary again we going to do a comma and now here's our criteria let's do if they have greater than 50,000 for their salary and close our parenthesis so now it's only going to add up if their salary is greater than 50,000 now his is 50,000 exactly so that won't count but we have 63 and 65,000 which does equal 128,000 so it it just gives a specific criteria or an if statement then it does the addition uh so super useful on that one so that is how you do a su if and Su ifs is kind of the same thing as we did back here there's the if and the ifs so the ifs is going to be if it has it meets multiple conditions so let's take a look at that one so let's do um equals some ifs now uh oops now the Syntax for this one is going to be a little bit different you'll see that in just a second this adds the cells specified by a given set of conditions or criteria so let's do an open open parentheses we give the sum range so let's do um the same one as before then we have our criteria range so what are we looking at What's um this is the area that's going to be added after all these if statements are done right so we have to initially set that now we're going to say okay what criteria are we basing this off of so let's put a comma and we're going to base it off of let's do this one we'll say um if the uh gender so we'll do comma if that's female oops if that's female and then we'll give another one we can say if they're female and let's say they are greater than oops greater than 30 and we'll close that up and it's going to give us 88,000 so female female there's one two right here so it's going to be this one and this one that equals 88,000 so that's how that works you're able to incorporate several different conditions into uh the sum formula so again I know this one's super simple but you you can use it in a much more complex way if you use the sum if and the sum ifs um almost the exact same thing for this count I'm not going to go super in depth into this one um I'll just kind of show you because count is um count and sum are kind of on the same level of difficulty they're both pretty beginner this is just going to give you a count of how many cells um are there so let's give this range um and so it's not going to add it it's just going to give us a count so if we do right here and scroll over them like highlight them this countdown here oops this countdown here is nine and so it's going to give us that count but we can do a count with conditions exactly how we did it in the sum so if we do count if Oops I did not spell that right if we do count if we're going to give a range and a criteria exact same as we did before so let's do this I me you can do this on basically any of these it doesn't really for this demonstration it doesn't really matter um but we'll say if their salary is greater than 45,000 so how many people this is going to give us how many people have a salary over 45,000 and that's five so before in the sum if if we did that um we did 50,000 it adds everything together the count is just going to count the amount of cells that meet that criteria and again count ifs uh we're going to have a criteria range and then we will specify what if statements we want to be uh to occur in order to count those cells so let's do we want you know we want to count it can be any range or it can be any of these we'll do the ID this time and now we can say you know want it to be is our criteria one we can say we want it to be greater than want their ID to be greater than 1005 and let's say we want them to be male so they have an ID over a certain um a certain range and then they are a male so there's only three people that meet that criteria and so it'll be Michael Stanley and Kevin those are our three people and so it gives us a count very useful to give quick numbers like this something I I genuinely use a lot and I know I've said that a lot during this tutorial but that's because everything I'm showing you are things that I've used a lot so I don't feel like um you know I'm speaking out of turn here let's look at this one this one is very um has some specific use cases um notice that this is a text right now um if you do it when it is uh in a date format it actually will not work I mean I can you can test it out yourself you just got to trust me it's not going to work so what this does is it's going to give you the range from this day to this day that's what it's going to do so let's do uh oops days it's GNA we want to choose our end date so this is our end date it's kind of backward from what you think end date to start date you think start date to end date so you have to start with this one and then we're going to choose the start date and now it's going to tell us how many um how many uh days was it from here to here and this one it's 5,56 so Network days is extremely similar except it takes out holidays and it takes out weekends and you can see how many working days has this person um how many working days or network days has this person worked not including you know weekends and holidays have they actually worked since their start date and their end date so let's do Network days and we need our start date our end date and you can specify extra holidays if you'd like but there are a already standard set holidays in there that it takes out um so you know if you want to do that you can so we're going to do the start date again this one's different this one says start date end date and then we're going to give the end date and if you notice they are going to be different numbers is dramatically lower because it's taking out weekends and holidays so this is how many days uh calendar days they've worked and this is how many days they've actually been in the office and worked and that is it um again there are so many formulas I mean literally hundreds of formulas that you can utilize and use and are out there for you to try out yourself if there are specific ones that I did not cover in this video please please put it in the comments below so that I can you know show you how to do these things I I I will say I've probably used a majority of the ones that you're going to put in the comments already and if I haven't used it I'll take a look at it and see if it's really useful and I'll show you that so thank you guys so much for watching I hope that this has been helpful I I feel like a lot of these things are not things that I learned before I started almost all these are ones that I learned while I was on the job and so I'm hoping that you can get ahead of the curve and you can learn learn these things before you actually start so that when you get in there you're just like killing it with the formulas and people are like whoa this guy is like this guy knows what he's doing in Excel give him all the Excel work and then you become like you know just the Excel guy um and everyone you know loves you for it so with that being said thank you so much for watching I really do hope this helped if you like this video be sure to like And subscribe below I'll see you in the next [Music] video [Music] what's going on everybody welcome back to another video in this Excel tutorial we'll be looking at [Music] xlup now if you don't already know what xlookup is it is a new feature in Excel to kind of replace vlookup or to be a much better option at least in my mind is a much better option than V lookup and so if you're someone who's either used V lookup a a lot and you're trying to you know learn this new Option or if you've never used it before this video will be super helpful because I'll walk you through kind of the options and what x lookup can do as well as the difference between X lookup and V lookup but before we get into the tutorial I want to give a huge shout out to today's sponsor and that is udemy udemy is the go-to place if you want a full-fledged course in Excel I have three options of courses that I have taken on em me so I'd highly recommend checking those out they are having a huge sale on all their courses during this time and so if you are in the market for a course I highly recommend checking out UD to me and getting one there now without further Ado let's jum on my screen and start the tutorial all right so let's get me off the screen because we all know why we're here so I didn't include this in the formulas video last week because I knew this was going to be a large one and a lot of people are going to want to know how to do this what the difference stream V lookup and X lookup is so it has its own dedicated video to it so let's get started it is a Formula so we're going to come in here in this cell we're going to hit equal and then we're going to start typing X lookup now I'm GNA hit tab in just a second but let's read what this says it says searches a range or an array for a match and Returns the corresponding item from a second range or array by default an exact match is used so really useful to know um we'll talk a little bit more about that in just a second let's hit Tab and it's going to complete it and it's going to start giving us or it's going to tell us what our input values need to be we're going to have our lookup value we're going to have our lookup array our return array and then some options things like if not found so if your option isn't found you know what will be um you know the the uh output that it gives us a match mode and a search mode and I'm going to show you um kind of how to use every single one of these things as you can see at the very bottom I've kind of already set up all of the instructional um instructional content for this video and so we'll kind of get through all these different scenarios so let's just start really quickly with um how to use it very simply with the lookup lookup array and return array so we're going to come in here and we're going to give it our lookup value Now Toby Fenderson right over here in A3 is going to be our lookup value so that's who we're going to be searching for now we're going to hit comma and now we're going to be needing to look up uh or to input our lookup array now an array is just uh you know a range basically so we're going to do this is where it's going to be searching for um that value this is where it's searches for A3 so here's Toby Fenderson here's Toby flenderson so it will find it in this array right here then we're going to hit comma and now we need to give it the return array what it's going to return on that row when it finds it so we're going to return his email keep it really simple so what it should do and let's close parentheses what it should do is it should take Toby Fenderson it's going to search in this column or in this array and then it's going to return the email when it finds Toby Fenderson so it's on Toby Fenderson is on row six so it's going to find Toby flenderson it's going to come over here and it's going to return Toby flenderson dundermifflin corporate.com that's what it should do let's see what it actually does said enter and it returns it now if we drag it down like this it'll apply it to all of these names right here and it works exactly how it's supposed to um again if you have never used vlookup you don't know how good you have it okay vlookup um was extremely useful but just uh a bit complicated and I'll talk about that near the end of the video when we compare V lookup to xlookup but just know that if you're using X lookup for the first time and you're just getting into using Excel you guys have it good okay so just know that um now let's go over here to X lookup multiple rows because you can return more than one output with um with X lookup so let's go right in here and we're going to basically write the exact same thing as we did before so let's write X lookup we're going to do Toby flenderson as our value we're going to search here and we're going to do something a little bit different this time we want to include our end date and the email so what we're going to do is we're going to start here we're going to go down all the way to the bottom of end date and then we're also going to include the email and when we do that it will uh in the output give us a row or a column for end dat and a column for email so an output for both so let's hit enter and now we can see that we have the end date here and the email here now one of the downsides or or something that I'm not a huge huge fan of is well first off I love that you can do this that's fantastic um but it have to be right next to each other so you're only going to get that output exactly how it is in the columns so if I went and did this range um I would include all of that um so H you know let's just for example let's pull that down here so let's take this and put it right here if I did instead of zero or or O2 to P10 if I included age to email this whole range and I hit enter it's all going to be included so you know that's one of the small downsides of of that functionality of when you can use multiple rows is that it's going to use the rows exactly as they are you can't really customize it within the formula you can move around um these columns to how you want it um so that is something to note and again you can pull this down and it'll be applied to all of those names let's go over to X lookup exact match so let's open this up we're going to do equals xlup as we've been doing and we're actually going to be looking at the if not found and the match mode U both you know on this tab right here so let's do what we've been doing before we take our value that we're looking up we take the array that we're looking and we're going to do the email and you know as you can see this says Toby flender and not Toby flenderson so what we are going to do is we're going to hit comma and if it's not found you can return um a value or a string that you want to return now for simple purposes or for simple instructional purposes we're going to do not found and then we're going to close that off so let's do this and Toby Fenderson was not found and so it was returned not found if Toby Fender was actually in this full name then it would have returned the email and then if along the way you know one of these was not part of it then you know we would have uh we would have had the KN found all right so let's go right up here we're actually just going to copy this uh because I want to reuse it um and then we're going to go right here and we hit a comma now this is our match mode option and so we have four different options that we can choose from a zero is an exact match and that is by default that is what we have or what we use then there's a minus one that's an exact match or next smaller item then there's a one which is an exact match or next larger item and then there's a two which is a wild card character match now we're going to do that and we are going to um you know try this out and it's not going to work and not just because I forgot to put A4 um it's doing it because it's searching for Beasley but if there's not a wild card option already put in here um it doesn't recognize it so we need to indicate where that wild card needs to be so we're going to do a double apostrophe or quotation marks we're going to put put an asterisk right here and then do another one and we're going to hit an Amper sand so we're going to have an Amper sand right here and when that's going to say is anything that comes before A4 anything that comes before Beasley is okay doesn't matter what it is as long as it has Beasley at the end that is going to be okay so we're going to have Pam that comes before Beasley and that's going to tell it and it's going to say okay I know that anything that comes before Beasley is all right and so when we hit enter is now going to return the output that we are looking for and we can include that on these as well now this one is Meredith um and so Meredith is at the beginning so we have Meredith Palmer so we can actually take this and we're going to put this at the end put the Amber sand right here and now it'll work and the exact same thing for Kevin Malo right here Kevin Malone so it just didn't include uh the ne at the end and so it's still going to work if we include that asterisk at the end now I know I said we were looking at search order but I'm actually going to kind of give you an exact match uh first and then search order but it just kind of easier to show it over here so I'm going to do X look up I'm going to look up this value do a comma here's the range this is our start date that's it's going to be looking for and I want to return the full name now no value in here has one one 2000 but what we can do is we can do comma and then a comma for the match mode and do an exact match or next larger and I know this is in the exact match part but it you know kind of refers to search ORD a little bit um where it searches for the next largest value that's that's what that number one represents the next larger value so we have 112000 and if we look right here the next value above 112000 is 152000 and so it should should return Angela Martin let's see if that works and there it is now let's look up the actual search order um so let's do equals x lookup this is the value that we want to be searching for and we're going to be looking in this start date and comma and we want to return the name now let's get over to search mode now the search mode performs a search starting at the first item so at the very top going down so by default it searches from first to last but you can reverse that and do search from last to first or you can do a binary search which is where it sorts in ascending order or sorts in descending order um and that's with the actual value and so we won't be able to show this binary search or on ascending or descending because our values are the same but if we had different values and we were looking up um using this um next largest we we would be able to show that but I'm going to show you the search from first to last and last to first so let's put in by default and this is what it would be search From First to Last what the default would be so it starts at the very top it goes down and finds the first 56 2001 and returns Toby flenderson now if we go in here and we hit minus one that is going to search from last to first so it's going to start at the bottom and go to the top and the first one that it finds is Michael Scott so that's that first one starting from the bottom and then the Michael Scott right there so these two the exact match and the search order can kind of be combined into um this one right here we're using this one um which is you know exact match or next larger and you can include that in this binary search in this one as well all right now let's head over to the X lookup horizontal I think we're we only have a few left yep X look up horizontal then we'll do X lookup with sum and then I'm going to show you the V lookup at the end so let's go right here let's say equals X lookup the value that we want to be searching for is February that's what we're looking for hit comma and where do we want to search to find February we want to search in uh these calendar months and then we hit another comma and now we're going to be searching for paper so let's do paper and we'll hit enter and it found February and it return paper right here and we can do that for paper printer and manila folders and so it's going to give us the 310 the 40 and the 118 from February now let's go right over here to XL up with some um I actually it's basically a carbon copy of this uh let's take this over here real quick and place it right there because it's the exact same thing except at the end we're going to use I'm going to show you how to use sum with the X lookup at the same time now um we're going to be using the formula sum and so we're going to do sum and then within the sum our first number is going to be an X lookup and then our next value is also going to be an X lookup so let's do X lookup and now we're going to search for our very first value oops our very first lookup value so we're going to go to i1 and then we're going to search this again and we want whatever value oop goes into that so let's close that parenthesis and now we're going to do a colon and another X lookup and now let's do March so now we're going to search for March we're going to do our search range where we're searching for that March and we want the paper as well and let's close that and then we also need to close that parentheses so now we are basically adding this February and and this March so it's going to be 310 plus 150 it's adding those um two values and it should be uh what 460 so let's see if that is our output and it is so you can do this with a lot of things not just some but you're able to use x lookup within different formulas if you're searching for a specific value and a specific value um in in another um cell you can add those together using X lookup which is honestly it's pretty great so let's go over to V up so I wanted to show you this because I wanted to show you where it came from and what we used to do um unless you are continuing to use V lookup and what we can do now so X lookup I just showed you kind of everything um but super quickly I'm going to show you how vlookup used to work um in a super short way so that you can understand how it used to be used and how it is used uh how X lookup is used now so let's go in here and we're going to say equals and we're going to do a vlookup and so we have a lookup value Val and so we're going to click this we're going to hit Comma just like we did before and now we're going to do a table array and the table array is a little different in that you're searching an entire area so let's do uh H2 all the way through o oops o10 so that's what that's what our table array is going to be then we're going to do a comma and now we have to do a column index number which number um are we going to be um searching for which um value are we going to be searching for in here and so we want to search for eight because this is 1 2 3 4 five 6 7 eight we want to return that email and we're searching for the name right here in this very first column so we have that comma and we're going to do eight and then in the range lookup you can do true which is an approximate match or false which is an exact match and we'll do false I don't know why it's not Auto auto doing it but there we go and now we will do it and it's going to return it just as we had it um a lot of people uh I guess not everybody but some people didn't like and the reason why they created X lookup you had to do those ranges and if you ever went in here and then we let's say we um added another column which happens to data now it gives completely different um different data so let's say for whatever reason we added uh address so now we have these people address well now it's going to give us a different um value it's going to have this end dates because if we go in here now it doesn't um now the eighth is this end date and the ninth is this email so if you have a vlookup that you use for um you know a calculation or a table that you've created or different things in Excel you then have to go through here and manually change this and so a lot of people didn't like that CU if you you know needed to change data or you needed to change something or add an additional column you'd have to go back and fix all of your vlookups they wouldn't just automatically U Move with it which is what happens with xlookup and just to prove this uh let's go back to the very first one which is the X lookup and right now the email is looking at O2 and through o10 um we're just going to insert right here and that would be our new colum we'll do address oops address and notice that it hasn't changed and why is that because it auto changed for us from P2 to P10 understanding that it wanted to stick with when something was inserted here it wanted to stick with the original data the original array that was selected and so xlup does that work for you and it makes it a little bit easier to automate things and create these processes in Excel without having to go fix it later which you had to do with lookup so that is it for today I hope that you know how to use x lookup a little bit better now that you have watched this uh if you enjoyed this video be sure to like And subscribe below and I will see you in the next [Music] video what's going on everybody welcome back to another Excel tutorial today we'll be looking at conditional formatting [Music] now if you've never heard of conditional form mounting before that's okay I had never heard of it before I became a data analyst and so now that I've been using Excel a lot of course I use it quite a bit and so I want to show you how to use it conditional formatting is basically just a way to see patterns and Trends and data and that's a super simple way of putting it um but it's very easy to use and so hopefully I can show you how to use it uh really easily in a lot of the things that I use the most and some of the things that I use it for so that you can also know how to use conditional formatting now before we jump into the tutorial I want to give a huge shout out to the sponsor of this Excel series and that is udemy you guys know by now that I absolutely love udemy I've been using them for years and I've taken literally hundreds of courses on udemy and I've learned so so much especially when I was first starting out as a data analyst uh I learned a lot through their Excel courses on udemy and so I have actually put the ones that I really like and I have taken and enjoyed and think you would as well in the description so if you want to take those sure to check those out again huge shout out to UD me for sponsoring the series now without further Ado let's jump onto my screen and get started with the tutorial all right so let's jump right into it on this Home tab right here if we go all the way over to the right there is conditional formatting and the description that it gives us is easily spot Trends and patterns in your data using bars colors and icons to visually highlight important values and that is exactly how I would have defined it a really good job Microsoft exactly how I would have done it so what you'll see right away is there's nothing too complex so we have some highlight cell rules um we have some top bottom rules data bars color scales icon sets and then at the bottom we can create a rule we can clear the rule and we can manage our rule so if you create a rule then you can manage it so we're going to start with these icon sets and I'm going to show you how to use those and we'll work our way to the top and then I'll show you how to create some rules yourself and how that all works so let's start off with the icon sets I'm going to go over here to sales um and for this data we kind of have this um you know Trend or or pattern that you can kind of see over time so over the months um so if we go right here and let's use that conditional forming let's use that icon sets and right here we can use these directional so you know we have this kind of Time series each month that shows us how much paper they're selling and if we do this right here it's going to show us if it's kind of average or if it's below average or if it's above average or if it's going up so at a really quick glance you can kind of see the pattern of this data set it's kind of going mostly yellow and red there's only two months where it's going up significantly now we don't have to only do that for one row or one column you can apply to all of them but as you can see all of these are red now why are they all red it's because they're using numbers for everything so they're comparing these 24s these 50s and 65s against these 450s and 750s and so they're all going to be red but if we do it individually if we do it each row if we take it just like this and then we go to Icon sets and do it it's going to be much more representative of the actual printers not of all the numbers as a whole and you can do other things uh the arrows are ones that you'll probably see the most often that's the one I've used if I ever do use them um but you can you know do ones like this where they have you know kind of a trend upward or a trend downward um and so there's just several more arrows this one only gives you three as you can see this one gives you five um and you can do you know colors or shapes or or different indicators and all these different things um and honestly it's kind of whatever you want to use whatever makes sense for your data but you know I've really only ever seen like these colors being used I've never really seen these flags or anything like that but again it just depends on what industry you work in you might you might see that let's go right over here to the demographics um and let's look at our color scales now color scales are going to be the probably the most obvious thing that in datab bars are going to be the most obvious things in here um if you go right here and and you look at this color scale if it's high if it's among the top ones it's green the lowest it's red and you can change that um to really any colors you want any colors that they offer you um and it it does exactly what it does it's a color scale a gradient of the colors from high to low or low to high and so any color that you do you'll be able to kind of see um you know what's good and what's not good that really is um color scales in a nutshell data bars are again super super straightforward it's going to be either a gradient fill or a solid fill so let's look at the gradient fill if we do a blue gradient fill I'll actually let's get rid of our um let's go over here let's go to clear rules from selected cells we haven't looked at that yet but that's how you clear it let's go to data bars and we'll use this blue gradient so with this blue gradient you know this one is or sorry this one is the highest one so it's going to be completely filled and this one is 36,000 almost half of this I'm pretty close and so it's almost half um this one again you know it's not used very often I you don't see these a lot to be honest you just don't um but if you do see it that's how you use it that's how it can be done again pretty easy uh as I just showed a second ago if you want to clear the rules you can clear from the selected cells that's what we're doing so I have column G selected and I'm going to I'm going to clear that if you want to clear the rules for the entire sheet you can do that as well so it would affect every single column and row we'll just do this for now so now let's go look at the top bottom rules so so this is the top 10 items top 10% bottom 10 items bottom 10% above average and below average and they're going to do exactly what you think they are going to do if you select above average it is going to select or highlight the cells that are above the average in column G so let's look at the salaries that are above average all right and so uh the ones that are at the very top are Michael Scotts Toby flenderson and Dwight shro uh no shock there um I believe the average is somewhere around like 48,500 or something so I think this one just is just below it and so all these other ones are below average and that's just because you know Michael Scott and Dwight Sho are and Toby are kind of bringing up that average quite a bit so everyone else is going to fall beneath that so at a super quick glance you're able to just highlight the cells and you're able to see who is above average and you know you can do this in a lot of different ways in Excel but this is just a really simple fast way to do that um let's get rid of that real quick and let's go back up here and now we can oops let's go to top bottom rules and now we can see the below average and it's going to highlight all the other ones and so it works exactly how you think it is going to work and this is the default way that it highlights these cells so it highlights them this kind of um seeth through red and then it highlights the actual text or or the um characters in there red as well now I'm not going to go through and show you every single one of these top bottom rules I think they're pretty self-explanatory I just kind of wanted to show you what happens when you do use one of them it's going to highlight that cell so let's go up here to the Highlight cells rules and honestly these are the ones that I use by far the most uh all these other ones combined I do not use more than this highlight cells rules um and the one in here that I use more than any other conditional formatting rule is this duplicate values so I'll start with that really quick and I'll kind of show you a few few of these other ones but this duplicate values to me is one of the most useful ones um and so let's kind of show you how that works if we go to the start date you can see that we have a duplicate value right here and if we go over here to conditional formatting highlight cells rules and duplicate values it is going to highlight um the uh duplicate and that says duplicate right here now we can go through here and click on unique um and then it would highlight all the ones that are not duplicates um so you can use it you know kind of in a similar inverse way uh it's just different different but I use the duplicate almost always um another thing that you can do is go over here and you can change the color um or you can even do a custom um which I never do that it's not um something I spend a lot of time doing I typically just stick with this one so you can do that and it's going to highlight um you know something that has a duplicate value in there now why do I use this so much well I work with a lot of different types of data sets but one thing that you'll find in almost all of them is they have some type of ID and they're going to have some type of um personal information whether that's a social security number or an address or um you know or a cell phone number or something like that there is going to be data that is going to to identify that person now I work a lot with pharmaceutical data a lot with Pharmacy data um as well as Healthcare data so like names Social Security numbers addresses phone numbers all those things all that customer or or client information and oftentimes when I get a new data set and I have it in Excel or I convert it to excel I will start using these duplicates to try to find issues with the data and I find them all the time either there's an employee ID or some type of customer ID or client ID that has a duplicate in there that should not be in there or there's multiple Social Security numbers or there's an issue in some other way and I'm able to find those things and spot those patterns using this duplicates and I promise you I use this one almost every single time I open a new data set or I work with a new clients working with their data um and so I wanted to show you this one I wanted to really press upon you that this one is a really really really good one to know and learn how to use it's not complicated it's not hard it just shows you you know you know if there's a duplicate value but I wanted you to know how I use it and how often I use it so that you can you know pick that up and put that in your tool kit in your back pocket so that you can use that later on if you have uh if you have a similar need or if you're trying to do something similar to what I was just talking about so that is how duplicates work again super great it's obviously not super useful when you're only using um 10 rows but when you have you know 50,000 100,000 and there should be zero duplicates in there and you highlight it and then uh you come right here use the filter and we're going to filter and we're going to sort by the color and it allows you to sort by the color and you have duplicates in there then that's a problem and you identified a problem super quickly uh and you know some of those things they slip by because nobody checks it and so that's something that I I often check and if you go here and you sort by color and there isn't an option to do um this this pink red color and that means there aren't any duplicates and that a really good thing most of the time that's a really good thing so let's go ahead and we're going to clear that as well as get rid of our conditional formatting rules now another one that I use a lot is this one right here which is the text that contains honestly this one comes a lot in handy especially when you're looking for like a specific keyword in my uh case a lot of times I was using this when I was going through drug names I am not a doctor I do not pretend to be a doctor and so when I was looking for laraza Pam or something like that um I would just search for like lorz or something and and not Lorax but loras you know I I would just search for it and then all the ones that contain that would pop up I can bring them to the top and I can see them and to me that's super super useful and I would do that all the time and so in this case we're looking at emails and let's say we all only wanted to pull all the ones that are Gmail and so now we can go through and we can you know click okay and that's going to pop up or we want all the ones that have Dunder oops Dunder Mifflin and if we click on that all the ones that are Dunder Mifflin come up or have done their Mylin in it and again we can um sort by or we can um and so we can sort by right here and we can bring all those to the top and so super super useful um and another use for it that you may not think of is something like if it's you know there's some incorrect data in there this happens often with phone numbers addresses um start dates or or or dates in general date formats where you can go in here and you can say text that contains and if you know you put in a oops a dash and it has it in there then you know that that is that is wrong now that is really all I wanted to show you in the Highlight cells rules uh the duplicate values and the text contains are by far the ones that I use the most all the other ones I have used um these ones not so much but in these highlight cells rules I use you know these two all the time um sometimes I use this between I don't really use these other ones as much although I have used them and so you got nothing else from this video I just wanted you to know that these two are super useful and if you haven't used them before to maybe try them out and see if you can apply them to your own data sets now we've looked at all of these preset ones in conditional formatting but you can also do a new rule and so if we click on new rule right here and we go down to use a formula to determine which cells to format we can add our own formula in here that will then highlight exactly what we want and so if there isn't a preset rule that you like and it doesn't have the option that you want you can do almost any formula that you want in our formulas video that we did a few weeks ago and you can put it in here and then you can format uh what you want the cell to look like if it meets that criteria so let's take this right over here um and before we start this formula I just want you to note that you know I have h11 highlighted that's going to come into play in just a little bit but I want you to be aware that h11 is the cell that we're highlighted so what we're going to do is we are going to create our formula now if you've never created a formula I highly recommend uh watching my formulas tutorial because that is going to show you how to do this um but we're all we're going to do is we're going to do equals that's how you start the uh how you actually create a formula and we're going to give it this range right here and so it's going to take everything from G2 to G10 now these dollar signs are super important if you don't know how to use them or you don't know what they do um you're going to mess up this formula a lot uh and so what this dollar sign basically does is it's basically hardcoding it in there it is only going to look at G2 and is only going to look at G10 or through G10 because that colon and this can come into play because if you have something selected like the h11 it's going to mess it up because now if you have h11 selected like we do you'll see this in a second it's not going to be applied to this um and again I'll show you that in just a minute but we don't want this hardcoded in there okay but we do have to select the proper range in a second um so we're going to get rid of this we're going to get rid of the dollar signs because we want to pretty fluid and be able to applied to be applied basically anywhere we want let's go into this formula um if it meets our criteria let's give it um let's give it a border and we'll give it um we'll give it some color we're going to say if this is greater than 50,000 so let's hit okay and nothing happened so let's go back and see why so if we go to our manage rules you can see that so as the G2 to G G10 is greater than 50,000 but it only is being applied to this h11 cell which really makes no sense um so if we had wanted to get it done the first time we needed to have basically selected that G2 to G10 right away um but we can do that now so let's get rid of this and we're going to say G2 to G10 and that is hardcoded in there that's should be fine still um but let's see what it does and so now every every single thing is highlighted and why is that uh that's because when we changed it it also changed the format of it because we changed the cell that we were looking at so we need to come back here and that's why again you want to do this the right way the first time we're going to come back here we're going to give it this range and we're going to get rid of these dollar [Music] signs and now we're going to hit okay and so now it's being applied G2 to G10 and G2 to G10 and we'll keep it like that and we'll apply it and now it works properly so now everything that's above 50,000 is being highlighted again if that was confusing um it it is confusing it genuinely is and so if you wanted to do this right the first time without having to make a bunch of changes you'd want to highlight these before you start and then you want to go in and create the rule we'll do this really quick just to kind of show you what I'm talking about we'll say equals we'll give it this range get rid of these real quick because again I don't want this hardcoded in there it will ruin our formula and then we'll say greater than 30 um and we'll give this nice green uh and so now if they're over the age of 30 it will be highlighted and we didn't have to go back and change anything we didn't have to go back and fix anything like we did in the first one um that was all for demonstration purposes but again you need to really be aware of that that is something that I think think almost everybody's going to mess up at some point if you don't already know about it then you definitely are going to make that mistake now if we come over here in this area uh we go to our manage rules and not just the current selection but this whole worksheet then you can see that we have these two formulas now you can go in and edit any of these by double clicking or clicking on it and then hitting edit rule you can also delete these rules or duplicate these rules um I just wanted to show you what you are able to do with them but if we uh go ahead and we get rid of this um so let's say we delete that rule and we hit apply uh you know the rule is going to go away that's that I mean it's as simple as that so that is how you can create your own rule I want to be again very specific in the fact that that is a confusing piece and if you mess that up you're going to be you know fixing a bunch of different stuff and not understanding why your rule is not working properly it's just because it's confusing those dollar signs are are really important to watch out for and that is all all there is to it with conditional formatting again conditional formatting is um you know it's not anything super confusing we've looked at more complicated things but it's a really really useful tool to use to look at these patterns and Trends super quickly and to find um these outliers or these specific values that you're looking for very quickly and if you're looking at just thousands and tens of thousands or hundreds of thousands of rows this is one of the fastest ways to find these things without having to kind of wait and filter and use these um these these filters right here because again this can just take forever um and so if you haven't or if you've never worked with a ton of data and tried to use this before it can take honestly like 10 minutes for something simple that you could do with conditional formatting in like 10 seconds so definitely something to mess with and use when you are working with your own data sets uh I hope this was helpful I mean honestly I use this all the time so you know I hope that somebody out there can can use this uh for their own work that they're currently using thank you guys so much for watching I really appreciate it again huge shout out to you me for sponsoring this Excel series if you like this video be sure to like And subscribe below I'll see you in the next [Music] video what's going on everybody welcome back to another Excel tutorial today we will be looking at [Music] charts now if you have data in Excel and you want to visually show that with bars or graphs or anything like that you can do that really simply and I'm going to show you how to do that today and a lot of people are a little bit intimidated because they think it's a little bit complicated but I promise you by the end of this video you will know how to do it like a pro it's not that difficult it's just you need to know where to look where to click and how to actually filter through things to make sure that you're visually showing the things that you want to show but before we actually jump into the the tutorial I want to give a huge shout out to the sponsor of this Excel series and that is udem me you may not know this but I probably get at least 15 to 50 companies every single month reaching out to me wanting to sponsor the channel and promote their product and I turn down almost every single one because I either don't know their product or I don't believe in their product and so I'm not going to you know go and promote that on my channel but unud me is one that I have consistently promoted over the past year and that's because I truly believe in their product I've been taking courses off their platform for years and I've honestly learned so much and I cannot recommend them enough so if you want to take a full-fledged Excel course I have my recommendations in the description if you want to check those out thank you again to UD me for sponsoring this Excel Series so without further Ado let's jump onto my screen and get started with the tutorial all right so let's jump right into it right here we have the Dunder Mifflin sales report and over here we have all the products that they were selling along with the months that they were sold in and so in January they sold 450 reams of paper down here we have the total it items per month and so in January they sold 898 units of uh products or or things that they sold at the very end we have the year end total so this is the total amount of paper that they sold throughout the year now we're going to use this data right here for all of our charts now you may not have data exactly like this it can come in lots of different flavors but you're going to get the basic gist of how to use charts how to edit it how to customize it to fit what you need and then we're going to kind of put it right over here and kind of create its own sheet where we can kind of visualize all the things that we want to show so let's jump right back over here into sales and first thing we need to do is kind of highlight the data that we're going to be working with now I'm going to start with everything but um you know I'll show you along the way we don't actually want everything but we can filter that stuff out as we go so let's go right here and we're going to insert and we're going to go over to charts now this is the chart section there's lots of different types of charts um but the first thing that we're going to be looking at is right here this is a 2d column or kind of like a bar chart and we're just going to click right here and we're going to pull this down so now that we have this down here there are a few things that I want to show you before we actually really get into it I kind of want to show you the options that you have so if you go up here we have different uh chart Styles and so if I hover over them you can see that each one kind of looks a little bit different and it really doesn't matter uh it doesn't really change the data in any way just how you visualize it and so if that is important if that is something that you um you want to stick with a certain theme or a certain look then go for that uh the other thing that's really nice to have over here is this switch row and column so right down here you can see this purple and you can see this red those are our rows and columns and we can switch that right here so if we go like this now instead of the months being right here the months are the colors and the actual product is right here let's click it again and it'll go back and so now we have this kind of Time series now we have January through the end of your total now this one is one that I think is super helpful you know it you can do it down here as well if you go to this filter um but both of these are super helpful because you sometimes just want to select all the data and then kind of get in there and mess with with it something that we want to get rid of is this total items per month so we want to remove that and then we also want to remove this year-end total because both of those are are kind of the end result they're not the actual data per month or or per product so we're going to get rid of those and we're going to apply that and as you can see just right off the bat our data is changed dramatically uh and that's because we aren't including these these large large numbers that were kind of throwing off uh the visualization for us so this one right here as is is already pretty good um what we can do right here is we can change this and we're just going to say products sold per month now what we can do if we want to move it to another um to another sheet is we can actually move the chart and we can select where we want to move it we can move it to chart sheet and we can do that or something that I do um almost 99% of the time I just copy and I come over here and I'm going to paste it and so now we have this um this chart right over here as well as back here and so I typically tend to do that because now we can still go over here and change this one as much as we want so if we want to go in here we can alter this one and it won't affect the other one so we just have basically two copies so we're going to keep this one right here this is going to be our first visualization um and as I said said it's it's fairly straightforward if you've ever done any types of charts or graphs before um right here it's January February March April May and if you hover over these you can see that that's the the paper and if we just glance you know the paper is their biggest product by far and so that blue um which is their paper is going to be the biggest every single month so that makes perfect sense now what if we want to change up uh the the kind so what if we want to change up the kind of visualization that it offers us well we have a lot of different options let's go right over here to change chart type now this is going to offer you just about everything you could possibly imagine or want and even things that you absolutely would never ever want ever um and so I'm going to show you some of the good ones and I'm going to show you some just absolutely insane ones that uh Excel came up with which cannot I could not imagine a scenario that these are ever used um but Within These columns you can do they're called cluster columns uh these stacked columns so would look just like this those are often used as well um and then we have ones that they're just not used often let's look let's take a look at this one right here I mean it's tough it's tough to look at um but let's let's put it right here this is basically the same thing that we just had except visualized in a different um we'll call it more unique way uh and let's for the sake of it let's put it over here um these two things show the same information they show the same data just one is shown well and one is not shown well um I'm not a fan of these 3D type of visualizations I I just don't like them but maybe you do and and you want to use that that's fantastic let's go back um something else that you'll probably use a lot are things like these um these line graphs okay so these are line graphs and they're different types so they're these stacked um 100% stacked line lines with markers different flavors for this this type of line graph and so you can go in here and take a look again um not my favorite but they have it as an option if you CH so choose to do this um but I kind of I'm kind of a simple guy um but I'm going to go in here and it's pretty cluster um I want to kind of take the ones that have the highest sales or the highest total amount sold so that would be paper manila folders and three ring binders so let's go in here we want to keep paper we want to keep uh manila folders and we want to keep three ring binders and let's apply that and so now it's a lot cleaner and we're just going to copy this and we're going to put it over here and I'm just putting these all over here for you U because we'll look at this at the end and just kind of see different options and and ways to do things as we have gone through this tutorial so let's go back here now something else that we haven't looked at is the actual colors and color schemes that you can do so let's go right here to these chart Styles and we can go to color now color is um something that probably is quite overlooked um in actual charts and graphs some terrible colors like this or or this um where they're really close together especially when you have a lot of them um for example let's just pretend we put all of them back really quickly it is near impossible to distinguish these colors um we wouldn't we wouldn't want that let's go back to this color you know when you have it like uh in some of these colors at least it at least distinguishes them so you can kind of see what you're working with with um but when you have it in these monochromatic options sometimes they're just impossible to distinguish so be sure to choose the right colors that you're using so that if somebody who's never seen this data before looks at it they can easily distinguish uh the product and the month that you are looking at but let's go just back up here we'll choose this default option um well let's choose this one right here this one's nice although there's lots of yellows and oranges let's see this one this one's not bad greens blues uh and like yellows so that's nice um other things that we want to look at and there are these chart elements right here other things that we can add are things like data labels um and right here it's super messy um but if we went back and we got rid of some of these things like the printer Staples highlighters pens and total we apply that it's a little bit easier to distinguish um and that's you know something that you may be interested in doing you can also add this data table at the bottom which is the actual columns and rows that you have for this visualization right here now let's expand this quite a bit I'm going to make this extremely large if you have something like this it actually can be pretty nice um you know maybe we get rid of these data labels but it can be easy because you're putting it all in one place you can also make this two separate visualizations so you can have one visualization just like this and right underneath it you can have the actual rows and columns but this option allows you to put it all in one so let's put this back down because that is way too big and uh wait let's expand it a little bit now if you notice right here we have our Legend up top um it is possible to actually change that you can go right here and you can move this um kind of wherever you want um but it's not exactly easy to put based off how we have it right here if we go into to this chart elements we go down to Legend and we hit this little arrow right here we can select it on the right the top the left and the bottom or we can just go to more options uh which allows us to push it anywhere but um let's say I want to do it just like this I'm going to put on the right and I actually want to bring it down right here and you know that's just an option if you want to kind of customize it a little further makes a little cleaner uh you can do that with almost any of these things so if you click on this oops if you click on this you can move this anywhere as well so if you want to move this over here on top of it you can and make it look terrible or you can move it uh right back over here you know this is something that you can move around uh you just kind of want to make sure you're doing it the right way so let's get this back where was there we go now before we go any further let's copy that and put it right over here with our other uh charts and graphs and if you see over here on this side we have this this format chart area notice I haven't showed you this at all yet that is because I genuinely just don't use this almost at all um there are some good stuff in here um and I'm sure that you know if you were someone who really wants to go in there and super customize it you can do that um but I honestly I just never get in here and I never you know change the glow or the Shadows um just not something I use and some of these are only for these three 3D formatting which I never use and so I'm not going to show you and walk through these things again I I really don't use it and so if you want to go in there and mess with it uh you know by all means go for it it's just not something that I want to take the time to show you and with that being said let's go back over to this chart sheet that we have and it was super super easy to get these um charts and graphs and and and whatnot there are lots of different options again if we go back here and we go up here to chart design and go to the change chart type and again there are a ton of different options like a pie chart um like this it's it's you know you can try to figure this out and use these um but you know I wanted to show you the ones that you'll probably use the most which are these columns and line charts and they all kind of are similar in their own way this bar chart is basically you know this column chart just on its side and so they all have their different flavor they all have their different way of visualizing the data but but in essence they're using the data in a similar way to to visualize it and represent the data itself especially things like these box and whisker plots or these waterfall charts uh you know these are things that usually require specific data to kind of use uh and and so I'm just using data that you'll probably see the most of um like this this sales data so I hope that this given you a pretty good um you know quick understanding of how to use these how to customize them how to copy and paste them over to to a different sheet to create some type of little uh chart and visualization sheet that you can use to show your employers and and visualize the data that you are working with thank you guys so much for watching I really appreciate it again huge shout out to you to me for sponsoring this Excel series if you like this video be sure to like And subscribe below and I'll see you in the next [Music] video what's going on everybody welcome back to the Excel tutorial Series today we'll be looking at how to clean data in [Music] Excel now knowing how to clean data in Excel is actually extremely useful and there are a ton of techniques to do this I'm going to be showing you the ones that I probably use the most I feel like are the most helpful to kind of do the bulk or the majority of the data cleaning that you're going to do in Excel like I said there's so many different ways and very specific things that you can do but I'm going to highlight some of the bigger ones that I find the most useful and some of you may be thinking well I'll just do my data cleaning in SQL or python or when I get it ready to put it in Tableau um but honestly a lot of the data cleaning at least a lot of the big stuff I tend to do in Excel IF the data set is small enough to fit in Excel and so I think it's actually really really useful to know how to do this because you'll most likely be doing it more than you think now before we jump into the tutorial I want to give a shout out to the sponsor of this video and is brand new sponsor it is unlocked by Z by HP unlocked is a movie that's actually broken up into four parts and each of them have a unique data science challenge associated with it now I'm going to read this next part because it's extremely interesting each challenge represents a different topic so there's data visualization text analysis audio signal processing and computer vision and you can submit your answers in your work on their website for a chance to win one of 10 zbook Studio laptops or a free trip to the kaggle World Championships so I'll leave a link in the description where you can go watch the movie and then do the challenges and then submit your answers for a chance to win you should also go check out their hackathon where you can do these projects with other people just like you who are trying to figure out these answers and submit them to win as well so go check that out thank you again to the sponsor of this video unlocked by Z by HP now without further Ado let's jump onto my screen and get started with the tutorial all right so let's jump right into it I have this US president data set I got the base data set from kaggle uh but I added some of my own data and then I messed some stuff up as well just to kind of um demonstrate some of these things that we're going to be looking at today this is not a full project so you know we're actually going to be using this to create any visualizations or anything like that so you know all this is just for demonstration purposes but we will be doing a full project in about two or three videos uh in this Excel Series where we're going to be doing from start to finish with a real data set so you know if that's something that you're you wanting then we will absolutely be doing that now something that you may be wondering is how do you actually identify what you need need to clean in the data what do you know to look for well some of the obvious things are things like formatting and standardization so things like you know this James Monroe is in all caps that happens all the time within real data um and and so you know you want to standardize that or this all lowercase you want to standardize that you want that all to be the same there's also things like um right here or we have this wig and this wig with a bunch of random stuff after it this happens all the time where it's not completely standardized um and you may even notice um you know there are some spelling errors in here and I'll we'll kind of look through that in a little bit and then you know there are things like additional spaces where there shouldn't be spaces there are things like currencies that you need to be aware of if you were importing this into or going to be importing this into a SQL database um things like currencies can be just a problem or be really um unnecessary it may actually cause more issues in the long run so you may just want to you know take that to the base value and then dates are always an issue always always always um so always look at your dates make sure they're they're formatted correctly make sure they're all the same these are the types of things that right when I glance at this data set these are things that I'm looking for um one other thing that is actually the first thing that we're going to start out with is you want to make sure that your data is not duplicated because if your data has duplicate data in it and you don't want that it's not supposed to be there there are some specific use cases where duplicated data is okay um you know you want to get rid of that and it's very easy to do in Excel uh the first thing we're going to do we're going to go uh to this data tab we're going to go right over here and we're going to get see if there's any uh duplicates in our data so we're just going to go up to remove duplicates it's going to automatically choose all of your columns to to check against so it's going to for from a all the way through I it's going to see is the exact same data in all these rows and if it is it's going to get rid of it um and so we're going to click okay and it did find one duplicate and I'll show you that one real quick um because you know it was right here so Barack Obama was here twice and then I'm going to hit control I hit control Z to go back I'm going hit control y to go forward and it removed that uh that row completely now in this example you may be able to spot that with your eye but in a real data set where you have 10,000 100,000 rows there's absolutely no way you're going to see that or very very unlikely that you are going to see that there's duplicated data in there so just running a a a quick um dup or or removing of duplicates that is really important to make sure that you um have gotten rid of those things so that's one of the first things that I do um we're going to go into a lot of these different uh columns and I'm going to kind of show you different techniques or things that I do when I look at actual data so I'm going to come right over here I'm going to insert and this is what I actually do I I usually create a separate column especially when I'm working with this because I don't want to change this one um I don't want to go in here and you know say um equals upper equals proper Etc there's a lot of different ways that you can change um names or not a lot but the main ones that you can change names and all of them are completely okay so for example I'm going to hit equal upper oops upper and I'm going to go like this and close my parentheses so I selected this S I close my parenthese hit enter it is and I'm going to hit um in the bottom right I'm going toit double click this and it's going to apply to all of them it is completely okay to have your data like this if you want it to be like that um if you want it to be all lower you can do that if you want it to be in proper case you can do that um there are oops there are different um uses for all of them and honestly as long as it's all the same typically it's okay but if um you know for example if you're selling this to like a third party company or something like that they may have um what they want for their ingestion process when they take your file in if you send you know a weekly file or a monthly file they may want it exactly how they want it and you can change that to to what they want um but as long as it's standardized for you it's all the same for you that is a good thing so now we have all of these um in the proper case that's typically what I I do or I use upper those are the ones I use the most I don't usually use um lower and if you go in here and you type in lower you know it changes it to all lower I don't typically do that um and I'm gon to add I'm oops I'm gonna say president Dash fixed and so now all of these names um all of these uh different uppercase and lowercase these are all fixed and and it just makes it so much easier to read and you don't have different um uppercase and lowercase issues it's all the same so I'm going to keep keep that right there uh if we move a little bit to the right if you look at this prior now this prior is a mess it it has stuff all over and to be honest this is not really something that I would probably be using um like in a real data set I would look at this column and I would say this is pretty useless um if I had a very specific use case for this this data in this column I might try to you know parse it out and do something but I don't uh this this is a completely useless com to me so I'm actually going to skip this one I'm going to go to this party one and this party one to me it looks pretty important because this is something that I know I can Group by um and I can create visualizations with and and kind of break that out and if you look right here we're going to add um we're going to add a filter so now let's open up party and take a look so if we look right here we have Democratic democratic-republican Federalist nonpartisan repu Republican Republicans wig and wig with a a date and some information in the back of it and then some blanks um and it's really important when we're when we're looking at these um ones that we think we might Group by that we have these um properly grouped so Republican and Republicans to me right off the bat looks like a spelling error and so I'm just going to deselect All I'm going to go to Republican Republicans and it's literally Republican all the way down except for for this last one and to me that's just something that I would update so I would just go right here I do that if I didn't do that and then I try to create let's say a pivot table on here I'll have its own group of Republicans and it wouldn't be added to Republican and maybe that's on purpose but let's just presume that we know this data extremely well and that's not supposed to be like that right again that that just comes back to knowing your data really well understanding what it um you know what it should look like and we know that it should not be like that so we're going to fix that uh the next thing that we're going to fix um and as you can see it it got rid of it next thing we're going to fix is this wig um that's just like an error that's that's some issue on the the data side and we're just going to fix that by updating it and that's it I would always be keeping um a a copy of this with the raw data uh somewhere else because this is presumably like a working document this is not a um you know you aren't saving over your original file let's just say that and then let's take a look at these blanks real quick um okay so there are these rows right here that have nothing I think we're okay but if we see anything different 47 48 okay so yeah it's just these ones right here that have no data in it anyways it's just seeing it in the filter so not an issue at all so okay we're looking good we've gone all the way over we we fixed this President we skipped this one um we we cleaned up this party and I kept this one in here because I'm not exactly sure if that's a Democratic or republican so I'm going to keep it its own thing um I'm not a huge uh history buff in that aspect the next one right here is um the next one right here is really easy uh this is something that happens all the time especially on actually most often it's happens on numerical data so like uh you know there'll be a number of 1,1 and then there'll be a space after it for absolutely no reason uh and it happens all the time it does happen like this as well um where you'll see this and all you got to do is do trim and select the the cell we're going to close that parenthesis and we're going to apply that all the way down what is so fantastic about the trim is that it's really intuitive and it knows basically everything it needs to do for example um it gets gets rid of the um spaces before it gets rid of extra spaces in the middle and um it'll get rid of extra spaces at the end um which you wouldn't be able to see but they are there and they they absolutely can cause issues if you have spaces at the end that you cannot see um let's take this one for example like if I had spaces at the end that can cause issues when you insert or or or put that into a database um that happens a lot with numbers um you know when you're putting that into SQL that can cause issues and so you really it is important to actually do that trim um and you can do that on all of your columns or just ones that you know you're having issues with but once you import that data into SQL you will know if there's an issue or not when you actually try to start using it so we're going to say Vice and we're going to say fixed oops there we go uh this next one is one that you'll run into a lot when you're working with numerical data you will encounter so many different issues um one that I run into a lot is I I've worked with a lot of cost data or pricing data and when it's in an Excel it h it sometimes comes in with um these currencies like a dollar sign a pound sign things like that and when you put that into SQL it just is a nuisance right you're not going to be able to run um it's going to go in as a text or it's going to be like a string right because it has that special character and you don't want that you don't want to have to then go in and then change things around you just want to be able to start um you know doing calculations on those numbers so what you can do is sometimes it'll come in as a text sometimes it'll come in as um currency which I think this one's a currency we are just going to change that to be a number and then we're going to get rid of these oops and get rid of those that it doesn't look as pretty but that is much more useful than actually having the currency on there with the decimals this actually is so much easier when you when you want to use it for almost anything because you're able to add and uh do things properly in other systems in Excel I think it does understand it um but you know that can cause issues so there is how you do that the next thing that we're going to look at is these dates and just notoriously whenever I see a date field I know there's going to be an issue with it it's very rare that I get a date field that is perfect uh it just it is genuinely is um is a novelty when that happens and most of the time it has to do with um let's say a date comes into Excel and it's in a text format or date comes into Excel and they're not the same in this example they are not the same um and we just want them to all be similar they say date on if you look right here it says date it says date it looks like it should be the the same um but if we go like this it all looks the same right there's no issues at all if we were to um try to use that it may or may not be an issue but we don't want to leave that to chance later on if you're using this with python or something like that it can cause issues U maybe not in SQL because it may um see the underlying um what's in the underlying cell not just what we see but some systems won't and so you want to make sure that they're all the same and so you know what we were doing back here with um oops with the party and we were looking at this uh this filter and identifying the issues I usually do that on date fields as well and and oftentimes um I know just for just for demonstration purposes ofttimes I will get something like that and then I'll come up here and I'll notice that there's this one random number that happens all the time all the time um and so you know you want to make sure that you um that you look at these things and just just do at least a quick glance if not kind of doing a kind of a deep dive into it but all we're going to do is we're going to do both of these and we're going to do a short date and let's take a look and see if that fixed it and so now they are all the same format and that is fantastic that is exactly what we want we're going to go back through here we're going to get rid of these um again this is a working um this is a working document oops uh we need to we're I'm going to do um control shift down oops let me go back up do control shift down and copy and what I'm going to do right now is I'm actually going to copy let me do it right here I'll show you sometimes I do this does just depends I'm going to go right here I'm going to hit rightclick and I'm going to paste as a value which means it's not going to take the calculation or the formula that I just did uh it's going to actually paste it as that value so we just replaced it um right here you can see up here it says equals trim of G2 this now now that I copied and pasted it over as a value um it got rid of that um calculation and now it is actually a string so we don't need this anymore and I'll do the same thing over here as well I'm going to control shift down copy and I just hit the right key uh or the left key sorry now I'm going to right click and I'm going to do paste as a value and again it has this proper and now it doesn't have the proper it's actually the value that was here so that's really important to note uh and we're going to get rid of that one and so now what we have is is already looking much better now one of the last things I we're going to look at is deleting columns that we are not going to use and this is why it's so important to keep a backup or or or the raw data not in this file because if you start saving over this file and this is your raw file uh that can mess up a lot of things and that happens to me before and it's terrible and then you have to request another file or you have to go back and find it or something like that it's terrible um so so this is our working document so we can mess with this and do whatever we want for our purposes now for us um I can already tell you that this prior is a bunch of nonsense and we do not need it we're not going to use it for anything and it and if we have um this is a small very small data set this only has like um let's say you know one two three four five six seven eight we have like eight columns that we're you know kind of using that has data eight or nine now that's a small data I've had ones with literally like hundreds um and and it has so many columns uh so much data and sometimes it's good to just trim it back to the things you know you're going to use this to me is absolutely useless um we're going to delete that and then right over here it's pretty redundant um it's just one number off but if we scroll down just a little bit um it goes it's basically just counts it's a you could even call it a unique um identifier if you want sure why not but we don't need both um so we're going to get rid of this first one and now we have more of the useful and relevant data rather than the stuff that we absolutely know that we are not going to use um these date updated and date created we may never use them but we might um so it doesn't hurt to keep it on hand those other ones are ones that we are almost certain we will never use again keep a backup just in case you need it you can always go back and get it so you know if you go back to what we started with and you look at what we have now it is much cleaner it's much more usable and these are small subtle changes um especially with this very small data set of only like 50 rows or or 46 rows but you're going to be working with data sets that are thousands tens of thousands hundreds of thousands of rows and you need to know how to kind of look at this data standardize it um format it properly for what you're going to be using it for if you're keeping it in Excel there are different things that you may do than if you're putting it into a database or going to be using it in you know um using python to to access it so you need to kind of know your use case but these are some things that I do all the time to kind of clean up the data before I use it for something whether I'm creating pivot tables or I'm inserting it into or I'm putting it into SQL these are things I do all the time and so hopefully that helps give you kind of an idea of some of the things that you should be looking for when you're actually cleaning data and it's really important to understand why you're actually making these changes and the reason you're making these changes because some of the things that I did today may not be things you want to do on a different data set that has different uses and different um purposes for so you know take everything that I've said and and apply it um with a little grain of salt to your data set because your spefic specific needs may be different than what I wanted when I was cleaning my data set so I hope this was helpful I hope you this gave you a small glimpse of some of the things that I'm looking for when I clean a data set or I get a new data set in and I'm kind of you know analyzing it figuring out what I need to fix in it I hope this has been helpful uh with that being said thank you so much for watching I really appreciate it if you like this video be sure to like And subscribe below and I'll see you in the next [Music] video [Music] what's going on everybody welcome back to the Excel tutorial Series today we're going to create an entire project in [Music] Excel now if you've never done a complete project in Excel where you take the data you clean it then you create an actual dashboard where people can click on things and filter things this is going to be a really great learning opportunity as well as potentially you know a simple project that you can use for your portfolio or you can spice things up and go a little farther than what we're going to be doing in today's video I will walk you through every single step of the way and hopefully we learn something together and without further Ado let's jump right into it let's jump onto my screen and get started with the project all right so this is the data set that we're going to be working with I will leave a link in the description to my GitHub where you can go and download it so you can be working with the exact same data set that I am using now before we actually get into this data and start looking at it I'm going to show you what the final dashboard is going to look like um we're going to create a few different types of visualizations nothing too crazy um and then we'll create some filters as well so we can kind of you know create some interactive filters with our data so let's go right on over to our data set now I'm going to hide this because we are not going to use that but what I am going to do before we do anything is I'm going to create a dashboard and I'm going to create a pivot table oops and I'm going to create a working sheet so um all these things have different uses and I'll explain that as we go along so this is our data set um I'm going to copy this over to our working sheet when I go into you know an Excel and I'm working on something I don't like to you know use just the one that I was using in case I mess something up and it saves over or's some issue I like to create a working sheet and keep the raw data right over here it just makes my life easier I don't have to save it and then you know open up a different Excel to compare them so we have our bike buyers this is our working sheets this is our raw data this is the one we're actually be working on today so let's um let's start looking at it really quick and just kind of glance and see what data we're working with and then we'll start cleaning it up making it more useful for what we are going to be using it for and then we'll start building out the dashboard so right here we have an ID that should be be a unique ID to each person uh this is their marital status so married or single this is their gender male female we have their income children their education their occupation do they own a home how many cars they own how long their commute is the region where they live their age and if they purchased a bike and this column right here is extremely important this is going to tell us whether they did or did not buy a bike so we got their information they're looking for a bike but they either decided not to buy a bike or they did buy a bike and we're going to be using that one a lot in in this video and so um you know this is basically the data set that we're working with um some of the demographics and and information behind the person so what we want to do when we are cleaning the data before we do anything uh I like to see if there are any duplicates in here um what we're going to do is come right up here we can go to uh where is it right here we got remove duplicates so we're going to click on that it selects every single one we just want to see if there's any useless duplicated data that we do not need uh and the data is a header so we're going to click okay all right so we had a ton of duplicates in there uh for whatever reason so yeah we do have duplicates in there so I'm glad we did that otherwise we would have uh you know not good data and we don't want that let's start right over here um the ID of course we're not going to change the marital status and gender are M's s's fs and M's um this isn't inherently a bad thing to have it like this but you know we have to think about it from the perspective of someone who's going to be using this dashboard do they know what M ands is do they know what M uh and F is and if they don't it's better to just spell it out for the most part um so let's just do that so we're going to click on the column B we're going to hit controll H that's going to bring up our find and replace now there's an m in both of these columns and there's different things one is married and one means male so we're going to do is we're going to search by columns um and we'll have match case I don't think that's going to change anything but that just means an exact match uh and we're going to do m equals and we're going to replace it with married and we'll replace all awesome and then we do s is single this one is super easy we're going to do the exact same thing right here so column C to hit contrl H we'll do still has by column so we'll do m is male we'll replace all of those and F is female and replace all those that's great uh you know the next column right here is income and in a SE in a previous video I talked about how I don't typically like it in this format and that's true um if you're doing calcul ations on it or or any other thing it can mess it up sometimes having the dollar sign or it being a currency we're not really going to mess with it too much right now um what we can do is just kind of make sure all of it's currency um we'll just go like that to make it a little simpler but we're not going to change it to like a numeric um we will use this in the visualization we'll see how it looks and if we need to we'll come back and change it if not we'll keep it how it is um so so that's all we're going to do to that one uh the children those look good we have education partial College partial High School this looks fine to me um if there's any spelling errors or anything like that of course we need to clean that up it doesn't look like there is occupation skilled manual manual okay those should be separate are they a homeowner should just be yes or no all right we have Cars 1 2 3 4 good night who owns four cars um and then we have the commute distance uh and you know there's nothing terrible about this it's giving you ranges um which can be a good thing I say let's keep it for now but I have a feeling when we get further and we start using in the visualization we may want to change this so let's just hold off for now um but if needed we will come back to this and we'll change this um and then we have our region and that looks totally fine and we have our age now when you're using ages typically you have some type of like age bracket or or age range and you do that because there are so many ages in here right it's 25 all the way down to 89 and if you're using that in some type of visualization it could just get really messy and so you'll create kind of you know just brackets around these so that you can kind of condense it and make it a little bit easier to understand so let's do that and just create a new column and then then we can use that for our dashboard so let's go right up here we're just going to create a new column uh we'll call this age brackets and what we can do is we can use an if statement to kind of say if it's older than or less than and and and kind of give them these ranges um that's one way to do it and that's the way we're going to do it right now so let's go up here and what we want to do is we want to say is going to we're going to say equals and we're going to do if and we're going to close that parenthesis now what we're going to say is if this we'll go right back up here if this is less than so we're going do this 31 and we're going to say comma so if they are less than 31 what do we want to call them what do we want their their you know name to be we'll call them adolescent oops that's not how you spell adolescent adolescent um and then if they're not what we're going to do is we're going to say it's invalid okay and let's just see if this one works first all right it's not working at all um okay so basically what we did was um incorrect we did it backward uh we want to do I said uh L2 is greater than 31 no we want to do like this so let's do that now all right and it should pull up where if they're under the age of 31 so if they're 30 or below is basically what it's saying so if they're 31 they'll be invalid but if they're 30 or below it's adolescent so it is working properly um and let's see what it see what it says perfect so this one is working and and now what we want to do is we actually want to build on this and make it uh kind of like a nested if statement if you've ever heard of that or done that before so this is our first first if statement and this is going to be this is invalid this is our value if false statement this whole statement is going to become our value if false for a different if statement um so let let me write it out and hopefully that'll make sense but we're going to say if do open parentheses and we're going to do it like this and let's just get rid of this for a second all right uh what did I do and let me do oops give me a second okay we have our if let me just write that out again we have our if there we go so now what we're going to do is we're going to write basically the next part of it so we're going to say if that L2 is and we're going to do this time we're going to do greater than or equal to 31 so now it's going to include that 31 so right here we did anything less than 31 so it's 30 and below this one is going to be 31 and above so we're going to say these people are middle Ag and if not then it's going to go to this if statement and then we need to close it I believe so now let's try this all right fantastic now if um everybody should be in one of these areas right everyone should either be an adolescent or middle age because basically all we're saying is is if they're older than 31 or 30 or below that's all these two statements do so we have um you know our next group now we can add and go even further into this and now we can use this entire thing as the um what was it called the value if false section so that's what we're going to do we're going to do one more so we're have three different categories so we're going to say if and do uh an open parenthesis and we're going to say if oh actually Let's Do It um let's not do it to this one let's do to this top one just easier uh so we're going to say if open parenthesis we're going to say L2 and this time we're going to say anybody over the age of 50 uh or we can do 55 let's do 55 so we'll do 55 and we're going to call them old and we'll do a comma and this is the value if false statement and we need to close our parenthesis so let's try this anybody over the age of 55 should have old um you know maybe we'll do 54 so anybody who is 55 is considered old I think that's fair I think that's fair guys oops I should have done I should have done that to this one let me get out of this and we'll do 54 my dad is 55 that's why I'm doing it like this this is fre dead CU he should be in this old category to be fair so now we have adolescent adolescent middle-age and old these are three categories so we can now have these buckets these different groups of Ages and it's much more usable than these individual ages um and so we will be using this in our in our dashboard for sure now our next one is the purchased bike uh and we're not going to do anything with that so you know that is that is that one and you know there wasn't a ton to clean up here we removed some duplicates um I don't know why it says that what did I do married married what does this mean even mean I did I write that did I mess this up guys oh when I did the m and the S uh replacement in there it replaced it with married and single it's supposed to say marital status oops thanks for catching that guys thanks for catching that I hope that's how you spell marital uh we'll see so uh we are going to keep it just like this now what we are going to now now what we are going to do is build pivot tables with this data so we had our raw data we have our working sheet and now we want to create pivot tables and pivot tables is how you actually help build your dashboards or help build your visualizations so we're going to go right here we're going to hit whoops get rid of that we're going to go right here we're going to insert and we're going to say pivot table and it's going to ask us what range so we're going to go back to the working sheet and we'll just click here and hit control a this is going to select all of our data for us so it's really easy and we're going to hit okay and so now we have all of our pivot I don't need I don't need to pull it out that far that was way too far and now we have all of our pivot table information over here and so that should make it really easy to you know actually build out so what we're going to do is start selecting what columns and what data we actually want to work with so the first one that we're going to build out is a dashboard that is basically looking at the average income of somebody who either bought or did not buy a bike so we need in this one we're going to need their income that's definitely going to be a value right here um but we want to break it out by male and female so let's look at their gender we going to pull that down into the rows so um this is basically a sum and no let's look at let's make this an average so I just went to the um I clicked right here I went to the value field settings and we're just going to do an average all right and then we are going to make these um and as you can see there's four decimal points um we'll keep it as is right now but we may need to go back and change something then we're going to look at if they purchased a bik or not and we're going to put that right here so so we can see that uh right here for the people who did not buy a bike the females their their average salary was 53,000 the average salary for the average salary for males was 56,000 for yes the ones who did buy a bike the average salary was 55 for female and 60 for male so the people who had a little bit more money are buying bikes and you can also see that uh the men are making more money in this data set just overall in general um so let's make the visualization really quick but you know I don't know I'm not a huge fan of these decimal points and maybe we can just change that in the visualization we'll see um oops that's not what I meant to do um let's do that so what we are going to do is we're going to click into here we're going to click insert and we're going to go to these recommended charts and it's going to bring up basically every single type that we would want um and we can just click in here and see which one looks good uh oh yeah I love those 3D ones those are my favorite you guys know that uh let's let's use this one right here pretty simple um whoops let's pull this right over here and as is it looks pretty good um you know it shows male female we have the average or the incomes right here whether they did or did not purchase it um and so at a glance it's pretty easy to see let's see if there's anything um you know if you want to change up style-wise go for it I'm just going to keep it as is um but let's see if there's anything we need to add right do we want to add these access titles uh for the most part I I tend to do that um it makes it pretty easy to see so we can go in here and we can just click it like this and we'll say income and we'll say oops and we'll do gender so that's what that is and and let's go back in here do we want to add a chart title we definitely want to add a chart title uh for most of these we'll add a chart title for sure so we'll say average income per purchase um I don't know if that's 100% right but we'll we'll we'll use it uh if we need to change it to be you know by gender or something we can but um for now let's see do we want to add data labels uh definitely not uh a data table um we can do this it may make it a little easier to read I will say that again these numbers are just these decimal points are really throwing me off let's go see if um we can change it in here let's go to see if we can just make these numbers okay and um we can keep it like that or we can even do something like this add commas yeah I'm going to keep it just like this I I think this just looks the best um again I'm I'm getting adding commas here I'm changing the um decimal place right here it just makes it look a little nicer a little cleaner um so let's keep this exactly how it is um we can always change things if we want to uh if we want to come back to it so we created our pivot table and then we created our visualization basically exactly what we're going to do for all of these because again all of these need um you know all of these need pivot tables in order to create the visualization so let's um get out of here we're going to scroll down and we're going to create our next pivot table and once we get done with all of the pivot tables that we need all the visualizations that we need then we will um we will start so we're going to do control a we're going do okay and basically do the exact same thing that we did um this time we're going to look at the distance so for this one I wanted to see you know I try to you know I created this already I've already done this entire project through but I haven't really talked about why or what we're going to look at for this one you know know we're looking at is their income does it change whether they bought or didn't buy one um so if they said yes you know is there a reason are they making more money is you know are price points are the customers do they make more money so you we cater to them or not uh that's a good question uh another thing is you know we're we sell bikes or this person sells bikes so commuting distance definitely makes a difference you know does the person who is buying a bike live one mile away from where they work or 20 miles away uh this will help us determine this next visualization will help us determine you know who who is doing that or who's buying it so what we are going to do is we are going to look at the um that one that we were looking at earlier the commute distance so we're going to bring that right over here so we have these you know one mile 10 Mile 1.2 Etc now we are going to uh again we're going to look at if they purchased a bike that's really important and let's make that the column as well so now what we have is a count of these Nos and yeses whether they did or did not buy a bike um one of the issues I already see and we'll I'm going to visualize it and then I'll show you that this 10 miles you know it's right next to the 0.1 so it's not an order um and that could be that could be an issue um so we may have to revise that somehow to put it at the very bottom because we can either do ascending or descending uh either one I don't think is going to work so we may have to work through that in just a second um I don't know if I did that my I plan for that um yeah so it has this big dip um yeah so let's let's create it um that's okay we're going to figure this one out together because I honestly um I didn't plan for this one so okay we have 0.1 miles that's exactly where it needs to be the one the two the five that's exactly where it needs to be this 10 miles is not and let's see if I change that 10 10 plus miles to 10 miles plus let's see if that'll put it down here because I I don't know if it's looking at I don't know if it's reading it weird um but let's go into this working sheet and let's go right here and we're going to do controll H and we'll do oops not this one um 10 miles plus let's get that in there and we're going to do 10 uh miles plus I I don't know if that's actually going to work um we will see so let's go back to the pivot table let's re go to the data let's refresh uh no it didn't it didn't change it um okay so let's think about this maybe if we change it to like a letter it might change down here so start it with uh miles that could work um let's try it it okay it's already selected let's do the 10 plus miles okay so let's do um M uh more than 10 miles and we'll replace all let's get rid of this let's go to the pivot and refresh all right okay so it's not perfect but it works um and for what we're doing I think we'll keep it how it is so we have our second one uh and you know there are different ways you can kind of change this one um you know on the last one we did a ton of different stuff we can do just do commute distance and we can say what do we want to say on this one what is this oh this is the count um do we have to do we have to keep this one um no there we go I'm just going to do um just one and say commute distance and let's add a title chart title we can make this one um let's say distance per customer uh that's not 100% true because it's no or yes um that's that's the important part of this it's distance um average distance uh let's see we'll just say customer commute all right and we'll keep it just like that all right perfect I don't think um let me see I don't think there's anything else we need to add on that one all right now let's go right down here we're going to create our very last one uh we only had three so you know sometimes you'll have a ton sometimes you'll have like one on each sheet and you'll create multiple sheets but um do contr a um now we have our thing now this one we're going to be looking at these age brackets that we were looking at that we created um something that I do honestly a lot is is kind of bracket things in into groups like this and you know for this I'm just kind of made them up but you know it's good to know how to do this because I I promise you this one happens a lot or I use this one a ton and then we just want to look at who purchased a bike uh so the same thing as we did before so like purchase a bike count of the purchase um you know pretty easy so we just have to count of either no or yes for these age ranges um and let's go to the insert we'll go to recommendation um I personally like a good line for this one um so let's this is already interesting we could do something like this that's nice see this one versus this it just adds a dot it looks nice we'll keep that one um so just really quick at a glance really interesting people under the age of 30 are not buying that many bikes um age 30 to 54 uh 31 to 54 buying a ton of bikes uh they buy more bikes or look at bikes more than anybody really interesting um but yeah we'll make the dashboard in a little bit um let's make these chart titles we'll do vert oops the horizontal we just call this age bracket um and then we'll add a chart title um again you can add some extra stuff if you want to um but you don't need to uh none of this other stuff we really need I'm just kind of looking at the stuff we do need or do want uh so what do we want to call this one let's call it customer age brackets um and it's not perfect but we'll keep it as is for comparison um let me see if I can copy um or or use this um real quick instead of the age brackets I'm going to get rid of this and use the age and then let's use um let's insert recommendation we use a line and we'll use this so This compared to this just think of it like if a customer or consumer or or not a customer if somebody you're working with is trying to use this dashboard to understand this dashboard this is going to be just it's going to I don't know it might melt their brain just makes no sense it makes sense it's just all over the place it's really hard to make sense of this it really is I mean you can kind of see a pattern going up around like the mid-30s and then it Trends downward but it's hard to see um it really is so doing these um these brackets really helps and you can even add you know adolescent um you know 0o to 30 underneath it and in fact we may want to do that um why not why not let's do that oh whoops um so why don't why don't we do that why don't we go back I'm just going to I'm doing this on the Fly why don't we go back uh what am I doing whoops and this is all calculated but let's do adolescent 0 to 30 let's do middleaged 31 through 54 and then old 55 plus let's see if this breaks anything I hope it doesn't um and we'll go back to our pivot table let's refresh the data uh okay it did mess with stuff okay never mind guys that was a terrible idea don't do that um perfect uh let's get rid of that that was a terrible idea don't do that I'm glad we tested it out though I like I like to see if it was going to work no it messed with the um the Order of Things um I I intentionally named them adolescent middle- Ag and old because it's it it makes sense for the visualization um but you know if if I change something and it messes with it I'm not going to mess with it it was just an idea on the Fly guys come on all right so let's start building out our dashboard now um when we're building our dashboard what I personally like to do is to have this pivot table sheet and then I will copy them over and later we'll hide these other sheets beats um and I'll explain that a little bit but I like to have this this one for us so we're going to copy this so I just click on it hit controlc we're going to paste it right over here uh let's just make them small for now that's oh gosh no let's not do that oh these look terrible okay anyways um let's copy this one over oops okay what did I just do oh I didn't copy this one whoops it's not copying okay we're going to go copy hit paste fantastic oops guys look away this is this is tough to watch this is tough for me to watch I'm the one doing it it is tough for me to watch all right let's go to this last one I'm I'm gonna try it again all right it worked this time so now we have um our our three visualizations this is perfect but now we actually want to create a dashboard now how do you do that how do you make it look nice U and then we're going to add some you know filters and stuff like that how do we make it look nice um what happened here what changed what did we do oh my goodness gracious all right let's copy this let's paste this let's get rid of this I don't even know how that happened I've never seen that before that was wild uh Excel is trying to destroy my whole video I mean I'm doing this for you Excel good night okay no problem at all what we're going to do and how you make this at least look nice um first off we can get rid of these grid lines pretty easily and I recommend when you do that when you make a dashboard just makes it look cleaner makes it look like an actual dashboard um let's go to view and grid lines so we can get rid of these grid lines it just makes it look nicer um we're going to make you know we can choose any color here here I'm just going to get choose a color I like this and let's we're we're basically creating like a header right if you're using like Tableau or something um we're going to merge and center so it takes every single cell that we have highlighted creates it into one let's call this um bike sales uh I have I think I called it bike sales dashboard let's just call it that um you know see what happens let's get that let's make it white and and make it much larger than it is okay okay um sure let's do that doesn't look bad um what is it doing there we go uh let's bre that Center perfect um it's not perfect but we're going to use it all right so now we kind of want to organize these and you know everybody has their different way of doing it uh I'm just going to start building it out myself self and just see how it looks uh and then we'll go from there I like this one there um we can put this one I I this one's a kind of a longer one so I'll probably put it at the bottom let's see how it looks um but we'll put this one right here try to line it up geez let's let's zoom in a little bit let's try to line this up see what it looks like let's extend it to the end that doesn't look too bad uh needs to move up just a hair and I'll show you how to kind of align these in a second but um that looks not bad and we'll kind of try to align these as well let me zoom out and extend this the length of this just to make it look nice um you know now what you can do and you know this is something that's pretty simple is you can get both of these and we're going to go to shape format and we can just align these it's really nice to align especially if like the top and maybe like the left to right but like we're going to align these to the top and they just kind of align themselves on the very top now these look much better this one is a larger dashboard or a larger visualization so I'm going to keep it how it is um and I'm going to keep this one how it is so it is going to be a little bit smaller as you can tell and then we'll have this one um and I'm going to do that um I this is going to bother me if I don't align these so let me do this I'm shape format align to the right and it's not exactly what I wanted to happen because oh jeez what am I doing that's not exactly what I wanted to happen I actually wanted this one to align uh this one to align with this one it did the opposite um so let me just scoot this back all right visually looks fine but that's how you do it if you want to do it um I I I if you have multiple of them like this it you can make it look bad so we have our dashboards this is already looking really good I I like how this looks colors are coordinated it we have a kind of a theme throughout um and it looks nice I actually I actually kind of want to change this one um to um let's see maybe if I did like that it look nicer than all of them yeah this does look nicer um it doesn't change much either guys I'm should I do it all right we're going for it we're changing the design on the Fly should I do it for all of them let's see it doesn't fit doesn't fit um all right guys just ignore what I'm doing uh don't do any of this I'm just messing around at this point so this is really great to have it really is and what we want to do is there are other elements there are other things that people would like to feel a to filter by and be able to look at but it's not in this visualization um to be more specific one field that's could be really interesting is married versus single are single people buying more or um married people buying more you know it it'd be nice to filter on it so we're going to click on uh any of these actually and we're going to go up to Pivot chart analyze and we'll click insert slicer now we can choose which ones we want to be able to filter on all at the same time or one at a time I'm just going to do the first one by itself and then I'll show you how to do other ones um but this one is the marital status so this is the married single the one we were just looking at and we can drag this right over here bring it in a little bit all right and we don't need all that space so we're going to boop boop boop boop all the way up now while we're doing this um it only because we selected this uh this visualization it only is working on that one right now we of course wanted to apply to all of them is not hard to do all we're going to do is we're going to click on we're going to make sure we're clicking on this we're going to go up to slicer we're going to hit report connections um and if you remember we have this um this pivot table that we're working with um and this is where all of our pivots are coming from so we're going to actually apply it to all of them this is our sheet U and this is the name of the pivot table now again we created that fourth one we're not using it but we're going to apply it to all of them so now when we click on it it's going to apply to all of them so at a quick glance let's see what single people are doing um interesting interesting um you know when I'm looking at the just these numbers right here married people these individuals are making a lot more like eight um sometimes eight to like 10,000 more on average than their single counterpart um you know again that's a rough estimate but it's it's interesting so now what we can do is we're going to create more of these so we're going to go to uh pivot chart analyze we're going to go to slicer now we already did marital status but what if we want to look at things like uh region and maybe something like their education so let's bring up both of those and look now two of them come up so let's add the region right here we'll bring that in just a little bit see if we can match it nailed it all right now we're going to put that up we'll bring this one down just like this bring it over see if I can match it again come on N almost nailed it I don't know if I nailed it but it's close all right kind of bring this up a little bit bring this up and we have to do the exact same thing that we did with this one because right now again it only applies to that one um chart so what we want to do is we want to go to slicer report connections add it to all of them okay do the same thing with education or connections bada bing bada boom We are looking good and now uh let's get rid of all of them it's just going to be everybody so now we can kind of slice and dice and choose what we want we want to look at people who have a bachelor's degree who live in Europe and are single and this is the information that we have on those people so now we can narrow it down by certain demographics even further and look at this key information so we may not you know look at counts and averages of these things but we're able to filter on them uh and that's really great to know so bachelor's degrees on average are making 60s 70,000 um let's look at um let's look at graduate degrees okay a little more um but you know again I'm just looking at random stuff um but you can mess around with this take a look at some stuff um this to me I want to make this color darker I feel like it look nicer darker there we go oh yeah that's way better this to me is it's a good dashboard right you have key information that you're looking at nice visualizations it's color coordinated you have these slicers on the side um to me this is a fantas fantastic just simple dashboard and there are so many other things that you can do with this data and you can make it unique and you can add your own spin on it and I highly recommend that you do that push yourself go past what we just did today and add your own stuff and and use this and then you can add this to your portfolio website and show this off and show people that you know how to use Excel which is a fantastic thing to know how to use and show off so with that being said I hope that this project was helpful I hope that you learned something along the way I know I did um I was learning things as we were going and I hope that you didn't mind that I took some detours along the way um for your amusement as well as my learning uh so with that being said thank you so much for joining me I really appreciate it I hope you have a good day and [Music] goodbye what's going on everybody welcome back to another video today we are starting our Tableau tutorial [Music] series now this series is for absolute beginners so if you have never used TBL blow before you are in the perfect place I'm going to take you all the way from the very beginning of installing it and just understanding what Tableau is and how you can use it all the way to creating dashboards and sharing it now personally I hate those videos that are like 3 hours long and they just expect you to go through it uh i' like to break my videos up in chunk so if you have ever done my sequel tutorials you'll know that I like to break things up so it gives you time to try them out and do them yourself and then you can move on to the next video so I'm going to be breaking this up into five separate videos but in this video I'm going to show you how to install Tableau for free I'm going to show you the user interface we're going to download a data set that you can find on kagle and then we will build our first visualization together with that being said let's jump over my screen and we'll get started all right so the very first thing that we need to do is you need to actually download Tableau so we're not going to be using Tableau we're going to be using a free version called Tableau public it has a lot of the same features except of course it's not uh every single feature that regular Tableau has but it is absolutely perfect for learning it and for using it and and you can even build um you know dashboards and share those for your portfolio um I'm going to put this link in the description so you can just go and click on that and and all you have to do is input your email right here we're going click download the app um and then it should start to download and then you can save that and then you're going to open this up now I'm going to open it up I don't know what it's going to do I already have it downloaded um but it should open up and look hopefully like what you're seeing on my screen in just a second let see what it does um I hope you can see this but it says Tableau public um it says I already have it set up but you're going to click install and go through all that um all that setup stuff uh so I'm going to exit out of here but I'm going to go over here and type in table of public uh and it's 20 21.3 that's the current version that they have out if you're doing this in the future they may have you know different versions um so you should be able to pull this up right here now um I'm going to go and get our data set that we're going to be using and I'm going to show you how to get that as well and then we will actually jump into Tableau and start uh using it so let's go over here I'm going to get a data set from kagle I wanted something pretty generic uh to show you in future videos I'm going to show you some special or not special but just different visualizations that you might use um and we'll get different data sets for those because of course not one data set covers all these other types of visualizations so um we're starting off pretty simple right here we're going to be getting one called video game sales um and we can take a really quick look at it um here are some of the fields that you're going to be having uh like rank name platform the year genre and then some sales data and this is what it actually looks like it's called VG sales so video game sales it's then a CSV and um you know here are the fields and we have our data and all we are going to do is we're going to download that and I will save it now when you download it it's going to be saved into a zip file so we need to go to our downloads uh let's refresh this here's our archive we need to go in here you can just copy it and paste paste it right back into here um and just so you know that is a uh a CSV so be aware of that so what we want to do is we want to come in here now since it is a CSV this is not we're not going to be using Microsoft Excel we're going to be using the text file so we'll come in here we'll take VG sales now uh one thing I want to do before I do that is I'm going to rename mine uh VGC sales1 um I've already prepared for this and so I already have that in there um but so I want to make a distinct one for myself you do not have to do that so we'll come back here um and then we're going to do text file and VG sales we're going to open that up and when it pulls up right here um you can bring in other tables and then you can start to join them together and create those relationships we are not going to be doing that in this video we'll do that in a separate one um as for you know just getting started you know we're not going to be using that but you can see some of these things or some of these fields and if you notice they they um they're either ABC or they're a number so it starts to categorize what this field type is so is it a string is it numeric it starts to automatically do that and that's all done within Tableau and so it just kind of reads it and that's what it does um what we going to do is I'm going to click right down here it's called go to worksheet um the worksheets are where you're going to actually start being able to build your visualizations your charts your graphs all these things um and so you know we have this in here now and so we're just going to click right here on go to worksheet as you can see here is VG sales1 you will not have the underscore one if you did not add that like I did uh but right down here you can see all the fields that we just imported from that data set and they even created one right here for us uh they just generated that field u based on the file so it's a count of all the rows really so what I'm going to do is I'm just going to walk you through uh basically what we're looking at some of the things that we're going to be using today there will be things that I don't talk about but I'm going to highlight those in in in future videos when we start using those or going over them um and so let's just start with the most obvious one it's way over here I'm sure you saw it when we uh this first came up on the screen because it has all these different charts and visualizations and graphs and uh these will become available as you start dragging and dropping our data into this sheet and so if I go right here it says for Scatter Plots try zero or more Dimensions two to four measures so what our dimensions are are right here what our measures are are right down here and so typically uh things like like you say genre or names or or strings like that are going to be these uh dimensions and then a lot of lot of times the numerical is going to be our going to be measures next what I want to show you is right here so you can take something like Global sales and you can drag it right here into your rows and then it takes your rows and so it automatically created a sum of global sales now if we take that away and let's say we drag it right here it's going to give us a column now you can also do it right up here you don't have to um drag it on screen you can also just add it to the column or the row that's typically what I do I it's just more intuitive to me um or you can drop it in this section right here and it does its best to assign it some type of um some type of visualization and so that's what it always is trying to do it is trying to say okay this is what you're trying to do let me try to to get the best visualization for the data that you're giving me now while we are here um it went down here into marks and marks is a very important area it's where you can add color size text detail and Tool tip and I'm not going to go into what all those are cuz I'm just going to show you so let's start pulling some fields in here and creating a visualization and then I'm going to show you how all of that works including filters as well so the first thing that we are going to look at is global save and let's put that in the rows and then I'm going to take year and I'm going to make that the column and this is basically exactly what uh I wanted to do now as of right now it has only the year and it's looking at Global sales for everything but we want to break that out a little bit better I Want to Break It Out by let's do genre so different genre of games now if I add that right here to this column s it is going to break it up by year and genre if I add it right here is going to break it out by the year of course but then in each individual row has the different genre that's not what we want we want to keep this type of line graph uh and what we're going to do is we're going to add it to Marks and you can't really see it based off of these colors but they're all different so we have action J genre we have the sports genre racing uh role playing all these different genres within it now we can get rid of that cuz we don't need it anymore uh and this is where these U these marks really come in handy because you can start basically doing what you want with them so for the genre I want to be able to see all these different genres with different colors to me that just makes the most sense so I'm going to put color right here and automatically it assigns every single genre its own own color and gives us this Legend right over here and so it's really easy to see well when you have smaller numbers is much easier but I know that red is sports and I can go right here and find red and that is sports so it makes it a lot easier than when it is all the same color blue so what you can do after that is you can also add things like uh a label to it so if we take label and we or we take genre put label you can click right here and you can get rid of the labels that you have and you can see them right down here or you can also change uh the font so if you want to make it orange or or whatever color you can do all those same things and you can also do things like changing where you see these things so for Action you're going to see it a ton because for each year action is is at the is on the higher end and so you're seeing those in those mins and Maxes you can also do it for a selected area so if I come in here and I select it it's then going to show me what those are so label is really really uh useful really helpful let me get rid of that really quick uh you can also do it where the lines end so line ends is at the beginning and the end and you can also take that away or put that back on so labels are really important labels aren't very helpful when you're doing at least I don't find that it's super helpful when you're doing things like genre so when you're doing your Dimensions so I'm going to get rid of that and I'm actually going to bring our Global sales over here and let's label that and right now I think it's labeling the uh line ends we want to do the Min and Max now if we do Min and Max on the table it's just going to give us the Max and the men which is zero and then 139.0 it's a little bit more useful if we do it for each line uh this at least gives us some context I probably wouldn't do this in an actual visual visualization but to give you some um understanding just how it works so now I know that um right over here the men and the max or the men sorry the max for these for action and for sports is right around 138 139 so it's pretty easy to see um and you can again go in here and you can remove the max or remove the mins whichever one you feel is best uh you'll probably keep the maximums in there for each category and so this is a really quickly becoming uh a pretty usable visualization and that's not the only label that you can add we still are using year over here so we can always drop year in there as well we'll create a label and so now we have let's see for this one is a puzzle genre so we also have the year that it had the maximum uh sales and so you know just some things that you can do you don't have to add that now let's go up here and we're going to take a look at filters because filters are really important you know if you are making this for a client or you making this for somebody you want them to be able to filter down uh to very specific information that they want to see so let's take uh the platform lots of different platforms um as you can see you know PS4 Xbox um if you're familiar with these we'll click all of these um and we'll click okay so now this is an option as a filter and all we're going to do is we're going to click on this Arrow right here and we're going to say show filter now right now all of them are selected so every single one is being taken into account for this visualization but let's say we come down here and we say okay I don't want to see sales for any of these PS the original PlayStation 2 three or four so I'm going to get rid of this one this one this one and this one and you could immediately see the the changes that were happening so now none of the numbers none of those sales are being accounted for and and being added to the sum of global sales right here at all so that is just how a filter uh can work and you can also do that and you can get rid of all of them and you can go in and actually just pick very specific sales so if you only want to see the PlayStation sales you can go in there and do that as well so really really handy filter are things that you at least want to have as an option for most of your your visualizations at least that's what I found especially when you're doing client facing work they like to uh get in there and mess around and look at different look at it in different ways and so that's one that I I think is is really useful to to have the very last thing that we want to do is we want to actually add this to a dashboard now let's say we add come right down down here and we add a new worksheet and actually we might change one more thing on that last one but we'll just make a really simple one um we'll just give it genre and we'll give it Global sales as the rows um and this Nifty button right up here which is a sorting button so I'm going to sort like that I'm going to add the genre in just as we did I'll give it different colors perfect now we have two really quick different visualizations right what I want to do is just show you how to combine those because what you are going to do is you're going to actually come in here and you're going to do new dashboard that's what this button is right here now when we come in here the size is extremely small it's very easy to fix that all we're going to do is Click right here we're going to go to this range or this dropdown and we're going to click automatic so now it is a much larger size for us to actually drop our visualizations into uh and let's put sheet sheet one and we'll put uh let's put it up top so now it looks a little bit like this uh not perfect but again if I wanted to make this look a lot better I definitely would and then you can go over here and you can rename these things you can also do that back when we were in our actual worksheets but you can also do it here as well and then start um you know customizing it and building it out that's not what this video is for that is the last video we're going to build an entire dashboard it'll be kind of like a small project you put that in your portfolio um if you have gotten this far and you want to jump straight into it and you don't want to wait for these other videos to come out or you don't you just want to jump straight into creating an entire portfolio project I have an entire portfolio project series that covers SQL Python and Tableau and so go check out that series I have one video dedicated to Tableau it's like 45 minutes or an hour long and it covers a lot of the things that we're going to hear in here as well as a few other things but I appreciate you checking out this video in future videos we're going be going over things like creating bins calculated Fields doing joins and then creating a final project and putting it all together so thank you so much for joining me I really appreciate it if you like this video be sure to like And subscribe below and I will see you in the next [Music] video [Music] what's going on everybody welcome back to the Tableau tutorial Series in this video we're going to be going over bins and calculated [Music] Fields all right so let's jump right into it the first thing that we're going to look at are bins and bins are basically just groupings or ranges of numerical values so we cannot create bins uh for genre name platform or anything like that we have to do something with this sign right here which means that it is a numeric so year or all this sales data or this ranking data and we're going to use what we worked on in our very first tutorial and so what we're going to be using to kind of demonstrate how bins work is this year right down here so right now we have a range of 1993 all the way up to 2018 and we're going to create some bins to group and create ranges for these years and it's pretty simple all we're going to do is I'm going to come right over here to year and this little drop down on the side and we're going to go down to create and go down to bins now it's going to say the size of Bin and it's going to give you a recommendation based off of the information that is already provided the Min and the max the ranges of these values you know you don't have to do this but usually um it it does give some good estimation on what you might be considering if you were thinking hey maybe do a bit of like 20 and they're recommending two think about why they might be doing that we're going to change ours to five and you can always change what this field is going to be I'm just going to give it an old exclamation point just to um really spice things up here so we're going to click okay and as you can see it adds it right up here is no longer um it is no longer a numeric now it is a categorical so it now it's this is no longer just uh 1 2 3 4 five its ranges its groups and we're going to get rid of this year really quick actually let's keep it up there for a second uh see what happens but we're going to bring this up and we'll get rid of this year and this is is what kind of it spits out for us now I did look at the data um when I was prepping for this there are some nulls in the Years um and so all we're going to do for this is we're just going to go like this and we're going to exclude the nulls uh probably not something you should be doing uh if you're doing this for work but this is for demonstration purposes so we can do it ever we want but as you can see we now have these ranges so this range starts at 1990 and it includes 1990 all the way up to 1994 and then it's 1995 to 1999 and so just really quickly we can tell that the years 2000 to 2004 were a huge huge huge uh season or group of of years for game sales so these are the global sales for for these video games and so it is really helpful it's very useful um you can do this on a lot of different information we could do this on the sales data you can do this on age you can do it on years like we did and it can be very very useful and so uh really quickly that is how bins work I would say it's pretty straightforward now this is a perfect time to segue into the next part of the video which is calculated Fields uh right over here on this left hand side we see that the global sales which are in millions goes all the way up to 900 million and created these beautiful bins right down here but let's look at Within These from 1999 to 2015 let's see which of these has the highest percentage of course it's going to be this one but we can do something called a quick table calculation uh we'll create a our own calculation later I'll show you how to do that but we're going to do a quick table calculation and we're going to do the percent of total and so now we have these bins and instead of just seeing the total amount of sales that they had we see the actual percentages based off these year ranges which is really useful something that you could absolutely put uh in some real work that you do for a client now really quick just to show you something that you can do if you click control and you drag this over here you can actually save that calculation so we can say percentage of global sales and that actually saves it as uh you know a measure for us so that was a quick calculation but let's look how to actually create a calculated field so if we do this right here what is going to come up is just the global sales and you can do a lot of what you would basically do in Excel multiplication division subtraction a few other things but we're going to keep it super super simple today all I'm going to do is I'm going to take Global sales and I'm going to subtract I'm going to do an open bracket and I'm going to say EU sales and it auto completes for me I'm going to click okay and created calculation 2 I'm going to come in here and I'm just going to say Global sales minus EU sales and let's drag this over these are different um one's percentage one is in terms of sum and so I'm just going to bring this in right here and so now we are comparing against the same thing and if we look at the global sales we have probably right around 9 50 million-ish in this 2000 to 2004 bin and for Global sales minus the EU sales we're looking at you know 650 million so there is a noticeable difference and this is just one of the ways that you can use calculated fields to actually just show the difference between two numbers or you can do more advanced calculations depending on the data that you actually have so that's it for this video I hope you learned a little bit more about bins and calculated fields in the next video we're going to looking at a ton of different visualizations and graphs and charts and just exploring what options really are out there for visualizing our data thank you guys so much for joining me I really appreciate it if you like this video be sure to like And subscribe below and I will see you in the next [Music] video what's going on everybody welcome back to the Tableau tutorial Series in this video we're going to be looking at lots of different visualizations including the scatter plot and density [Music] Maps now before we jump into the tutorial I have some very exciting news in just two days on October 7th I going to be partnering with alter X to host a webinar this webinar is completely for data analysts who are wanting to change careers to become a data analyst now you did hear that right I will be the host of the event but but we will be bringing on guests as well who are industry experts who actually change careers to become data analyst much like myself they'll be sharing their stories of how they actually transition careers along with the tools that they found extremely useful and helpful to make that switch and they'll be giving lots of advice along the way so if you are somebody who is wanting to change careers to become a data analyst or just wanting to learn about data analytics this is an absolute fantastic place to learn a lot more about that I will leave a link in the description so be sure to go and sign up for that again I'm going to be there so so it should be really fun without further Ado let's jump onto my screen and start the tutorial now we are about to look at a ton of different visualizations uh over here you can see just an array of them but not all of them are ones that I actually think are useful or ones that I would actually recommend using and so I'm going to take you through some of the ones that I absolutely think are worth learning and using and trying out uh and I'm just going to kind of just show you how I might use them how they might look how you can navigate them a little bit now before we do that we do need to go download one data set it's this Starbucks location worldwide yes we're going to do a little bit of longitude latitude here and all we have to do is click this downloads button and it will download we're going to do that into downloads we'll save that uh yeah I've already done that but you know I'm doing this with you guys I'm doing it for you so let's go to our downloads now we have have here we want to come in here we're going to copy it or um you can cut it and then we're going to paste it here yeah replace it perfect and now we have it ready to go we'll come in here let's do a new sheet and I already have it in there but uh I'm just going to show you what I would do do new data source we'll do text file we'll do directory and we will open it and let's see what data we have in here before we actually begin uh just super quickly we have the brand so um whatever company has it and then a bunch of um location information street address City the state this is all in the United States so that's basically it and what we are going to do is we're going to go over to this sheet three and we have this directory 2 that's the one I just pulled in exact same thing as directory but so the first VIs visualization that we are going to look at is a bar and line graph so what we're going to take is the year right here take these Global sales and these na sales and we're going to be doing this one right here so this has a combination of two separate uh types of visualizations so sometimes you just have lines sometimes you just have these uh these bar graphs or the bar charts and we're combining the two and it's very nice I like how this looks now if you notice if I put this na sales behind it now it kind of cuts off so now this Global sales is in front we're going to you know put that back I just wanted to show you that uh right here there's all some of global sales some of Na sales so if we go into this all we click this drop down we can change it to a line um we can change it basically whatever we want I just hit contrl Z to reverse that but what we can do is we can go in here and we can change this color and let's see if we can just make it red is that [Music] possible see what I did I made it orange that works for me um just something to stick out a little bit more choose whatever color you want and this is a really nice visualization this is one that I have used in the past we're looking at Global sales versus the na sales and so it's very easy to see the distinction between the two and how one was doing a specific year versus how the other one was doing in that same year so I really like this if you want to do something uh like keeping it consistent you can do two bars I don't really like this one as much um and you can again you can really change it up um there's lots of different ones that you can do again I prefer the line but you know do whatever you think is best I'm going to change it back because this is not how I want to keep it but there you go so that is the first one that we are going to look at let's move on to the second one and we actually will be using our our Starbucks data here now when you bring in data that has um any type of map or or um address or postal code or things like that or or country it's typically going to create this latitude and longitude it's going to generate that now what we want to do is bring this longitude right up here and this latitude right there and if you do the show me right now it's giving us this but what we want to do is add what we're looking for so what will we actually be trying to search for on this map you can do anything from like a postal code um and it will drag us right here let's come over to this this allows us to kind of scroll around a little bit um we're going to mess around with this one for just a little bit and me see if I can that's nice that might be too big let me back up one so at least in the Continental us a little bit down here this these are the postal codes so right now we're looking at post codes uh and there are a lot that you can do with this um really color will make almost no difference it just becomes this mess so you don't typically want to do something like that at least not for this let's go to size and if we make it really small you can kind of see these groupings these pairings um typically of like larger cities or major major metropolitan areas and so you can do this and it's and it's really really easy I don't recommend uh labeling this I don't even know if it'll do it um it would be an absolute mess to try to label all these postcodes well let's bring this out and let's bring these State and provinces in now right now we have these little tiny tiny uh dots on here and I think what we want to do is not increase the size size but over here we want to actually do this and make it a map so now it's going to fill in all the states we can you know why not we'll add some color here um but we can it hasn't numbered I didn't think they were numbered um oh that's interesting I haven't seen that I didn't look at that before I was just found that interesting but now we can see what uh what states Starbucks is in and as you can see they're in all 50 states but it's something interesting to um look at to think about now if we go right up here we can again choose a different type and we're going to go to the density now right now it's just doing a density on the uh the state we're going get rid of that we're going to bring back postal code I'm just switching it up on you a little bit and you can do it as small or as big as you'd like um you know I like to do somewhere in the middle um probably right right about there is fine um I don't think it's going to make sense to really add any color here again all these poster codes are different so it's just going to be complete mish mash but this is kind of how you can use a density map and you can do this with uh countries you can do this with postal codes you can do this with any type of kind of like address or location based data so that is how you can use a map again there's lots of different ways to use a map and so I'm not going to show you every single way but in a really brief way this is how you can use a map to actually visualize your data that does have location uh based information in it so let's go over to sheet three uh and this data that we have over here it just allows for a lot of different types of visualizations so we're going to use this one um and there are lots of other ones that you might see out there like this one right here uh we obviously wouldn't be using this we might do something like this change the label um and maybe add why have both of these in here um let's get rid of this oops that's not what I meant let's actually add that let's do the sum of global sales and we'll just make that into a label as well so what you can do with these and and how you're able to use them and visualize them again these are not you'll see these often but these are not often ones that I would recommend you use that's very similar to these packed bubbles um you can as these Global sales in here again add the label it just uh it sometimes is not as straightforward the information that it's trying to tell you right you kind of have to search for it a little bit you kind of have to look around um but you can find some good visualizations in here for very specific types of data and so these are just ones to consider uh one that you'll see all the time is uh this guy right here and uh let me see if I can expand this a little bit because this is very small um let's see we have the I just want Global sales and let's label that the size I how do I expand this haven't done this in a while let me just expand this I don't use pie charts what is happening this is a incredibly large pie chart oh my gosh I am making this um this is becoming a problem there we go uh and what I actually wanted to do was label the uh genre as well as I've been doing in all the other ones and we'll label this now look whether you are a fan of pie charts or not you have to understand that people use them uh some people just like how they look and for certain data it can do well for things that have a lot of different um groupings or categories it usually isn't super great but it does give you some type of order of things give you a quick glance and people use them right so let's not pretend like it's like the the the Hideous stepchild all right people use it people have it in their dashboards and their visualizations all over so it's best to just know what they look like know how to do them know um how to use them best again I'm not a super huge huge fan of it myself I've used it once or twice but one to look out for and again you can come over to here and use is called a box and a whisker plot um it's good for these large um distributions you know this is like the median upper upper lower lower I don't use these a lot but I know a lot of people who love them something to just look at and or mess around with it a little bit it's pretty I think straightforward and it does give you some good insight into your data if you know how to use it now there is one last one that I want to show you I'm just going to create it on a new sheet make it easy uh we'll do year here we'll do some of let's do na sales why not and we are going to make this like this now it's very similar to a line chart but when we break it out by the genre and we add some color you know it's just a different way to visualize this information you can uh you know potentially add some stuff in here like some labels if you uh want to depending on how it looks for you but this is just another way to visualize the data so wanting to give you guys some options wanting to give you some things that you might want to look at if you haven't already used these before four these are ones all every single one that I've showed you are ones that I've at least used once um this one I maybe have literally only used once but the first ones that I showed you the ones I pointed out as the ones that I really wanted you to know are great visualizations to learn how to use and learn how to make useful for the data that you have with that being said that is all that we are looking at in this video again I tried to keep it super easy just wanted to show you some different visualizations the data that you can use to get those visualizations and just some other options in case you wanted to get a little bit uh spontaneous a little bit out there a little bit funky uh to show your boss or something like that thank you guys so much for watching I really appreciate it if you like this video be sure to like And subscribe below and I will see you in the next [Music] video [Music] what's going on everybody welcome back to another video today we're looking at joins in [Music] Tableau now before we get into the tutorial I want to give a huge shout out to today's sponsor and that is udem me they were having a massive Black Friday sale and so everything is about 85% off so if you've been looking at a course now is the time to buy it if you are looking at learning and taking an actual full Tableau course there are fantastic ones on UD me that I have taken myself so be sure to go and check out UD me while they're having this huge sale I will include a link in the description if you want to check them out now let's get into the tutorial all right let's get started and first we're going to start off in Excel I'm going to kind of walk you through the data that we're working with and then we're going to put it into Tableau and I'm going to show you how to do all those joins in Tableau so the first table that we have is this demographics table we have employee ID name of employee employee age and employee gender now look right here because this will be important uh going forward in the demographics table we have 10 uh individuals and they each have an employee ID now when we go to the job title we have our employee ID employee name and the job title but this one is missing Ryan Howard is missing his employee ID and then the very last one there are only seven employee IDs and no names um and so we're going to use all of that and I'm going to show you how to actually do the joins into Tableau Tableau does a really fantastic job of visualizing for you so it takes a lot of the guesswork out um I am going to include a link to my joins video in SQL because these two are very closely connected and and if you understand how the joins work in in SQL you'll understand how the joins work in Tableau it's almost the exact same thing so with that being said let's jump over to Tableau so I'm going to pull this up going go right over here and now we have uh where where we can connect to our data and so we're going to click Microsoft Excel I'm going to scroll down here to Tableau joins file I'm going to open this up and I have it open so I can't use it so let me get rid of that and let's open it again perfect so now what we're going to do and I'm going to show you how to actually open up the joins um in a second but what you need to understand is when you first come here Tableau doesn't automatically allow you to to use the joins they use something called relationships and there are joins on the back end but they call it relationships because they are inferring all of these things they're trying to go in and make that inference for you so it takes a lot of the work off of you and most of the time that works and and you know you just plug these two things in here like a demographics and the job title and it is going to you know help you build those what they call relationships and you can click on this and learn how the relationships differ from joins again there's not a huge difference but it's not as custom customizable and you can't as easily do left joins or full joins or all these things that we're about to look at so uh I'm going to take this one off and what we're going to do to actually be able to look at the joins and and choose what joins we want to use is we're going to do this dropdown we're going to click open and so now we are in a place where we can actually create the joins uh and again it's just much more customizable and so um back when I was using regularly I would use the relationships when it was pretty simple and straightforward cuz almost they almost always got it right but uh you know the joins it it just makes more sense in the way it visualizes it for me so most of the time I'd be using the joins so let's pull over this job title right here and it's going to make this connection now before if you remember just about you know 30 seconds ago when it connected them it was just a line and and so it gave us the this option down here to kind of edit the relationship but now it's giving us this visualization and so let's click on it really quick and what is going to come up is the different types of joins that you can do you can do an inner join a left join a right join and a full outer join and then you can actually choose the different uh data sources and how you're connecting them so again um I'm going to walk through a little bit of this but I think the sequel video that I did on this shows it so well um I would highly recommend using that um and I recommend learning SQL too so you know two birds one stem so I'm going to get into each of the joins how they work what data is going to be displayed um and these visualizations are really going to be helpful and I think that it's it's just nice that they have it because it's a little reminder okay um you know this is what this joint is or this is what that joint is so super super simple so right now we have the demographics table and we have the job title table and so what it's doing right now and let's get rid of this what it's doing right now is it's doing an inner join and so it's pulling everything that overlaps if it matches on the employee ID and the employee ID and so right now you only see one through n but if you remember in the demographics table we had uh 1,000 all the way through 10 so where's that 10th one well the 10th one is not there and that is because in this job title employee ID it only went up to9 and then Ryan Howard just didn't have an employee ID in there for whatever reason so that data is going to be missing now when you are using actual data sets very large data sets which we will use in the next video when we walk through an entire project um when you use large data sets this can be the difference between clean data and very wrong data and and visualizing it correctly and showing completely wrong numbers and so you really need to be sure you understand how your data works together when you're doing these joins so how can we fix this how can we um make it to where we can see all of the data well right now we're only making it to where if the employee ID is equal to the employee ID so we only are going to see through 109 and through 109 we're never going to see Ryan so there are two different types of joins that we could do to make it see it and then there's something else that we can join on to where we can see that data the first that we can look at is the right uh join and what this does is it's going to take everything that is the same but also everything from this job title table regardless of if it has a match in the demographics table so it's pretty you know this visualization does it all it's going to show everything in the right table regardless and it's only going to show things from this table if there's a match so let's try this one and we should see Ryan Howard in the job title table so let's click on it and if we scroll down there going to be n n n n n until we get to over here where we now have the data that we had in that actual table but again this wasn't a match and so we weren't able to see that data so this gives us a way to where we can see all of it um all everything from that right table this job title table and now we're going to click on the full outer now the full outer is going to take everything from both regardless of if there is a match at all and so right here you're going to see Ryan Howard and Ryan Howard now why are there two different rows for it well because in the demographics table there was an employee ID so we're seeing the employee ID Ryan Howard his age and his gender and over here there was no match right but in the job title table again this one didn't have an employee ID and so we we are going to be able to see this data but over here it has no match and so that's why showing us two different rows is because there was no connection there was no match there that's what a full outer joint is going to do now just for uh the purposes of seeing what this one does as well we have the leftand table um and now we are able to see the 110 or or 1010 that we didn't see before um and it's putting in nulles over here because there's no match so that's that is um what we have so far now like I said just a second going to go there is a way that we can do this without using the employee IDs we're allowed to use a different join Clause now there is the name of the employee in both of them this one is called name of employee and in the job title it's called employee name they don't have to have the same column name in order to join it you can do whatever you want so I'm going to get rid of this one and now we are only tying it on the employee name and let's do an inter join and it should be basically everything um except the only piece of data that wasn't filled in which is that 110 over on the job title table and so this way was a slightly different maybe uh less thought of way because normally you do it if there's an ID you go on the IDS but because we had a lack of data for in in one of the tables in the job title table we decided to use a different column to to join on and now we're able to look at all the data together so super quickly that is an inner join a left join a right join and a full outer join and it's pretty easily visualized here and you're able to uh change what you're joining on right here but you're also you can do multiple so if we want to do the employee ID and the employee ID you can do that as well and you can keep going as as many as you'd like um and right here or you can change some of these things uh I don't there aren't a lot of use cases for this um but you know you can absolutely do this um and mess around with this as seen I'm not going to go through it in the tutorial because again 95 plus perc of the joins you're doing you're going to want to do it to where this equals this um and if you want to get into where it doesn't equal or or all these other things which is more complicated I think it's much better to learn that in SQL uh that's my personal preference and so um again all in the SQL tutorial if you want to check that one out so you're able to join on multiple things now let's get rid of that one because we can actually bring in this salary one as well and what you'll see right down here is that we have our employee ID and this is all coming from the demographics so employee ID name of employer employee age employee gender then right over here we have the job title table so employee ID job title employee name job title and then right over here was or is our salary table and so we have employee ID salary and employee salary so again this is a way that you can put all of this data into one place and and just a second we'll go into the worksheet right down here I'm going to show you kind of how it looks because it looks a little bit different um than previous tutorials and so I want to show you how that actually all works together um but again you can create these joins um as well and do the exact same thing that we just looked at and customize the joins customize what you're what you're um uh joining on and then you have your finished product and so right now we have our demographics plus Tableau joins file and we can rename that if we want I'm going to call this um demographics plus joins demo and click enter and so now that is saved so so now let's go down to the go to worksheet we're going to click on that and so up here on our left side this may look a little bit different than it normally does um because it's broken out um on the measure names and the measure values it's broken out by the tables that they were joined on so we can pull in the employee gender now and we can pull in the employee name now um and we can pull in the employee ID again if we want to from the job title table and we can pull in the employee ID from the salary table we could do that if we wanted to it makes no sense uh uh for actually creating any visualizations but you know you can do that and so you probably you wouldn't be able to do that if you hadn't joined these together and so down here in the measure values the values that we have are from the demographics table and the salary table all of the um all of the stuff from the employee title none of those things were um values and so we can't use there are going to be no values down here and so really quick let's take the name of the employee let's take their salary sure why not um let's order that let's take the employee salary we'll do color and uh expan this out a little bit maybe one more time oops just like that and there you go so that is how you do joins in Tableau and I think Tableau does a really fantastic job of making it pretty simple they have the different types of joins when you click on that that join button and it shows you the inner and the left and the right and the full outer and they make it pretty simple um and and and it's just really useful to be able to see that while you're creating it and see the output below like we just did a second ago it it just makes it so simple to create those joins and then just keep going because you already know what your output is going to be and you can kind of mess around with it and make sure you're getting the data that you need in the very next video we're going to be doing an entire project in tap we're going to be using a lot more data and it's going to be a a complete project that you can add to your portfolio and it's going to be a really good time so I hope that you joined me for that one I appreciate your time I hope that this was helpful thank you guys so much for watching I really appreciate it if you like this video be sure to like And subscribe below and I'll see you in the next [Music] video what's going on everybody welcome back to the Tableau tutorial Series this is our very last video in the series and today we'll be doing an entire [Music] project now if you're watching this video I hope that you watch the other four videos in this series just so you can get the basics down you kind of know what you're doing uh this won't be a crazy hard project this is a beginner tutorial Series so I'm trying to make this super easy so you can follow along nothing super comp complicated I promise and if you were wanting to go above and beyond and just make a lot of different dashboards or try a lot of different things there's a ton of data in here and so I'll show you some of the things that I would do you know as we go through it of the things that I would be looking at and some of the different visualizations that I might do as well but again in this video we're going to be singing to a lot of the basics but I'll switch over my screen in just a second I will show you the final product and then we will actually walk through step by step of how to do the entire dashboard and at the end you should have a completed project that you can add to your portfolio or you know just share on LinkedIn if you want to do that as well with that being said let's jump over to my screen and let's get started all right so let's get me off screen and show you what we're going to be working on today this is the final dashboard that we're actually going to be building and so it's nothing crazy right I'm sure you have seen all of these things before um and I'm just going to help you kind of build it out show you what to do the buttons to click um and it's really going to be a simple walk through by the end of this you should be able to do all these things very easily and I highly encourage looking at at the data and looking at these visualizations and seeing what else you can do with it there's a lot of different colors a lot of different visualizations um that you can do with this data I'm just showing you this today and so the more you go out there and the more you do this on your own and you mess around with stuff and and choose different things and see how it all works the better you're going to get and so I highly highly encourage doing that uh so what we are going to be working with today is an Airbnb data set I'm going to show you that in just a second and I'm going to show you the data and we're going to just jump right into it all right so this is the data set that we are going to be using this is the Seattle Airbnb open data set and let's scroll down really quick um there's three different csvs in here and so this is some of the data that we're going to be working with um some date on listings and some pricing and then there's the actual listing that shows um the actual street address the location the price the bedrooms all of these good stuff stuff and then there's a reviews um and it has you know some comments and you know talks about some of the reviews so this is what we're going to be working with but you don't have to go in here and download it I have already combined all these csvs into one I've put it on the GitHub so I'll have a link below so you can just click on that and you don't have to do all the stuff that I did to get this set up um just so you know this is from 2016 so this data set is a little bit old if you want to you can come right here and I will leave this link as well and you can get the data set from you know what is this a couple weeks ago uh this is they they are continuing to update this this is always updated and so you can go ahead and download these but some of these are the CSV Dogz um so you may need to like convert it I don't want to go through that process um on you know in the video and so I am just going to go with what is literally in kaggle um and use that but if you want want to have an updated one for your project I just advise you to go in here and grab it yourself and that should be perfectly good so go ahead and download the data set from the GitHub and we should be good to go so this is the Excel that I was just talking about this has all of our csvs in one place this is you know an Excel workbook so in this reviews actually let's start with the listings because that's kind of where it all stems from uh we have our listing and the DAT or the data in here is um you you know really extensive there's a lot of data in here so let's get over really quick um the listing refers to the actual home that they're renting out the Airbnb so it shows their location um and there's a lot more location information over here I'm getting into it in in just a second so there's the neighborhood the city state um zip code all stuff that you know may be useful there's a latitude and longitude it shows what type of property it is so that's really really good um right over here it has you know how many bathrooms bedrooms and beds um you know sometimes if it's a five bedroom house it has seven beds so that's why there's those two different um Fields I don't know if you're familiar with Airbnb and and you know what they have on there but just something to note uh they have the price this is the price per day this is a weekly price a monthly price and if there's a deposit needed uh and then a cleaning fee as well so a bunch of financial data that's you know super useful we go into it a little bit but there's so much you can do with that um you know if you want to dig into that and that's kind of it the rest of it's pretty uh pretty useless um and there's a lot so there's so much data in here almost you know more than half by far is nothing you would put in any type of visualization um and this is pretty common uh you're not going to get data every column where you're going to be able to use it a lot of times it's just a lot of useless junk and so you have to know what you're looking for and know uh you know what's actually useful so that's the listing then we have reviews now what's really a little bit confusing in here and something that you just need to kind of understand about the data um and something that if you're if you get a data analyst job you need to understand your data because it's very easy to come in here and say okay there's an ID ID field and here's an ID field so that means that those are the same well not in this case um this ID field is actually the review reviews ID not the reviewer ID that refers to like the person this is the reviews ID this listing ID is the actual ID right there so really important to note um and then the L and so then they just have their comment there what they left as a review and then on the calendar um I don't know why I'm scrolled down uh we have this listing idea again so again that listing ID is equal to the ID in this listing table and we have a date in a price so this refers to a specific location and on this day they got $85 for it somebody rented it out um and so then there's these like T's and Fs um let's try to find a blank one really quick here's a blank one so there's these T's and Fs uh the t means that it was taken um the f means that it's vacant I don't know exactly what it means uh what a TF means but that we can deduce that much from this and so you can see when and how much this person was making or this homeade uh in that time so really really good data in here there's a lot to work with um and and so we're just going to be kind of I'll give you a little bit of a use case for it in a second and then we're going to start trying to answer some of those the building out some of the visualizations for that use case uh again you could have 20 different use cases for this data or more um honestly for this data where you can build out different dashboards and different reports literally with just this data but you know we're doing a pretty General broad project and so it's hard to answer all of them so let's jump over to Tableau we're going to get started on this and we are going to build out everything all right so let's come right here uh this is a Microsoft Excel we'll open that up do this one we will open it and give it just a second says it's executing the query it's pulling the data in all right so we have our calendar our listing and our reviews those are the different tabs at the bottom we're going to start with the listing this is the the kind of the main one has um you know the there's I didn't show you but there's about 3,600 locations that they had in there uh let's just have it update automatically I don't know why we need to click on that but um so we have this list listings we have our calendar and our reviews what we're going to do is going to come in here and we're going to open it as we did in our very last video uh for the joins so now that we've opened it we can kind of go in here and we can do the joins as um as needed and so let's go over here and we're going to uh let's start with calendar put it right there that was super slow I apologize all right let's wait for it to get the data start setting everything up did not think it would take this long I apologize no take your time so let's click on here and right now it has the uh the join based on the price which obviously is not going to work um and if you remember there is no ID in this calendar it's just just the listing ID um we can actually look right here there's just the listing ID so we're actually going to put listing ID is equal to ID and right down here we can see that we have a lot of of well you can't see it um but we show that there is a lot of data um and so we know that that is correct we know that that is now pulling in data correctly because it's showing up down here so that's a good thing now in this listings there there are about 3600 um about 3600 listings and so that all the data that's in listings is going to be in there but on the calendar because we converted from a CSV to an Excel workbook it isn't able to store as much information so some of the ones in calendar may have gotten cut off so we can just keep at this inj join because we know that if it's in listings it is going to be in calendar we know that it if it um there may be some in calar Cal that aren't in listings so if we really um you know if we really really wanted to we could do a full outer or something like that I I haven't really thought through this as I'm talking through it in my head but we know that uh everything that's in listing is going to be in calendar and so you know we don't really need to do anything other than an inner join and we can also pull in these reviews and it's going to do the same thing as before where just kind of pulling in the data and it defaults to ID equals ID now we know that that is not correct um because the ID in here is referring to the review ID we need to go to the listings ID so we need the ID be able to you know be part of that listings ID if we do the ID it goes down to 2,555 rows if we do how it's supposed and because that's just you know it's random luck there happen to be some numbers that are in both fields um that tie together if we do the correct one where we hit the listing ID it bumps it up to I think 2, 373,000 oh maybe more than that uh 23 million rows right a lot lot lot more and so it's super important to get these joins right to tie them together on the right Fields if you just do it based off what Tableau tells you because it has that automated um you know it goes into these fields and says okay these are the same exact column name so they're most likely going to be what you're looking for well it was incorrect in this point so it's really important to check those things and make sure you're pulling in the right data again we're going to keep it that inner join um you know if you wanted to you know try to see if there's any other data that correlate we're keeping it simple today but sometimes you need to join on multiple things uh so just uh a you know a tip so let's get out of here um and we are good to go so this is our listings plus Tableau full project that's what we'll that's what we'll be working with um and we we were able to tie all three of these um you know as you call them tables or sheets or whatever you want to call them we were able to tie them together so let's go over here to our first worksheet uh let's see all right so this says Tableau public only works with less than 15 million rows of data we have 23 million rows of data that is uh that's a problem um and when I did this before it didn't do that so I you know we're going to work through this together so this is date reviews I believe this is date for um this is date for the calendar which is going to be a lot of rows of data and so I'm sure that's part of it let's see let's do years we only want 2016 oops we only want 2016 let's do okay let's see what that does let's see if that gets us under what we need um we only want 2016 data anyways so if it's in 2017 we were going to take it out um anyway so we'll see if that gets us underneath I have absolutely if this T ends up taking like 20 minutes I will just cut it and you know you won't have to wait as long as I'm waiting so let's see how long it takes all right so it took about 20 minutes and it did absolutely nothing um one thing I do know is that we don't actually use this review tables at all um just for demonstration purposes so we're going to remove that and let's see if that helps us in any way if it does we're just going to keep it as is um you know the reviews table is really just for demonstrating how to do the joint but we weren't actually using any of the data for any of the visualizations although you could again I'm going to see how long this takes uh and I'll cut ahead all right so that worked uh perfectly it apparently took out all the data that we needed all the rows that we needed to get under that level again I was just doing that to show you the that that joins how you needed to change the columns to make sure that it joined properly we don't actually use for any of the visualization so their end product is going to be totally fine I don't know why uh this didn't happen to me when I when I created this whole thing already um so just going to move forward because uh I make mistakes so uh let's keep moving the first one that we are going to make is that uh is that colorful one I'll probably pop it up on screen so you can see it uh well if I remember I'm going to pop it up on screen um it's the colorful one it's the price by ZIP code so we're going to be looking at these zip codes and kind of see um you know how expensive is each zip code um and before we actually start I just remembered I want to talk to you about the use case for this data I want to imagine you to imagine that you're working for somebody they're like hey where you know I want to start an Airbnb business I want to know where I should go where should I buy up buy a home put it up on Airbnb and start renting it out where's the best place you know what are some of the fact fact that I should be looking at uh and so that's kind of what our use case is so we're going to some of the things that he cares about are things like bedrooms um location which is really important and how much price he's actually going to get how much money can he charge and so he's trying to optimize that to make sure that whatever rental he gets he can make a the most profit from instead of choosing something that you know he thinks would work but you know in the end he's actually not making that much money so those things are important so that's our use case we're trying to help this guy out help him find a really good Airbnb um so let's take a look at these zip codes real quick we have uh quite a few of them and there's one that's null uh we'll exclude that or if if it doesn't have a zip code we'll just exclude those because they're not going to show up on the these visualizations anyways um and so we want to look at the price so we just want to find uh the price which should actually be down here and not the sum uh no we want to look at the average price and let's order that this is great um so this is the most expensive one uh ZIP code 98134 at $26 uh per for the average price uh but let's give that some color really quick Let's uh where's the ZIP code it's up here so let's take that zip code we're going to put it right over here we're going to do color and it's going to give it some uh assorted colors now these colors are going to um when we do the map in just a little bit these colors will um match what we're doing in there and so you know I I like to try to color coordinate things um we're not doing going too crazy with the colors today so this is our very first visualization congratulations it is uh it is complete so uh we can label this one and we can just do price by zip code and I'll make that bold I don't know I usually like it bold we'll apply we'll do like that and boom first one is done uh and this is our starting place to say uh Hey person who's looking to buy this Airbnb here are the zip codes where they are able to charge the most um for for their Airbnb so let's go over to the second sheet and we are going to be doing the map and so um map is pretty easy but it it's pretty easy Once you actually get the data that you need although there's a lot of different data that you can use for the actual U map right here you need something that shows um the location and there's a lot of things that show location in here in fact they already um provide a latitude and longitude and then at the bottom they generated a latitude and longitude from from some different um fields and then there's just a bunch of different um State there's um States there's zip codes there are uh I think another one I yeah like country there's a lot of location data in here so which one do we want to use we want to stay consistent we don't want to deviate from that and start using different um L long longitude and latitudinal uh coordinates because that could throw off our our results completely we want to stay consistent with what we're using so we actually want to use this ZIP code but when we pull it up here it's going to give us uh basically the same um you know it's going to show these zip codes but we were going to right over here we're going to click on this one and now it's going to separate them out so now we have all of these um you know kind of separated out what you might get when you first do this um is it might look like this you may have to zoom in um I know that that happened to me the other time excuse me go to here that's what happened to me uh just when I first did it so uh know that that may happen and we want to change the colors the exact same way that we did them before so we're just going over here we're doing color and these colors do um they do should match up with the um with the other ones let me um exclude this let me see if it does 98134 that's the blue and right over here 98134 that's a blue I I I believe believe they are going to be the same yep and so just scrolling back if you look at the ZIP code on the far right uh they are the same so if you're looking like this section right over here I I'm just wanting to make sure I'm not going crazy uh before I get into this and realize I'm not correct at all so uh now what we want is you know this doesn't really give us any information if I was just to glance at this map I would have no idea what you're trying to show me um any information off this so we want to show some actual information so first thing that we're going to do is we're going to actually add the label to this so that you can see it you know when you're going over here and you see okay here's this um zip code um in the dashboard when we create it you can click on this but if you just want to do it visually without having to click anywhere you'll be able to see okay 98134 that's right here so this location right here is you know able to charge a lot of money it's probably a really nice neighborhood so um and we can back that up by putting the average price so these these two visualizations are really they really go hand in hand we're going to add oops not the sum this one needs to be the average so you go to this measure the sum go to average and there you go and these should match so this should be 206.125 206.000 so this all matches um and we can uh we can actually change that size a little bit if you want to actually get it in um get it within each of these things you know adjust it as you see fits I think that's fine right there um no need to mess with it anymore all right so let me see I think that is everything for this one I don't know if I want to add anything else uh no I'm going to keep it how it is so that is our second visualization again these ones are directly uh correlated and and you know this there's just different ways to visualize it this one you can see actually on the map where it is and the average price this one you can see from highest to lowest so again you know sometimes when you're doing these visualizations you're going to have these accompanying um uh these accompanying visualizations in your dashboard that's very normal so let's move over to the third one and for this third one um you know something that our guy was looking at is he's like okay well you know I'm thinking about listing it on Airbnb but I also want to live in it so I want to know the best times to actually um you know put it on the market for people to be able to use and so I was like okay man no problem uh let's let's take a look at when when are people spending the most money in airbnbs and we actually had that calendar um if you remember let's look let's see this calendar so we have this available the date the listing all of that stuff um and let's look at the date in here uh and we obviously don't want it like this we want it to be more uh more of a Time series and we're going to do be doing that based off of uh the price for the calendar so let's go see if we can find that really quick okay here's the price where is that calendar one let me see okay there's the calendar oh here I totally forgot where that was supposed to be o that looks terrible okay um let's see let's let's start working on this because this needs some work obviously uh this is the worst visualization I have ever seen um so we need to work on this a little bit what we need to do is we need to change oh whoops we need to change some the way that these dates are are seen so right here is a these are two separate things so if I go right here and I Do by quarter it's just going to change the quarters here right that's that isn't really helpful we actually want to keep the year here what we want to do it is by year we want to separate it by year um but we want to separate it let's just do I don't know let's try weak and see what it looks like okay this is great this is this is what we're looking at again um if we went back and Chang this like quarter it uh changed it quarter and then change it to week it would show the quarters but it wouldn't show everything right this isn't all the data that we need and so you know you really need to make sure that you're doing this correct I by default it's almost always year but if you're looking at it via quarter so like let's say somebody comes in you say hey what quarters I Want to Break these out by quarters um and not year-over-year that's how you would do this but in the year we want to break it out by uh the week and you see this huge drop off um at the end well that is actually because the data doesn't go past that um there's just like one day of data or one one um week of data in here with actual um with January of 2017 data so it just drops off because this is an this is the sum so it only adds up to like um 591 th000 compared to like the 2 million so we want to get rid of that um and how do we do that uh let's see I think it's filter how's it format no it's not format what am I thinking bear with me uh let's a filter well I was looking for it I just couldn't find it uh let's bring it back to the 31st let's see if that fixes what we need perfect uh that's all you had to do um and the reason that this is helpful and often times you'd have several years worth of data in here um and then you could have you could do even do something like this um like this one where it has multiple lines the reason that this is helpful is because if I'm telling my friend let's I mean just I'm going to say it's a friend or business partner whatever you whatever you want to use this use case for I'm GNA tell him hey the beginning of January all the way until like you know even February it's like really low it's half so there's not a lot of people traveling because everyone travels when at the end of the year so in November December for the holidays to visit family um and then in the summer for vacations I would tell him just based off this one thing I would say hey over the summer and then at the end of the year and during the holidays that's when I would be renting out your air BNB okay so just this one very simple visualization can help him understand the best times um to do that that may be an intuitive you may have already known that but you can prove it with the data which is always really helpful um and let's see is there anything else that we need to do with this uh I'm just going to label it and I'm going to say um revenue for year let's do bold do apply there we go do I label this last one I didn't let's label that last [Music] one and we'll do price per zip code price per zip code we'll just keep it at that keep it simple um and let's do that all right I believe we have two more so we have done um we've done three of them um we got the zip codes we've got the um you know the time of the year now something else that he was wanting to know is um you know just how things affect it and something that's going to affect the price of the actual Airbnb is going to be the amount of bedrooms so the the larger the house the more bedrooms the more it's going to cost typically so we can take a look at that let's pull in these bedrooms um and that will be our columns uh no it won't what we need to do um and so I I knew this was going to happen I just forgot it until right uh until right now what we this right now is actually a um it's a a value right so it's a number and that's totally um reasonable because if we go right here we do count distinct that's because there's only seven values right it goes there's zero bedrooms 1 2 3 4 5 5 six 7 all the way up to seven bedrooms right now it has it as a numerical value we want to um change that to create it as um these measure names not a value so we're going to um we're going to remove this we're going to go right down here we're going click this drop down and we're going to say convert to Dimension and so now we're going to add it as a dimension so there that looks um much more normal I really quick I'm going to I'm going to keep these in here for a second but we're going to get rid of these nulls and zeros because if a home has zero bedrooms that's a problem um and so we want to look at the price again let's go down here in the listings it should be the price now this is the price for the location per day um if you want to look at monthly or or you know stuff like that they have that data um but we're just going to do the price the average price not the sum um although this is is helpful so just really quick before we change it this is going to show you which ones make the which ones are bringing in the most money it also may show you which ones are the most common um those are all different visualizations that we can do but the one that brings in the most money uh that brought in 63 or that has $63 Million worth of um worth of listings so they all add up those one bedrooms are doing phenomenal half of that are two bedrooms at 30 million three bedrooms at 18 million and so on and so forth so there's a ton of one-bedroom ones we may even keep we could even keep that in there um you know if we wanted to um and then we do something similar later but you can keep something like this in there what we will do really quick though is we're going to do the same thing that we've been doing is keeping average um and we are going to get rid of this cuz if it doesn't have the bedrooms you know that's not helpful to us and if it has zero bedrooms that's that's genuinely a problem I will not be renting an Airbnb with my family uh that has zero bedrooms in it so now we have this and would be really helpful to be able to see that in the visualization I mean it's just kind of hard to see it as is I mean it just does not hurt to add that right here do a label um why is it angled like that maybe I just need to move it out more that looks much better um that's the average price that cannot be right that's the sum that's why so let's go over here let's make that average as well much better because uh if the price was $3 million for a three-bedroom I would not be going there so this is really really useful information for our friend right if um he wants start you know get into those one that one bedroom area you know you're not going to be making a lot of money it may be low cost UPF front but he's not going to be making a lot of money it significantly goes up when you reach these five and six bedroom homes which makes sense I mean if it has five or six bedrooms in it it's probably a really large really nice home and you can charge a lot more money and our friend is uh extremely wealthy he can buy whatever he wants and so he may be looking at these um larger on seeing that there's a much higher return um on his investment the higher and the more bedrooms he goes so we're going to keep it just as it is um and let me see is there's anything else that we want to do with this no we're going to keep it just like this uh and the last one is by far the easiest and we actually just discussed it a little bit we want to know you know what's his competition look like so um for those for the bedrooms specifically so let's go back up to the bedrooms we want that one to be right here in our rows so we show um these and then we just want to count of um how many listings there are so we can do that via the listings ID so here's our listings each ID represents one location or one home so we're going to do that right here uh that looks absolutely terrible that looks terrible what am I doing wrong here um let me see uh one thing we need to do is we want to get rid of these nulls and zeros do that really quick um and then we don't want to do just the ID because I I'm realizing now uh what I'm doing I need to convert this to a numeric so we can do a count on it so let's um oops let me see what what is happening this is terrible all right let's put this back let's make let me see if I can just um do an attribute let's do the [Music] count and let's do text um no it needs to be a distinct count because that's that's basically like um a count of the numbers themselves not each individual ID okay it took figuring out I'm going to keep that in there because you guys need to see uh a lot of you guys like seeing when I make mistakes so you know makes it feel like when you make mistakes it's okay um and I'm all about that so I'm leaving that in there you guys can see me fail a little bit um I just forgot how to do that for a second and this is exactly what we're looking for right we want we now it showed us in that visualization that we were looking at earlier before we um switched it to the average price this is showing us that there are for one bedrooms there's 1,800 one bedroom two that 483 3 that have 206 four that have 55 only five that have 20 and six that have five so the more you go up the less and less it is or the less and less competition there's going to be now is there a lot of demand for four-bedroom five-bedroom six-bedroom uh that's for our friend to figure out um well maybe we'll help them out with that later um in the with the data you know we could look at the reviews that we had um there's so much data in here and we could absolutely figure that out but for what it's worth giving him this initial stuff and he'll have follow-up questions for us later that's how it always works I promise um so now we're good with this one let's label this one did I label the last one I will go back and look um distinct I I'm going to butcher this one I'm going do a distinct count of of bedroom listings I don't that may not make sense at all but we're keeping it so we're going to do bedroom apply okay let me see if I added the label on this one I didn't let me do that real quick we do average price per bedroom again I'm oops you didn't see that I'm just going with whatever is coming to my head this probably wouldn't be what I would keep if I this or like an actual project but it works for now so we have our five visualizations 1 2 three four and five and let's create our dashboard that's going to be this button right here so we're going to click that we are going to uh go right here and we're going to say automatic because we want to use this entire area and so now we're just going to start um you know pulling them over and I'm just going to start from the very first one and go to the very last one keep it really simple so this very first one we'll pull it over it you know it's going to take up the entire space until you start adding all the other ones we'll include this one right here um and well let's leave it as it is you know we'll adjust it once it gets to its final place now we have number three We'll add this one on this side it looks terrible right now but give it a second uh then we have number four we're going to add that across the top okay it's already starting to look a little better and um maybe I I you don't have to keep this in here um but you definitely can uh let's start to adjust things a little bit oops okay let's see if I can zoom in one more NOP I'm going to do it just like that actually let me [Music] see if I can make it even just a little bit closer perfect uh that's the the best you're going to get um if you didn't see I use this um magnifying and then I could click on the area that I wanted to see so we're going to keep that just like that we're going to move this over because that is um definitely not as important um and then we're going to move this way over as well so keep it just like that again this is something where if you want to you can click on this um it didn't I don't know why uh I can't remember how to get those connected but it's you definitely can um but okay I was just clicking on the wrong one that's why that is why but you can click over here and you you know it'll filter um based on so if I go to this one oops [Music] dang oh jeez what am I doing oh this is a travesty okay let's try to get this back all right I'm not touching it guys you get the gist you can mess around with it yourself I'm not messing this up okay so the next thing we need to add is the very last one that's going to go right up here and then we're just going to kind of move it off to the side and let's see going add yeah have this caption um if you've never seen something like this before um and I actually want to make this bigger as well oh jeez give me a second it's it's kind of lagging a little [Music] bit and make this a little bit tall maybe I don't want it as wide but I definitely want a little [Music] taller give it a second yeah let me scooch this [Music] back just like that that's fine uh we can keep it like that in my original one I didn't have this um um you can get rid of this if you want you know you can um you know just exit out right here if you want to do that but there you have it uh this is the entire thing so we started from the very start um we started with this one then this one uh did some um and this is you know all the zip all of our ZIP code work then we took a look at the calendar where we looked at the price and did some time series visualization and then we're looking at the bedrooms and and the count of bedrooms and so this should be really helpful for a friend it should be an initial dashboard to get him going and once he sees us he's going to have a million other questions and he's going to want another dashboard for different data that's in there he's going to ask about okay well what if I want to do it weekly or you know I want to rent it out for the month or you know how many um reviews are people five star reviews are people giving on you know W bedroom two bedroom three bedroom these are all things that you know he may ask and then we'd have to build out in the real world this is what happens all the time you know they make a request and then they're like oh this is great but I also want this so um you know your friend is is going to be right in line with just about everyone else um that has ever gotten a dashboard uh for work or for personal use with that being said this is it um we have done the entire thing now if you want to share this it is super super easy to share um and I'm going to try to remember how to share it uh so we're going to do save to tap public As and we're going to do this and we're going to make it um let's do Air BnB is it like is it a capital B is it like that no that doesn't look right Airbnb uh we'll do full project and we'll save and that is being created right now um and I will save this so if you guys want to go look at this you can um and I'll provide a link in the description as well for that and see if yours looks um similar to mine or better than mine give it a second CU it's thinking all right so here it is so here's our final our final project um and if you followed step by step then you should get this exact or very very similar to this one again I encourage you to if you want to have the upto-date data to go to that um Link in the description that has um the the most recent data and they update that I believe monthly so you can go there get the most recent data and then you can do stuff and you can create a beautiful project just like this um but with the you know the most recent data again I use the kaggle data just so you guys can remember and I encourage you to look at the different data points that are in the Excel there is so much in there and you can use uh honestly like there's probably 30 or 40 other fields that you could be using in there that we never even touched um but for this project we're keeping it pretty simple and so so go do that make completely unique dashboards and and visualizations and create projects and add it to your portfolios so that you can create uh a fantastic portfolio website and get a job and that's what this is all about um it's about upskilling and and getting these skills that you can you know get a job or or do better in your job so I hope this has been helpful I really appreciate you guys joining me and and doing this entire project with me I have no idea how long this is this probably this could be like an hour for all I know um so thank you so much for sticking with me this entire time if you like this video be sure to like And subscribe below and I will see you in the next [Music] video what's going on everybody welcome back to another video today we're going to be starting our powerbi tutorial series now I am super excited to start this series with you guys we are going to be breaking this up in about six or seven videos I don't really like those super long videos where it's like four hours long I like breaking mine up into chunks so that's what we're going to do this is the beginner series and so we're going to start with the very Basics and we're just going to work our way up and I'm going to walk you through every single step of the way it'll be very easy to follow everything will be provided for you so that all you have to do is really follow along and by the end of it you should know powerbi a lot better you should have a lot more com using it now before we actually jump onto my screen I want to give a huge shout out to the sponsor of this video and that is udemy you guys know that I absolutely love udemy I've been using them for years and that is no exception when it comes to powerbi I have taken some of the best powerbi courses ever on udemy so I highly recommend you checking out the ones that I have in the description these are ones that I actually took and I loved the most so if you're looking for a full powerbi course I highly recommend checking out you to me thank you so much again to our sponsor and now without further Ado let's jump onto my screen and get started with a tutorial all right so the first thing I'm going to do is download powerbi desktop I will leave this link in the description so you can just click on it go to it and download it we're going to click this download free button and once we click it you can go to the Microsoft store and I already have it downloaded so when you see it uh it'll already say downloaded but um for you you can go in here you can click download and it will download it for you I'm on Microsoft uh but it may look a little bit different for you if you're on a different system but once that is done we are going to open up powerbi so let's go right down here to our search let's go to powerbi and it is going to open up for us all right so right away this is what it's going to look like when you open it and we're going to go right over here to get data and let's click on that it's going to open up this window and it's going to give us a lot of different options for where we can get data from now some of these are free and some you need to upgrade from but you just taking a quick glance through here you have a ton of options there's databases there's um you know blob storages there's post create SQL or different SQL databases um there's Google analytics there's a lot of places and you can go through the process to connect to that data and you can pull that data in from those data sources now for what we are doing we're just going to be using an Excel I'm going to leave the Excel that I'm going to be using in the description you can go and download it and walk through this with me so what we're going to do is click on Excel workbook and we're going to click connect so we're going to go right here in our powerbi tutorials folder and we're going to click on apocalypse food prep so let's click on that and it is going to connect and pull that data in now right here we have our Navigator and so if you had a lot of different sheets you can click on that and choose which ones to pull in I just clicked on it right over here and we're able to preview the data but I can't load or transform it yet I need to select which sheets I'm bringing in so we only have ones that's the only one we're going to bring in so you can go ahead and load the data or you can click on transform data it's going to take us to powerbi power query which is going to allow us to transform our data so I'm going to have an entire video on how to transform the data but I'm going to give you a really quick glance at it to kind of show you what it is so right up here it says our power query editor this is a the window to basically transform your data and get it ready for your visualizations now you can do this in Excel if you want to and do that before forand or you can do it here and there are lots of things that we can do in here as you can see at the top again I'll have an entire video dedicated to just power query but let's take a quick look at the data and see if there's anything we want to transform quickly before we actually go and start building our visualizations so over here we have the store where we purchased it we have the product that we purchased the price that we paid and the date that we bought it now the first thing that jumps out to me is that this just says date on it um we might want to say date uncore purchased and we're going to hit enter and if you noticed right over here on these applied steps it says renamed columns everything that you do every single step that you apply to transform this data is going to be right over here and if I want to if I go back and I say you know I really didn't want to rename that column I can just click X and it is going to get rid of that and take it back to its original state so again I'm just going to say purchase and we're going to enter that now this is our apocalypse food prep so this is food that we are buying for the apocalypse um for this example and if we look at our products we have bottled water canned vegetables dried beans milk and rice and all of that stuff makes sense except for the milk U milk will not stay or last long in the apocalypse so I think what we're going to do is we're going to filter that out really quickly and we're GNA click okay and right over here again says filtered rows and so now if we scroll down there's no milk so what we are going to do is we are going to go over here to close and apply and it is going to actually load the data into powerbi desktop so on this left- hand side it immediately takes us to the report Tab and what we want to do is go right here to the data Tab and take a look at our data so again there's our date purchased and as you can see the milk is not in there another tab that we're going to take a look at um and again in this report tab this is where we actually build our visualizations the data is where we can see the data and and change it up a little bit and change some small things about it like sorting The Columns or even creating a new column and over here we have this other Tab and is called model and this is especially useful when you have multiple tables or multiple excels and you need to join them to kind of connect them together we don't have that but in a future video I'm going to walk through how to use this entire higher tab so now let's go back to the data Tab and I want to just look at the data really quickly before we go over to the report Tab and we start building our first visualization as you can see I've been buying these different products in different months so this rice I've been purchasing in January February March and April and I've been buying it from three different locations because I wanted to see if I was spending less money at one location on all of the products so then I would just shop there in the future and save a lot of money or if there were specific products that were really cheap at one location but others they were cheaper at a different location so I should just buy like the dried beans at Costco but everything else I should be buying at Walmart and so that's what we're going to look at in just a little bit so let's go over to the report tab right up here at the top there's this data section so you can kind of choose if you want to add any more data now that we are here we can also write queries or transform the data like we were looking at in the power query editor window over here in the insert we can add a new visualization or a text box and then in the calculation section we we can create a new measure or a quick measure and then over here we have share where you can actually publish your report or your dashboard online now over on the visualization section on this far right this is a very important area this is where a lot of the actual creating of the dashboards happen so let's take a look really quick and we'll get into a lot of these things as we're actually building our dashboard so we're not just sitting here looking and talking we're going to be actually building and doing all right so we're going to click right here on this drop down on sheet one it's going to show us all of our columns now two of the things that we wanted to look at were where are we spending the least amount of money buying the exact same product that'll help us determine where we want to shop and the second thing was should I be buying all my products at the same place or are there certain products that they're going to be cheaper at a specific store and I should buy it there so let's start out with the first one which we're just going to see uh with the store and the price uh where we're spending the least amount of money and just at a quick glance we can see we're spending the least amount of money at Costco at $210 versus Target 219 and Walmart at 225 and that really answers our question but we want to visualize it better be able to see it in an easier way so we're going to go right over here and we can click on a lot of these but the one that probably makes the most sense is the stocked column chart and it's going to show Walmart Target and Costco now they're all the same color let's add a legend so we're just going to drag store over here down to this Legend and let's make this larger while we're working on it so now we can see we're spending the most amount of money at Walmart right in between at Target and then at Costco is the lowest and so right there we know that Costco is the place to go for our apocalypse food prep but is it going to be that way for every product I don't know let's take a look let's put this up in this corner and let's start a new one we're going to need to select the product for sure and the price and probably Additionally the store as well and let's click on let's not do this one we need a clustered column chart that's what we need let's bring this over here let's expand this quite a bit and so really at a glance this is giving us everything that we need we can see each product right here and we can see how much we're paying per store and so for Rice we're paying it looks like a lot more for our rice at Walmart while at Target is actually where we are paying the least now if we look at all of these it looks like for Costco the only one that we're really paying a lot more on is on our rice but for our dried beans our bottled water we're paying quite a bit less and really it's pretty negligible for these canned vegetables we're paying maybe what 60 cents 50 60 cents more per can so that's pretty negligible but for the big ticket items um we're really spending a lot less at Costco if we wanted to SP to save just a little bit more money we could go to Target for our rice now if I want to make this more like a dashboard and we're only keeping these two things I'm going to kind of size them kind of like this whoops going to show you that in a little bit I'm going to size them a little bit like this so now that we have that looking good we want to change the title of both of these so what we're going to do is go over here in our visualizations and format your visual uh and we are going to go to this General go to Ty TI and now we can name it anything we really want for this we're going to say best store for product and while we're in here one other thing that I wanted to do is I want to go to this visual go right down here to these data labels now we haven't added any data labels so I'm going to click on and you'll see exactly what it does uh it just puts the labels and the numbers above it so you don't have to actually like hover over it and see what it is now it is actually rounding these numbers so what we're going to do is go down here we're going to go down to values and we'll go down to display units and it's on auto so it's Auto rounding those numbers and we're just going to say none so we can see the actual value of these numbers and we can do the exact same thing over here it probably is a good thing to do um and it just is going to visualize it a little bit differently in here but you can always change that if you want to go over here to title and we're going to say total by store and now we're going to take a look and so in a matter of minutes we were able to take our data from an Excel put it into powerbi transform it a little bit then we're able to create these visualizations that gave us concrete answers to some very important topics we now know that Costco is the place to go for basically every single product except if we're buying rice and if we want to save just a few dollars we're going to head over to Target and that's genuinely going to change my shopping habits for the next several years until the apocalypse happens so in future videos we're going to dive into a lot of the things that we looked at today but just in more detail and then at the very end of the series we're going to have an entire project where we really use every single part of powerbi and create a beautiful dashboard and so that's all we have for our very first video in our powerbi series I hope it was helpful if you like this video be sure to like And subscribe below and I'll see you in the next video [Music] what's going on everybody today we're continuing our powerbi tutorial series and in this video we're going to be looking at Power [Music] query Now power query is really great because it allows you to actually transform the data before you actually get it into powerbi so if you want to make any changes like adding or deleting a column or changing the data type or a ton of other things you can do all of that in power query now without further Ado let's jump on my screen and get started with the tutorial all right so before we jump over to powerbi and start using power query I wanted to take a look at the data and this is the Excel from our last video called apocalypse food prep and in that video we went through and we bought some rice some beans water vegetables and milk all for the apocalypse getting prepared for that now we decided to buy some additional things like rope some flashlights duct tape and a water filter several water filters and after we purchased those uh our boss or whoever we're working with or somebody decided to go and make a pivot table now in this pivot table they kind of broke it out by Costco Target and Walmart and had all the items had some subtotals as well as some Grand totals right here and then they decided to kind of copy and paste that into this and you'll see this a lot when you're working with uh people who use Excel they like to kind of make things like this maybe make it into like a table or or format a little bit differently but you'll see stuff like this a lot so this is what we're going to actually pull into Power query and work with now we're going to imagine that this is all we have this is the only thing we were working with and I'll kind of reference this pivot table a little bit but we're going to pretend this is all we have and we want to transform it to make it a lot more usable to where we can make visualizations with it so let's hop over to powerbi and pull this excel in so what we're going to do is click import data from Excel we're going to click apocalypse food prep and click open and then it's going to bring up this window right here now this is where we can choose what data to bring in so we can take a preview and just click on it real quick and this is the pivot table that we were looking at so it does have that pivot table so we are able to pull in just a pivot table and then we have the purchase overview where it's kind of that formatted um thing that we're just looking at with all the colors we're going to pull both of those in so we're going to pull in the pivot table and the purchase overview now we could just load it or we could transform it and we're going to click transform and that's going to bring us to power query so let's click on transform data so now really quick before we actually jump into working through this and transforming it I want to show you what the power query editor looks like so if we go right over here we have our queries and these are the tables that we actually pulled in and we can click on those and kind of go back and forth between them now up top we have our ribbon and the ribbon offers a lot of functionality we have things like remove columns keep rows remove rows split columns these are all things that we're likely to use when using this power query editor there's also another tab called transform where there's a lot of functionality here as well things like unpivoting a column or transposing columns and rows and using a first row as a header some of the things that we'll be looking at today there's also another tab called add a column and this one's pretty self-explanatory where you can add additional columns like deleting a column creating an index column or a conditional column those are the three main ones there's also view tools and help but we're not going to really be looking at those today and then on the far right side we have our query settings you can do things like change the name so we call it pivot table 2022 and it'll update right over here on our query side and we have our applied steps now our applied steps are extremely important and very very useful anytime we make any change to transform this data it's going to be documented right here and then we can go back and look at it or we could even delete that change in the future if we want to and go back to a previous version of what we just did so when we loaded the data into powerbi it did a few things for us it shows the source the navigation and it promoted the headers and then it also changed the data type so if we want to check we can actually see those things or change those things like this Source right here we can click on this little icon and it's going to bring up the actual path where we got this file so if we wanted to change that or or it changes in the future future we can come here and we can change this file path but we're not going to do that right now so let's click on cancel and let's go back down to change type so it promoted these headers and obviously these headers are not correct we're looking at this pivot table and not the purchase overview but it changed these column headers and so in the future if we wanted to we could easily change those but it did that for us and it changed the type as well so if you look right here it says abc123 all the way over here it's where it just says ABC ABC means it's only going to be text where abc123 means it could be basically anything uh text or it could be numeric so now let's go over to purchase overview and this is the one that we're actually going to be working on the most but we might be looking at pivot table just a little bit to kind of reference it and see some of the differences so before we do anything let's just take a look at how powerbi decided to take this data in so it chose this apocalypse food prep overview as kind of the First Column and that was kind of our header or the title of what we were looking at before and then all these other columns are basically column 1 2 3 four fivs so that's something that we're going to want to change in just a little bit there's also all these blank uh columns right here at the top and kind of these null values as we go along and we'll take a look at those and we kind of we going to want to get rid of some of this and just clean this up to make it more usable for our powerbi visualizations this may be perfectly fine and acceptable in an Excel but when you're pulling it into powerbi the real reason you're pulling it in is to create visualizations not just it to look good in an Excel so we're going to need to clean this up quite a bit so let's go right up top the first thing that I want to do is I want to get rid of these top rows so we're going to go to this top ribbon and we're going to click remove rows and we're going to select remove top rows and we're going to select two because we have one two rows of all nulls and those are completely useless we just want to get rid of them right away so let's cck Okay and it removed those the next thing that we want to do is these this location product and all these dates these are actually the column headers that we wanted so what we need to do now is we want to go over to transform and we want to say use first row as headers and just like that we have location products and these dates as our headers exactly how we wanted them now let's say for whatever reason you know we made a mistake and we needed to go back we would just select remove top rows and that would be perfectly fine now you can see over here it promoted the headers but it's also changed the data type so before if we went to before we removed the headers these were all abc123 abc123 because it had a lot of different data types in there so it just kind of made a generic data type but when we promoted these headers the first thing that it decided to do was also change this data type for us giving us its best guess as to what this data type is and it decided to do this decimal so this one two is a decimal but we're actually going to change that and all you have to do is click on This 1.2 uh or or the data type that it has right here for you and we're going to click on fixed decimal number and let's do replace current and now it's just a little bit better so now it's 2.70 2.5 and that's normally how we would read uh values like this because this is money so we would normally read it to the second decimal just like that and if we have it on the second decimal for some we should probably have it on the second decimal for all all of them so really quickly I'm going to go through and I'm just going to change that and it should be pretty quick so hang with me for just a second all right that is perfect now for the purposes of what we're about to do we don't actually need these subtotals or this Costco total Target total and Walmart total as well as the grand total really we want to get rid of those and so what we're going to do is we're going to go right over here we're going to click on this drop down and we're going to try to filter this data before we actually load it into power VI so we're going to filter and we're going to say remove empty and let's remove those and it's going to take out all of those nulls if we wanted to try to filter this out by saying something like Costco total or Target total we could do that by going right here clicking this drop town on products going to text filters and saying does not contain and let's do insert and we're going to say does not contain and we want to say total and let's click okay okay and again it filtered out all of those things so there's a few different options that you can do if you want to filter out rows that contain either null values or specific values now the next thing that we're going to do is actually get rid of a column this grand total column and so what we're going to do is we're going to click on the very top part where it says grand total we're going to go back over here to home and we're going to click on remove columns and it says insert that's because we're on this filtered rows one right here um but what we're going to do is just insert that and it'll insert right there that's totally fine we can just move it to the bottom now we got rid of this column entirely now this looks really good visually I like how this looks I like how everything is set up the biggest thing about this is that when you're actually wanting to use this for visualizations these columns as dates doesn't really work too well and so what we're going to want to do is we're going to want to transpose this or pivot this to where these dates are actually rows so what we're going to do is select the first date which is January 1st all the way through April 1st and we're going to hit shift and click on that April 1st right there to select all of them at the same time and then we're going to go over here to the transform Tab and we're going to click unpivot columns and let's see what this does and so now what we've done is we've basically recreated our original Excel that we had so let's go back and take a look really quickly at that so this looks almost identical to what we have in powerbi right now and this is extremely usable and very good for visualization and is much much better than this but again we were pretending that this is what we were given at the beginning so you have to imagine you know somebody just handing you this and you need to make it much more usable for visualizations in the future which happens a lot and we actually wanted to create this we just weren't given this now a few last things that we might want to do is we want to clean this up just a little bit we're going to select the data type and change this to date and then we're going to select the value and I double clicked on the value and I actually want to call this cost uh or product cost productor cost and then for the location I actually want this to be called store so now this looks really good but I want to show you one thing really quickly on this pivot table 2022 so let's go back here this looks very similar to how we had it when it first started one thing I wanted to show you uh really quickly and I want to click on this first one we're going to make make this our column header and then we're going to try to Pivot or unpivot this January February March April so really quickly let's do that so we're going to transform use first row as headers so now we have this January February March April now if you notice these are not dates these are actually texts it says January February March and April so if we go to do this and we click unpivot and here's the columns that are cre cre when we unpivot it it is January February March and April these are not dates so we cannot go and change this to a date because that would error out because it's actually text so it's something that you want to look out for it's something that you need to be aware of and you can change that in the pivot table so you want to be aware of how it actually sits and looks in the Excel or whatever data source you're pulling from before you actually pull it into Power query to transform and now the very last thing that we need to do to finalize all of this is go over here to close and apply and once we click that everything that we've worked on is going to be applied to the actual data and it's going to load into powerbi to create our visualizations so let's go ahead and click on that and so now the data has been pulled into powerbi let's go right down here to data and we can see the data right here if we need to transform this data again we can bring it back into the power query editor window by just clicking the transform data button and it's going to bring us right back so I hope that this was helpful thank you so much for watching if you like this video like And subscribe below and check out all my other videos and everything data analyst related I'll see you in the next [Music] video what's going on everybody welcome back to the powerbi tutorial Series today we're going to be taking a look at building [Music] relationships now when you import multiple tables from either the same data source or multiple data sources you want to tie them together so that when you're creating your visualizations everything is connected so in this tutorial we'll be walking through how to create those relationships to make sure that all of your tables are connected properly and without further Ado let's jump onto my screen and get started with the tutorial all right so before we jump over to powerbi and start creating our relationships and our model I want to take a look at the data in Excel we realized we were buying so many products for the apocalypse that we decided to start our own store and we have several customers and some client information down here and so I wanted to take a look at some of the columns and these tables that we're going to be looking at first thing we have is the apocalypse store these are the things that we are selling I know it's a very limited inventory but these are the really high sellers these are the ones that I wanted to sell so we have this product ID our product name price and production cost then we have this apocalypse sales this is how many sales we've actually made to our customers so we have this customer ID our customer name product ID order ID unit sold and the date it was purchased and then we have our customer information right here here are all of our clients so we have this customer ID customer address city state and zip code so now that we've taken a look at our data let's go and load it into powerbi so we're going to say import data from Excel we're going to choose this model right here we're going to click open and we are going to want all three of these so I'm going to click on all of them and we're just going to load it we're not going to transform the data at all so now the data has been loaded let's go right over here on the left hand side to our model Tab and let's scoot this over just a little bit and move back and we're going to move these tables up to where it's a little bit easier to see so right off the bat you can already see that there are these lines between these tables so there are already relationships that powerbi has automatically detected and created from my experience powerbi actually does a really good job at creating these relationships automatically but we're going to go in and take a look at these and kind of see what everything means and then we're going to go back and create these relationships from scratch just to make sure that we know how to do every single part so to get it started let's double click on this line connecting the customer information table to the apocalypse sales table and it's going to bring up this edit relationship page right here so this line right here connecting these two tables actually gives us quite a bit of information without actually having to click into this edit relationship page what this is showing is that we have a one to many relationship and there's only one or a single crossfilter direction and you can find both of those things right down here and I'm going to walk through what those mean in just a little bit on this page you can also see the columns that powerbi decided to choose in order to tie these two tables together now for our example they decided to use the customer and customer right here from the customer information table as well as the apocal sales but I don't really want to use those specifically because on this apocalypse sales table I might remove this customer information and just keep the customer ID it may have chosen these customer columns because they have the exact same name and really the same information but I want to use this customer ID anyways so what I'm going to do is I'm going to click on that column and click on this column and then I'm going to click okay and if we go back into it by double clicking again we're going to see that and now save that and if we did what we just did before which is kind of hover over it it's going to show us what those two tables are joined on so opening this back up let's go down here to this cardinality and cross filter Direction cardinality has several different options that you can choose from you have one to many one to one one to many and many to many now for this example we're looking at apocalypse sales and we're going apocalypse sales down to customer information now there are a lot of rows in the apocalypse sales but there's very few in this customer information and there's only one customer per row whereas in the apocalypse sales up here the customer can have several rows for several different orders so that's why the cardinality is many to one now if we flip this and we say we want the customer information here and we want the apocalypse sales down here we tie that together now it's going to flip and it's going to say one to many now let's look at the cross filter Direction and there's only two options here it's either single or both and if we choose both and we click okay this now goes from a single arrow pointing in one direction to two arrows pointing in both directions but what does this really mean so in order to demonstrate this I'm going to put this back to a single Direction and what we're going to try to do is connect the data over here or the columns over here to the columns in this apocalypse store so let's go over here to build a visualization and what we're going to do is we're going to take this customer information and let's just say we want to look at state so I'm going to click on state right here and I'm just going to make this into a table and the customer information table is only tied right now to the sales table so we're actually going to go over to the apocalypse store and we want to see how many product IDs are being bought in these different states so really quickly we're going to come up here and create a new measure and all we're going to say is this measure is the count of Apocalypse store product ID and we're going to create that and now we're going to select it so it's added to that table so now what this is showing is that there are 10 product s which there are 10 products for each of these states but that's not actually technically correct because not every state purchased these 10 different items if we go back to our model and we change both of these to a both Direction and then we're going to go back and see what changed in our numbers so now let's go back to our visualization and now we can see that Minnesota actually only ordered seven different product IDs Miss Miss 8 New York 99 and Texas 10 this is actually much more accurate than before when you use the both option it takes these tables and treats them as if they are a single table but the single option is not going to do that and so for our example if we're trying to connect this table to this table and one of the last things that I want to show you is this option right down here which says make this relationship active now if we don't click list and there are other options in here that connect these things like the customer to the customer then that may be the active relationship but if I select this is the active relationship that means this is going to become the default relationship between these two tables so now let's come out of here we're going to click cancel we're going to zoom in just a little bit and bring these tables a little bit closer so we can zoom in just a little bit more now we are going to go ahead and delete these so we're going to say delete yes and delete yes so just for demonstration purposes we're going to build these relationships from scratch so we're going to come over to the customer information table and we're going to drag it all the way over here and put it on top of this cust ID or the customer ID in Apocalypse sales and it's going to automatically create that relationship and we can open this up and as you can see it created the relationship between this customer ID in the apocalypse sales and the customer ID in the customer information it also defaulted the cardinality from many to one and the cross filter direction to single so we're going to go ahead and change that to both and click okay and then we're going to come over here to the product ID in Apocalypse store and drag this over the product ID in the apocalypse sales and again if we open it up it created that relationship for us it created the cardinality automatically and we're going to change this cross filter direction to both and click okay and so on a really small scale that is how it works of course it becomes a little bit more complex the more tables that you add and the more relationships that are created but this is how you're going to actually create the relationships in the model tab within powerbi I hope that this tutorial has helped you understand this concept a little bit better thank you guys so much for watching I really appreciate it if you like this video be sure to like And subscribe below and I'll see you in the next [Music] video what's going on everybody welcome back to the powerbi tutorial Series today we're going to be taking a look at Dax [Music] now DAC stands for data analysis expressions and it's basically a library of functions and operators that help you build formulas you can use Dax to create measures and calculated columns within powerbi which can really give you a lot of insight into your data honestly it is not super complicated and hopefully by the end of this video you'll have a lot more confidence actually using Dax and powerp so without further Ado let's jump onto my screen and get started with the tutorial all right so let's take a look at our tables and data before we get started so we have two tables the apocalypse sales the apocalypse store for this apocalypse sales table we have the customer product ID order ID unit sold and the date it was purchased and then for the apocalypse store we have product ID product name price and production cost now these are joined together or they do have a relationship together via the product ID so what we're going to be using are these new measures and new columns to create our Dax functions so really quickly let's go over to this report Tab and let's drop down our Fields over here so we can see everything and so to get us started we're going to go right up here to apocalypse sales we're going to rightclick and click new measure and it's going to open up this right here which is basically our bar where we can create our functions and so right here it's automatically given us the name measure but we can change that and we're going to say count of sales so now we can start writing our Dax function that's just going to be the name of it and what's going to show up right over here once we click enter so let's go over here and we're going to say count and as we're typing it's automatically giving us options it has something called intellisense if you've ever used other Microsoft products intellisense is their kind of autoc completion that helps you look at other options very quickly and so we're just going to click on this count and it's prompting us to put in a column name and so we can come down here and we can select one or we can type it out and it'll try to predict and help us choose which column to select so for us we're going to use this order ID but let's just start typing it out we'll say order ID and then we can click on it and we're going to close this parenthesis and click enter or you can go over here and click this check mark but we're just going to click enter and so over on this right side it finalized that and save that and we can actually look at that by clicking on this box next to it and we want to look at the this in a table so now we can see that there are 74 sales now for this we want to see who's buying our products we want to see what our what our client name is so we're going to go over here we're going to choose customer and we're going to put customer on top of sales and we're just going to take a look at it like this so now we can see that our number one customer is Uncle Joe's Prep shop he has 22 orders now they have the most orders with us but it doesn't necessarily mean that they're spending the most money with us but we can take a look at that later the next thing that I want to take a look at is how many products we're actually selling what are our big products that we're selling we have 10 different items but I don't know exactly which one is selling the best if if one is doing really poorly and getting no orders this is something that I want to look into so all we're going to do is go right back up here to apocalypse sales again right click and select new measure and for this one we're going to call it the sum of products sold and all we're going to start out with is by doing sum and if this seems familiar to something like Excel you're 100% correct it is very similar and remember these are both Microsoft products so there's going to be similar functionality in both of them and so this Dax is going to have a lot of similarities to exactly how it has it in Excel so we're going to do an open bracket and now what we're going to choose is this units sold we want to sum up all of these units sold and see how many we actually selling so we're going to say units sold I'm going to hit tab it's going to autocomplete that I'm going to close my parenthesis and I'm going to come over here and click this checkbox so now it's created that measure and we're already selected in this table so all we have to do is click the check mark and it's going to show us that we have 3,000 total products sold and we can go through here and see what the big sellers are and probably the biggest one that I see right off the bat is this multi- Tool Survival Knife so these Dax functions that you can write can be very simple and lead to really good insights that you can use for the visualizations later on now I want to take a look at the difference between something like sum which is an aggregator function and something like sum X which is an iterator function because if you add X to some of these aggregator functions you can create them or or make them into an iterator function so you can have some and some X or average and average X adding X onto the end of them can make them to an iterator function so let's take a look and see how that actually works I'm going to show you the difference and then I'm going to talk through the difference at the end so really quickly let's go back to our data and let's go to the apocalypse store now what we have right here is we have the price and we have the production cost and we want to see how much profit we're getting from each of these as well as we can take a look at the unit sold and see how much money we are actually making so what we're going to do is we're going to come back over here we're going to go to apocalypse store we're going to right click and create a measure and in just a little bit we're going to be creating a new column and that'll kind of show the difference really well so we're going to create this new measure and we're going to name it profit and we're going to come over here and what we're going to do is we're going to take the sum oops we're going to start with our sums we're going to take the sum of the price and then we're going to close that parenthesis and we're going to subtract the sum of the production cost so all that does is it says if something cost $20 if we sold it for $20 and it only costs us $10 that's $10 in profit for that item and then what we're going to want to do is we're going to actually want to encapsulate that really quickly because we're about to use multiply and then we're going to sum and now we're going to take the units sold so how many units were actually sold at that profit that we just made so let's see if that works and let's click the check right here and so we have the profit so let's click on the profit oops that's not what I wanted to do let's use a new one or let's create a new uh table we're going to click profit let's make it a table and I'm going to pull this right over here now we have our profit but I really want to know is which customer is spending the most money at my store so we're going to come right over here we're going to click on customer and I'm put customer at the top and just at a glance we can see that Uncle Joe's Prep shop is spending the most money at the store now now what I want to show you is the difference between sum and sum X so what I'm going to do so I'm going to go back to this profit and going to copy this this entire thing and we're going to go back here to this table now we just created a measure and we were able to break it down by each customer so let's go back over here now let's go up here to home and we're going to create a new column and we're going to call this profit profit underscore column and we're going to literally paste the exact same thing into here and we're going to hit enter and each row is the exact same thing so what it's doing is it is going through the price and it's adding all of it up and calculating it at the bottom it's adding the production cost it's going all the way down and calculating it at the bottom and then it's going over and looking at how many units it sold and then it's performing this calculation up here and then it gives us the total and it's doing it for every single row but that's not really what we wanted to show what we wanted to show is the profit for each row what we wanted to say is here's the price for the Rope the production cost for the rope and then how many units we actually sold and then it'll calculate that and give us the actual profit for just that row but we cannot do it by just using this sum what we need to do is use something called Su X so let's add another column let's go back to home say new column and now we're going to say profit underscore oops underscore column underscore sum X and now we're going to use sum X and hit Tab and we need to choose the table that we want to put this in so we're going to say apocalypse sales because that's the table that we're looking at right here we're going to say comma and now we need to input an expression which it says it Returns the sum of an expression evaluated for each row in a table before when you're just using sum it's looking at all of these combined now it's taking it row by row so what we're going to do is basically input the same thing as we did before I'm going to copy I'm going to paste that it's not going to be correct I need to get rid of these sums but it's basically the exact same equation give me just a second and let's get rid of this some and let's see if this works so let's click the check button and now this looks a lot better so what this is now showing us is at a row level this nylon rope made us 51,000 almost $52,000 the waterproof matches made us $115,000 and we can go down and look at each item and see how much that actually made us versus this profit column and so that is the biggest difference between sum and sum X hopefully that made sense I know that sum and sum X and and the difference between an aggregator function and iterator function can be a little bit confusing especially if you've never done it before but hopefully that was a good example for you to understand that concept now let's go back over here to apocalypse sales right here we have a date purchase now in the Dax function we have some ways that we can interact with dates and so I want to take a look at those really quickly so we're going to go right up here and click on new column and we're just going to leave that as column but what we're going to say is day so there's a few different ones we have Day dates YTD next day previous day and weekday and they all are pretty self-explanatory if you click on it let's click on weekday it says it's going to return a number from 1 to 7 identifying the day of the week of a date so let's use this really quickly and so we're going to say date purchased and and click tab hit comma and it's going to give us a three different options basically it's a one a two and a three um right here if you hit this button read more you can read more on it this is going to say Sunday is equal to one Saturday is equal to seven I like this one personally which is Monday equals one in my brain it just makes more sense so I'm going to click on two I'm going to close that parentheses and we're going to I guess I'll say uh let's say day of week for the column let's click that checkbox and now Saturdays are equal to sixes Mondays are equal to one this allows us to see which day of the week people are buying the most products on or or which day of the week is somebody submitting their orders on and so let's go over to our report let's get rid of this we just going to move this oh jeez I hate moving stuff sometimes all right really quickly I want to show you the difference between what we just did and what we already have so we have this um date purchased and let's make that into a bar graph and what we're going to be taking a look at is actually the units sold so right here we have this and obviously for we don't want 2022 we're going to get rid of the year we only have one quarter right here we can see January February March so we can tell that January has the most sales or the most units sold in that month if we get rid of that we go down to day we do have some information but we don't know what day of the week it is it could change from month to month and it's really hard to tell exactly what if there's any pattern there at all that's where what we just created comes in handy so let's recreate this exact same thing but instead we're going to use day of week so we're going to select day of week and unit sold let's drag that down and move this over right here and this day of the week should be on the xaxis and it's really easy now to see if there's a pattern here there's really not at least not for this fake data that we have um but just I I want these uh data labels on really quickly um it's not easy to see if there's any pattern again Monday has the most so maybe that that I mean it goes down a little bit and then it picks back up so maybe middle the week is our least uh sales day our Wednesdays and Thursdays are a little bit lower than the rest and the beginning and the end of the week tend to be the highest again not a huge pattern but you know it's much easier to see if there is a pattern from week to week or what day of the week now that we use this weekday function and so this can be really really useful let's go back here to our data now we're going to look at our last Dax function for this video let's go up here and create a new column and we're going to be looking at something called the if statement now if you've ever used Excel I'm sure you have heard of this and you can do the exact same thing here in powerbi and so we're going to name this one order size order undor size and so all we're going to say is if we're going to click on this one right here we need to perform our logical test and then we want to say if it's true what's our value and if it's false what is our value so what we're going to be looking at is units sold so we're looking at order size so we're going to say if unit sold is greater than 25 what's going to happen if it is true if the order is larger than 25 you want to say it's a big order and if it's not we want to say it's a small order super simple we'll close that parenthesis we'll click okay and now really quickly we're able to see if this is a big order or a small order and so that is all I have for you today there are a lot of other dox functions but the ones that we looked at today are ones that are very common ones that you'll see the most and there can be a lot of really complex and intricate Dax functions that you can create and in our project at the end of this series I will be sure to include some more complex Dax functions but hopefully this gave you a good introduction into Dax so you know how to use it a little bit better thank you guys so much for watching I really appreciate it if you like this video be sure to like And subscribe and check out all of my other videos on everything data analyst related I will see you in the next video [Music] what's going on everybody welcome back to the powerbi tutorial Series today we're going to be looking at how to drill down in [Music] visualizations so when I say drill down I mean you're basically adding another layer beneath the top layer of the visualization and when somebody clicks or drills down in that data they can see more insights and more information on the top level of data when you drill down you can also drill up and I will show you how to do that in this tutorial so without further Ado let's jump on my screen and get started with the tutorial all right so before we get started I wanted to remind you that you can find the data that we're going to be working with in this tutorial in the description you can go and download it from my GitHub now the two tables I'm going to be looking at are apocalypse sales and purchase tracker and if you've ever created any visualizations you've probably seen something like this where you'll have the store and the price and this is the the things that we actually bought so this is the total amount of Apocalypse prepping uh equipment that we bought and we'll put the store in this Legend right here and you've probably seen something like this and if you're anything like me you're going to be in a meeting and you're going to be presenting this and some higher up is going to be like hey Alex that looks great but I want to you know see what things we actually bought in Target and how much this cost can you create a visualization for that and you're going to be like well I could or I could use drill down so you could have done this in the first place uh which you should have so what we're going to do is all we're going to do is we're going to say we're going to say the product right here and these are going to be the actual things and we're going to put it right under store now you can't see these things right but there is a a hierarchy here so once we added this these options became available let's take it out and all those just disappeared and then if we add it back right here they came back and so you can do right here which is is click to turn on drill down you can go to the next level in the hierarchy or you can even expand all down one level in the hierarchy so let's look at each of those really quickly so let's click on this one it's just going to turn on drill down mode so now if I go and I click on target it's going to drill down into these and if we want to I can then put product under this Legend and we can see all of those things but of course if we go back up it's going to be all broken up into this clustered column chart which is more like um this which isn't exactly what we were going for but it works now uh let me get rid of this I actually want store in the legend now if we turn that off and we click it doesn't do that anymore so what it does now is it just highlights Walmart it highlights Costco it highlights Target so we're going to keep that on uh but we can also do something called going down in the next level of hierarchy so let's click on that and so now this is going to go down to the next level down to this product level because that is the next level and now it's going to show us each of those things but it's going to have it broken out by the store and so it's a completely different visualization but all within the same Realm of the data that we're looking at and what we actually care about so let's go back up in the hierarchy and then let's use this one right here which is expand all down one level in the hierarchy and so this one is again extremely similar except it just visualizes it differently and now what it's doing is Walmart rice Target dried beans Costco rice so instead of having an all uh like this one where it's stacked on top of each other it's breaking it down individually so this one column would become three separate columns now I'm going to minimize this right here uh I'm actually going to go back up in the hierarchy just for visual purposes now I'm going to show you one more example we're going to use this apocalypse sales up here and this is one that I actually use all the time so the one you've seen you know you'll get stuff like that especially if you're working with like sales and stuff but I work in operations right so I have a lot of order IDs product IDs stuff like that now this one this one genuinely I use quite often I'll have a customer U let's make it we'll just go like this we have a customer and we have unit sold and let's use the customer as the legend so let's make this one quite a bit larger and I'll have something like this and they'll say okay well we want to see the order ID s that go with it because we want to know what orders are actually happening for each of these people obviously I'm not using this exact data but very very very similar and all you have to do is take these order IDs and slide it right under here under customer and this visualization right here is something I've done a thousand times because what happens is is someone some stakeholder in our company is saying hey Alex we want this and we want to know we want to drill down on this IP address we want to drill down on this certain database we want to drill down on something and we want to see the order IDs within them so then all you do is you turn on drill mode or drill down mode you'll click on it and you can see every single order ID that's in there and then they can go and look those up in their system and resolve them or whatever they're trying to do with it and it helps a ton and it's very very useful this one is extremely applicable and that's really all drill down is again you have these different hierarchies as well um but for different things it's not as useful as you can see we also have this hierarchy which again is not as useful so it just depends on the data that you're using and how you want to use this drill down effect but I promise you that drill down is used all the time especially when you're giving presentations where people want to know more information than just the the visualization that you're presenting so I hope that this has been helpful I hope that you understand drill down a little bit better if you like this video be sure to like And subscribe and check out all my other videos on powerbi thank you and I'll see you in the next video [Music] what's going on everybody welcome back to the powerbi tutorial Series today we're going to be taking a look at conditional [Music] formatting now conditional formatting may sound familiar because we looked at it in the Excel series and it's very similar how you use it in Excel versus how you use it in powerbi conditional formatting allows you to take a table or a matrix within powerbi and use those cells to color code them and create gradients and different visualizations within the actual table or Matrix I'm excited to start this one so let's jump over my screen and get started with the tutorial all right so before we get started if you want to use the data that we're using in this video you can find it in the description on my GitHub now conditional formatting is super simple and you've most likely used it in Excel before but you can also use it in powerbi and let me show you how to do that so the first thing we're going to do is come over over to our apocalypse store and we're going to pull up our product name as well as the price and what we can do is come over here and we're going to go to price and it has to be under the columns so you can't come over here and do this we're going to come right over here to price and we're going to right click and let's go to conditional formatting and we have background color font color icons and web URL let's take a look at background color first this is most likely the one that we'll look at the most so we're going to get this pop up and I'm going to slide this over now there's a lot of different things we can customize in here and the first thing I want to take a look at is format style we have the gradient and what it's going to say is the lowest value will be this color highest value will be this color it'll give us this gradient color scale and so we'll use that in just a little bit but we can also create rules kind of like an if statement and if it is between this range and this range we give it a color and if it's between a different range and a different range we'll give it a different color so we'll also try that one and then we have this field value uh and this one is one that uh honestly I don't use that much I've used it maybe once and what you can do is select a text field like customer and you can do some summarizations on the first and last and that is it so what we're going to do is we're going to look at gradient specifically for not the customer but we're going to go back to the apocalypse store and we're going to do it on the price now what I'm going to do is keep it as the count because this is what the default is and we're going to go back and fix it later but what we want our lowest value to be is this bright green showing that this it's a cheap product it's easy to purchase the high value ones are going to be just the shade of red more expensive and we'll do it on the count now remember the count is on each of these and we're not doing a count of how many are sold we're doing a count of each product so it's just one per row so it all should be the same color let's take a look so it is all the same color but what we really want to show is the actual price not just the count of the price so let's go back to conditional formatting we're going to click the background color again and this time we're going to change the summarization now you can do sum you can do average minimum maximum it really doesn't matter for this example the number is the same regardless of really which one we choose so we can just choose the minimum and it's going to choose the minimum of each row which is the price so we're just going to select minimum for this example we'll select okay and it should correct it accordingly which means the bright green is the lowest and it goes all the way up to the highest which is the red now let's go over here to apocalypse sales we'll add in the units sold and let's move that out a little bit and I'm doing that on purpose because we're about to look at something within the conditional formatting so let's go to unit sold and we'll look at the conditional formatting for this one now if you noticed we now have a new one on here called data bars now we're able to see data bar bars on unit sold and not price because unit sold is something like a sum an average something that's aggregated but let's take a look at datab bars because I want to show you how to use this and then we'll go back to the background color so for data bars we are going to taking a look at the lowest to the highest value again we're going to go from bright green all the way to this exact red it's going to be from left to right and what it's going to show you is if it is a positive number which all of these are is going to be a green bar basically representing the number that you see in here along this line so let's click okay and we're going to be able to see the highest numbers and let's scooch this over quite a bit so you can kind of get a better understanding and we're going to do it from highest to lowest so we sold the most multi-tool survival knives at 477 and so this entire bar this row is entirely filled up or almost all the way filled up while as it gets lower and as we sell only 182 solar battery flashlights the bar is going to represent that and show that now I'm about to completely mess up this visualization on purpose because it's about to get very messy to show you that you can do a little bit too much uh it is possible what we're going to do is we're going to go right over here to this background color unit sold and instead of gradient let's look at rules now with the price we just did a gradient scale but we can do basically groups of these and say if a number is greater to or equal than this number then it's going to be a certain color and then if it's in a different range we can give it a different color so we're going to say if it's greater than or equal to zero and we're going to say number not percent and if it's less than 266 because we have 265 right here let's make it a nice uh like gold a beautiful lovely mustard gold just just great now we're going to say if it's greater than or equal to we'll do 260 6 6 because this is less than 266 so it should be greater than or equal to 266 number and if it is less than we'll say 500 now we want to do this one and we'll give it uh let's do like a peach and we'll click okay and now we have another conditional formatting on top of that that can give us more information now again you should not do this it's just too many now let's go one step further and make it even more ridiculous and show you one more thing before I show you how you may actually want to use this uh let's go back to unit sold we're going to rightclick go to conditional formatting and you can do something called icons um font color is the exact same thing as background color except it changes the the font and so I'm not really going to look into that one icons are very simple extremely similar to Excel and how you've seen them and the rules that you can apply to them are basically the same as if you're doing like a gradient and it's these if statements that we saw before now it Auto gives us this right here which basically says 0 to 33% 33 to 67 67 to 100 if it's in the bottom 3% it gives us this red the middle is yellow and the top is green so we can go through and change all of this but honestly this looks pretty good so let's click on it and so the ones that are our least sellers are these red ones right here and the top sellers are up here now this is just based on unit sold and this looks absolutely terrible so let's kind of take this exact information but make it a little bit better so we're going to create a new visualization or at least a new table so let's click on product name and we'll take the price unit sold and revenue and what I think makes the most sense for looking at revenue is these data bars right here but there's only one problem I can't do that because it's not summarized like unit sold was but what I can do is to get that those data bars is I can come right down here instead of saying don't summarize I can summarize it and I can just click the sum so it now was summarized it's the exact same number but if I right click on here as sum of Revenue I go to conditional formatting I can now use those data bars and so we're going to use those data bars and we're going to say for the lowest value and the highest value and let's just make it a nice maybe a darker green I don't want it to well that's that's hideous let's make it this color right here a nice dark green and there's no negative so it doesn't really matter we're going to go left to right and you can show the bar only but we're going to keep it because I want to see it and we're going to go just like this we're going to order and this is pretty telling um honestly I did not think the weatherproof jackets were performing so well but I mean they are by far a number one seller so you know our weatherproof jackets multitool survival knives and the nylon rope are perform outperforming all of our other products so those my might be the ones that I focus on the most while duct tape the n95 masks and waterproof matches I mean those are those are garbage so I might be looking to replace those in the near future with some other items that might sell a little bit better so that's how you use conditional formatting and it's actually pretty useful there are a lot of times where I've done something like this in an actual visualization for work and it looks something like this it just depends on what you're visualizing but this is very much a simple thing that you can do to just add a little bit more information and and actual visual to this little chart or table that you're going to create sometimes it's just better to have these simple visualizations on this table rather than just having the numbers themselves makes it a little bit more easy to read and understand so again I hope that this was helpful thank you guys so much for watching I really appreciate it if you like this video be sure to like And subscribe and check out all my other videos on powerbi and I'll see you in the next [Music] video [Music] what's going on everybody welcome back to the powerbi tutorial Series today we're going to be taking a look at bins and [Music] lists now bins and list are really useful because they allow you to group things together to analyze and visualize them easier so in this tutorial I'll show you how to create your bins and lists and then we'll create some visualizations to show you how it can be helpful so without further Ado let's jump on my screen start with a tutorial all right so before we get started I wanted to let you know you can go and download the data that we're going to be using in this tutorial in the description below is on my GitHub so we are going to be looking at bins and lists today um and for this we're going to be going over here to this apocalypse sales uh and let's open up our data right over here and we want to look at apocalypse sales really quickly I feel like more people would know what a bin is so we'll kind of start with a list just go a little bit backwards than we normally would uh I'm going to use this customer or we're going to use this customer column right here for a list really quickly and you can do that in two ways you can come up here and you can right click on the customer and go to new group or you can come over here under this uh the Field section on the far right and go to customer rightclick and click new group so let's click on that now and right now is only giving us the list type it's not giving us bins because bins have to be numeric so we really can't do that at the moment um so we're going to call this just customer groups just or or we'll actually call it list just so it's easier to recognize when we create it and so all we're going to do is we're going to basically group these but it's going to be called a list and so what we're going to do is we're going to select and we're going to select and we're going to say group and click on this group button and then it creates this Alex the analyst apocalypse Preppers and uh this prep for anything prepping store so that it kind of named it for us but if we double click on it then we can rename this and we can call this the best prepping stores and then we have these last two and we can we can click on one and then click control and click on the other one so we get both of them and then we can click group and we can call this and we'll double click and we'll call this the worst prepping stores um and then that's it and that's all we have to do and what we're then going to do and if you want to undo this and you want to switch it up and do whatever you can click on group but we're not going to do that we're going to click okay and here is the column that it created and it basically tells us what list we put it in if it's Uncle Joe's Prep shop that's in the worst prepping stores list and if it's the Alex the analyst apocalypse Preppers that is in the best prepping stores so it's kind of like an if statement you could even create a calculated column do it on this customer create an if statement this is just a lot faster and a lot easier than doing that but it basically would do the exact same thing now you can use lists as well on things like numeric so let's say we have order ID and we'll go to new group and it's going to Auto go to bin because typically that's what you'll use but you can do list as well and let's say you know we want to say we want to call these like we'll group these and call these the first um we'll call this the first customers or the first orders because we're looking at order IDs look at the first orders and then we will go back here we're going on the left side we're going to click oops we're going to go back to the top we're going to hit shift group all of these and we'll say the latest orders and you absolutely can do this um again this is kind of like an if statement right so you're saying if it falls between this range and this range then it's called the first orders and if it's between this range and this other range it's the latest orders um again it's just a much simpler version of an if statement and so you don't have to write it all out you can just have this user interface kind of do it for you uh and and it's really really useful so now let's talk about bins and by far the easiest way to demonstrate this and I'll show you one other way uh but by far the easiest ways to show this is by using age and so uh for absolutely no reason whatsoever these customer IDs uh who are right here in this customer information they decided to give us some of their buyer information who are actually buying their products on their website or in their store they just decided to give it to us as well as some uh simple demographic information I I don't know why but what we're going to use bins for is grouping these age brackets so you know you might be interested in say well I want to know if my core population who are buying my products are within a certain range and you don't want to look look at every single age because then it just you know in your visualizations it's not going to look right you want to kind of group them make it easier to visualize so what we're going to do is we're going to go through here and we're going to basically go by tens so 10 20 30 40 50 60 and see what age bracket these people fall in so we're going to go to age we're going to right click and we're going to say new group and we're going to go to bin and we'll leave it as a default age bins um and you can do two things you can do the size of the bins which splits it uh uh which splits it by this number right here or you can go based on the number of bins so if you only want to do five different bins it'll calculate that for you and it'll say okay if you only want five bins you're going to have to do it at 12.2 if you want 10 bins it can be 6.1 but it is completely up to you on how you want to do that um you can do the size and we'll just say every 10 which is what we're going to do or you can go through and then you can create you know the how many many bins you actually want so let's go ahead and click okay and it's going to create those bins for us so if somebody is 78 they're going to be in the 70s bin if somebody's 41 they'll be in the 40 bin if somebody is 29 they'll be in the 20 bin and so on and so forth so when we go to visualize this we don't have you know 71 72 73 74 have a lot more things on our visualization it'll just be the 70 or it'll just be the 20 now we can also use bins on dates as well so let's go back to apocalypse sales we have this date purchase so we can create a bin for this as well so let's go to date purchased let's go new group now you can also create a list and that's totally fine if you would like to do that um and it would look kind of like this where you can go through and you can select it and you can say okay this group all these dates you can group those and say this is going to be January uh and you can do that and that's totally okay um but for this one we're going to do bins I think it's a little bit easier to do bins because what we can do is go right here and we can specify if we want seconds minutes hours days months or years and so um for the data that we have it goes January February and March so we're going to do months and we're going to say the bin size is going to be one month so each month should have its own bin so it'll be three bins total so we're going to select okay and as you can see on this right side we have January of 2022 and that correlates to the January over here then it goes down to February and then it goes down to March and then when we visualize this uh we don't have to do this the hierarchy stuff that we do in here where we filter it down down to months we can just use this right here and that will be our month's column so now let's go over to our visualizations and we'll see how this looks really quickly we're not going to look at all of them but we will take a look at few of them so the first one that we can look at is age so let's look at the buyer ID and then we'll do age as well and so let's spread this out and we can see our distribution of our buyers so it looks like we have very few uh who are in the 10 range thank goodness and we can even put the age right under here under the age bins and we have this now we kind of have this drill down and so if we go right here and we drill down right there this will actually give us the breakdown so this is what it would have kind of looked like our visualization would have looked like if we had just kept it the age cuz now we're drilling down into the age and so it looks like we have one 18-year-old and maybe a 20-year-old as well um let's go back up yeah so it looks like we only have one buyer ID yes so there's only one 18year old so of legal age to start buying you know all these prepping equipment and probably uh buying online and stuff like that which makes sense right so uh this gives you kind of a quick breakdown in the bins rather than um doing it the alternative way so now let's take a look at the customer list as well as the unit sold and it looks like the best prepping store uh is actually performing much worse surprisingly uh than the worst prepping store and so I hope this gave you a really good idea of how to use bins and lists within powerbi thank you so much for watching if you like this video be sure to like And subscribe and check out all my other videos on powerbi I'll see you in the next [Music] video [Music] what's going on everybody welcome back to the powerbi tutorial Series today we're going to be taking a look at all types of [Music] visualizations now when you're working in powerbi there are a lot of different options to create visualizations and you may not always be sure which one to use and so that's what this video is for I'm going to walk you through a lot of the visualizations that I like and I use a lot as well as kind of point out some of the ones that I don't like as much so that you get kind of a feel for the ones that I think are really popular and that are used the most so without further Ado let's jump into powerbi and start taking a look all right before we jump into it there is a link in the description where you can get the data that we're going to be using for these visualizations if you want to practice them yourself before we actually get into it we do need to combine this and if you download that Excel and you see this you'll have to do the same thing all we have to say is that this product ID is the same as this product ID purchased and now we are good to go do one to many and it's okay if it's one way so right over here under this visualizations tab there are lots of different options and it can be a little bit overwhelming you don't really know which one to choose there are some in here that I have almost never used for my job ever so I'll Point those out as we go through but the main focus is going to be focusing on the ones that I do use that I have used and showing you how to actually create that visualization Maybe spice it up just a little bit but we have a lot of them to go through so let's jump right into it and the very first one that we're going to start with probably the easiest one and the one that you'll recognize the most is a stacked bar chart and what we going to do is go ahead right over here to the product name and we want this unit sold as well so we're going to click product name and it's going to go straight into the Y AIS for us and then we're going to click unit sold and that will go into the x-axis automatically it just kind of intuitively knows but sometimes it will make a mistake and then you can just fix it or flip it and we do want this uh let me make this much larger we do want this to be a little bit more colorcoded that is what this Legend is down here so what we're going to do is drag this product name down to the legend and now we have each product as its own color and in previous videos we have gone through and looked at some of these Visual and general options that you have when you're actually creating these visualizations but we're going to do some of them while we're in here as well so we're just going to go down here we're going to choose data labels and we're going to shrink that and if you go higher the higher you go the less you see so if you want all of them all the way down to the green we're going to go right about there and we're going to make it smaller so now we can go ahead and click anywhere outside of that visualization and now we can create a new one if we had just kept it like this where we were still interacting with this visualization and we clicked on a different one it would have then changed our visualization completely which we don't want so let's hit contrl Z click out of it and now we can create a new one let's go right over here to this 100% stacked column chart I'm going to click on it drag it over here and make it much larger and we're going to come right over here to this customer information and we're going to click on customer and then we're going to go up to unit sold and click on unit sold and we want to break these out and so basically what this is doing is it's breaking it out by each of these shops and we can see the total of what they're buying the units sold but we want to see exactly what products make up this percentage of this 100% so we're going to go right over here to product name we're going to drag that down to the legend and as you can see now we have each of these products and each of the products is up here so this backpack we can see the backpack right here backpack right here and right here and we can see which customer is buying what percentage of their purchases so for this prep for anything prep store they have a very large percentage 40% is duct tape so they're buying a lot of duct tape so really quickly we're able to see what clients are purchasing or which clients are purchasing what products the most so just like this Alex analyst apocalypse Preppers they're buying a lot of water purifiers we like drinking clean water um you know that's just what my audience likes and so you know we can easily get a quick glance of that again we're going to go in here I tend to like putting these data labels on here that's just what I preference so you know something like this it looks nice it looks clean um we can always go back and change these names which we'll do for this one so we're going to go over here go to title we'll go down to the text and we'll do customer oops customer purchase oh jeez breakdown pretend I'm really good at spelling and we're going to do it just like that we'll get out of there so now we have customer purchase breakdown and that looks really nice it's a good uh a good visualization and we're going to bring that right over here we're going to have a lot on the screen so I may have to uh make them smaller or larger to fit everything all right so let's go on to our next one another really common visualization is this one right here which is the line chart and the line chart is great especially when you're using things like dates I have found this one to be the best best and a lot of people use this as well so we're going to go right over here and click on date purchased and then units sold and on the x-axis you can see it's broken up by year quarter month and day so we don't want to do it that high level we only have three months of data in here so we're going to get rid of the year we're going to get rid of the quarter and then we at least have this and let's break it out because right now we're looking at all of the units sold so we're going to drag the product name right down here to the legend and now it breaks it out by the actual product and for each month in January February or March you can follow these products and see how they did in each of those months and if we wanted to we can come right over here to the filter on the product name and we could filter it by maybe the top three so let's do multi-tool survival knife the nylon rope and the duct tape and we can have it just like this and you know you can do those for any product that you want but again we just want to do it for those three just for an example and that really doesn't give us a ton of information we could even go down to the day and you know it might give us a little bit more information and so we'll keep it like that and we can go over here change the name as well we're not going to do this for all of them again we're just looking at the different types of visualizations I think are really good to know but we'll change this one as well to products purchased by date we'll keep it just like that again nothing fancy we're just trying to look at a bunch of different stuff so let's put this over here down here now let's click out of there and there are other ones in here um that are definitely useful and you absolutely can use um like this one is a stacked bar chart this one is a stacked column chart it's basically the same thing just a different orientation like we went to here it's just a different orientation it's the same thing um just like this clustered bar chart custom column chart it's just its orientation either horizontal or vertical then we have things like an area chart uh stacked area chart not really things that I've used too much in previous positions one that I have use though is a line and clustered column chart so it kind of combines a few of these with you know you have these bar charts as well as line charts into one visualization so let's look at this one because this is one that I have used several times in my actual job so for our x axis we'll use the product name then we'll look at something like the price and so let's make this a lot larger so you can actually see it so now we have the price and now we can look at something like the production cost and that can be our line ya AIS so now we're looking at the price of it how much someone is actually paying for it and then we're looking at how much it's costing us to actually produce that product and so really quickly at a glance you can kind of see that it's around the halfway to 2/3 point on most of these you can see that the production cost is always lower than the actual price because of course we're out here to make a profit on these products so let's minimize this one we're going to put this one right down here let's make it even smaller let's click out of that and the next one that we're going to take a look at is a scatter chart so let's click on that and make it much larger oops there we go so let's use the price and the production cost again and so our x axis is the price our y y AIS is the production cost but now we need to fill in this values right here so let's go over here and click on the product name and drag that into values and so now we have our values we just don't know what they are but we can see it so let's drag this down to Legend as well and it breaks it out and we kind of have this scatter plot and you know for this fake data that we're using it doesn't really show a lot U but if you're using real data you can definitely find outliers and Trends and patterns using this type of visualization let's go ahead and make that one small as well drag it right down into the corner now let's go right over here and we have the the dreaded pie charts um and dut chart now look I think it's kind of a joke in the data analyst Community about pie charts and doughnut charts but at the same time people use them and they request them and so sometimes you're going to use it whether you like it or not so let's click on the dut chart and let's make this one a lot larger and let's go over here and let's click on State and we're also going to click on total purchased and that's really all you have to do these ones are pretty straightforward you can change a few different things like where these labels are if you want them inside you can also do that and that would look totally fine um again I'm just not a super huge fan but you will get this one requested people like this and want to see it and the reason a lot of analysts don't like using this is because when you start glancing at these it's really hard to tell the difference between these sizes if you look at something like this you can easily see that this is larger like if you're looking at this one the multi-tool survival knife is obviously the longest and it gets shorter shorter shorter shorter but when you start getting in here it's really hard to approximate the size I would not be able to tell the difference between this 5.63 5.78 two uh 7.72 I would not be able to tell really the difference between these or or kind of the the difference between them very easily that's why a lot of people don't want to use them in general so again I want to show you this one because I think it's worth noting and worth knowing how to use but I don't really push people towards this because I don't think it's the best visualization available most of the time all right the next two are super easy but are used all the time uh maybe more than some of these even but they're just so easy to use so I'm kind of saved them for last this one is the card and all the card is is it displays one number or multiple numbers if you want to use a multi- card but we'll just look at the card for now all we're going to look at is the total purchased and it's just going to display it just like this and you can make it as large or as small as you'd like and normally it goes on like the top and you'll put card here a card here um just for example I'll kind of show you how this might look so it look something like this right and at the top it'll have different usually High overarching information and this is super common to see and I'm sure if you've looked at other people's visualization you'll see something like this this is usually totals or averages or something like that in here where it's super easy to look at so like right here this is total purchased and we can go in and look at the minimum and then we can go over here and this one can be account and so it gives us a lot of information just at a really quick glance and then we have all of our more in-depth colorful visualizations that kind of have more information than just a single piece like the card does and then the very last one that I'm going to show you is this one right here which is the table and this one is obviously extremely popular it's like an little Excel table and we can go in here and we can get the customer wherever that is and then we'll also get the unit sold and this is what it looks like and it's super easy and oftentimes you'll have it like on the side as well uh and all the other visualizations over here and so you know if we're going to take all these visualizations and pretend they were like a real thing you know there's a lot in here but we'll just kind of really quickly do this um you know we might have something like this and we'll make this larger and make this wider and you know we have a lot of information just in here and this is not a project so don't go put this on your portfolio I'm just threw a ton of random visualizations on you know this dashboard but you can already see a lot of these you most likely have seen in other people's work in other people's visualizations on LinkedIn or on YouTube these are very common very very popular and again we did not go through all of the ones over here there are maps that you can use but I haven't used Maps ever in my job there are things like gauges and decomposition trees and waterfall charts and uh tree maps and all these different things but I really have never used those in my actual job and I don't see them a lot in others people's work either otherwise I would be telling you to learn these and use these but again try them out see which ones you like if you like this video be sure to like And subscribe below and go check out all the other powerbi tutorial videos that I have on my channel and I will see you in the [Music] next what's going on everybody welcome back to the powerbi tutorial Series today we are going to be working on our final project now this is our final project of the powerbi tutorial Series so if you have not watched all of those videos leading up to this I recommend going and watching those videos so you can make sure that you know all the things that we're going to be looking at in today's project I am really excited to work on this project with you because I think it is a really good one and it uses real data that we collected about a month ago where I took a survey of data professionals and this is the raw data that we're going to be looking at and so I think it's just really interesting that we collected our own data and now we're using for a project we're going to transform the data using power query and then we're actually create the visualizations and finalize the dashboards as well as create a theme and a different color scheme to kind of make it a little bit more unique without further Ado let's jump onto my screen and get started with the project all right so before we jump into it I wanted to let you know that you can get the data below it is on my GitHub you can go and download this exact file that we're going to be looking at now in the past several projects we have been using this fake apocalypse data set you know it was fun it was you know what whatever this data set is real this is a real data set it was a survey that I took from data professionals I posted on LinkedIn and Twitter and all these other places and we had about 600 700 people who responded to the questions so before we actually get into it and start cleaning the data and doing all this stuff in powerbi I just wanted to show you the data all right so this is the CSV that I downloaded from the survey website that I used and this is completely raw data I haven't done anything to it at all let's go through the data really quickly and we'll kind of see what we have and we are not going to make any changes at all in Excel we're going to do all of our Transformations or at least a few transformations in powerbi because again this is a powerbi tutorial and project so I want you to kind of learn how to use that and not use Excel because you can go through my Excel tutorial if you want to do that so let's just look at it in Excel and then we'll move it over to powerbi and actually start transforming the data so we have this unique ID these are all the people that actually took it oops don't want to do that we have an email which this was completely Anonymous I didn't collect any data or user data on this then we have the date Taken um and let's get into the actual good information then we have all of these questions so we have question one which title fits you best and they can choose things now uh let's add a filter really quickly that we can look at this now you had the pre-selected ones which were like data analyst architect engineer but then there was an option where you could say other and you could spe specify what that was so if you look in here we're going to have all these different other please specify with different titles right and there were a lot of them now typically what you want to do is really clean this up and we're not going to be doing a ton ton ton of data cleaning but we are going to do some in powerbi but none in here but typically with this amount of data and the way that it's formatted we would do so much data cleaning um with this one I mean I mean there is a lot of work to be done um like this current year salary this is one that I would absolutely be cleaning up because it's ranges and it has a dash and a k and and all these numbers this is something that I would be cleaning up and using but we're not going to be cleaning this up right now so anyways let's just get into it let's see what questions we asked uh we have the yearly salary what industry do you work in favorite programming language then there were a lot of different options this is like one question where they picked multiple options so is how happy are you in your current position with the following you have your salary work life balance um then we have co-workers management upward Mobility learning new things um and they could rank it from zero to 10 so some people ranked upward Mobility a 10 some ranked it a zero or a one um and again they can answer however they want how difficult was it to break into Data very very difficult very easy um if you're looking for a new job we have you know what would you be looking for remote work better salary Etc we have male female which country you from and then this is more like demographics so if you're a male how old you are and this was in a Range so this is like a a a sliding bar so you could slide it to the exact age you had there's some people who are apparently 92 um which if that's true I mean good for you man or woman actually really quickly I'm going to see just just while we're here I'm going to see if this is a male male or a female oh it's a female from India very cool um so we have all this information and it is a lot of information when you have something like this I mean there is so much data cleaning that can be done I mean I already see like 20 plus different things that I would need to do to make this a lot better um and we also have date Taken and the time taken as as well as how long it they took on it like the time spent really just really interesting data but again this is a beginner tutorial Series this is the beginner project so we're not going to get do anything too crazy I will be using this exact data set in a future video doing a lot more data cleaning and creating a much more advanced visualization with what we have and what we're looking at right here but for this video we're just going to be doing a pretty simple visualization and D dashboard that you can use uh to practice with or put on your portfolio if you know that's where you're at right now so let's get out of here and let's put this into powerbi so let's exit out and let's come right over here to import data from Excel we'll click on powerbi final project and open give that a second doing this all in real time we only have the one so we'll do be we won't be practicing any joins or anything but we're not going to load it we're going to transform this data so let's put it into to power query editor and now we have all of our data in here and it should look extremely familiar now when I'm looking at this when I start looking at this information I kind of need to know beforehand what I want to get out of this do I need to clean every single column do I just need to clean a few of them do I need to get rid of columns that's kind of where my head's at and so right off the bat I can already tell you that there are columns that we can just delete to get out of our way so we're going to do that at the beginning so that we don't have to do that later on or they're just in our way so I'm going to click on browser and then I'm going to hit shift and I'm going to go over here to refer and I'm just going to go up here to remove columns and everything that we do is going to go over here to this applied steps if you've been following this series um you know we can remove things add things but anything we do will show up right over here so we can track it and go back if we need to now one column that I know for sure that I'm going to be using quite a bit is this which title fits you best in your current role because I I specifically wanted to do a breakdown of different people's roles and how much they make and different stuff like that so I know that I want to use this but as we saw before there's kind of the issue is is it's not very clean right it has data analyst data architect engineer scientist databased developer and then like a hundred different options and then a student or or none of these right um and so for the purpose of this video right here we are not going to take every single one of these options because this involves a lot more data cleaning let me give you an example this says software engineer this also says software engineer and with AI these two would typically be combined or standardized to software engineer but it's not very easy to do that in powerbi we could do that in Excel but not really in powerbi or even SQL if we pull this from a SQL database um and you can find lots of different you know options of that we have data manager and data manager if we separated these out these would be different options when we created our visualizations and we don't want that so what we are going to do uh and this is going to be kind of a an easy way out to just make sure that this is pretty clean and doesn't we don't have a thousand different options we're going to create this to other so we're to simplify this a lot and then we're going to use this so we'll have maybe six or seven options instead of the you know let's say 50 that we would have if we actually did the harder work which just break it out standardize it and clean it up that way so what we're going to do is we're going to click on this right here and we're going to go up here to split column in this ribbon up top we'll go to split column and we want to do it by a delimiter and if you notice let me see if I can move this over if you notice we have other and then we have this parenthesis and in no other option or way is there parenthesis so what we're going to do is we're going to use a custom and we're use this open parenthesis what that's going to do is it's going to separate it by this parenthesis it's going to leave the other it's going to create separate columns um just one separate column for each of these and we can do that at each occurrence or we can do the leftmost and we really we only need it for the leftmost because there's only one of these uh left-handed or left-sided uh brackets or or what is it whatever this is called and then let's go and click okay and it should create another column so it's going to have 0.1 Point 2 and now we have if we click on this now we only have these options we have analyst architect engineer data scientist database developer other and student looking or none that is what we want it makes it so much simpler and it's not perfect but again I'm trying to show you what we are able to do in powerbi so now we're just going to remove that column and we're going to go and do the exact same thing to this one as well because I know that we want to use this and I really wanted to use this one as well but if we look at this one also um there's a lot so I said what is your favorite programming language and people there were pre-selected answers like JavaScript Java C++ python R things like that and then there was an other option and in this other option I mean it was free text so they can fill it in as they want I mean there's four five six different ways that people put SQL that is something I would standardize and you know that would be the way I cleaned it but that's not how we did it in here so we're going to do the same thing we're going to keep that other so we're going to split this column again we're use a delimiter and for this delimiter though we're going to use a colon so we're going to say we're going to do a colon right there we'll just do the leftmost we'll click okay and then we have our options and it's much simpler now I really would have rather kept all these and because sql's in there quite a bit but you know a lot of people don't think SQL is even a programming language so uh we're going to delete that column now one that I just skipped and I kind of wanted to go back to is this current yearly salary I really want to use this let's see if we can use it I here's what I want to do with it and this is not perfect um for this video I want to try it what I want to do is break up these numbers 106 125 and then take the average of those numbers so then we'll use some docks in there so we'll take 106 125 create that into two separate columns then we'll create a third column that will give us the average of those two numbers so we'll do 106 plus 125 divided by two and then we'll have the average of that now that is not perfect but it's going to give us at least you know an average of kind of roundabout number because they gave us this range they said my salary is between 106 and 125,000 so if we say that their salary was 112,000 at least gives us it makes it usable it's a numeric value instead of being this which is text which we really we could use and and I'll show you how to do that because we're going to keep this column I'll create a copy of this and I'll show you the difference between this and using the average but for but for this data cleaning portion let's just try it let's see what we can do and see if we can make it work so first let's create a duplicate so we're going to uh duplicate the column so now we have this copy at the very very end and we can use this one instead of having to use the original way way way back here so we're going to leave that one how it is and we're going to use this one so let's go ahead and split this one up we're going to click on the column header then we're going to click on split column and we'll do it by digit to non-digit and if you look at it right here it's broken it out kind of um in the fact that now in this one we just have numeric values and in this one we have k- numeric or just Dash numeric and now this can be easily cleaned whereas this one we can just completely get rid of because it's only K so we'll just remove that column and then in this one we're going to rightclick we're going to click on replace values and so if it just has we're just do a k we'll replace with nothing we'll do okay and then for the last one we'll go to replace values and we'll do the dash or the minus sign and we'll place that with nothing and so now we have our values as well oh we also have a plus let me get rid of that because that's when some people had 250 or 225,000 plus so for that one the average is just going to be 225 we'll have to specify that in our dock I forgot but actually if somebody has 225 let me find this plus really quick uh let me filter by it because that's a lot faster what we actually want to do for the purpose of this one is we want to put 225 here so that when we do 225 plus 225 divide by two it comes out to 225 that's just what we're going to put it as and there's only two people so uh I'm actually going to replace this I'm going to do replace values I'm G to say Plus with 225 and we'll click okay awesome we can unfilter these select all so we're going to go right up here to add column we're going to say custom column and we're going to go right over here actually let's make it uh average salary let's make it average salary so we're going to insert this I'm going to say parentheses and we're going to say plus this insert and close the parenthesis divided by two and it says no syntax errors have been detected let's click on okay and it's giving us an error so it's saying we cannot apply operator plus to types text and text which makes perfect sense these aren't uh numbers so let's make it a whole number and let's make it a whole number and then let's see if this will actually work no or maybe we just need to try a whole another one so let's try transform or add column custom column let's try this all again see if uh I can make it work insert do this one plus this one and we'll do divid by two and let's try this one and there we go so now let's get rid of this column columns and we can actually remove these ones as well because now we have this um average salary column which when we look at this or when we use this uh we can let me see if I can just move this way way way over all right I might cut because this is taking forever so if you take the average of these two numbers you'll get 53 if you take the average of 0 and 40 you'll get 20 so now we have this average salary and again when we get to the actual visualization part I'll show you why this isn't as useful as having this average salary and just a reminder this is not perfect uh I wouldn't typically do this especially if I had it in Excel or if I was you know creating this survey in a different way I would probably have a very specific value where they could do it on a slider but this is how it is so we've at least made it usable or more usable in my mind and we have a few other things that we can change like what industry do you work in where we can break this one out so I'm going to go ahead and break this one out as well as this one right here which country do you live in I'm going to break bro both of those out to where it's the country or other I'm not going to have these other values although there are a lot of them because there's a lot of people who live in these different countries but we can't really do that super well in here because again the same issue kept happening Argentina Argentina Argentine a Australia so we can't normalize those values unless we spend just a copious amount of time doing that so I'm going to go ahead and do these I'm going to fast I'm going to fast speed this so it goes a lot faster so I'm just going to go silent and let this happen really quick and then we'll get to the end and we'll actually start building our visualizations all right so we've split them up and as you can see we have all the these options as well as other and I think you know there is let me tell you there is so much more that we could do with this I mean just so many other things but this is like what the bare minimum of what we need for this project so let's go ahead and close and apply this and if we need to come back at any point and actually fix anything or change anything we can so it's not like that's permanent um so as you can see we have everything over here we have all our data as it is transformed in here as well and now we can start building out our visualization let's go back to our report and let's start building something out all right so let's add a title to our dashboard we want to make this right at the top we call this the data professional survey breakdown and let's make make that quite a bit larger make it bold why not and we'll put that in the center and now let's um let's add some effects let's change that background to something like it's too dark something like this and I do not like that Boldt let's take that off there we go so something like this just as a quick title to what we're about to do what we are about to build so we're going to start off with the most simple visualizations that we're going to do and we'll kind of work our way towards kind of the harder ones so the first one that we're going to start off with is a card and the cards are obviously like just super super easy they usually just display one piece of information so we're going to go right over here to the very bottom at the unique ID and we're going to select it and we're going to say a account of distinct or account it doesn't matter um it says 630 count of unique ID now we're not going to keep that as is we're actually going to go right over here we're going to say rename for this Visual and it says count of unique ID but we're going to say count of survey takers and you can say whatever you want here but in in general that is what it is we're we're counting how many people um you know took this survey and that's just a kind of a total maybe I should say total amount or of survey takers but you can say count of survey takers how many people took this survey so let's click out of there let's click on card let's make it about the same size we're going to drag it up here and try to make them about the same we will in a little bit we'll make them the same size um but for this one we're going to look at age so we're going to look at current age so I'm going click on that and we'll say want the average age so our average age taker is almost 30 years old so let's go right over here we're going to say rename for this visual we'll say a average age of survey oop this might be too long average age of survey taker again name it whatever you'd like so again these are meant to be highlevel numbers so when somebody's looking at your dashboard they can just really quickly glance at this and know exactly what it is instead of like some of these other visualizations that we're about to create they don't really have to dig into it look at the x- axis the y axis the the different uh Legend colors and whatnot they can just see these high numbers and get a really quick glance of the data now let's create our first visualization and what we're going to do for that one is a clustered bar chart so let's go ahead and click on the clustered bar chart we can create as small or as large as we'd like and for this one we're going to be looking at the job titles now remember we kind of changed the job titles or you know U transform those if you want to say that so we're going to look at Job titles and then we're going to look at their average salary and if you remember we transformed that one as well we have a average salary now this one is it looks like a text right now so it may not work properly and what we're actually going to do is go over here I want to see the average salary so let's click on average salary and see if we can change this data type from a text to a decimal number let's click yes I forgot to do that when we were transforming it and there we go this is perfect um so now we can go back and we can select our average salary and as you can see it has this um this function symbol and so now we can click on it and it'll look a lot better and although this says average salary as the title it's actually doing a count or the sum so we can click average right here and what we want to do is actually break this down by the job title and so now we can see data scientists are making the most by far far they're making average of 93,000 at least from the survey takers that took it then we have our data Engineers making 65,000 data Architects are making 63 and then where the data analysts data analysts are right here making 55 so again we had 630 people take this survey and so the vast majority of them were data analysts so this one's probably the most accurate out of all of them and I actually don't like how this looks as the cluster bar chart let's try the stocked bar chart and put this as the legend that's more what I was going for I don't know I didn't want as skinny because when you're doing this one it typically they have multiple options per um uh x axis and so I think that's why it was that little skinny line but this one is more what I was looking for but let's make that smaller and let's definitely change that title because good night um this is like incredibly long let's go over here to this format visual ual we'll go to the general the title and we're just going to say average salary by job title just like that and this looks a lot better now we're not going to kind of format all our whole dashboard yet we're going to create our visualizations and then we're going to kind of organize everything and kind of play Tetris with it to make it look the best so we're just going to minimize this and put it right up here for now um but we will go back and kind of make everything look better at the end and actually while we're here I also want to change this as well so rename for this we're going to say job title Oops why did I do that job title and for this one we're just going to say name average salary there we go looks much better much cleaner uh took away a lot of the anxiety that I was feeling about two minutes ago when we first put that up there so let's go on to our second visualization the next one that I'm interested in is actually what programming language people were using the most so we have salary there's a thousand different things we can look at in here but I want to know you know what is people's favorite programming language so let's take a look at that so we have favorite programming language let's find that so we have our favorite programming language and we also have how many people actually took it or the unique people so right now this is columns we don't want that let's um let's do a clustered column chart click on this right here and it looks like here we go that is kind of what we're looking for and instead of count of unique ID we'll say count of let's do count of Voters and for favorite program language we'll say favorite oops favorite programming language and get rid of that as well and then we're going to go into here also and change the title and say favorite programming languages or favorite pro programming language just like this now let's make this a lot bigger so you can see it but really quickly at a glance you can see python is by far the most popular are other C++ JavaScript Java now all we're seeing is the count so it's all the same it's just blue we can see how many people voted for each one but if we wanted to break it out similar to how we did with the job titles we could still do that so all we'd have to do is break it out uh or bring this job title down to the legend and now breaks out like this and that's not exactly what I was going for I was going more for something like this where we can see the still the whole count but now we can see who is actually V voting for these things so I'm just not a huge fan of the colors that are pre-selected here and kind of the whole theme of this dashboard at the very end we're going to completely revamp this change a bunch of colors the background and make this look a lot nicer rather than just the white background like we have it um and so for now let's just make this a lot smaller and put it into this corner these will not be staying there but we need to we need room to create our next visualizations and just just a cleaner space to do things now the next thing that I really want to include is a way to break down where they're from their country because especially something like salary is very dependent on your country whereas the average salary in the United States for a data analyst may be like 60,000 in another country it could be 20,000 that could bring down the average quite a bit so we need a way to be able to break that down now we can do something like a filled map and there's no problem with that at all um but you know for what we're building what we're creating it's not probably going to work out the best I mean this looks okay we could stick it in the corner or something um and you can do that and that's perfectly fine I think what I'm going to do is something like a tree map which I don't use a lot but I want something where they can just click on it they can look at the values distinct they can look at the values and just click on it and it'll be right there for them so they don't have to filter it out on their own or no geography and look at this map they can just read Canada other United Kingdom India United States and click on that and so for example let's click over here on United States the numbers change quite a bit now the average salary for a data scientist is 139,000 for data analyst it's 80 and if we look at India you know the average salary for a data scientist is 68 the average salary is 26 for a data analyst that doesn't mean that they make less money in India that just means that the cost of living is probably lower in India therefore they don't need the higher US Dollars salary because again this was all done in US dollars so just something to think about uh let's click out of that so we'll keep that one as well so now let's create our next visualization and this is one that I do not get to use enough in my actual job so we're going to use it in this project um and it's going to be this gauge right here so let's add that one put it right over here we're going to add two of those let's just go ahead and add another one while we're at it because we're going to have them kind of like right here right next to each other the first one and these ones are really good for kind of looking at these kind of surveys and I don't get to work with surveys enough but we can see you know how happy are they in terms of work life balance so we can add that we're going to add work life balance um and right now it's doing a count and we don't have minimum or maximum values in there yet so it's going to look kind of weird but we're going to look at the average rate or the the average score of these then we're going to pull this over to the minimum value and we want to put that at the minimum and pull this over and add the maximum value so now it actually has zero to 10 and it shows that the average person is happy with which one was this their average person is happy with their work life balance uh they rate about a 5.74 overall now let's really quickly change the title of this because this is ridiculous I want to say happy with work life balance so this is their rating uh you know change it to whatever title you want that's what I'm going to do and we'll also do happy with their salary let's click on salary We'll add that to minimum and we'll add the maximum value as well to make sure that we know how to use that and then we'll take the average so not many people are happy with their salary I'm just finding out I mean this is a real survey this is real data so I mean it's h pretty interesting let's go to the title let's go to happy with or maybe it's happiness happiness with salary maybe that's what we should make it and I'm going to change that over here as well I think it sounds better some of this I've already planned out some I haven't this is not something I've planned out so uh so we're going to say happiness with work life balance happiness with salary really interesting um we may go back and tweak these just a little bit in the future but the very last visualization that we're going to do is male versus female kind to got to have that in there um I don't typically like pie charts and dut charts but uh you know I'm feeling I'm just feeling it so let's try it um and we will do let see let's make this larger so we have male female and what do we want to look at like what do we want to measure so we have male versus female we can measure anything um but maybe what we'll do is the average salary again I mean we've kind of only looked at salary once in this one right here um and a little bit of like how happy they are but we'll look at the average salary between males and females and then we'll look at not the current age Oops I meant average salary and then we'll look at the average and it looks like the average salary is actually really close versus males versus females 55 for female versus 53 for male so actually the females are a little bit higher congratulations so they're just a little bit higher in terms of pay so now we need to start organizing all of this cleaning it up making it look a lot better than it does right now it looks great uh you know but we can do a lot more with this so I'm gonna we're we're going to keep these or all these kind of over on this left hand side I'm GNA put this I want this up here we also need to change that title I want this up here um and again we're going to kind of change the theme as we go I I just want to format it right we'll have it just like this let's change the title of this let's go to title and we're going to say country of survey takers uh I'm not the the survey takers I'm not really stuck on that if you find something better you think of something better I would go with that but um you know it definitely doesn't look bad and where did this where did my other visualization go there goes um I think this one I want to make kind of more tall um so I might move it this way jeez this is such a I hate I hate having a lot of visualizations on here it just really is annoying to me so what we're going to do I think we're gonna step this to the side put this to the side as well I want to make it to where it's just okay I didn't want it to cut off we'll do that might make these um make these a little bigger actually so I want it to kind of match the size like right there I'll match this perfect this one I kind of want to bring over here and bring it down a little bit maybe something like this maybe I'm not sure I'm not I'm not sold on that um I added a few different visualizations that I didn't have in my original so now I'm kind of having to do this on the fly so I might fast forward some of the parts where I'm like really thinking about it or taking too much time on it but I'm going to bring this down a little bit actually because I don't like how close that is to um the the text above it but one thing we do need to do I'm going to put this up kind of like this I think that looks fine I think I'm going to put this at the very bottom so let's make some room for it all right just like that stretch it to the side and we'll lower it and I think we'll keep that as is kind of like this um okay there's a lot going on in here and there are some things I'm just noticing as we're walking through this that I kind of missed um like I need to change some titles and stuff like that so let me go ahead and change some of those things so we're going to do title do average salary by gender or by sex do like that average salary by sex I also don't like that it's in the middle um I don't like that it's on the outside I want them on the inside for this so let's go to the details let's go to inside and see if that looks any better oh that looks terrible um let me see if I can change that maybe I don't no I definitely want it um I guess we'll do outside I you can't even see the information oh the decimal is crazy long um let me go and see if I can change that decimal to just like a whole number or like 1.1 uh because that's a problem so maybe I need to go over here to the value all right so I think I want to change this one it's just not working out exactly how I wanted and you guys know if I make mistakes I'm going to keep it in here so you guys can see it I I hoped that this was going to turn out better but it didn't um one that I do want to add because this is kind of a a breakdown and a nice visualization I want to add this difficulty piece so I want to add this how difficult was it for you to break into data science let's get rid of these and I want to click on this really quickly see what it gives us um values okay so now this shows us percentages um of how easy it was again it's neither easy nor difficult difficult easy very difficult very easy these numbers make absolutely no sense we need to kind of order them a little better so I'm going to come over here to slices we have our colors over here we want very difficult to be like the most difficult um so we're going to make that red and then we want difficult to be maybe like an orange let see if we can find an orange there we have an orange this does not look red enough there we go oh no no no very difficult is red difficult is orange we have neither easy nor difficult and that's kind of a neutral um let's see if we have something neutral in here kind of like this yellow I don't know let's try it out then we have easy and very easy and these will be like our Blues so I'm going to keep that um I'm going to keep that kind of like a dark blueish and then our blue for super easy is just going to be like really blue U and that doesn't look bad the I mean look I'm I'm not a color person I I'm not great with colors and we're going to kind of organize this in just a little bit but this looks better to me um but we need to change up some stuff as well like the title need to do difficulty to break into Data there we go and we're also going to change this title right here we're just say difficulty difficulty difficulty this looks better to me um again not perfect and there's a thousand different things you could have done but that's just what we're going to do I need to go through here and see what I need to change so right off the bat I can see I need to change this um to let's see right here I'm going to rename this job title just like we did in this one right here uh count of Voters that's fine progr language breaking into difficulty happiness happiness average count okay okay so what we have here is very close to a finished product now it's not 100% complete I mean I I do want to make it look a little nicer rather than just the typical white so what we're gonna do we're GNA go up here we'll go to uh what is it View and we have all these different filters and we're just going to play around with it see if we can find something that we like um this doesn't look too bad it's not really my style we can do this one Frontier this is pretty neat I kind of am digging this we might come back to it I like the natural tones I don't know why I said tones like that but I did um this one's not bad but I don't I don't it's not that's not my I don't like how dark that is um and so maybe it's like you know we change like the background color of all of these as well as match it with um match it with something else whatever you want genuinely you customize this however you want I kind of like this one it's kind of groovy man and um it's not perfect by any means but what we can do and we can customize this current theme we can come in here customize this theme however we'd like I personally don't want color five which is the data analyst color I don't like it to I don't want to go go and change it because I don't like it but I don't really like that color per se you know I might want to choose a different color um but it has to be like this muted like that it has a style to it so you can come in here and you can customize this and make it however you'd like and and really mess around with it play play around with it for me uh I'm just going to keep it how it is because I don't really want to mess with it and break it or anything like that so U let me just put that up just a tiny bit so this is it this is the project I hope that it was helpful um I am not joking when I say that I'm because I'm gonna do a different project I'm gonna go really in depth in another project it's probably gonna be like a two-hour project it's going to be crazy long um well for a YouTube video but I can see doing thousand different things with this data creating a really great dashboard really cleaning the data which is a large part of of actually doing this and we didn't do much data cleaning at all there's just so much you can do with this and so really dig into this see what you like see what you don't like see what you want to clean what you don't want to clean you could put it in SQL you could put it in um Excel and just and just standardize the data to make it a lot more usable do whatever you want with it I mean I I took this survey for you guys that we could use it so go out and use it and make the best dashboard that you can possibly do so I hope that this was helpful I hope that you enjoyed this thank you so much for watching this video If you like this thank you so much for watching if you like this video be sure to like And subscribe below and I'll see you in the next [Music] video what's going on everybody welcome back to another video today we're going to be starting our Python tutorial [Music] series now I am extremely excited for this series we're going to be walking through all the things that you need to know to get started in Python we'll be looking at variables data types for Loops y Loops operators and a ton more after this beginner series we're going to be going into another set of Series where we look at pandas mat plat lib Seaborn web scraping and more now in this video we're just going to be setting up our environment to where we can learn python in future videos in this series we're going to be using jupyter notebooks for all of our tutorials because I feel like it's a really great place to learn the basics but then in future videos I'll show you different idees that you can use for your python code I genuinely cannot wait to get started on this series I absolutely love python so without further Ado let's jump on my screen I'm going to show you how to install jupyter notebooks all right so let's get started by downloading anaconda anaconda is an open- Source distribution of python and R products so within Anaconda is our Jupiter notebooks as well as a lot of other things but we're going to be using it for our Jupiter notebooks so let's go right down here and if I hit download it's going to download for me because I'm on Windows but if you want additional installers if you're running on Mac or Linux then you can get those all right here now if you are running on Windows just make sure to check your system to see if it's a 32bit or a 64 you can go into your about in your system settings to find that information I'm going to click on this 64 bit it's going to pop up on my screen right here and I'm going to click save now it's going to start downloading it it says it could take a little while but honestly it's going to take probably about 2 to three minutes and then we'll get going now that it's done I'm just going to click on it and it's going to pull up this window right here we are just going to click next because we want to install it this is our license agreement you can read through this if you would like I will not I'm just going to click I agree now we can select our installation type and you can either select it for just me or if you have multiple admin or users on one laptop you can do that as well for me it's just me so I'm going to use this one as it recommends now it's going to show you where it's installing it on your computer this is the actual file path it's going to take about 3.5 gigs of space I have plenty of space but make sure you have enough space and then once you do you can come right over here to next and now we can do some Advanced options we can add Anaconda 3 to my path environment variable and when you're using python you typically have a default path with whatever python IDE or notebook that you're using I use a lot of Visual Studio code so if I do this I'm worried it might mess something up so I am not going to do this it also says it doesn't recommend it again messing with these paths is kind of something that you might want to do once you know more about python so I don't really recommend you having this checked we can also register in AA 3 as my default python 3.9 you can do this one and I'm to keep it this way just so I have the exact same settings as you do so let's go ahead and click install and now it is going to actually install this on your computer now once that's complete we can hit next and now we're going to hit next again and finally we're going to hit finish but if you want to you can have this tutorial and this getting started with Anaconda I don't want either of them because I don't need them but if you would like to have those keep those checked and you can get those let's click finish now let's go down and and we're going to search for Anaconda and it'll say Anaconda Navigator and we're going to click on that and it should open up for us so this is what you should be seeing on your screen this is the Anaconda Navigator and this is where that distribution of python and R is going to be so we have a lot of different options in here and some of them may look familiar we have things like Visual Studio code spider our studio and then right up here we have our Jupiter notebooks and this is what work we're going to be using throughout our tutorials so let's go ahead and click on launch and this is what should kind of pop up on your screen now I've been using this a lot um so I have a ton of notebooks and files in here but if you are just now seeing this it might be completely blank or just have some you know default folders in here but this is where we're going to open up a new Jupiter notebook where we can write code and all the things that we're going to be learning in future tutorials and you can use this area to save things and create folders and organize everything if you already have some notebooks from previous projects or something you can upload them here but what we're going to do is go right to this new we're going to click on the drop down and we're going to open up a Python 3 kernel and so we're going to open this up right here now right here is where we're going to be spending 99% of our time in future videos this is where we're going to write all of our code so right here is a cell and this is where we can type things so I can say print I can do the famous hello world and then I'll run that by clicking shift enter and this is where all of our code is going to go these are called cells so each one of these are a cell and we have a ton of stuff up here and I'm going to get to that in just a second one thing I wanted to show you is that you don't only have to write code here you can also do something called markdown and so markdown is its own kind of you could say language but um it's just a different way of writing especially within a notebook so all we're going to do is do this little hashtag and actually I think it's a pound sign but I'm G to call it hashtag we're going to do that and we're going to say first notebook and then if I run that we have our first notebook and we can make little comments and little notes like that that don't actually run any code they just kind of organize things for us and I'm going to do that in a lot of our future videos so just want to show you how to do that now let's look right up here a lot of these things are pretty important uh one of the first things that's really important is actually saving this so let's say we wanted to change the title to I'm going to do a AA because I want it to be at the beginning um so I can show you this I'm do AA a new notebook and I'm going to rename it and then I'm going to save that so if I go right back over here you can see AAA new notebook that green means that it's currently running and when I say running I mean right up here and if we wanted to we go ahead and shut that down which means it wouldn't run the code anymore and then we'd have to run up a new cluster uh so let's go ahead and do that I didn't plan on doing that but let's do it so we have no notebooks running and right here it says we have a dead kernel so this was our Python 3 kernel and now since I stopped it it's no longer processing anything so let's go ahead and say try restarting now and it says kernel is ready so it's back up and running and we're good to go the next thing is this button right here now this is an insert cell below so if I have a lot of code I know I'm going to be writing I can click a lot of that and I often do that because I just don't like having to do that all the time so I make a bunch of cells just so I can use them you can delete cells so say we have some code here we'll say here and we have code here and then we have this empty cell right here we can just get rid of that by doing this cut selected cells we can also copy selected cells so if I hit copy selected cells and I can go right here and say paste selected cells and as you can see it pasted that exact same cell you can also move this up and down so I can actually take this one and say I wanted it in this location I can take this cell and move it up or I can move it down and that's just an easy way to kind of organize it instead of having to like copy this and moving it right down here and pasting it you can just take this cell and move it up which is really nice now earlier when I ran this code right here I hit shift enter you can also run and it'll run the cell below so you can hit run and it works properly if you're running a script and it's taking forever and it's not working properly at least it's you don't think it's working properly you can stop that by doing this interrupt the kernel right here and anything you're trying to do within this kernel if it's just not working properly it'll stop it you can restart it then you can try fixing your code you can also hit this button if you want to restart your kernel and this button if you want to restart the kernel and then rerun the entire notebook as we talked about just a second ago we have our code and our markdown code we're not going to talk about either of these because we're not going to use that throughout the entire series the next thing I want to show you is right up here if you open this file we can create a new notebook we can open an existing notebook we can copy it save it rename it all that good stuff we can also edit it so a lot of these things that we were talking about you can cut the cells and copy the cells using these shortcuts if you would like to we also go to view and you can toggle a lot of these things if you would like to which just means it'll show it or not show it depending on what you want so if we toggle this toolbar it'll take away the toolbar for us or if we go back and we toggle the toolbar we can bring it back we can also insert a few different things like inserting a cell above or a cell below so instead of saying This plus button you can just say A or B adding above or below we also have the cell in which we can run our cells or run all of them or all above or all below and then we have our kernels right here which we were talking about earlier where we can interrupt it and restart those there are widgets we're not going to be looking at any widgets in this series but if it's something you're interested in you can definitely do that and then we have help so if you are looking for some help on any of these things especially some of these references which are really nice you can use those and you can also edit your own keyboard shortcuts and now that we walked through all of that you now have anacon and jupyter notebooks installed on your computer in future videos this is where we're going to be writing all of our python code so be sure to check those out so we can learn python together thank you guys so much for watching I hope you were able to get everything installed correctly I am super excited for this series ahead of us if you like this video be sure to like And subscribe below and I will see you in the next [Music] video [Music] hello everybody today we're going to be learning about variables in Python a variable is basically just a container for storing data values so you'll take a value like a number or a string you can assign it to a variable and then the variable will carry and contain whatever you put into it so for example let's go right over here we're going to say x and this is going to be our variable we're going to say is equal to now we can assign the value to it so let's say I want to put 22 x is now equal to 22 so we won't have to write out the number 22 in later scripts that we write we can just say x because X is equal to 22 it now contains that number so now we can hit enter and say print we do an open parentheses and we'll say x now I'm going to hit shift enter and now it prints out that 22 because we are printing x and x is equal 22 this is our value and this is our variable one really great thing about variables is that it assigns its own data type it's going to automatically do this so we didn't have to go and tell X that it's an integer it just automatically knew that 22 is a number so we can check that by saying type and then open parenthesis and writing X and we'll do shift enter again and this says that X is an integer type now we only assigned an integer to X let's try assigning a string value or some text to a variable so we'll say Y is equal to uh let's say mint chocolate chip I'm feeling some ice cream today so we'll say mint chocolate chip now if we print that again we'll do print open parenthesis Y and do shift enter it'll print mint chocolate chip and if we look at the type we can see that the type is a string this time and not an integer now again we did not tell it that X was an integer and Y was a string it just automatically knew this let's go up here really quickly we're going to add several rows in here because we're about to write a lot of different variables and really learn in- depth how to use variables the next thing to know about variables is that you can overwrite previous variables right now we have mint chocolate chip and that is assigned to the variable y so if I go down here I say print y I hit shift enter it's going to print out mint chocolate chip but if I go right above it I say Y is equal to and let's say chocolate if I print that out it's now going to say chocolate whereas up here I'm reassigning it to Y it's still going to say mint chocolate chip so if I come right down here and I copy this and I'm going to paste this right here initially it is going to assign y to Chocolate but then right here it will automatically overwrite y as mint chocolate chip and when we hit shift enter it's going to show mint chocolate chip variables are also case sensitive so if I come up here and I say a capital Y this is a lowercase Y and this is a capital Y it is going to print out the correct one instead of mint chocolate chip and then if I go down here to the print and I type the capital Y it will give us the mint chocolate chip up till now we've only assigned one value to one variable but we can actually assign multiple values to multiple variables so let's do X comma y comma Z is equal to and now we can assign multiple values to all of those so we can say chocolate and then we'll do a comma oops a comma then we can say vanilla and then we'll do another comma and we'll say rocky road now this is going to assign chocolate to X vanilla to Y and Rocky Road to Z so what we can do is we'll say print and we'll go print print print and we'll say X Y and Z so it prints out chocolate vanilla and rocky road and these are our three different values we can also assign multiple variables to one value and we can do this by saying X is equal to Y is equal to Z is equal to and we can put whatever we would like let's do root beer float then we'll come back up here we'll copy this and let's print off our X our Y and Z and they are all the exact same now so far we've really only looked at integers and strings but you can assign things like lists dictionaries tupal and sets all to variables as well so let's go right down here so let's create our very first list I'm going to say icore cream is equal to and that is our variable right there the ice cream is our variable so now we're going to do an Open Bracket like this and we're going to come up here and copy all of these values and we're going to stick it within our list so now within ice cream we have three string values chocolate vanilla and rocky road all within this list so what we can do is we can say x comma y comma Z is equal to icore cream so so now these three values chocolate vanilla and rocky road will be assigned to these three variables X Y and Z and we can copy this print up here and we'll hit shift enter and now the X Y and Z all were assigned these values of chocolate vanilla and rocky road now something that we just did which is really important or something that you really need to consider is how you name your variables so right here we have ice cream now this to me is exactly how I usually write my variables but there are many different ways that you can write your variables so let's take a look at that really quickly and let's add just a few more because I have a feeling we're going to go a little bit longer than what we have so there are a few best practices for naming variables first I'm going to show you kind of what a lot of people will do I'll show you some good practices and I'm going to show you some bad practices as well that you should avoid doing the first thing that we're going to look at is something called camel case and let's say we want to name it t test variable case oops case now if we have a test variable case the camel case is going to look like this we'll have lowercase test and then we'll have uppercase variable and uppercase case is equal to this is what this variable is going to look like and we can assign it a nilla swirl and this is what your camel case will look like it's going to be lowercase and then all the rest of those uh compound words or however you want to say that these letters are going to be capitalized to kind of separate where the words end and begin let's go right down here we're going to copy this the next one is called Pascal case so Pascal case is going to look just a little bit different instead of the lowercase at test it's going to be a capital T in test so test variable case again this is a very similar way of writing it very similar to camel case U but just a capital at the beginning now let's look at the last one and this one is my personal favorite this one is going to be the snake case now this one is quite a bit different in the fact that you don't use any capital letters and you separate everything using underscore so we're going to write testore variable underscore case now typically let me have them all in there typically these are the best practices these are what you typically want to do but probably the best one to to use is this snake case right here what a lot of people say is that it improves readability if you take a look at either the camel case or the Pascal case which you will see people do it's not as easy to distinguish exactly what it says and the name of a variable is important because you can gain information from it if people name them appropriately so when I'm naming variables I usually write it in snake case because I just find it a lot easier to read because each word is broken up by this underscore score so now let's look at some good variable names these are all ones that you can use or could use let's do something like test VAR so test VAR is completely appropriate we can also do something like testore VAR oops underscore we could do underscore test underscore VAR you'll see that often as well well people will start it with an underscore you can do test bar capital T oops capital T capital V in test VAR or you could even do something like test VAR two now adding a number to your variable is not inherently a Bad Thing usually it's semif fround upon but there are definitely some use cases where you can use it but one thing that you cannot do is do something like putting the two at the front if you put the two at the front it no longer works it won't run properly at all so we're going to take that out so we can't do that so I'm going to use this as an example of what you should not do you also can't use a dash so something like test- var2 that doesn't work either and you also can't use something like a space or a comma or really any kind of symbol like a period or a backslash or equal sign none of those things will work within your variable now another thing that you can do within your variable is use the plus sign so let's assign this we'll say x is equal to and we'll do a string we'll say ice cream is my favorite and then we'll do a plus sign and we'll say period now what this will do is it will literally add these two strings together so let's do print and we'll do X so now it says ice cream is my favorite one thing that we cannot do in a variable is we cannot add a string and a number or an integer so we can't do ice cream as my favorite two if we try to do that it will give us this error right here so in this error it's saying you can only concatenate a string not an integer to a string so only a string plus a string for this example you can also do and we'll say x is equal to or we'll say y we'll say Y is equal to 3 + 2 and it should output five because you can also do an integer and an integer now so far we've only been outputting one variable in the print statement but you can actually add multiple variables within a print statement so let's go right down here we're going to say let's give it some more right there so we'll say x is equal to ice cream and we'll say Y is equal to is and then the last one Z is equal to my favorite and we'll do a period at the end now we can go to the bottom and we can say print x + y + C and when we enter that and when we run and when we run that we get ice cream is my favorite now we can actually add a space before is a space before my and when we hit shift enter it says ice cream is my favorite you can also do this exact same thing with numbers as well so we'll say x = to 1 2 and what Z is equal to three so this should equal six now one thing that we tried to do was assign to one variable a string plus an integer and that did not work but what you can do is you can take something like this and you can say ice cream and we'll get rid of this one and we'll get rid of the Z now saying plus is actually not going to work let's try running this so again we can't concatenate these but what we can do in the print statement is we can separate it by a comma so when we add this comma it should work properly let's hit enter and it says ice cream 2 again this makes no sense but you are able to combine a string and an integer separating by a comma now this is the meat and potatoes of variables there are some other things as well but some of those things are a little bit more advanced and not something I wanted to cover in this tutorial although we may be looking at some of those things in future tutorials but this is definitely the basics what you really really need to know about variables I hope that this video was helpful if it was be sure to like And subscribe below and I will see you in the next [Music] video hello everybody today we're going to be talking about data types in Python data types are the classification of the data that you are storing these classifications tell you what operations can be performed on your data we're going to be looking at the main data types within python including numeric sequence type set Boolean and dictionary so let's get started actually writing some of this out and first let's look at numeric there are three different types of numeric data types we have integers float and complex numbers let's take a look at integers an integer is basically just a whole number whether it's positive or negative so an integer could be a 12 and we can check that by saying type we'll do an open parenthesis and a Clos parenthesis and if we say the type of 12 it's going to give us an integer or if we say a -2 that is also an integer we can also perform basic calculations like -2 + 100 and that'll tell us it is also an integer so whether it's just a static value or you're performing an operation on it it's still going to be that data type if those numbers are whole numbers whether negative or positive now let's take this exact one and let's say 12 and we'll do+ 10.25 when we run this it's no longer going to be a whole number it'll now be a float so let's check this and now this is a float type because is no longer a whole number it's now a decimal number and the last data type within the numeric data type is called complex let's copy this right down here now personally this is not one that I've used almost ever but it is one just worth noting so you can do 12 plus and let's say 3 J and if we do this it's going to give us a complex the complex data type is used for imaginary numbers for me it's not often used but if you do use it J is used as that imaginary number if you use something like C or any other number it's going to give you an error J is the only one that will work with it now let's take a look at Boolean values so we'll say Boolean the Boolean data type only has two built-in values either true or false so let's go right down here and say type true and when we run this it'll say bu which stands for Boolean we can do the exact same thing with false that is also Boolean and this can be used with something like a comparison operator so let's say 1 is greater than 5 and let's check this this is giving us a Boolean because it's telling us whether one is greater than five let's bring that right down here this will give us a false so it's telling us that one is not greater than five and just as we got a false we can say 1 is equal to one and this should give us a true so now let's take a look at our sequence type data types and that includes strings lists and tupal let's start off by looking at strings in Python strings are arrays of bytes representing Unicode characters when you're using strings you put them either in a single quote a double quote or a trible quote I call them apostrophes it's just what I was raised to call them but most people who use Python call them quotes so right here we have a single quote and that works well we can do a double quote and that works also and as you can see they are the exact same output and then we have a triple quote just like this and this is called a multi-line so we can write on multiple lines here so let's write a nice little poem so we'll say the ice cream vanquished my longing for sweets upon this diet I look away it no longer exists on this day and then if we run that it's going to look a little bit weird it's basically giving us the raw text which is completely fine but let's call this a multi-line and we're going to call this a variable multi-line and we're going to come down here and say print and before I run this I have to make sure that this is Ran So now let's print out our multi-line and now we have our nice little poem right down here now something to know about these single and double quotes is how they're actually used so if we use a single quote and we say I've always wanted to eat a gallon of ice cream and then we do an apostrophe at the end obviously something went wrong here what went wrong is when you use a single quote and then within your text within your sentence you have another apostrophe it's going to give you an error so what we want to do is whenever we have a quote within it we need to use a double quote these double quotes will negate any single quotes that you have within your statement they won't however negate another double quote so you need to make sure you aren't using double quotes within your sentence if you want to do something like that you need to use the triple quotes like we did above so we can do double double and then let's paste this within it and anything you do Within These triple quotes will be completely fine as long as you don't do triple quotes within your triple quotes we'll say this is wrong so even though it's between these two triple quotes it doesn't work exactly again you just have to understand how that works you have to use the proper apostrophes or quotes within your string and just to check this we can always say here's our multi-line we can always say type of multi-line and that is still a string one really important thing to know about strings is that they can be indexed indexing means that you can search within it and that index starts at zero so let's go ahead and create a variable and we'll just say a is equal to and let's do the all popular hello world let's run this and now when we print the string we can say a and we're going to do a bracket and now we can search throughout our string using the index so all you have to do is do a colon and we can say five what this is going to do is is going to say zero position zero all the way up to five which should give us the whole hello I believe let's run this and it's giving us the first five positions of this string we can also get rid of the colon and just say something like five and then when we run this it's actually going to give us position five so this is 0o 1 2 3 4 and then five is the space let's do six so we can see the ACT ual letter and that is our w we can also use a negative when we're indexing through our string so we could say -3 and it'll give us the L because it's NE -1 2 and three we can also specify a range if we don't want to use the default of zero so before we did 0 to five and it started at zero because that was our default but we could also do two to five let's run this and now we go position 0 1 and then we start at 2 L L now we can also also multiply strings and we have this a hello world so we can do a * 3 and if we run this it'll give us hello world three times and we can also do a plus a and that is Hello World hello world now let's go down here and take a look at lists lists are really fantastic because they store multiple values the string was stored as one value multiple characters but a list can store multiple separate values so let's create our very first list list we'll say list really quickly and then we'll put a bracket and a bracket means this is going to be a list there are other ones like a squiggly bracket and a parenthesis these denote that they are different types of data types the bracket is what makes a list list so to keep it super simple we'll say one two three and we'll run this and now we have a list that has three separate values in it the comma in our list denotes that they are separate values and a list is indexed just like a string is indexed so position zero is this one position one is the two and position two is the three now when we made this list we didn't have to use any quotes because these are numbers but if we wanted to create a list and we wanted to add string values we have to do it with our quotes so we'll say quote cookie dough then we'll do a comma to separate the value and then we'll say strawberry and then we'll do one more and this will just be chocolate and when we run this we have all three of these values stored in our list now one of the best things about list is you can have any data type within them they don't just have to be numbers or strings you can basically put anything you want in there so let's create a new list and let's say vanilla and then we'll do three and then we'll add a list within a list and we'll say Scoops comma spoon and then we'll get out of that list and then we'll add another value of true for Boolean and now we can hit shift enter and we just created a list with several different data types within one list now let's take this one list right here with all of our different ice cream flavors we'll say icore cream is equal to this list now one thing that's really great about lists is that they are changeable that means we can change the data in here we can also add and remove items from the list after we've already created it so let's go and take ice cream and we'll say ice cream. append and this is going to append it to the very end of the list we do an open parenthesis and let's say salted caramel now when we run this and we call it just like this it's going to take this list add salted caramel to the end and we'll print it off and as you can see it was added to the list and just like I said before let me go down here we can also change things from this list so let's say ice cream and then we need to look at the indexed position so we're going to say zero and that's going to be this cookie d right here we can say that is equal to so we can now change that value so let's call that butter econ and now when we call it we can now see that the cookie dough was changed to butter peacon another thing that you saw just a little bit ago is something called a list within a list basically a nested list so we had Scoops spoon true let's give this and we'll say nested uncore list is equal to now when we run this we now have this nested list so if we look at the index and we say zero we'll get vanilla if we say two we'll get Scoops and spoons now since we have a list within a list we can also look at the index of that nested list so let's now say one and that should give us just spoon and you can go on and on and on with this you can do lists within lists within lists and all of them will have indexing that you can call now let's go down here and start taking a look at tupal so a list and a tupal are actually quite similar but the biggest difference between a list and a tuple is that a tupal is something called immutable it means it cannot be modified or changed after it's created let's go right up here we're going to say Tuple and let's write our very first tupal so we'll say Tuple score Scoops is equal to and then we'll do an open parentheses now these open parentheses you've seen if you do like a print statement but that's different because that's executing a function this is actually creating a tupal which is going to store data for us so we'll say one 2 3 two and one let's go ahead and create that Tuple and we can just check the data type really quickly and it's a tupal and just like we saw before a tupal is also index text so if we go at the very first position which is a one we will get the output of a one but we can't do something like aend and then add a value like three if we do that it's going to say Tuple object has no attribute append it's just because you cannot change or add anything to a tupal just like we were talking about before typically people will use tupal for when data is never going to change an example for this might be something like a city name a country a location something that won't change they definitely have their use cases but I don't think they're as popular as just using a list so now let's scroll down and start taking look at sets but really quickly let me add a few more cells for us and let's say sets now a set is somewhat similar to a list and a tupal but they are a little bit different in the fact that they don't have any duplicate elements another big difference is that the values within a set cannot be accessed using an index because it doesn't have an index because it's actually unordered we can still Loop through the items in a set with something like a for Loop but we can't access it using the bracket and then accessing its index point so let's go ahead and create our very first set so we're going to say daily uncore pints then we're going to say equal to and to create a set we're going to use these squiggly brackets I don't know if there's an actual name for those if I'm being honest I call them squiggly brackets and that's what we're going to go with we're to put in a one a two and a three so let's go ahead and run this and let's look at the type and as you can see it is a set now when we print this out it's going to show us one a two and a three and those are all the values within our set but if we copy this and we'll say daily pant log this is going to be every single day maybe I had different values now when we run this and we do the exact same thing now when we print this it's going to have just the unique values within that set now a use case for set and this is something that I've done in the past is comparing two separate sets maybe you have a list or a tupal and you convert that into a set and that will narrow it down to its unique values then you can compare the unique values of one set to the unique values in another set and then we can see what's the same and what's different so let's go down here and let's say wife's uncore daily just copy this right here we'll say is equal to let's do our squiggly lines let's do one two let's do just random numbers so now this is my daily log and this is my wife's daily log and now we can compare these values so let's go right down here let's say print we'll do my daily logs and then we'll do this bar right here and this is going to show us the combined unique values it's basically like putting them all in one second set and then trimming it down to just the unique values so we'll take wife's daily pintes log and when we run this we actually need to run this first when we run this we should see all the unique values between these two sets and so as you can see 0 1 2 3 4 5 6 7 24 31 so these are all the unique values between these two sets we can also do another one and instead of this bar we're going to do this symbol right here which I believe is called an Amper sand don't quote me on that but when we run this it's going to show what matches that means which ones show up in both sets so the only ones that show up in both sets are 1 2 3 and five we can also do the opposite of that by doing a minus sign and this is going to show us what doesn't match and so we have four 6 and 31 now where is our 24 that was in our wife's daily pints log it's in this one but we're subtracting the values on this one so let's reverse reverse this and we'll say daily pints log and let's run it now those are our other values so we're taking the values of this and then we're subtracting all the ones that are the same and getting the remaining values and then for our last one we can get rid of this and we'll do this symbol right here and this is going to show if a value is either in one or the other but not in both so let's run this so these values are completely unique only two each of those sets now the very last one that we're going to look at in this video is dictionaries so let's go right down here let's add a few cells and let's say dictionaries now I saved dictionary for last because this one is probably the most different out of all the previous data types that we've looked at within a data type we have something called a key value pair that means when we use a dictionary it's not like a list where you just have a value comma value comma value we have a key that indicates what that value is attributed to so let's write out a dictionary to see how this looks so we're going to say dictionary cream and just like a set we use a squiggly line but the thing that differentiates it is that in a dictionary we'll have that key value pair whereas in a set each value is just separated by a comma so let's write name and this is our key and then we do a colon and this is then where we input our value so we're going to say Alex freeberg and then we separate that key value Pair by a comma and now we can do another key value pair so we'll say weekly intake and a colon and we'll say five pints of ice cream do a comma and then we'll do favorite ice creams and now what we're going to do is we're going to put in here a list so within this dictionary we can also add a list we'll do MCC from mint chocolate chip and then we'll add chocolate another one of my favorites so now we have our very first dictionary let's copy this and run it and let's just look at the type and as you can see it says that this is a dictionary let's also print it out now if we want to we can take our dictionary cream and say dot values with an open parenthesis and when we execute this we'll see all of the values within this dictionary so here's our values of Alex freeberg five mint chocolate chip and chocolate we can also say keys and when we run this all of the keys the name weekly intake and favorite ice creams and we can also say items so this key value pair is one item and this key value pair is another item now one difference between something like a list and a dictionary is how you call the index but you can't call it by doing something like like this where you just do a bracket oops and say zero so this would in theory take this very first one right our very first key value pair that's going to give us an error how you call a dictionary is actually by the key so it doesn't technically have an index but you can specify what you want to call and take it out so we're going to say name and this is going to call that key right here and when we run this we'll get the value which is Alex freeberg one other thing that you can do is you can also update information in a dictionary which we can't with some other data types so for this for the name it was Alex freeberg now let's say Ste freeberg and when we update that I'm also going to print the dictionary get rid of this so it's going to update Christine freeberg in that value of the name so let's go ahead and run this and now it changed the name from Alex freeberg to Christine freeberg we can also update all of these values at one time so let's copy this and I'm going to put it right down here I'm going to say dictionary.c cream. update then we're going to put a bracket or not a bracket but a parentheses around these so now what we're going to do is update this entire thing let me take this say print this dictionary now we can update this to anything we want so instead of here I can say I'll say weight and because of all that ice cream I now weigh 300 lb so let's run this and as you can see it did not delete our key value pair right here instead it just added to it when you're using the update we can't actually delete that's the delete statement and I'll show you that in just a second but all we did was added this new value it also is going to check and see if you changed anything with your key value pair so we can go in here here and change this value and we'll say 10 so now when we run this the value of this key value pair was changed but let's say we do want to delete it we'll say deel that stands for delete part of this dictionary cream and now let's specify the key which will also delete the value with it well let's specify the key that we want to get rid of and let's say wait and then let's print that again and as you can see the weight was deleted from that dictionary so that is all we're going to cover in this data types video thank you guys so much for watching I really appreciate it if you like this video be sure to like And subscribe below and I'll see you in the next [Music] video hello everybody today we're going to be taking a look at comparison logical and membership operators in Python operators are used to perform operations on variables and values for example you're often going to want to compare two separate values to see if they are the same or if they're different within Python and that's where the comparison operator comes in right here you can see our operators you can also see what they do so this equal sign equal sign stands for equal we have the does not equal the greater than less than greater than or equal to and less than or equal to and honestly I use these almost every single time I use Python so these are very important to know and know how to use so let's get rid of that really quickly and actually start writing it out and see how these comparison operators work in Python the very first one that we're going to look at is equal to now you can't just say 10 is equal to 10 let's try running that really quickly by clicking shift enter it's going to say cannot assign to literal that's because this is like assigning a variable we're trying to say 10 is equal to 10 and then we can call that 10 later but that's not how this actually works what we're trying to do is to determine whether 10 is equal to 10 so we're going to say equal sign equal sign and then if we run that by clicking shift enter again it's going to say true now if we put something else like 50 in there and we try to run this it's going to say false so really what you're going to get when you use these comparison operators is either a true or a false if we take this right down here we can also say does not equal and we're going to use an exclamation point equal sign and that says 10 is not equal to 50 and that should be true you can also compare strings and variables so let's go right down here and we're going to say vanilla is not equal to chocolate and when we run this it'll say false now if it was the same just like when we did our numbers it should say true and we can also compare variables so we'll say x is equal to vanilla and Y is equal to chocolate and then when we come down here we can say x is equal to Y and it'll give us a false and we say X is not equal to Y and it'll give us a true the next one that we're going to take take a look at is the less than so let's copy this one right up here let's scroll down and let's say 10 is less than 50 now this will come out as true now let's say we put a 10 in here before 10 was of course less than 50 but is 10 less than 10 no that's false because they are the same so if we want an output that is true all we would have to add is an equal sign right here and this would say 10 is less than or it is equal to 10 and now it's true of course we can say the exact same thing by saying greater than so 10 is equal or greater than 10 that'll be true because 10 is equal to 10 we can also say 50 is greater or equal to 10 because 50 is obviously greater than 10 now let's look at logical operators that are often combined with comparison operators so our operators are and or and not so if you have an and that returns true if both statements are true if it's or only one of the statements has to be true and the not basically reverses the result so if it was going to return true it would return false I don't use this not one a lot but I will show you how it works so let's actually test that out so before we were saying 10 is greater than 50 and of course this returned false so now let's add a parentheses around this 10 is greater than 50 and we're going to say and we'll do an open parenthesis 50 is greater than 10 now this statement right here is true 50 is greater than 10 so we have a true statement and a false statement but this and is going to look at both of them and it's going to say they both need to be true in order to return a true so let's try running this and we still have a false if we want it to return true we're going to have to change this to make it a true statement so 70 is greater than 50 and 50 is greater than 10 when we run this it should return true now let's look at the or so let's copy this and we'll say 10 is greater than 50 or 50 is greater than 10 now this is a false statement and this is a true statement so if even one of them is a true statement the output should be true and again we can do this even with strings so we can do vanilla and chocolate there we go and vanilla is actually greater than chocolate because V is a higher number in the alphabetical order so V is like 20 something whereas chocolate is three right so actually looks at the spelling for this so if we say or here it will come out true and if we say and here it should also be true because V is greater than C and 50 is greater than 10 so this should also be true now let's copy this right here and we're going to say not so what we had before is 50 is greater than 10 that returned true but now all we're doing is putting not in front of it so instead of returning true it's going to return false so now let's take a look at membership operators and we use this to check if something whether it's a value or a string or something like that is within another value or string or sequence our operators are in and not in so it's pretty simple if it's in it's going to return true if the sequence with a specified value is present in the object just like we were talking about and for not in it's basically the exact same thing if it's not in that object so let's start out by taking a look at a string we're going to say ice _ cream is equal to I love chocolate ice cream and then we're going to say love in ice cream and that will will turn true so all we're doing is searching if the word love or that string is in this larger string we could also just do that by literally copying this and putting this where this is so we can check is this string part of this string and it'll say true we can also make a list so we'll say Scoops is equal to and then we'll do a bracket and we'll say 1 2 3 4 4 five and then we'll say two in Scoops so all we're doing is searching to see if two is within this list and that should return true now if we put a six here and we said not in it will also return true because six is not in Scoops and that is true and just like we did we could also say wanted underscore Scoops and we'll say eight so I wanted eight Scoops so we can say wanted Scoops in scoops and this should return true because there's not an eight within the Scoops that we wanted and if we said in and we said we wanted eight is that within our list that we created and that's going to return a false so that is a quick breakdown of comparison logical and membership operators I hope that this was helpful thank you guys so much for watching if you like this video be sure to like And subscribe and I will see you in the next [Music] video [Music] hello everybody today we're going to be taking a look at the if statement within python now it's actually the if LF else statement but that's a mouthful so I'm just going to call it the if else statement now we have this flowchart and I apologize for being blurry but this is the absolute best one that I could find right up top we have our if condition now if this if condition is true we're going to run a body of code but if that condition is false we're going to go over here and go to the LF condition the LF condition or statement is basically saying if the first if statement doesn't work let's try this if statement if this LF statement is true it goes to this body of code if it's false it'll come over here to the else and the else is basically if all these things don't work then run this body of code now you can have as many ill if statements as you want but you can only have one if statement and one else statement so let's write out some code and see how this actually looks let's first start off by writing if that that is our if statement and now we have to write our condition which is about to be either met or not met so we'll say if 25 is greater than 10 which is true we'll say colon and then we're going to hit enter and it's going to automatically indent that line of code for us and this is our body of code so if 25 is greater than 10 our body of code will execute so for us we're just going to write print and we'll say it worked now if we run this it's going to check is 25 greater than 10 if that is true true print this so let's hit shift enter and it worked now let's take this exact code we'll paste it right down here and we'll say is less than and right now this if statement is not true so it's not actually going to work as you can see there's no output there's nothing that happened really but it did check to see if 25 was less than 10 but it just wasn't true now we can use our else statement so we're going to come right down here and we're going to say else and we'll do a colon and we'll hit enter again automatically indenting and we're going to say print and we're going to say it did not work dot dot dot so what it's going to do is it's going to come up here and check is 25 less than 10 no it's not so this body of code is not going to be executed it's going to go right down to this else statement now this else statement is going to be printed there's no condition on this so the if statement has a condition 25 is less than 10 this has no condition so if this doesn't work if this is false it's going to come down here and it will run this body of code let's run this by clicking shift enter and as you can see our output is it did not work now let's go back up here and put greater than because this is now true it's going to say if 25 is greater than 10 print it worked and then it's going to stop it's not going to go to this lse statement at all so let's run this and our output is it worked so what if we have a lot of different conditions that we want to try let's come right down here this is where the LF comes in so so really quickly let's change this to a not true a false statement we're going to go down and say LF and we're going to say if it is and let's say 30 we'll say LF worked so now it's going to check is 25 less than 10 no it's not let's look at the next condition is 25 less than 30 and if it is we'll print L if worked so let's try running this and L if worked now we can do as many of these LF statements as we want we can do let's just try a few of them right here so we'll say if 25 is less than 20 is less than 21 and let's do 40 and let's do 50 so we'll say LF lf2 lf3 and lf4 now if you look at this the first one that is actually going to work is this 25 to 40 right here once this one is checked and it comes out as true none of the other LF or L statements will work so let's try this one it should be lf3 and this one ran properly now within our condition so far we've only used a comparison operator we can also use a logical operator like and or or so we can say if 25 is less than 10 which it's not and let's say or actually and we'll say or 1 is less than three which is true if we run this now it will actually work so we can use several different types of operators within our if statement to see if a condition is true or not or several conditions are true there's also a way to write an IFL statement in one line if you want to do that so we can write print we'll say it worked and then we'll come over here and say if 10 is greater than 30 and then we'll write else print and we'll say it did not work just like we had before except now it's all occurring on one line so let's just try this and see if it works so it's saying print it worked if 10 is greater than 30 which it wasn't so it went to the lse statement and then it printed out our body right here although we didn't have any indentation or multiple lines it was all done in one line now there's one other thing that we haven't looked at yet uh and I'm going to show it to you really quickly and that's a nested if statement so when we run this it's going to say it worked it works because it says 25 is less than 10 or one is less than three since this is true it's going to print out it worked but we can also do a nested if statement so we can do multiple if statements as well so we're going to hit enter and we'll say if and we'll do a true statement here so we'll say if 10 is greater than five let's do a colon hit enter then we'll say print and then we'll type A String saying this nested if statement oops worked now let's try this out and and see what we get so it went through the first if statement it said it was true and it prints out it worked this is still the body of code so it goes down to this next if statement and it says if 10 is greater than five we're going to print this out and you could do this on and on and on it can basically go on forever and you can create a really in-depth logic and that actually happens a lot when you start writing more advanced code so I hope that this was helpful I hope that you understand the IFL statement better I hope that you understand how nested if statements work as well thank you guys so much for watching if you like this video be sure to like And subscribe below and I'll see you in the next [Music] video hello everybody today we're going to be learning about for Loops in Python the for Loop is used to iterate over a sequence which could be a list a tupal an array a string or even a dictionary here's the list that we'll be working with throughout this video and I have this little diagram right here which kind of explains how a for Loop works the for Loop is going to start by looking at the very first item in our sequence or our list and that's going to be our one right here it's going to ask is this the last element in our list and it is not so it's going to go down to this body of the for Loop now we can have a thousand different things that can happen in the body of the for loop as we're about to look out in just a second then it's going to go up to the next element and ask is this the last element reached so it'll be no again because we'll be going to the two and then the three and then the four and the five once it reaches the five it'll go to the body the for Loop and then when it asks if that's the last element the answer would be yes because it's iterated through all the items within the list and then we would exit the loop and the for Loop would be over now that may not have made perfect sense but let's actually start writing out the syntax of a for Loop so we can understand this better to start our for loop we're going to say four and and then we're going to give it a temporary variable for this for Loop so it's a variable as it iterates through these numbers it's going to assign the variable to that number so for this one we're just going to say number because it's pretty appropriate because these are all numbers and then we're going to say in integers now right here you can put just about anything this could be the list this could be a tuple this could be a string even but that is what we're going to iterate through so we're saying for the variables each of these numbers within this list of integers and then we're going to write a colon this is the body of code that's going to actually be executed when we run through and iterate through our list so for our first example we're going to start off super simple and all we're going to do is say print open parentheses and say number as it iterates through the 1 2 3 4 and five number becomes our variable that is going to be printed so during that first loop our one will be printed because that will be assigned right here then through the next iteration the two will be assigned and'll be put right here in each Loop until the very end so let's hit shift enter and as you can see it did exactly that now in this body and I'll copy and paste this down here in this body we really can do just about anything we want we don't even have to use this variable number right here we can just print yep if we wanted to and what it's going to do is for each iteration all five of those every time it Loops through it's going to print off yep so let's hit shift enter and it printed it off for us so really we weren't even using the numbers within the list we were really just using it as almost a counter now let's copy this integers once again let's go right up here and let's go copy this for Loop that we wrote now we do not have to call this number this can be anything you want any variable name that you'd like to name it we could call it jelly and we can do jelly plus jelly I think you're getting the picture right when it Loops through that one it's doing 1 plus one when it Loops through the two it's doing two plus two that is basically how a four Loop works now for a dictionary it's going to handle it a little bit differently so let's create a dictionary really quickly so we'll say ice cream dictionary is equal to we're going to do a squiggly brackets so we're going to say name and we're going to say colon we need to assign our value for that item so we're going to say Alex freeberg we'll do our next one separated by a comma and we'll say weekly intake and I'll say five Scoops per week the next one we will do is favorite ice creams and for this one we're going to do something a little bit different for this we're going to have a list within this dictionary so we'll say within our list of my favorite ice creams we'll say mint chocolate chip and I'll just do MCC for that and we'll separate that out by a comma and we'll say chocolate so now we have this dictionary ice cream dick and within it we have my name my weekly intake and my favorite ice creams with a list in there as well let's hit shift enter and now we're going to start writing our for Loop now the for Loop is going to look very similar but to call a dictionary it's just a little bit different so we're going to say four the cream in icore creamore dictionary. values and then we're going to do parentheses and then a colon now we're going to print the cream so in order to indicate what we actually want to pull we have to specify within the dictionary what we want are we pulling the item are we pulling the value we need to specify this so that's why we have this dot values right here so let's run this and see what we get so as you can see we are pulling in the values right here that's why we're pulling in Alex freeberg 5 and mint chocolate chip SL chocolate now we are able to call both of those both the key and the value so let's go right down here and we can do both the key and the value so we can pull two things at one time and we're going to do this by saying do items so we could also do do key if we just wanted to do a key but we want to do items so we going to do both of them so we're going to go right down here and say for key and value in ice cream dictionary. items print and let's write key and then we'll do a comma and then let's give it a little arrow or something like that uh something like this and then we'll do a comma and we'll say value and let's print this off and see what we get so it's looping through and for each key and value it's saying here is the key so that's the name then we have weekly intake then we have favorite ice creams it's giving us a little arrow and then we're also printing off the value so we have name Alex freeberg weekly intake five favorite ice creams mint chocolate chip and chocolate so now let's talk about nested for Loops we've looked at for Loops we understand how they work and why they do what they do but what about a nested for Loop a for Loop within a for Loop for this example let's create two separate lists let's create flavors and let's make that a list by making it a bracket we'll do vanilla the classic chocolate and then cookie dough all great flavors so that's our first list and then we're going to say toppings and we'll do a bracket for that as well and we'll say hot fudge and then we'll do Oreos and then we'll do marshmallows is how you spell marshmallows I think it's an e that looks wrong I might be spelling it wrong but that's okay so let's save this by clicking shift enter and now we have our flavors and our toppings so now let's write our first for Loops we're going to say 41 as in our number one for loop we're going to say in flavors and we'll do a colon we'll click enter now we can write our second for Loop so we're going to say 4 two in toppings and then we'll do a colon and enter and then we're going to say print and we'll do an open parenthesis and then we're going to say one so we're printing the one in flavors and then we're going to say one comma I'm going to say topped with comma 2 so what this is essentially going to do is we're going to say for one we're going to take the very first one in flavors and then we're going to Loop through all of two as well so we're going to Loop through hot fudge Oreo and marshmallows and once we print that off then we will Loop all the way back to Flavors and look at the next iteration or the next sequence within the first for Loop so let's run this really quickly and see what we get so as you can see it goes vanilla vanilla vanilla and vanilla is topped with the hot fudge the Oreos and the marshmallows and then we start iterating through our second one in our first four Loop so there's that hierarchy so we're iterating completely through this one before we actually go to the very first for Loop and start iterating through that one again now that is essentially how a nested for Loop works these nested for Loops can get very complicated in fact for Loops in general can get very complicated the more you add to it and the more you're wanting to do with it but that is basically how a for Loop and a nested for Loop works thank you guys so much for watching be sure to like And subscribe below and I'll see you in the next [Music] video [Music] hello everybody today we're going to be taking a look at while Loops in Python the while loop in Python is used to iterate over a block of code as long as the test condition is true now the difference between a for Loop and a while loop is that a for Loop is going to iterate over the entire sequence regardless of a condition but the while loop is only going to iterate over that sequence as long as a specific condition is met once that condition is not met the code is going to stop and it's not going to inter through the rest of the sequence so if we take a look at this flowchart right here we're going to enter this while loop and we have a test condition right here the first time that this test condition comes back false it's going to exit the while loop so let's start actually writing out the code and see how this while loop works so let's create a variable we're just going to say number is equal to one and then we'll say while and now we need to write our condition that needs to be met in order for our block of code beneath this to run so we're going to say while number is less than five and then we'll do colon enter and now this is our block of code we're going to say print and then we'll say number now what we need to do is basically create a counter we're going to say number equals number + 1 if you've never done something like this it's kind of like a counter most people start it at zero in fact let's start it at zero and then each time it runs through this while loop it's going to add one to this number up here and then it's going to become a one a two a three each time it iterates through this while loop now once this number is no longer less than five it'll break out of the while loop and it will no longer run so let's run this really quick by hitting shift enter so it starts at zero and it's going to say while the number is less than five print number so the first time that it runs through it is zero and so it prints zero and then it adds one two number and then it continues that y Loop right here and it keeps looping through this portion it never goes back up here to this line of code this is just our variable that we start with and then once this condition is no longer met once it is is false then it's going to break out of that code now that we basically know how a y Loop Works let's look at something called a break statement so let's copy this right down here and what we're going to say is if number is equal to three we're going to break now with the break statement we can basically Stop the Loop even if the while condition is true so while this number is less than five it's going to continue to Loop through but now we have this break statement so it's going to say if the number equals three we're going to break out out of this while loop but if this is false we're going to continue adding to that number just like normal so let's execute this so as you can see it only went to three instead of four like before because each time it was running through this while loop it was checking if the number was equal to three and once it got to three this became true and then we broke out of this while loop the next thing that I want to look at and we'll copy this right down here is an else statement much like an if statement but we can use the lse statement with a while loop which runs the block of code and when that that condition is no longer true then it activates the else statement so we'll go right down here and we'll say else and we'll do a colon and enter and then we'll say print and we'll say no longer less than five now because this if statement is still in there it will break so let's say six and then we'll run this and so it's going to iterate through this block of code and once this statement is no longer true once we break out of it we're going to go to our else state St now as long as this statement is true it's going to continue to iterate through but once this condition is not met then it will go to our L statement and we'll run that line of code now the L statement is only going to trigger if the Y Loop no longer is true if we have something like this if statement that causes it to break out of the while loop the L statement will no longer work so let's say if the number is three and we run this the L statement is no longer going to trigger so this body of code will not be run now the next thing that I want to look at is the continue statement if the continue statement is triggered it basically rejects all remaining statements in the current iteration of the loop and then we'll go to the next iteration now to demonstrate this I'm going to change this break into a continue so before when we had the break if the number was equal to three it would stop all the code completely but when we change this to continue which we'll do right now what it's going to do is it's no longer going to run through any of the subsequent code in this block of code it's just going to go straight up to the beginning and restart our while loop so what's going to happen when we run this is it's going to come to three it's going to become three it's going to continue back into the while loop but it's never going to have that number changed to be added to one to continue with the while loop this will basically create an infinite Loop let's try this really quickly and as you can see it's going to stay three forever eventually this would time out but I'm just going to stop the code really quick so if we just change up the order of which we're doing things we're going to say there and we're going to put this down here so what it's going to do now instead of printing the number immediately and then adding the number later we're going to add the number right away and then we're going to say if it is three we're going to continue and it's going to print the number so let's try executing this and see what happens so as you can see we no longer have the three in our output what it did was when we got to the number three it continued and didn't execute this right here which prints off that number so that really is the basics of the while loop I hope that this was helpful I hope that you learned something in this video If you did be sure to like And subscribe below and I'll see you in the next [Music] video hello everybody today we're going to be taking a look at functions in Python a function is a block of code which is only run when you call it so right here we're defining our function and then this is our body of code that when we actually call it is going to be ran so right here we have our function call and all we're doing is putting the function with the parenthesis that is basically us calling that function and then we have our output throughout this video I'm going to show you how to write a function as well as pass arguments to that function and then a few other things like arbitrary arguments keyword arguments and arbitrary keyword arguments all of these things are really important to know when you are using functions so let's get started by writing our very first function together we're going to start off by saying DF that is the keyword for defining a function then we can actually name our function and for this one we're just going to do first underscore function and then we do an open parenthesis and then we'll put a colon we'll hit enter and it'll automatically indent for us and this is where our body of code is going to go now within our body of code we can write just about anything and in this video I'm not going to get super Advanced we're just going to walk through the basics to make sure that you understand how to use functions so for right now all we're going to say is print we'll do an open parenthesis we'll do an apostrophe and we'll say we did it and now we're going to hit shift enter and this is not going to do anything at least you won't see any output from this if we want to see the output or we actually want to run that function and some functions don't have outputs but if we want to run that function what we have to do is just copy this and put it right down here and now we're going to actually call our function so let's go ahead and click shift enter and now we've successfully called our first function this function is about as simple as it could possibly be but now let's take it up a notch and start looking at arguments so let's go right down here and we're going to say Define number underscore squared we'll do a parenthesis and our colon as well now really quickly when you're naming your function it's kind of like naming a variable you can use something like X or Y but I tend to like to be a little bit more descriptive but now let's take a look at passing an argument into a function the argument is going to be passed right here in the parentheses so for us I'm just going to call it a number and then we're going to hit enter and now we'll write our body of code and all we're going to do for this is type print and open parenthesis and we'll say number and we'll do two stars at least that's what I call it a star and a two and what this is going to do is it's going to take the number that we pass into our function it's going to put it right here in our body of code and then for what we're doing it's going to put it to the power of two and so when the user or you run this and call this function this number is something that you can specify it's an argument that you can input that will then be run in this body of code so let's copy this right here and then we'll put it right down here into this next cell and we'll say five and so this five is going to be passed through into this function and be called right here for this print statement let's run it and it should come out as I believe 25 that is my fault I forgot to actually run this block of code so I'm going to hit shift enter so now we've defined our function up here and now we can actually call it so now we'll hit shift enter and we got our output of 25 now now in this function we only called one argument but you can basically call as many arguments as you want you just have to separate them by commas so let's copy this and we'll put it right down here now we'll say number squared uncore custom and then we'll do number and then we'll do power so now we can specify our number as well as the power that we want to raise it to so instead of having two which is what you call hardcoded we can now customize that and we'll have power power and now when we call this function we can specify the number and the power and both of those will go into this body of code and be run and we can customize those numbers so let's copy this and we'll say 5 to the power of three and let's make sure I ran this so let's do shift enter and now we will call our function and let's hit shift enter and we got 5 to the^ of 3 which is 125 and just one last thing to mention is if you have two arguments within your function and you are calling it right here you have to pass in two arguments you can't just have one so if we have a five right here it's going to error out we have to specify both Arguments for it to work now let's take a look at arbitrary arguments now arbitrary arguments are really interesting because if you don't know how many arguments you want to pass through if you don't know if it's a one a two or a three you can specify that later when you're calling the argument so you don't have to do it upfront and know that information ahead of time so let's define our function so we're going to say Define and then we're going to say number underscore args and we'll do an open parenthesis and a colon now within our argument right here typically we would just specify here's what our argument will be it will be number or it will be a word right but what we're going to do is something called an arbitrary argument so it's unknown so we're going to put star and then we'll say args now you will see something exactly like this typically if you're looking at tutorials that'll have star args in there or if you're looking at just a generic piece of code this is what it will look like but for us we're going to actually put number so again we have the star and then we have our arbitrary argument right here and then we'll hit enter and we're going to say print open parentheses and this is where it's going to get a little bit different so we're going to say number and then we're going to do an open bracket and let's say zero and then we'll do that times and then we'll say number again with a bracket of one so in a little bit once we run this and then we call this number args function right here we're going to need to specify the number zero and the number one that's going to be called so let's go ahead and run this and then we are going to call it and let's say 5 comma 6 comma 1 2 8 so right up here we did not know how many arguments we were going to pass through it could be five it could be a thousand we could also call in a tuple and that's what this is right here we're calling in a tupal so what it's going to do now is when it calls this number it's going to call the very first within that tupal which will be that five and then it'll also call in this number which will be the first position which is the six so let's hit shift enter and it's going to multiply these numbers together so 5 * 6 is equal to 30 now like I just said this is a tuple so we don't actually have to write out these numbers like we just did we can pass through a tuple when we are actually calling this function let's do that right up here let's just create um let's call it argor Tuple and we'll do open parentheses and we'll do the same numbers let's just copy it make it easier and now we've created this tupal right here which we can then pass in and this is a lot more handy a lot more specific and this is most likely how someone would do something like this but let's now create this and now we can copy args Tuple and pass it through now really quickly this is going to fail and I'm doing that on purpose but I want to show you what you need to do in order to pass through this tupal so right now it's going to say Tuple index is out of range all you have to do in order to use this is you have to specify a star before it just like you did when you're creating your argument up here you have to put a star in front of our Tuple that we just passed through and now let's try running this and now it works properly now the last two things that we're going to look at are keyword arguments and arbitrary keyword arguments there are more things that you can learn and do within functions but again I'm just trying to teach you the basics to make sure that you understand how they work so let's go right up here and a keyword argument is kind of similar to this right here and let's actually copy this and put it right down here now a keyword argument is very similar in that you're going to specify your arguments right here but what we did up here let me bring this down when we actually called the function what we did was we just put in a five and a three and when we did that it automatically assigned number to five and power to three and that's totally fine and you can do that but if you want a little bit more control you can use a keyword argument so right here we could say power is equal to five and number is equal to three so I just switched it around right number was assigned to five and Power was assigned to three but I just switched it to show you how this might work so let's run both of these and now it's 3 to the^ of 5 which is 243 so that essentially is a keyword argument again it just gives you a little bit more control you don't have to put them in specific positions like if you're just calling multiple arguments now let's come right down here we're going to create basically another custom function uh so for this one we're going to write Define number underscore bar and then we'll do an open parenthesis a colon and enter and what this one is is this one is a keyword argument or an arbitrary keyword argument now to specify an arbitrary argument all we did was a star and then we input number but if we're doing a keyword argument we actually have to have two stars right here so let's start taking a look and again if you're doing arbitrary it means we don't really know how many keyword arguments we want to pass into our function so we're just going to put star our number and then later within our body of code and when we're calling it we'll be able to specify it and just like the arbitrary argument before the arbitrary keyword argument means we really just don't know how many keyword arguments we're going to need to pass into our function so to demonstrate this let's write print do an open parenthesis and we'll say my oops need to do an apostrophe my number is we'll do just like that little space and we'll say plus and this is kind of where it gets a little interesting or a little bit more tricky so we're going to say is number so This Is Us calling our number and then we're going to do a bracket and then I'm actually going to go to calling the function it's a little bit backward or a little bit different than what you might think but when we're calling it what I'm going to do is I'm going to say integer is equal to let's just do some random number now when we're calling that keyword within our body of code what we're going to do is we're going to actually type out integer just like this and this looks a little bit different but what this allows us to do is we can put as many keyword arguments in here as we want later and I'll show you in just a second but for us we're just creating this key and this value when we are calling it within the function so now when we create this and we run this oh whoops I forgot this has to be a string um so let's run this again now we will say my number is 2309 then we're we're going to add we'll say plus and this isn't going to look great but we'll say my other number because this will all be in the same line that's okay my other number and then we'll say number and we can specify again what we want in there so now we can go down here to where we're calling it we'll put a comma and we'll say integer oops integer 2 is equal to we'll do a random number and then we'll put in two right here and then we'll add plus right here so we don't error out we'll create this we'll run this and as you can see both numbers were passed through again the syntax is terrible but now you can see that you have this arbitrary keyword argument right here and all we have to do is put number number and we can pass through as many of these arbitrary keyword arguments as we want as long as we just specify within our function when we're calling it so that's all we're going to look at in today's video on functions there are of course other things that you can do within functions and it can get a little bit more advanced but I wanted to show you the basics the meat and potatoes of things I definitely think you should know in order to get started using functions I hope that you were able to understand functions better because of this video if you did be sure to like And subscribe below and I will see you in the next [Music] video hell hello everybody today we're going to be talking about converting data types in Python in this video I'm going to show you how to convert several different data types including strings numbers sets tupal and even dictionaries so let's start off by creating a variable we'll say numor int is equal to 7 and we can check that data type by saying type and then inserting our variable number undor int and that will tell us that our data type for this variable is an integer let's go ahead and create another one we're going to say num underscore string is equal to and for this one we'll also do a seven but let's check the type and we'll do an open parentheses and we'll say the type of num string and that one is a string now let's say we wanted to add those we'll say Num uncore Sum so the sum of numor int plus numor string now when we're adding these two values it is not going to work it's going to give us an error and it's going to say unsupported op brand for INT and string so it cannot add both an integer and a string what we need to do in order to add these two numbers is to convert that string into an integer so let's go right up here let's add another cell and let's say numor string undor converted is equal to and we want to convert it into an integer so all we have to do to convert it into an integer is type int and then we're going to say num underscore string and that is as easy as it's going to get all we have to do is say integer with our numb string inside of it and then it's going to convert it and we can even check it right after by saying type num string converted and let's run this and now we can see that it was converted into an integer so now let's add that num string converted right here let's copy and replace that string with the string converted and let's actually print out that numor sum and it worked properly now we did not specify what type of value this Num Sum was going to be but because it was two integers in here it's going to automatically apply that data type of integer to that Num Sum let's go right down here and now let's look at how we can convert lists sets and tupal so now let's say we have a listor type and that's equal to 1 2 3 and we can check it again by saying type and that is a list let's say we want to convert it to a tupal it's fairly easy all we're going to do is write Tuple say listor type that list uncore type is now going to be a tupal and we can check that by saying type and wrapping it around this Tuple and it shows us that it is converting that list into a tupal now we can also convert a list into a set but it may change the actual values within it let's check that out really quickly so let's say we have this list and let's add a few more values to this just like that now let's say we want to convert it to a set so we're going to run this and we'll say set of listor type and let's try running this and see what the output is so this is something that you really need to be aware of when you are converting data types because set does not act the same as a list a set is basically going to take the unique values in the list and convert it to a set and it fundamentally changes the data that was in that original list and just to check the data type we can say type I'm just doing this for all of them and as you can see that is now a set now let's go down here and take a look at dictionaries now let's say we have a dictionary called dictionary type and we'll do a squiggly bracket and we'll say name name and we'll do a colon and we'll say Alex then we'll do age and a colon and we'll say 28 and then we'll do hair colon and so really quickly let's take that dictionary type and just confirm that it is a dictionary and it is and now what we're going to do is take a look at all of the items within that dictionary so we're going to do dictionary type. items open parenthesis and this is going to show us all the items within it now we can also take this and look at something like the values and when we run that these are our values So within our dictionary we have items and that's what this is right here this is one item and then within that we have our values which are right here so Alex 28 and Na and then we have something called a key and this is the key the name age and hair are all keys and we can look at that by saying dot keys so let's say we want to take all of the keys and put that into a list what we're going to do is we're going to take this right here say list we'll do an open parenthesis we'll type that in right there so it says a list and we're converting these Keys into a list and let's run that and now this is a list and let's just check the type as well just to confirm and as you can see it was converted properly into a list and we can do the exact same thing with values and the values can also be converted into a list now we can also convert longer strings that aren't just numbers like we did above in our very first example so let's do longcore string and we'll say I like to party now we're going to take this string and we're going to say list long string so we're going to convert this string into a list and let's see what happens so it took every single character in that string and put it into a list and we could also do a set as well that one's a lot shorter because it's only looking at unique values so that is how you convert data types in Python thank you guys so much for watching I really appreciate it if you like this video be sure to like And subscribe below and I'll see you in the next [Music] video [Music] hello everybody today we're going to be working on building a BMI calculator in Python now before we get started I want to show you this BMI calculator that I found online and it shows you the basic calculation that they use and that's the one we're going to use in this video and they also have this calculator right down here and some ranges that we can use for our calculator as well so for reference I weigh about 170 I'm about 5 9 let's calculate this so I'm about a 25.1 BMI which falls into the overweight category that's unfortunate but we can see exactly how this works and how ours should work when we actually build it so we're going to kind of reference this throughout the video so let's go right over here to our BMI calculator we need to calculate weight and height and then run this calculation right here so let's go ahead and copy this and we're going to put it right down here here and so now we have our calculation so what we need is we need input from a user and there is an input function within python that we're going to be using so let's actually give me a few more cells so the first thing that we need to calculate is their weight let's type out weight right here we'll say weight is equal to and this is where we'll use our input function so we'll say input and when we actually run this it's just going to give us this blank square or a user can input something we'll say Alex so this is our output is what the actual user input and it does save it to this variable so if we say print weight it will still print out Alex now this is where we want the user to just like we did before where they'll input their weight so we want to kind of give them a prompt for this we'll put a string in here so I'll do a double quote and then I'll say enter your weight in and we're using pounds say pounds colon space so now when we do this it'll say enter your weight in pounds I'll say 170 and then when we run this it does store that now let's do print I should have saved it wait again oops now it's only storing the value of 170 it's not actually storing this string right here so that's really important for when we do our calculations later um I'm going to I'm going to save this right down here because I'm sure I'm going to use that later um so we have that it's working now we need to also do our height so let's copy this and we'll put it right here and we'll do height and enter your height in inches so now for this one if we hit enter it's actually running let's stop it really quick and interrupt it let's try running this so it's going to say enter your weight and pounds that's the first input say 170 and then when I hit enter it's going to prompt me for that second input and so in inches 59 is 69 in and then I can hit enter again and now we have both of our inputs now we need this calculation right down here and just like that so now we have weight in pounds time 703 divided by height in inches by height in inches so we actually have weight and it's already written in there but I'm just going to do like this we'll do weight time 73 so that's pounds there our weight and pounds * 703 divided by now we have our height in inches times the height in inches so this is our calculation right here so let's do this exact same thing let's run this and this times of course is not going to work whoops we need to do our star for both of these all right now this is our calculation so let's run this so we have 170 and that's pounds and inches was 69 hit enter and it says cannot multiply the sequence of non- integer type of string Ah that's because these are being stored in strings they right down here I do and we'll do type of height we run that this is actually a string so we want to change that because we don't need that anymore that so we don't want it to be a string we need those to be integers or Floats or really anything besides a string it just needs to be numerical uh so integer float really so let's do integer and we'll wrap that input in it and we'll do the same thing for this one now we have an integer for our weight an integer for our height so now when we're running this calculation it should work properly let's run this again our pounds are 70 our height is 69 in and it's not giving us our output because we're not printing anything okay so I just need to do print BMI so let's try this again 170 69 and there is our BMI 25.1 so it worked the exact same as this one so they input well we input our height we inputed our or we inputed our weight we inputed our height and then it calculated rbmi the next thing that we need to do is we need to kind of give the user some context is that good is there BMI in within a good range a bad range we don't know uh so let's go ahead and I'm going to see if I can copy this know if this will work or not let's go ahead and copy this right down here perfect so what we now need to do is we need to say okay if the user has given us this input we want to give them or tell them if they are a normal weight overweight obese severely obese anything like that and we have these ranges so that should help us out quite a bit so let's just write our if statement and then we'll include it up here but let's go down here and we'll say if and then we'll do BMI and let's just say BMI is greater than zero so if it's greater than zero if they had any input where the BMI was not zero which should be every time if they do it properly and they don't you know put a string in there or something or type out 40 which maybe we should make a prompt for that if that happens then we can say if we'll do BMI and now we need to give that first range so this range right here so if it's under 18.5 so we need to do a less than so if it's less than 18.5 and it just says under it doesn't say under or equal to so I'll keep it at 18.5 so if it's under 18.5 then let's give kind of the output we'll say print and the output or the basically the prompt is underweight so we'll just say you are under under case underweight and just like that um then we're going to pass several ellf statements through here well let's just say else so I guess this would be like if they are if they don't input something properly if something messes up maybe I we could write something like um print oops I'm thinking all this through we can write print enter valid inputs or something like this or we can always change that but let's really quickly let's run this okay so I'm not in that range uh let's make the next one so then I can be within a certain range oops and we need we should need one one more a minimum so we'll say LF and LF these next two are this 24.9 so it's going to check this one first so if it's 18.5 or below 18.5 it's automatically going to print this one so this next one we don't have to do like a range or anything we can just say if it's below if it's between 25 and 29.9 so this one actually should be less than or equal to um this one is normal oh whoops 24.9 so this one is 24.9 this one is going to say you are normal weight so let's run this now let's see BMI was 25.1 oh guys I'm just messing up here I apologize all right this is the one that I was part of so now it's going to be I'm part of the overweight crowd now now let's run this and now our prompt is you are overweight cuz remember the BMI was saved right here as 25.1 down here if we run through this it's saying no you're not in oops get rid of that no you're not in under 18.5 you're not under 24.9 if you under 29.9 you are overweight so that did work properly so that's really good and I don't think I want this to be our output for person because we're going to add this up here it's just going to give us the BMI and then the output is going to say you are overweight uh let's make it a little bit more customized um I'm going to say name is equal to input and then we'll say enter your name um so it'll be enter your name we'll do Alex 70 69 there's our BMI now it's going to run through this logic or it will run through this logic and just just a second when we actually finish this so then we have 34.9 and let's do one more oops and then this one's going to be for 39.9 so this one was overweight this one is obese severely obese so we'll say severely that you spell it really obese and then anything that's over that 40 and over so if it's not this one anything else should be S morbidly obese so actually this lse statement right here should say uh you are you are severely obese this is going to say morbidly morbidly obese now I added that name up here because I wanted to add that down below actually so we're we're going to say uh name plus and then we'll do like comma you are underweight so it'll be a little bit more personalized uh I think it'll I think it'll be a nice touch I really do we'll do it like this and we'll say you and let's go back and do that to all of them and let me see how quickly I can do thiss oh whoops what I do get rid of that name plus u like that geez you guys are seeing me mess up a h name plus you and then name plus you so now let's run this and now it's a little more personalized it says Alex you are overweight so this is all really good now this is an if statement um what we had done before I think is actually what we should put right down here so we'll say l else and then if that doesn't work we'll say what do we say enter valid input we'll just put that um and let let me see if I can test this out don't I don't know if this will error out or if this will even work let me just see if I can mess with it and see if I can get it to work actually let's copy this we're going to copy this whole thing we're going to include it right here and now we have basically our entire calculator so um let's run this enter your name we'll say Alex enter your pounds 170 into your inches 69 and then it's going to say 25.1 Alex you are overweight and that's perfect we could even go as far as adding like some feedback we say you are overweight and then it would be a period and we could say um you need to exercise more stop sitting and writing so many python tutorials so now if we run this we'll do Alex 17069 it says Alex you are overweight you need to exercise more and stop sitting and writing so many python tutorials period and that's it this is the entire project um you can go a ton farther you can include much more complex logic you could even build out a UI to create your own you know app just like this where it has this input and this UI you can build that out with in jupyter notebooks with python um but that's not really what this tutorial is for this is just to kind of help you um think through some of the logic of creating something like this so you know I hope that this was helpful I hope that this was fun I like creating stuff like this we have two other projects that we're going to do and maybe I'll include more but we have two right now that I have planned um and I hope those those are helpful this is probably our easiest one and they'll get a little bit more difficult in the next projects so I hope that this was fun I hope that this was helpful and that you can now kind of utilize those python skills that you've been working on if you like this video be sure to like And subscribe below and I'll see you in the next [Music] video hello everybody today we're going to be creating an automatic file sorder for your files and file explorer now out of all the projects that we've done in this series so far I think this one might be the most difficult but I also think this one is the most cool because it has some real life applications so without further Ado let's take a look at some files that we have right down here in my file explorer so I have this beautiful picture of Rosie uh right here this is a PNG file I have a CSV file and a text file and I want to sort all of them into their own folders depending on what kind of file it is so if I go right in here and I click on this one I go to properties I can see that this is a PNG file um if I go into this one I don't need to but if I go into this one it's a CSV file and of course this one is a text file so I want three separate folders in here and I want them to automatically go into those folders without me having to drag and drop and going and clicking now we only have four files here but imagine if we have thousands of files how much time that could save us so let's get out of here and let's start writing our code so we're going to say import OS comma and then we're going to say chut iil now OS obviously stands for operating system shuil uh I don't know what it actually supposed to stand for but what it will allow us to do is do some highlevel operations on our files in file explorer so we're going to go ahead and import those and now that we have those imported uh something that's going to be very important for us to have throughout this whole thing and this is anytime I'm working with like directories or something like this we want to get this path down so I'm going to go ahead and copy this path and we're just going to say path is equal to and we'll do this right here so let's run this and I need to put an R right here to make this a raw text um so when you don't have the r uh it's going to read in these you know these backslashes and these colons and different stuff if we do R it's just going to read it in as the raw string and that's what we want so here's what we need to do there there's a few different things that have to happen when we are writing this out one thing is is we need to go in here and we need to see this path and we need to see are there folders in here already um if not we need to create a folder so that's one of the first things that we need to do the next thing that we need is it needs to check each of these files individually identify what kind of file it is and then put it into the correct folder so we have to create the folder then check these and then place it into the correct folder so let's go right out of here so what we're going to start doing is we're going to start working with these paths and these directories and some of these things you may never have seen before but that's okay I'll try to explain it as I go through so the first thing that we're going to write is os. list directories uh and what this is actually going to do is show us all the files in there we're going to say path so it should show us all the files within path and so here are our results so we have the data professional results fake text file our image and our other image so this is actually showing us what files are in that path and that's super important because we're probably going to have to Loop through this in some way later um I wrote this all out before so I kind of remember but I'm doing this all off the top of my head so I guarantee you throughout this I'll make some mistakes but what we now need to do is we need to create folders or check if there's a folder and create it if it isn't there that's um The Next Step that we need to take so let's go right down here and we want to check if this path exists already so if that folder already exists so we're going to say os. path. exists so this is going to check does this path just like this path up here does it already exist and then we're going to do an open parenthesis we'll say path so that's our path now we need to add a folder name to this um we could hardcode it so we could do plus we could say CSV files and that could work so it would say does this path already exist and we can try running this and it's going to say false so this doesn't already exist but the thing is is we need to create three separate path so we could do this by just hardcoding it in by saying CSV files image files um and text files or we can just put this all in a list and loop through it I think it's just going to be easier to do that or I don't know visually it's going to be easier so we'll do uh folder undor names and we'll say is equal to and we'll create a list so I think I want to call it CSV files comma um image files or PNG files whatever you want to write and then we'll do text files do text files and then we can go right down here um a little for Loop uh I think what we'll do actually let's write folder underscore names um then we can put something like uh let's write Loop why not um so a little trick for the for Loop is you going to say four and we'll say Loop and and we'll just do a range because we want it to basically go through here we don't want it to actually give us these file names we just want it to count Zer one and two so if we do range from Zer to two zero uh 0 one2 that should work if we do um this then when it Loops through it's going to call folder name and say zero which would be CSV files image files and text files um so let's uh yeah I need a colon let's run through this really quickly uh shouldn't do anything but what we can do now is we can say okay if this does not exist what we can do is actually create it so we'll say if not so if this does not exist then what we're going to do is take this and we'll say os. make directory and then we'll do just like that um I think it's make directory S I can't I think that's correct um so let's test this out really quickly let's see if this works and invalid syntax I I need a colon okay so I just ran this let's see if it did actually make those folders let's refresh it and it didn't so let's just print this off um so if not let's just print let's see does this actually work let's do if okay ah okay so I think I know what might be happening I think it's giving us it actually be let let's check this really quick go to python tutorials oh no I think it's creating yeah it's creating these Python tutorial images right here whoops okay so I just figured it out um let's go back into python tutorials don't take a look at any of those notebooks those are secret um we were creating them in the wrong place um and that's because of this right here we need a backslash so we need to actually include a backslash right here here in this path we didn't have that um e y scanning string literal okay so this back slash could cause an issue let's see if I can do forward slashes on all these just stick with me guys I might cut this out I might not we'll see if this is important just going to keep talking while we're doing it um let's run this okay so now that we're doing these forward slashes we're still checking let's make sure we can still check those files good now when we Loop through this I'm not going to well yeah I can print it off doesn't matter I'm going to print it and we'll see if that name works and then we're also going to um uh I said if so if it exists then make it no no no so if not I think the not did make sense we just weren't sure we had to do some um checking so if it exists then we're going to create it and we'll keep the print in there because it doesn't really matter so it's going to create the CSV an image but didn't create the text let's see okay let's uh I don't know why this would work but let's run it okay so I think I just had the wrong range so now we have our images all through or we have our folders all three folders now we need to write a script that will read in these and check and see what kind of file it is and place it into the correct folder so let's come right down here and let's see what we need to do so now I think we need to use this right here um I think we need to Loop through this to be able to check each one so we need to name this so we'll just do um file name is equal to run that so now we have this file name um and what we can do is Loop through this so let's say let's say for file in file name so we're going to Loop through this now when it goes through it needs to check the it's going to check the file path and in the file path it'll say. txt CSV so let's say um if I think it should be CSV Let's test it on this one but if CSV is in file name or actually it's file so if if it's in file and not in and oh not not in if it's also not in this I believe because we're going to check we're going to check each of those folders so we're going to Loop through and it's going to check and see if the CSV so if that string is in the file then what we want to do is check that it's also not in here that's actually just the folder we also need um also we're not doing that for Loop anymore um um okay I'm sorry I'm talking this through I'm figuring it out as I go because I may have forgotten some of this so we're going to say this that's the CSV files so we need to check this one um let's do it like this oops okay so it's going to check to see if CSV files and I think it needs that in between it so it's going to say the path so there's our path plus slash C SV files um actually no it needs to be like this CU we're going to check that then I got it all right I figured it out now then we're going to check if this file is in there yeah so that's right so it says if the CSV is in the file um which is right where am I looking oh file name so if it's in that list of the actual files which is all of these if we find CSV in any of these files and it's not already in here so it's going to say path plus CSV files did I say files yeah CSV files plus file okay that all looks correct so if it's not in there we're going to use shuttle. move now this is how we actually move the file it gives us the ability to move what we want then we'll say move we need to take it from our initial path to our new path so we're going to specify we'll separate by comma we need to spef ify its original path which it should just be this without this I think it should be file path because this is where it is now it's in the FI this path with that file name then we need to say we want to move it to here that is what we want to do um yeah so let's check it with just this one and see if it works okay it ran through it let's go check aha now that CSV file is gone perfect that is exactly what we want it to happen now we can just recreate this for um for both our PNG files our image files and our text files so we'll say LF and LF and let's do PNG then we'll do image files and image files because again we're just doing the exact same thing I can do text files the next one's going to be text files text files so this one's going to check for txt now do we need anything else um we'll just say else and we'll print off print this file type is not included or or if there's multiple files we'll say there are files in this path that were're not moved okay so if we run through this it's going to catch our CSV catch our PNG catch our text and if not it'll say there are files in this path that we're not moved exclamation point all right now let's run through this uh uh that's because if LF LF L if and then it's going to this lse statement uh I don't know let's let's Circle back around to that in a second all of them were moved properly that's really good really quickly I I'll I'll check and see I just don't I'm G to take that out for now so I'm just going to run it um I'm we may or may not go back to that but let's check and see if everything worked properly so let's go into the CSV file and we have our CSV file let go into our image files and we have our images and let's go into our text file and there are our text files now is there anything else that we need to do I don't believe so but what I can do is I can take all this I can include it in here and I'm going to basically restart it just to see if it works properly from scratch right I just want to make sure that I didn't miss it anything and we'll delete these so we have our I'm just going to rerun everything we we imported we created our path these are our file names and then when we run this it should take our folder names check through them if they aren't already created it's going to create it don't need it to print so let's get rid of that then for the file within our file names and it check it it checks each one we check if there's a CSV and if it's already already in that file if it's already in that folder I mean if it's in that folder then it doesn't do anything but if it isn't so and not it's not in there it is going to move it to that location so it's going to check CSV PNG and text I think everything should work properly let's run this and it looks like it's working good good good and perfect it worked exactly how I had hoped um that's great so this is the automatic file sorder in file explorer project uh you can go even a step further so I had to come in here and manually run this you can go a step further and put a timer on this where it automatically does this maybe every hour every day every 30 minutes you can run this in your background especially if you create um like a an execution for this you can run this in your background um if you are curious on how to do that I think I did something something similar to that in my web scraping project um my Amazon web scraping project if you want to go check that one out but we're not going to do it in this project this is all I wanted to show you how to do so I hope that this was helpful I hope that this project was you know interesting and that you liked it I hope that you learned something and so if you did be sure to like And subscribe below and I will see you in the next video what's going on everybody welcome back to another video today we're going to be starting our python web scraping tutorial series now this is more of a continuation of the Python tutorial series series but because we're going to be focusing on web scraping for three or four videos I wanted to just make it its own little minseries in this series I'm going to show you the basics of web scraping how to actually look at HTML how to inspect a web page how to pull that data in and then even put it into a CSV file so you can save it and use it now in this series we're just covering the basics which is a fantastic place to start but in future series I'll be going into some of the more advanced web scraping topics as well so without further Ado let sh up on my screen and get started with web scraping now the first thing that we need to learn is HTML HTML stands for hypertext markup language and it's used to describe all of the elements on a web page now when we actually go to a website and start pulling data and information we need to know HTML so we can specify exactly what we want to take off of that website so that's where HTML comes in and we're going to look at the basics understanding just the basic structure of HTML then we'll go look at a real website and you'll kind of see that's a little bit more difficult than what we just have right here but this is the basic building blocks to get to what the HTML actually looks like on a website now this is basically what HTML looks like we have these angled brackets with things like HTML head title body and then you'll notice that at the end we'll have a body and then we'll have a body at the bottom this forward SL body denotes that this is the end of the body section in HTML so everything inside of this is within this body so there is this hierarchy within HTML we have HTML and HTML at the bottom which encapsulates all the HTML on the website then we have things like head and head body and body now Within These sections we usually have things like classes tags attributes text and all these other things things that we'll get to in different lessons but one of the easiest ones to notice and look at are tags things like a P tag or a title tag now Within These tags because this is a super simple example we have these strings here my first web page page and this is what's called a variable string and this is actual text that we could take out of this web page now that you understand the super basics of HTML let's actually go to our website and I'm going to have a link down below but it's going to be this one right here this is basically just a website that you can you know practice web scraping on it's called scrape the site.com and what we're going to do is look at the HTML behind this web page and you can do this on any website that you go on so we're going to right click we're going to go down to inspect now right off the bat this looks a lot more complicated and a lot more complex than the very simple illustration that we were looking at but let's kind of roll this up just a little bit you'll notice we have HTML and HTML at the bottom we have a head and there is the end of the head and then a body and the end of the body so in a super simple sense it is similar but just the information that's within it is a lot more difficult now if we look at this title right here this is our title tag if we click this little arrow this is our dropdown you'll notice that here we have the string hockey teams forms searching imp pagination now let's say we didn't know we didn't want to click on that and go find it there's something that's super helpful within this inspection page that you can click on right here it says select an element in the page to inspect it so we're going to click on that and as we go through our page and let's click on this title it's going to take us to exactly where this is in our our HTML this is extremely helpful extremely useful for example let's say the data I want is down here I want to take in the Boston Bruins I can click on it and it's going to take me to where that is exactly in the HTML this is where we can start writing our web scraping script to specify okay I'm looking for a TR tag I'm looking for a TD tag I'm looking for the class called team this is all information and things that we can use to specify exactly what we want to pull out of our web page now there are other things that didn't really look at as well in just our simple illustration let's come right over here there's things like HRS now these are hyperlinks so if we went and then clicked on this this is just regular text but inside of it is this hyperlink where if we clicked on it it would take us to another website and typically that's denoted by this hre right here then you'll typically see things like a P tag which usually stands for a paragraph now the last thing that I want to show you while we're here and we're going to learn a lot more in the next several lessons but if we come right down here there is this actual entire table here and let's try to find this table and I'm having trouble selecting the entire thing but let's select this team name and if we look at this team name you can see that this is encapsulating the table this table tag now these are super helpful because it takes in the entire table now if we wrap this up and we look just at this it says class table and then we have the end of this table tag now when we open it it's going to have all of this information so as you can see as I'm highlighting over it we have these th tags and we have these TD tags and even these TR tags which is the individual data and this is something that we'll look at when we're actually scraping all of the data from this table in a future lesson so this is how we can use HTML how we can inspect the web page and see exactly what's going on kind of under the hood and then in future lessons we'll see how we can use this HTML to specify exactly what data we want to pull out thank you guys so much for watching if you like this video be be sure to like And subscribe below I will see you in the next [Music] lesson hello everybody in this lesson we're going to be taking a look at beautiful soup and requests now these packages in Python are really useful these are the two main ones that I use when I was first starting out with web scraping it can get a lot of what you want done in order to get that information out now of course there are other packages that you can use that may be a little bit more advanced but again this is just the beginner Series in a future series we'll look at other packages as well that have some more advanced functionality so what we're going to be doing is we're going to import these packages and then we're going to get all of the HTML from our website and make sure that it's in a usable State and then in the next lesson we're going to kind of query around in the HTML kind of pick and choose exactly what we want we look at things like tags variable strings classes attributes and more so let's get started by importing our packages what we're going to say is from bs4 this is the module that we're taking it from we're going to say import and then we'll do beautiful soup then we're going to come down and we're going to say import requests now let's go ahead and run this I'm going hit shift enter and it works well for me now if this does not work for you you may potentially need to actually install bs4 so you may have to go to your terminal window and say pip install BS 4 I'll just let you Google how to do that if you need to do that cuz it's pretty easy but if you're using Jupiter notebooks through Anaconda like how we set it up at the beginning of this python series then you should be totally fine it should be there for you the next thing that we need to do is specify where we're taking this HTML from so what we need to actually do is come right over here to our web page and we need to get the URL so we're going to go here we're going to copy this URL and I'm just going to put it right here for a second and what we're going to do is we're going to be using this URL quite a bit so we just want to assign it to a variable so just say URL is equal to and then we'll put it right in here now we can get rid of that so now this is our URL going forward this is where we're going to be pulling data from let's go ahead and run this now we're going to use requests and what we're going to do is we're going to say requests.get and then we're going to put in url now this get function is going to use the request Library it's going to send a get request to that URL and it's going to return a response object let's go ahead and run this as you can see here I got a response of 200 if you got something like a 204 or a 400 or 401 or 404 all these things are potentially bad something like a 204 would mean there was no content in the actual web page 400 means a bad request so it was invalid the server couldn't process it and you don't get any response if you got a 404 that might be one that you're familiar with that's an error that means the server cannot be found the next thing that we're going to do is take the HTML now if you remember we come right back here and we inspect this we have all this HTML right here now on this web page specifically right now it's completely static it's not a bunch of moving stuff or anything like that usually when you're looking at HTML if you're looking at something like Amazon and those web pages can update but when you actually pull that into python you're basically getting a snapshot of the HTML at that time so what we're going to do is bring in all of this HTML which is our snapshot of our website and then we can take a look at it so we're going to come right down here and now we're going to say beautiful soup so now we'll use the beautiful soup package or Library so we need to say beautiful soup and we're going do an open parenthesis we're going to do two things there's two parameters that we need to put in here first we need to put in this get request we actually need to name this and we'll call this page we'll say page is equal to and let's run this and now we're going to put that page in here and what we're going to say is do text so the page is what's sending that request and then the text is what's retrieving the actual raw HTML that we're going to be using then we're going to put a comma here and what we need to specify is how we're going to parse this information now this is an HTML so what we're going to do is HTML just like this this is a standard this is already built into to this Library so we don't need to go any further but it's basically going to parse the information in an HTML format let's go ahead and run this let's see what we get and as you can see we have a lot of information and as we scroll down I'll try to point out some things that we've already looked at in previous lessons um something like this th tag that should be very similar that's the title then we have these TD tags and then of course if we scroll down even further we'll have things like ATR tag so these are all things that we looked at in that first lesson when learning about HTML now again we want to assign this to a variable so we're going to say soup that's going to say equal to this information information right here now I'm not going to go into all the history behind beautiful soup what I will say is the guy who created this beautiful soup Library uh what he said was is that it takes this really messy HTML or XML which you can also use it for and makes it into this kind of beautiful soup so I just thought that was kind of funny uh but that's why we're calling it soup right here and we're going to go ahead and run this and we'll come right down here and we'll say print soup and let's run it and now we have everything in here so we have our HTML L our head we have some HR and some links in here let's scroll down a little bit more and then we have our body right there and of course we have a bunch of information in here now in the next lesson what we're going to be doing is learning how to kind of query all of this to take specific information out and basically understand a lot of what's going on in this HTML to make sure we can actually get what we need now if this looks really kind of messy to you and it just doesn't make a lot of sense there is one more thing that I'm going to show you and we'll come right down here so we'll say soup. pry and if you've ever used a different type of programming languages uh pry is very common in a lot of them where it'll just make it a little bit more easy to visualize and see uh you'll notice that it kind of has this hierarchy built in whereas if we scroll up there's no hierarchy built in it's all just down this left hand side so if you kind of want to view it and just kind of visually see the differences this does help a lot but it doesn't actually help a lot when you're you know querying it or using you know find and find all which is what we're going to look at in the next lesson so that is our lesson on beautiful soup and requests in the next two lessons we're going to be looking at find and find all as well as really diving into things like variable strings and tags and classes and all those things and then in the last lesson we're going to do kind of this mini project where we try to get all the data from this web page that we've been using from that table and put it into a panda's data frame so thank you guys so much for watching I really appreciate it if you like this video be sure to like And subscribe below and I will see you in the next [Music] lesson hello everybody in this lesson we're going to be taking a look at find and find all really we're going to be looking at a ton of different things in this lesson this is where we really start digging in seeing how we can extract specific information from our web page but in order to do that let's set everything up where we actually bring in the HTML like we did in the last lesson and we're just going to write all this out one more time just for practice if nothing else and then we'll get into actually getting that information from the HTML so we're going to start by saying from bs4 import beautiful soup there we go and import requests we'll go ahead and run this then we're going to come up here grab our HTML or sorry our URL so we'll say URL is equal to and we'll have that right here now we need to say page is equal to and then we'll do requests.get and then we'll put in our URL right here and we're going to come over here and run this and lastly we need to say soup so we'll say soup is equal to beautiful soup there we go and then within our parentheses we need to specify the page. text because we need that and our parser which is HTML and there we go and let's go ahead and run this let's print it out make sure it's working and there we go so we have our soup right here all this should look really similar to uh our last lesson and so now we've brought in our HTML from our page we have a lot a lot a lot of information in here now really quickly let's come over and let's inspect our web page now in here we have a ton of information right we have bunch of different tags and classes and all these other things but how do we actually use these well that's where the find and find all is going to come into play and they're pretty similar and you'll see that in just a little bit but let's say we want to take uh one of these tags and let's come down let's say we just want to take this div tag now there's going to be a lot of different div tags in our HTML but let's just come right here let's go down and let's say we're going to call Soup we're going to say soup that's all of our information we're going to say do find now within our parentheses we can specify a lot of different things but we're going to keep it really simple right now we're just going to say di let's go ahead and run this what this is going to bring up is the very first div tag in our HTML and that's going to be this information right here now let's copy this and we're going to do the exact same thing except we're going to say find underscore all now let's run this now we're going to have a ton more information really all find and find all do is that they find the information now find is only going to find the first response in our HTML Le that's the div class container let's go back up to the top that's our div class container but find all is going to find all of them so it'll put it in this list for you so it's going to have this first one and it goes down to uh this SL div which should be right here and then we have a comma which separates our next div tag so that is how we can use it now what if we want to specify one of these div tags we pulled in a ton of them but we want to just look for one of them well this is something where the class comes in handy because right now we have class is equal to container classes equal to co md-12 I don't know what these are at the off the top of my head but um usually they'll be somewhat unique and we can use these to help us specify what we're looking for for example just kind of glancing of this we could also use this a tag if we wanted to look at this so we could say oh we're looking for uh these hrefs so we have an hre here and this right down here we have this hre as well which again uh if you remember from previous lesson that stands for a hyperlink now something like the class or the href um or these IDs these are all attributes so we can specify or kind of filter Down based off of these now let's try it so what we can do is we can do class first and this is kind of the default uh within something like find all is you can even do class underscore we can come right back up we have this div and then here's our class so again we have to have the div and the class if we took this a tag this is an a tag which would go right here with the class of something like navlink or something like navlink again down here we need to specify that more but we have our div so we'll say CL Cole md12 right here and let's go ahead and run this and now it's going to pull in just that information now we're still getting a list because we have multiple of these so this div class uh Cole md-12 doesn't just happen once if we scroll down we'll see it multiple times something like right here uh or actually let me see right here so here's this comma then here's our next one so we have two of these uh div tags with a class of coal- md-12 and in each of these we have different information this looks like a paragraph with this P tag right here and let's scroll back up uh so I also think we should try out doing something like this P tag typically these P tags stand for paragraphs or they have text information in them let's try to P tag really quickly and let's just see what we get and let's run this and it looks like we get multiple P tags now if we come back here you can see that there's this information and it's this information that we're pulling in and I'm just you know noticing that from right here and then we have this information right here and it looks like there's one more which is this href which looks like this open source so data via and then that uh hyperlink or that link right there so we have three different P tags now just to verify and make sure that that's correct what we could do is come over here we're going to click on this paragraph it's going to take us to that P tag where the class is equal to lead let's come over here and look at this paragraph now we have another P tag right over here with the class is equal to glyphicon glyphicon education I have no idea what that means um and then we'll go to our last one which is right here where the P tag is equal to uh we have a tag HRA class uh and a bunch of other information so let's say we just wanted to pull in this paragraph right here let's go here and see how we can specify this information so it looks like P or the class is equal to lead that looks like it's going to be unique to just that one so if we come down here we're going to say comma and it was class so you can do uh class underscore is equal to and then we're going to say lead let's try running this and we're just pulling in that information now let's say we actually want to pull in this paragraph We actually want this text right here and this is a very real use case you know let's say I'm trying to pull in some information or or a paragraph of text well let's copy this and what we're going to then do is say. text and let's run this now we're going to get an error right here and this is a very common error because we're trying to use find all unfortunately find all does not have a text attribute we actually need to change this to find typically when I'm working with these find and find alls I'm using findall most of the time until I want to start extracting text then when I specify it I'll change this back to find just like this now let's try this and now we're getting in parentheses this information now this is all wonky it needs to definitely be cleaned up a little bit but if we code back up it's no longer in a list and we no longer have things like these P tags in here or this class attribute so we're really just trying to pull out this information now again this does not look perfect we could even try to do something like do strip look like there's some white space uh that cleans it up a little bit this definitely looks a little better um and we could definitely go in here and clean this up more but just for you know an example this is how we can then extract that information now let's look at one more example this is some information and this is what we're going to do kind of our little mini project in the next lesson on let's say we wanted to take all this information what if we wanted to pull in something like the team name that's going to be in right here in this TR tag and each of these TR tags have th tags underneath them so if we scroll down you'll notice that each row is this TR tag so let's go ahead and search for let's do th let's just search for that first so let's come right back up here let's use this find all and we'll get rid of this text for right now and let's just say we want to look for the TR is that what we said we were looking for no th so let's say we're looking for th let's go ahead and run this so we're going to have underneath this th we have team name year wins losses and notice these are all the titles so these titles are the only ones with these th tags if we go down you'll notice that the data is actually TD tags so now let's go back and look for TD we'll say D and this is going to be a lot longer we have a lot of information but these are all the rows of data let's see if we can just get one piece of this data we're going to get back we want just this team name that's all we're trying to pull in for now um and then we'll try to get this row and then in the next lesson we're going to try to get all of this information make it look really nice and then we'll put it into a panda's data frame so let's just get this team name right now let's go ahead we're going to say th let's run this and we have this th and now that we know we're getting this information in we can do find let's run this so there's our team name we're just going to say. text and again we can do do strip just like that and Bam we have our team name so you can kind of start getting the idea of how we're pulling this information out we're really just specifying exactly what we're seeing in this HTML and what's really really helpful and you know something that I do all the time is I'm inspecting it I'm just kind of searching like how what do I want what piece of information do I want then I go ahead and click on it and then I'm looking you know where is this sitting in the hierarchy it's within the body it's within this table with the class of table then it's down here where this TR tag and then this TD tag so I'm looking kind of at the hierarchy and I'm specifying exactly what I'm looking for so that is what we're going to look at in today's lesson that's how we can use find and find all we were able to look at classes and tags and attributes and variable strings which is this right here getting that text uh and variable strings and we will look at find and find all and how it's pulling that information in and how we can specify exactly what we're looking for now in the next lesson which is definitely going to be the most exciting one we're going to try to pull in all of this information so every single thing because we'll be able to put all this information into a data frame which then we can use pandas to really search and manipulate that data within that data frame so with that being said that is the end of this lesson if you like this video be sure to like And subscribe I will see you in the next [Music] lesson hello everybody in this lesson we are going to be scraping data from a real website and putting it into a p and's data frame and maybe even exporting it to CSV if we're feeling a bit spicy now in the last several lessons we've been looking at this page right here and I even promised that we were going to be pulling this data but as I was building out the project I just I honestly thought it was a little bit too easy since in the last lesson we kind of already pulled out some information from this table and I want to kind of throw you guys off so we're going to be pulling from a different table we're going to be going on to Wikipedia and looking at the list of the largest companies in the United States by Revenue and we're going to be pulling all of this information so if you thought this was going to be easy in a little mini project uh it's now a full project because why not so let's get started uh what we're going to do is we're going to import beautiful soup and requests we're going to get this information and we're going to see how we can do this and it's going to get a little bit more complicated and a little bit more tricky we're going to have to you know format things properly to get it into our Panda data frame to make it looking good and making it more usable so let's go ahead and get rid of the this easy table we don't want that one uh and we're going to come in here and we're just going to start off this should look uh really familiar by now we're going to say from bs4 import beautiful soup I don't know if you've noticed but I've messed up spelling beautiful soup in every single uh video I've noticed uh let's run this and now we need to go ahead and get our URL so let's come up here let's get our URL say URL is equal to and we'll just keep it all in the same thing really quickly because we know this by Heart by now right uh we'll say request.get and then URL to make sure that we're getting that information it give us a response object um hopefully it'll be 200 that'll mean a good response and then we'll say soup is equal to and then we'll say beautiful soup and we'll do our page. text now we're pulling in the information from this URL and then we use our parser which will be oops HTML and let's go ahead and run this looks like everything went well let's print our soup now this is completely new to you it's completely new to me I don't know what I'm doing uh but it looks like we're pulling in the information am I right so we got a lot of things going for us uh the uh stuff was imported properly we got our URL we got our soup which is uh not beautiful in my opinion but let's keep on rolling let's come right down here now what we need to do is we we need to specify what data we're looking for so let's come and let's inspect this web page now the only information that we're going to want is right in here we're going to want these uh titles or these headers whoops so we're going to want rank name industry Etc and then we are for sure going to want all of this information let's just scroll down see if there's anything tricky in here all right that looks pretty good and there is another table so there's not just one table in here there are two tables in this page so that might change things for us but let's come right back and let's inspect our page by using this little button right here and let's specify in let's see if I can highlight just this page oh it's not going oh let's do that right there so now we have this uh Wiki table sorter now I'm going to actually come right here I'm going to copy and I'm just going to say copy the outer HTML just just going to paste in here real quick and that's a ton of information I didn't think it was going to copy all of it and we're just going to delete that I just wanted to keep that class uh because I wanted to then come right down here at the bottom and just see what this table uh looks like I don't know if it's part of it or if it's a if it's its own table um I can't tell let's look at this Rank and let's come up so it says uh it's under this table and it looks like it's its own table but it says Wiki table sort sortable jQuery table sorter wikip sortable jQuery table sorter so it looks like there are two tables with the same class which shouldn't be a problem if we're using find to get our text because we should be taking the first one which will be this table and this is the table we want um and if we wanted this one we could just use find all and since it's a list we could use index ing to pull this table right um but I think we're going to be okay with just pulling in this one so let's go ahead and let's do our find so we'll do soup. find and we could find all or we could just do find uh table let's just try this and see what we get and if it pulls in the right one that we're looking for that' be great now this does not look correct at all um I don't know what table it's pulling in oh maybe it's this right here this might be a table yeah it is so we have this uh box more citations so actually we are going to have to do exactly like what I was talking about uh let's pull this and we well we could do comma class uh right here and let's do both you know what this is a learning opportunity let's do both so let me go back up to the top because I need these um and what we're going to do let's come right down here I want to add in uh another thing actually I'll just push this one up there we go so we're going to say findor all let's run this so now we have multiple and again we got that weird one first but if we scroll down here's our comma and then here's our wik Wiki table sortable and then we have rank name industry all the ones that we were hoping to see and I guarantee you if you scroll all the way to the bottom um we're going to see potentially Well Fargo Goldman Sachs I'm pretty sure those are um let's see yeah here we go like Ford motor Wells Fargo Goldman Sachs that's this table right here so now we're looking at the third table but again this is a list so we can use indexing on this and we'll just choose not position zero because that's this one right here which we did not like well now we'll take position one let's run this let's go back up to the top and this is our table right here rank name industry this is the information that we were actually wanting just to confirm rank name industry Etc so this is the information we're wanting and we're able to specify that with our findall and this is the information we want so we now want to make this the only information that we're looking at so I'm just going to copy this we didn't need to use our class for this one you could probably could have um but we could so let's actually um put this right down here this will be our table we'll say equal to but then I'll come right here and I'm going to say soup. find this is just for demonstration purposes we do table comma class underscore is equal to and then we'll look at this right here whoops me do this and let's see if we get the correct output and let's run this and looks like we're getting a nun type object uh if I remember remember looks like the actual class is this right here so let's run this instead and I got to get rid of the index there we go okay so we were able to pull it in just using the find so the find table class and it says Wiki table sortable at least that's the HTML that we're pulling in right here let me go back because I don't don't know if that's what I was seeing earlier let's just get this rank let's go back up where's the rank we go rank there we go so here's our Rank and let's go up to the table and there's our class yeah and and that's just uh to me that's a little bit odd so it says Wiki table sortable jQuery Das table sorder right here but in our actual um in our actual python script that we're running it was only pulling in the wiki table sortable so it wasn't pulling in the jQuery dot sorter why uh I'm not 100% sure but all things that we're working through and we were able to uh we were able to figure out so we're going to make this our table we're going to say tables equal to uh soup. findall and let's run this and if we print out our table we have this table now this is our only data that we are looking at now the first thing that I want to get is I want to get these titles or these headers right here that's where we're going to get first so let's go in here we can just look in this information you can see that these are with these th tags and we can pull out those th tags really easily let's come right down here we're just going to say th and we can get rid of this let's run this now these are our only th tags because everything else is a TR tag for these rows of data so these th tags are pretty unique which makes it really easy which is really really great because then we can just do worldcore titles is equal to so now we have these titles but uh they're not perfect but what we're going to do is we're going to Loop through it so I'm going to say worldcore titles and I'll kind of walk through what I'm talking about isn't a list and each one is Within These th tags so th and then there's our um string that we're trying to get so we can easily take this list and use list comprehension and we can do that right down here so I'm going to keep this where we can see it um we'll do worldcore tore titles that's equal to now we'll do our list comprehension should be super easy uh we'll just say for title in worldcore titles and then what do we want we want title. text that's it um because we're just taking the text from each of these we're just looping through and we're getting rank then We're looping through getting name looping through getting industry that's that's it so let's go and print our world table titles and see if it worked and it did uh this looks like it needs to be cleaned up just a little bit so let's go ahead and do that while we're here before we actually put it into the uh P's data frame oops I just wanted uh I just wanted this actually so what we're going to do is try to get rid of those back slash ends if we do dot strip that may actually not work yeah uh because this is a list what we need to do is we can actually do it dot. text. strip right here let's try to do it in there there we go so now we have uh this and now this world tables is good to go now I'm actually noticing one thing that may be odd yeah so we have rank name industry goes to headquarters but then in here we're getting rank name industry and then the profits which is from this table right here which we don't want uh let's scroll back up let's kind of backtrack this and see where this happened we did find all table we're looking at the first one right and then we're doing [Music] headquarters uh so we're doing print table ah okay I think I found the issue here and let's backtrack again this is we're working through this together we're going to make mistakes uh the table is what we actually wanted to do we just did soup. findall th which is going to pull in that secondary table um jeez we were not thinking here um so now we need to do find all on the table not the soup because now we were looking at all of them oh what a rookie mistake okay uh let's go back now let's look at this now it's just down to headquarters okay okay let's go ahead and run this let's run this now we just have headquarters now let's run this now we are sitting pretty okay excuse my mistakes Hey listen you know if it happens to me it happens to you I promise you this is you know this is a project this a little U little project we're creating here so we're going to run into issues and that's okay we're figuring out as we go now what I want to do before we start pulling in all the data is I want to put this into our Panda's data frame we'll have the uh you know headers there for us to go so we won't have to get that later and it just makes it easier uh in general trust me so we're going to import pandas as PD let's go ahead and run this and now we're going to create our data frame so we'll say PD dot now we have these world uh table titles so what we're going to do is pd. data frame and then in here for our columns we'll say that's equal to the world table titles and let's just go ahead and say that's our data frame and call our data frame right here let's run it there we go so we were able to pull out and extract those headers and those titles of these columns we're able to put it into our data frame so we're set up and we're ready to go we're rocking and rolling the next thing we need let's go back up next thing we need is to start pulling in this data right here so we have to see how we can pull this data in now if you remember that we had those th tags those were our titles as you can see I'm highlighting over it but down here now we have these TD tags and those are all encapsulated within a TR tag so these TR represent the row right then the D represents the data within those rows so R for rows D for data so let's see how we can use that in order to get the information that we want so let's go back up here just going to take this because again we're only pulling from table not soup not soup what were we thinking um and let's go ahead and let's look at TR let's run this now when we're doing this TR these do come in with the head so we're going to have to later on we're going to have to get rid of these we don't want to pull those in um and have that as part of our data but if we scroll down there's our Walmart um we have the location these are all with these TD tags and then of course it's separated by a comma then we have our td2 so above we had our td1 so Row one row two Row three all the way down now we will easily be able to use this right because this is our column data and we can even call it that column underscore data is equal to we'll run that um and what we're going to do is we're going to Loop through that because it was all in a list so we're going to Loop through that information but instead of looking at the TR tag we're going to look at the T D tag so let's come right down here we'll say for the row in column row and we'll do a colon now we need to Loop through this we'll do something like row. findor all all and then what are we looking for we're not looking for the TR looking for the TD and just for now let's print this off see what this looks like apparently I didn't run this uh column data that's why and let's run this and what we actually need to do is something almost exactly like this and I'm going to put it right below it um instead of of printing this off because again this is all in a list we're using find all so we're we're printing off another list which isn't actually super helpful um for each of or all these data that we're pulling in what we can do is we can call this uh the rowcor data and then we'll put the row data in here so we'll say four and we'll say in row data so we'll just say for the data in row data and we'll take the data we'll exchange that and now instead of uh World Table titles we can change this into uh individual row data right and now let's print off the individual row data so it's the exact same process that we were doing up here and that's how we cleaned it up and got this and we may not need to strip but let's just run this and see what we get there we go um and strip I'm sure was helpful let's actually get rid of this yeah strip was helpful is the exact same thing that happened on the last one so let's keep that actually let's run this and now let's just kind of glance at this information let's look through it this looks exactly like the information that's in the table let's just confirm with this first one uh 25 uh two what am I saying 572 754 2.4 2300 57275 2.4 2200 so this looks exactly correct now we have to figure out a way to get this into our table because again these are all individual lists it's not like we're just you know putting all of this in at one time we can't just take the entire table and plop it into um into the data frame we need a way to kind of put this in one at a time now if you're just here for web scraping and you haven't taken like my panda series that's totally fine that's not what we're here for anyways um but what we can do we'll have our individual row data and we're going to put it in kind of one at a time time now the reason we have to do that is because when we had it like this and let's go back when we had it like this it's printing out all of it but what it's really doing and let's get rid of it um what it's really doing is it's kind of doing it like this it's printing it off one at a time and it's only going to save that current row of data this last one it's only going to save that as it's looping through so what we actually want to do is every time it Loops through we append this information onto the data data frame so as it goes through and eventually it's going to end up with this one but as it goes through let's run this as it goes through it puts this one in and then the next time it Loops through it puts this one in and the next time it Loops through Etc all the way down um so let's see how we can do this so we have our data frame right here let's get rid of this let's bring our data frame in now again like I just mentioned if you don't know pandas and you haven't learned that uh you know go take my uh series on that it's really good and we do something very similar to this in that Series so I'm not going to kind of walk through the entire logic um but there is something called Lo which stands for location when you're looking at the index on a data frame and we're going to use that to our advantage so we're going to say the length of the data frame so we're looking at how many rows are in this data frame and then we're going to say that's our length then we're going to take that length and use it when we're actually putting in this new information pretty um pretty cool so we're going to say df.loc then a bracket and we're putting in that length so we're checking the length of our data frame each time it's looping through and then we're going to put the information in the next position that's exactly what we're doing so let's go ahead and put in the individual row data um so let's just recap We're looping through this TR this is our column data so these TR that's our row of data then we're as as We're looping through it we're doing find all and looking for TD tags that's our individual data so that's our row data then we're taking that data each piece of data and we're getting out the text and we're stripping it to kind of clean it and now it's in a list for each individual row then we're looking at our current data frame which has nothing in it right now we're looking at the length of it and we're appending each row of this information into the next position so let's go ahead and run this it's working it's thinking and it looks like we got an issue canot set a row with mismatched columns now we're encountering an issue not one that I got earlier but we're going to cancel this out we're going to figure this out together so let's print off our individual row data let's look at this this one is empty uh this is I'm almost certain is probably the issue um I didn't encounter this issue when I wrote these uh when I wrote this lesson um but I'm almost certain that this is the issue right here so let's do the column data but let's start at position um let's try one and not parentheses I need brackets because this is a list right so it should work and there we go so now that first one's gone so now we just have the information I didn't even think about that um just a second ago but I'm glad we're running into it in case you ran into that uh issue let's go ahead and try this again and it looked like it worked so let's pull our data frame down I could have just wrote DF let's pull our data frame down and now this is looking fantastic now um these three dots just mean there's information in there just doesn't want to display it but it looks like we have our rank we have our name have the industry revenue revenue growth employees and headquarters for every single one so this is perfect now this is exactly what I was hoping to get now you can go in and use pandas and manipulate this and change it and you know dive into all the information in there but we can also export this into a CSV if that's what you're wanting so we could easily do that by saying we'll do DF do2 CSV and then within here we're just going to do R and specify our file path so let's come down here to our file path then we'll go to our folder for our output so we're just going to take this path and let me do it like that so I have this path in my one drive documents python webscript being folder for output so you know I already made this um and I'm just going to put this right down here now I do have to specify what we're going to call this um we'll just call this companies and then we have to say CSV that is very important now if we run this I already know just because uh we have this Rank and this index here we're going to keep this index in the output not great uh but let's run it let's look at our output there's our companies and when we pull this up as you can see this is not what we want because we have this extra thing right here now if we're automating this this would get super annoying so what we're going to do is go back and just say index equals false let's go out of here and now we're just going to come right down here we're going to say comma index equals false and so it's going to take this index and it's not going to import or actually export it into the CSV now let's go ahead and run this let's pull up our folder one more time and let's refresh just to make sure should be good and now this looks a lot better so we're able to take all of that information and put it into a CSV and it's all there so this is the whole project so if we scroll all the way back up let's just kind of glance at what we did here scroll down we brought in our libraries and packages we specified our URL we brought in our soup um and then we tried to find our table now that took a little bit of uh testing out but we knew that the table was the second one so in position one so we took that table we were also able to specify it using find but then we used the class and of course we just wanted to work with that table that's all the data we wanted so we specifi this is our table and we worked with just our table going forward of course uh we encountered some small issues user errors on my end but we were able to get our world titles and we put those into our data frame right here using pandas then next we went back and we got all the row data and the individual data from those rows and we put it into our Panda data frame then we came below and we exported this into an actual CSV file so that is how we can use webs scraping to get data from something like a table and put it into a panda data frame I hope that this lesson was helpful I know we encountered some issues that's on my end and I apologize but if you run into those same issues hopefully that helped uh but I hope this was helpful and if you like this be sure to like And subscribe below I appreciate you I love you and I will see you in the next [Music] lesson so the first thing that we need to do is import our Panda's Library so we're going to say import we're going to say pandas now this will import the pandas library but it's pretty common place to give it an alias and as a standard when using pandas people will say as PD so this is just a quick Alias that you can use uh that's what I always use and I've always used it because that's how I learned it and I want to teach it to you the right way so that's how we're going to do it in this video so let's hit shift enter now that that is imported we can start reading in our files now right down here I'm going to open up my file explorer and we have several different types of files in here we have CSV files text files Json files and an Excel worksheet which is a little bit different than a CSV so we're going to import all of those I'm going to show you how to import it as well as some of the different things that you need to be aware of when you're importing so we're going to import some of those different file types and I'll show you how to do that within pandas so the first thing that we need to say is PD Dot and let's read in a CSV because that's a pretty common one we'll say read CSV and this is liter literally all you have to write in order to call it in now it's not going to call it in as a string like it would in one of our previous videos if you're just using the regular operating system of python when you're using pandas it calls it in as a data frame and I'll talk about some of the nuances of that so let's go down to our file explorer we have this countries of the world CSV you just need to click on it and rightclick and copy as path and that's literally going to copy that file path for us you don't have to type it out manually you can if You' like and we're just going to paste it in between these parentheses now if we run it right now it will not work I'll do that for you it's saying we have this Unicode error uh basically what's happening is is it's reading in these backs slashes and this colon and all those back clashes in there and this period at the end what we need to do is read this in as a raw text so we're just going to say R and now it's going to read this as a literal string or a literal value and not as you know with all these back slashes which does make a big difference when we run this it's going to populate our very first data frame so let's go ahead and run it and now we have this CSV in here with our country and our region now if we go and pull up this file and let's do that really quickly let's bring up this country's of the world it automatically populated those headers for us in the data frame but we don't have any column for those 0 1 2 3 so if we go back as you can see right here there's this index and that's really important in a data frame it's really makes a data frame a data frame and we use index a lot in pandas we're able to filter on the index search on the index and a lot of other things which I'll show you in future videos but this is basically how you read in a file now if we go right up here in between these parentheses and we hit shift tab this is going to come up for us let's hit this plus button and what this is is these are all the arguments or all the things that we can specify when we're reading in a file and there are a lot of different opts options so let's go ahead and take a look really quickly really quickly I wanted to give a huge shout out to the sponsor of this entire Panda series and that is udemy udemy has some of the best courses at the best prices and it is no exception when it comes to pandas courses if you want to master pandas this is the course that I would recommend it's going to teach you just about everything you need to know about pandas so huge shout out to you me for sponsoring this Panda series and let's get back to the video the first thing is obviously the file path we can specify a separator which there is no default so when we're pulling in this CSV when we're reading in the CSV it's automatically going to assume it's a comma CU it's a comma separated uh file you can choose delimers headers names index columns and a lot of other things as you can see right here now I will say that I don't use almost any of these uh the few that I'm going to show you really quickly in just a second are up the very top but you can do a ton of different things and I'm just going to slowly go through them so that's what those are you can also go down here this is our dock string and you can see exactly how these parameters work it'll show you and give you a text and walk you through how to do this again most of these you'll probably never use but things like a separator could actually be useful and things like a header could be useful because it is possible that you want to either rename your headers or you don't have a header in your CSV and you don't want it to autop populate that header so that is something that you can specify so for example this header one I'll show you how to do this uh the default behaviors is to infer that there are column names if no names are passed this behavior is identical to header equals zero so it's saying that first row or that first index which is like right here that zero is going to be read in as a header but we can come right over here and we'll do comma header is equal to and we can say none and as you can see there are no headers now instead it's another index so we have index indexes on both the x- axis and the Y AIS and so right now we have this zero and one index indicating the First Column and the second column if we want to specify those names we can say the header equals none then we can say names is equal to and we'll give it a list and so the first one was country and what's that second one oh region so right here that's the first um the first row but we'll rename it and we'll just say country region and when we run that we've now populated the country and the region uh we're just pretending that our CSV does not have these values in it and we have to name it ourselves that's how you do it but let's get rid of all that because we actually do want those in there so we're just going to get rid of those and read it in as normal and there we go now typically when you're reading in a file what you need to do is you want to assign that to a variable almost always when you see any tutorial or anybody online or even when you're actually working people will say DF is equal to DF stands for data frame again this is a data frame in the next video in this series I'm going to walk through what a series is as well as what a data frame is because that's pretty important to know when you're working with these data frames but we'll assign it to this value and then we'll say we'll call it by saying DF and we'll run it and that's typically how you'll do things because you want to save this data frame so later on you can do things like data frame Dot and you can uh you know pass in different modules but you can't really do that it's not as easy to do it if you're calling this entire CSV and importing it every time so let's copy this because now we're going to import a different type of file so now we've been doing read CSV but we can also import text files now you can do that with the read CSV we can import text files let's look at this one we have the same one it's countries of the world except now it's a text file because I just converted it for this video I'll copy that as a path and so now when we do this oops let me get those quotes in there it'll say world. txt it will still work as you can see this did not import properly um we have this country back SLT region and then all of our values are the exact same with this back SLT that's because we need to use a separator and I'll show you in just a little bit how we can do this in a different way but with that read CSV this is how we can do it we'll just say sep is equal to we need to do back SLT now let's try running this and as you can see it now has it broken out into country and region we could also do it the more proper way and this is the way you should do it and I'll get rid of these really quickly but just want to keep them there in case you want to see that but you can also do read table and let's get rid of this separator and now we have no separators just reading it in as a table let's run this and it reads it in proper L the first time this read table can be used for tons of different data types but typically I've been using it for like text files um we can also read in that CSV so let's change this right here to CSV we can read it in as a CSV but just like we did in the last one when we read in the text file using read CSV this read table you're going to need to specify the separator so I'll just copy this and we'll say comma and now it reads it in properly again you can use that for a ton of different file types but you just need to specify a few more things if you don't want to use the more specific read uncore function when you're using pandas now let's copy this again we're going to go right down here and now let's do Json files Json files usually hold semi structured data um which is definitely different than very structured data like a CSV where has columns and rows so let's go to our file explorer we have this Json sample we will copy this in as the path let's paste it right here and we'll do reor Json again these different functions were built out specifically for these file types that's why you know each one has a different name so now we're reading this in as the Json let's read it in and it read it in properly now let's go ahead and copy this and take a look at Excel files because Excel files are a little bit different than other ones that we've looked at um so let's just do read uncore Excel and let's go down to our file explorer and let's actually open up this workbook as you can see we have sheet one right here but we also have this world population which has a lot more data let's say we just wanted to read in sheet one we can do that or by default it's going to read in this world population because it's the first sheet in the Excel file well let's go ahead and take a look at that let's get out of here and and let's say oops I forgot to copy the file path let's go ahead and copy as path and we'll put it right here and let's just read it in with no arguments or anything in there or no parameters when we read it in it's reading in that very first sheet so this is the one that has all of the data now let's say we wanted to read in that extra sheet name or the second sheet name we'll just go comma sheet undor name say is equal to and then we can specify sheet was it sheet one like this yes it was so we just had to specify the sheet name right here and then it brought in that sheet instead of the default which is the very first sheet in that Excel now that definitely covers a lot of how you read in those files again you can come in here and hit shift Tab and this plus sign and take a look at all the documentation and you can specify a lot of different things things that I didn't think were very important for you guys to know especially if you're just starting out the ones that we looked at today are what I would say are like the ones that I use almost all the time so I wanted to show you those but if you're interested in any of these other ones or you have very unique data and you need to do that um you know it's worth really getting in here and figuring things out a few other things that I wanted to show you just in this kind of first video or this intro video on how to read in files um one thing that you may have noticed especially in this file right here is we're only looking at the first five and then the last five so if we wanted to see all the data all the data is in these like little three dots right here right we want to be able to see that data but right now we can't and that's because of some settings that are already within pandas and all we need to do is change that so this one has 234 rows and four columns so obviously we can see all the columns well let's just change the rows all we'll say is pd. set uncore option now what we need to do is we're going to change the rows we're not going to change the columns at least not on this one so we'll say quote display. max. rows now if we just run this for whatever data we bring in it's going to be able to show the max rows and then we'll say 235 although there's 234 rows I'm just going to be safe let's run this and now it has changed it so let's read in this file again and you'll see how it's changed now we have all the numbers and we have this little bar on the right that allows us to go down all the way to the bottom and all the way to the top so now we can actually look and kind of skim and see our values I like that better than just having that you know shorter version um we can do the exact same thing on columns as well so if we look at this one this is our Json file has the same thing right here we have what was it 38 columns but we can only see I think it's maybe it's 20 or something like that I can't remember um but we have 38 we can only see like let's say 15 of them or 20 of them we'll do the exact same thing and we'll just say pd. set options. max. columns and we'll set that to 40 for that one when we run this oops let's get over here when we run this one again we can now scroll over and see every single one of our columns now that one is a in my opinion a lot more useful I like being able to see every single column so definitely something that you should be using especially when you have these really large files you want to be able to see a lot of the data and a lot of the columns so when you're slicing and dicing and doing all the things that we're about to learn in this Panda series you know you know what you're looking at I also want to show you just how to kind of look at your data in these data frames as well that's also pretty important so let's go right down here and the very last one that we imported was this one right here this read Excel so this data frame is the only one that's going going to read in let's run it um this is the last one to be run so this variable right here DF uh it won't be applied to all these other ones which we can always go back and change those typically you'll do something like data frame two you want to do something like that um so let's keep data Frame 2 oops so what we're going to do is we're going to bring data Frame 2 right down here and we want to take a look at some of this data we want to know a little bit more about it something that you can do is dataframe 2. info and we'll do an open parenthesis and when we run this it's going to give us a really quick breakdown of a little bit of our data so we have our columns right here rank CCA 3 country and capital it's saying we have 234 values in those columns because there's 234 scroll up here because there's 234 uh rows that tells me that there's no missing data in here at least not you know completely missing like null values there is something something in each of those rows the count tells me it's non- null so there's no null values and it tells me the data type so it's ringing in as an integer an object an object and an object and it also tells us how much memory it's using which is also pretty neat because when you get really really large data types memory usage and and knowing how to work around that stuff does become more important than when you're working at these really small You Know sample sizes that we're looking at we can also do oops let me get rid of that can also do data frame two and we'll do shape and for this one we do not need the parentheses and all this is going to tell us is we have 234 rows and four columns we're also able to look at uh the first few values or rows in each of these data frames so we can just say data frame 2. head and if we do that it's going to give us the first five values but we can specify how many we want we can say head 10 it'll give us the first 10 rows right here we can do the exact same thing and let's go right down here and we'll say tail so they'll give us the last 10 rows within our data frame now let's copy this and let's say we don't want to actually look at all of these values or all these columns we can specify that by saying df2 and oops let's get rid of all of this and we'll say with a quote we'll say Rank and now we can take just a look at the rank data now we can't do that by doing the index or at least not like this if we want to use this index that is right here we can but there's a very special function called Lo and IO for that and I'm going to have an entire video on this because it does get a little bit more complex but there's df2 and there's Lo and I stands for location and I location that's only for the indexes whether it's the x axis or the Y AIS those are the indexes and for location it's looking for the actual text the actual string of the index so if we come up here that data Frame 2 we can specify 224 and it'll give us this information right here in a little different format so let's go bracket and we'll say 224 and when we run this it gives us our rank CCA country capital with our values over here kind of like a dictionary almost now let's copy this and we'll say df2 do IO and right now these look the exact same but we haven't really talked a lot about changing the index and you can change the index to a string or a different column or something like that and we'll look at that in future videos the iock looks at the integer location so even if these um let's go right up here even if this index had changed to let's say this rank or this CCA 3 or country or whatever you make this index the ILO will still look at the integer location so that 224 would still be 224 even if it was usbekistan so then when we look at this it's going to be the exact same but if we had changed that Index this Lo is the one that we could search on and we could search usuzan is that how you spell usbekistan hey I nailed it so that is how you use Lo and IO again I just wanted to show you a little bit about how you can look at your data frame or search within your data frame now in future videos I'm going to dive a lot deeper into a lot of the concepts that we just looked at because I just kind of touched on them I wanted you to have a brief introduction to them so that in future videos I'm not just dropping everything on you all at once so hopefully this was a good quick introduction to those topics uh you should be able to read in a file now see your data frame and kind of look at it in a few different ways that we just looked at and I hope that that was helpful and if it was be sure to check out all my other videos on Python and pandas and if you like this video be sure to like And subscribe below and I will see you in the next [Music] video [Music] hello everybody today we're going to be looking at filtering and ordering data frames in pandas there are a lot of different ways you can filter and order your data in pandas and I'm going to try to show you all of the main ways that you can do that so let's kick it off by importing our data set so we're going to say data frame is equal to and we'll say pandas and I need to import my pandas so we'll say import pandas as p that's pretty important I think um so pd. read CSV and we'll do R and then we'll say the world population CSV so let's run this all our data frame right here and this is the data frame that we're going to be filtering through and ordering in pandas so let's kick it off the first thing that we can do is filter based off of The Columns so the data within our columns so Asia Europe Africa or whatever data we may have in that column let's go right down here we're going to say DF and then within it we're going to specify what column we're going to be filtering on so we're going to say DF with another bracket and we'll say rank so we're going to be looking at this rank column right here and we'll say in that rank column we want to do greater than 10 and that's actually going to be a lot of them let's do less than so when we run this it's only going to return these values that are less than 10 we can also do less than equal to you know all of these um comparison operators so less than or equal to so now we have all of the ranks 1 through 10 now if we look at these countries we can specify by specific values almost exactly like we did here but instead of doing a comparison operator like we did right here and including those names let's say Bangladesh and Brazil we can use the is in function almost like an in function in SQL if you know SQL so let's go right down here and we're going to say specific underscore countries so right now we're just going to make a list of the countries that we want and then we'll say Bangladesh and Brazil so let's go right down here and we'll say okay for these specific countries from the data frame let's do our bracket we'll say in this country column so we'll do data frame and then another bracket for country so in this country column we can do do is in and then an open parenthesis and then look for our specific countries so we're looking at just this column and we're saying is in so we're looking at are these values within this column and we're getting this error and this looks very very odd let me um this doesn't look right there we go I just had some syntax errors I apologize made it way more complicated than it needs to be but here's how you use this is in function so we're looking at Bangladesh and Brazil and we return those rows with Bangladesh and Brazil really quickly I wanted to give a huge shout out to the sponsor of this entire Panda series and that is udemy udemy has some of the best courses at the best prices and it is no exception when it comes to pandas courses if you want to master pandas this is the course that I would recommend it's going to teach you just about everything you need to know about pandas so huge shout out to UD me for sponsoring this Panda series and let's get back to the video we can also do a contains function kind of similar to is in except it's more like the like in SQL as well I'm comparing a lot of this to SQL CU When You're filtering things I always my brain always goes to SQL but in pandas it's called the contains so let's do let's actually copy this because I don't want to make the same mistake again let's do that and we'll do the bracket but instead of dot is in we're going to do string. contains and then an open parenthesis so we're going to going to be looking for a string if it contain if it contains let's do United almost like United States or or any other United so let's run this and as you can see we have United Arab Emirates United Kingdom United States United States Virgin Islands so we can kind of search for a specific string or a number or a value within our data or within that column of country now so far we've only been looking at how you can filter on these columns we can also fil filter based off of the index as well and there's two different ways you can do it or two of the main ways there's filter and then there's L and IO Lo stands for location and IO stands for integer location and if you've seen other previous videos I've kind of mentioned those so we can take a quick look at all of those so really quickly we need to set an index because the index right now is uh not the best we'll set our index to Country so let's say df2 is equal to DF do setor index and we'll say country I'm just doing df2 because later on I want to use that data frame again so I'm just going to assign it to another data frame so that we can just easily switch back and forth so now we have this index as the country and what we can do is use the filter function so let's go down here we'll say df2 filter and we'll do an open parenthesis and now we can specify our items so these are actually going to be specifying which columns we want to keep so we're going to say items is equal to then we'll make a list we'll say continent hope that's how we spell continent I'm always messing up with my uh my stuff here my spelling then we'll do CCA 3 because why not you can specify whichever ones you want when we run this it's going to only bring in those two columns Now by default it's choosing the access for us but we can also specify which axis we want to search on so if we say axis is equal to zero it's actually going to search this axis this is the zero axis this is the one axis so where our columns are is one so if we go back and do one we're searching on that one Axis or those header axises again and this is the default but you can specify that so if you just want to search on uh you know filtering right here you can do that and let's actually copy this and do that right down here just you can see what it looks like but let's search for Zimbabwe and we'll do Zimbabwe and we'll be looking at the zero axis which is the up and down on the left hand side and when we filter on that we can filter by Zimbabwe by looking just at the country index we can also use the like just like we did before and I'll show you the exact same demonstration that we did which you can say like is equal to and instead of having to put in a concrete um text text you can just say United just like we did before and we're searching where the axis is equal to zero which again is this left-handed access so now we're looking for United and it's going to give us all of the countries or all the indexed values that have United in it like we were talking about before we also have l and ILO so we can say data frame 2. L now this is a specific value so we'll do United States so location is just looking at the actual name or the value of it not its position so if we search for United States it's going to give us this right here where it gives us all of the columns for United States and then all of the uh values for United States or we can do the io which is the energ location which is not the exact same because we're looking at the string for the L we're looking at this string but underneath it there still is a position that's that integer location let's do a completely random one let's just say three if we look at the third position it's going to give us ASM which I'm not exactly sure what it is but it still gives us basically the same kind of output which is the columns and the values so that's another way that you can search within your index when you're actually trying to filter down that data now let's go look at the order bu and let's start with the very first one that we looked at let's do data frame that's why I kept it because I wanted to use it later now we can sort and order these values instead of it just being kind of a jumbled mess in here we can sort these columns however we would like ascending descending multiple columns single columns and let's look at how to do that so we'll say data frame and then we'll do data frame look at rank again just like we were doing above and let's do data frame where it's less than 10 I should have just gone and copyed this I apologize so now we have this data frame that is greater than 10 now we can do do sortore values and this is the function that's going to allow us to sort everything that we want to sort so we can do buy is equal to and we'll just order it by the exact same thing that we were doing or calling it on we'll do rank so now what this is going to do it's going to order our rank column and as you can see it did that one 2 3 4 5 we can also do it with ascending or descending so if you want to you can look here and see what you can do so we'll do ascending we'll say that's equal to true and so that's the automatic default so that didn't change anything but if we say false it's going to be descending from highest to lowest so now we have it in the opposite direction now we don't have to just order or sort this on one single column we can do multiple columns and we can do that by making a list right here whoops make a list just like that and we'll input different ones as well so now let's input our country and when we run this it will give us rank of 9876 as well as the country of Russia Bangladesh Brazil now if you noticed the country really didn't change because the rank stayed the exact same that's because there's an order of importance here and it starts with the very first one if we change this around and we look at this one and put a com right here now the country is going to be descended and the rank would come second so it's not going the rank isn't going to really have any effect here so now we have the country United States Russia Pakistan and the rank really didn't get ordered at all now if we want to see how that can actually work let's do continent right here and actually put it right here and do country here so if we run this it's first going to come and it's going to organize or sort the continent then it's going to come come back and go to the country and then it's going to sort the country so keep so keep your eye right here in this Asia area because we're going to sort this differently than ascending so we have ascending false and that applies to both of these it's false and false but we can specify which one we want to do we can do a false here and a true here so we'll do false comma true and what this is going to do is it's going to say false for the continent so the continent right here is going to stay the exact same and so that is a lot of how you can filter and order your data within pandas I hope that this was helpful I hope that you enjoyed this video if you liked it be sure to like And subscribe below check out all my other videos on Python and pandas and I will see you in the next [Music] video hello everybody today we're going to be looking at indexing and pandas if you remember from previous videos the index is an object that stores the access labels for all Panda objects the index in a data frame is extremely useful because it's customizable and you can also search and filter based off of that index in this video we're going to talk all about indexing how you can change the index and customize that as well as how you can search and filter on that index and then we're also going to be looking at something a little bit more advanced called multi- indexing and you won't always use it but it's really good to know in case you come across a data frame that has that so let's get started by importing pandas import pandas as PD now we'll get our first data frame we say DF is equal to pd. read CSV and I've already copied this but we're going to do R and we're going to put this file path so I have this world population CSV I will have that in the description just like I do in all of my other videos let's run DF and let's take a look at this data frame so we have a lot of information here we have rank country continent population as well as the default index from zero all the way up to 233 now if you haven't watched any of my previous videos on pandas the index is pretty important and it's basically just a number or a label for each row it doesn't even necessarily have to be a unique number um you can create or add an index yourself if you want to and it doesn't have to be unique but it it really should be unique uh especially if you want to use it appropriately for what we're doing the country is actually going to be a pretty great index because the country you know is going to be all unique because we're looking at every single row as a different um country as well as the population so let's go ahead and create this country or add this country as our index now we can do this in a lot of different ways but the first way that you can do this if you already know what you are going to create that index on is we can just go right in here when we're reading in this file and we'll say comma index underscore oops I I spelled that completely wrong index uncore column and we'll say that is equal to and then we're going to say quote country so we're taking this country and we're going to assign it as the index now let's read this in and as you can see this is our index now it looks a little bit different we didn't have this country header right here which is specifying that this is still the country but you can tell that this is the index based off the um bold letters as well as it being on the far left and all the regular columns for the data is over here while the country header is right here and it's lower than all the others just a quick way that you can see that that is the index now before we move on I want to show you some other ways that you can do this as well but I'm going to show you how to reverse this index before we move on and we'll say data frame so we had our data frame right here so we have data frame dot we'll say reset index and then we'll say in place is equal to True which means we don't have to assign this to another variable and all that stuff it'll just be true so now when we run that data frame again the index was reset to the default numbers so now let's go down here I'll show you how to do this in a different way you can do DF do we'll say setor index and then we'll just say country so very similar to when we were reading in that file and we said set the index or that index column we said index column equals country if we do this and we run it in it works but if we say data frame right down here it's not going to save that if we want to save it just like we did above we're going to say in place is equal to true that is going to save it to where we don't have to assign it another variable so now when we run this the data frame right here which is going to populate this the data frame is going to say in place is equal to true so that country will now be our index again let's run this and there we go really quickly I wanted to give a huge shout out to the sponsor of this entire panda series and that is udemy udemy has some of the best courses at the best prices and it is no exception when it comes to pandas courses if you want to master pandas this is the course that I would recommend it's going to teach you just about everything you need to know about pandas so huge shout out to UD to me for sponsoring this Panda series and let's get back to the video now what's really great about this index is we're able to search based off just this index and so we can filter on it and basically look through our data with it and there are two different ways that you can do that at least this is a very common way that people who use pandas we'll do to kind of search through that index the first one is called lock and there's lock and iock that stands for location or integer location let's look at lock first let's say df.loc and then we'll do a bracket now we're able to specify the actual string the label so let's go right up here and let's say Albania so we'll say Albania so again this is just looking at the location let's run this now it's going to bring up all the Albania data just like here where it's kind of looks like a column in a column and we can get this exact same data but using iock right here and when we ran lock we were searching based off Albania which is in the 01 position so if we actually pull the one position for that integer the iock we can look at the one position and this should give us the exact same data now let's take a look at multi- indexing and we'll come back to a little bit of this in a second so multi- indexing is creating multiple indexes we're not just going to create the country as the index now we're going to add an additional index on top of that so let's pull up our data frame right now we have the country but let's do do reset index and we'll say in place equals true oops let's run it so now we have our data frame now let's set our index but this time when we set our index we're going to add the country as the index as well as the continent as an index so we'll say data frame. setor index then we'll do a parenthesis and instead of just doing country like we did before we're going to create a list oops and we'll do it like that and then we'll say oops continent and separate it by a comma so we have continents and Country let's just say in place is equal to true now when we run this we're going to have two indexes and let's see what this looks like and let's run this so now we have country as well as continent as our index now you may notice that these indexes are repeating themselves on this continent index we have Europe right here and Europe right here as well as Asia and Asia and it looks a little bit funky but we are able to sort these values and make they look a lot better so let's go ahead and try this we'll do DF do sortore index and when we run this it should sort our index alphabetically and we can also look in here and see what kind of things we can you know specify we can specify the axis but it's automatically going to be looking at the zero this is zero and this is one so we have two axes within our data frame you can choose the level whether it's ascending or not ascending in place kind string sort remaining all of these different things the only one that I really you know think is worth looking at is the ascending we already know some of these other ones but if we look at ascending let's run it now it's sorted these and so now it's kind of grouped together so we have Africa and all the African ones as well as South America and all the South American ones let's really quickly say pd. setor option and we'll say display. max. columns and just like this let's run it and I need to specify whoops specify right here let's see how many rows we have 235 so let's do 235 let's run this and now when we run this you can see that Africa is all grouped together and all the countries are in alphabetical order under it and then we go all the way down to Asia and again just all in alphabetical order if we wanted to we could say ascending equals true and then when we run this oh I meant to say false and then when we run this it's the exact opposite so it starts with South America the last one and then goes in reverse alphabetical order we could also say false make it a list and do comma true and just like this and then it would sort this First Column as false and this next column as true so you can really customize it but you know for what we're doing we don't need any of that we just need to be able to see this right here so now when we try to search by our index like we did before we did data frame. Loke now when we did that and we said you know let's say Angola when we specified Angola it's not going to work properly because it's searching in this first index for the first string that we have we can search Africa let's search for Africa and now we have all of the African countries and if we want to specify to Angola we can also go down another level oops by doing Ang Angola and now we have what we were looking at before where we're calling all the data within those but we couldn't do it just based off Africa because we had an additional Index right here so once we called both indexes now we get this view but let's look at that I look really quick when we run this let's just say one because right up here oh we have Angola zero and then one so you think it may pull up Angola let's go ahead and run this and it's still pulling up Albania let's go right up here if you remember when we didn't have the multiple indexes it was pulling up Albania the difference when you're doing these multi- indexes is that the the L is able to specify this whereas this one does not go based off that multi- indexing it's going to go based off the initial index or the integer based index so that's a lot about indexing in pandas we'll cover even a few more things in future videos as we get more and more into pandas but this is a lot of what indexing looks like within pandas and again super important to learn how to do and know how to do because it's a pretty important building block as we go through this Panda series so I hope you enjoyed this video on indexing if you did be sure to like And subscribe below and I will see you in the next [Music] video hello everybody today we're going to be taking a look at the group by function and aggregating within panas group I is going to group together the values in a column and display them all on the same row and this allows you to perform aggregate functions on those groupings so let's start reading in our data and take a look so we're going to do import pandas as PD and then we're going to say our data frame is equal to and we'll say pd. read CSV we'll do an open parenthesis R and our file path and we're going to be looking at the flavors CSV right here so right here we have our flavor of ice cream we have our base flavor flavor whether it was vanilla or chocolate whether I liked it or not the flavor rating texture rating and its overall or its total rating now these are all my own personal scores so you know I've spent years researching this so these are all very accurate but this should be a low stress environment to learn Group by and the aggregate functions so the first thing that we can do is look at our group by now you can't Group by well you can you can Group by flavor but as you can see these are all unique values what we need is something that has duplicate values or or similar values on different rows that'll group together so this base flavor is actually a perfect one to group it on and we'll do that by saying DF do group by do an open parenthesis and we'll just specify base flavor and this will then group together those values and I need to make sure I can spell properly this will group those flavors together so let's run this and as you can see it actually is its own object so it has a group by data frame Group by object so now that we've grouped them let's give it a variable so we'll say group underscore byor frame let's say that's equal to Let's copy this we'll run it and now what we need to do is run our aggregations in order to get an output so we're going to say mean and that's all we're going to put just for now just to get an output that we can take a look off and then we'll build from there so let's go ahead and run this and right here we have our base flavor which is now saying is the index of chocolate or vanilla and then it's taking the mean or the average of all the columns that have integers notice that it did not take the liked column and it did not take the flavor column because those are strings and they cannot aggregate those and we'll take a look at that later but it took all the values that have integers and then it gave us the average of those ratings really quickly I wanted to give a huge shout out to the sponsor of this entire Panda series and that is udemy udemy has some of the best courses at the best prices and it is no exception when it comes to pandas courses if you want to master pandas this is the course that I would recommend it's going to teach you just about everything you need to know about pandas so huge shout out to UD me for sponsoring this Panda series and let's get back to the video so right off the bat as averages with chocolate I have a much higher rating overall than the ones with vanilla bases now we can actually combine all of this together into one line and we can do something like this so we'll say DF do group by and we'll say mean just like this and this will actually run it before we didn't have any aggregating function on there so it didn't run but now that we combine it all into one it will run properly now there are a lot of different aggregate functions but I'm going to show you some of the most popular ones or the most common ones that you will see so let's copy this right here so we can do dot count and when we run this we can look at the count and this will show us the actual count of the rows that were aggregated so for chocolate we have three so there going to be three all the way across and for vanilla we had six so we're looking at a higher count of vanilla which if you're comparing it to this mean up here that could be a big skew towards the chocolate because if you have one or two good chocolates it could really pull the numbers up whereas if you had two good vanillas but all the other ones were bad it pulls that average down so knowing the count of something something is really good let's take a look at the next one and we can do Min and Max and I'll just run these really quickly we can do Min and when we run this the first thing that you should notice is that it now has a flavor and a liked column and that's because Min and Max will actually look at the first letter in the string or the first set of letters if there are um you know chocolate something it'll look at the first and then it'll actually populate it so chocolate with the CH chocolate is the very first or the minimum value for that string and for a cake batter that is the minimum value in vanilla as well now with the liked it's interesting because apparently I liked all the chocolate ones I'm going to go take a look so chocolate I liked chocolate I liked chocolate I lik so there is no no option in this liked column so yes was the only option and now let's look at Max whoops and it should do the exact opposite which is going to take the highest value even if it's a string so Rocky Road the letter r comes later in the alphabet so that's what it's looking at and so does vanilla and then we have yes as well and then of course right here it's taking the max value so before when we were looking at Min I just focused on those but it still does the exact same thing to these integer um columns as well so for the max value for vanilla it was mint chocolate chip that was our base so I had a rating of 10 for this vanilla row or grouping and then we can also look at the sum and there are all the sums for these and again it only does integer because we can't add the strings here are the sum or the total values for all of them and for the total values since we had you know six rows that were grouping into this vanilla we now have a lot of a much higher score for vanilla now that's a really simple way to do your aggregations but there is actually an aggregation function and let's take a look at this CU this is um a little bit more complex although when I write it out or show you hope it makes a lot of sense we can do a so this is our aggregate function and what we need to pass into our aggregate function is actually a dictionary so let's do an open parenthesis and we're going to do a squiggly bracket and then we need to specify what we're going to be aggregating on or what column so let's do this flavor rating let's copy this we'll do flavor rating and I need to put that as a string and then we'll do a colon and now we can specify what what aggregate functions we want so we've done sum count mean Min and Max all of those and we can actually put all of those into here and perform all of those aggregations on just one column so let's make a list and then let's say mean Max count and uh what's another one sum so let's do all four of those only on this flavor rating column and when we run this we have our base flavor right here chocolate and vanilla but now we don't have multiple columns we have one column with multiple Columns of our aggregations and it is possible to pass in multiple columns like that so we'll do texture rating and we'll just come right here and do a comma then we'll say uh uh texture rating and then a colon I don't know why I spelled it out when I copied it but I did and then we'll do the exact same ones and now when we run it we're getting the exact same columns mean Max count and sum for flavor rating then mean Max count and sum for our texture rating now so far we've only grouped on one column but we can actually group on multiple columns let's go back up here to our data and I should have just copied this down here let's go back down and just look at this so really we only grouped it on this base flavor but you can do multiple groupings or group by multiple columns so let's do our base flavor which we did already as well as the liked column so we're going to say DF dog Group by then we'll do an open parenthesis and then instead of just passing through one string we're going to do a list and we'll say base flavor oops comma and then we'll do liked so now when it groups this it should put put two groupings and let's run this and just see oops I got to say let's just do mean so now we have our chocolate and a vanilla and remember chocolate only had yes so that's the only one that it's going to group on but vanilla had a no and a yes so if we look at the vanilla we have our base flavor vanilla and then within liked we have no and a yes which can show us that within our vanilla when we group on these our NOS were really low but our yeses were really high we actually had a pretty similar rating or very close to the same rating as the ones we really liked in chocolate and just like we did above we can take this doag and I'm going to copy this and it'll perform it on each of those rows let me close that and what did I do wrong oh I need the squiggly bracket and it'll show us each of those so the mean Max count and sum for all of the chocolate and vanilla as well as the groupings of light yes and no now after we've looked at all that and that's how I usually do it there is one uh shortcut function that can give you some of these things just really quickly and so let's go back up here and take this it's just called describe um and if you've ever done it it's just going to give you some high level overview of some of those different aggregations so let's run this and it's going to give us our chocolate and vanilla and within each column it's going to give us our count our mean our standard deviation I believe is what that is our minimum 25% 50 75 and 100 which is our Max then our count and our means so a lot of those aggregate functions but the describe is you know a very generalized um function we can't get as specific as we were with the previous ones that we were looking at but I just wanted to throw this out there in case this is something that you'd be interested in because it you know technically is showing a lot of those aggregate functions just you know all at one time so that is our group Buy and aggregate functions within pandas I hope that that was helpful I hope that you understood you know everything that we were working on if you like this video be sure to like And subscribe and check out all my other videos on python as well as pandas and I will see you in the next [Music] video hello everybody today we're going to be talking about merging joining and concatenating data frames in p do this whole video is basically around being able to combine two separate data frames together into one data frame these are really important to understand when we're actually using the merge and the join right here we have what's called an inner join and the Shaded part is what's going to be returned it's only the things that are in both the left and the right data frames then we have an outer join or a full outer join and this will take all the data from the left data frame and the right data frame and everything that is similar so basically just takes everything we also have a left join which is going to take everything from the left and then if there's anything that's similar it'll also include that and then the exact opposite of that is the right join which is going to give us everything from the right data frame and it's going to give us everything that is similar but it's not going to give us anything that is just unique to the left data frame so this is just for reference because in a little bit when we start merging these these become very important so I just wanted to kind of show you how that works visually so let's get started by pulling in our files so first we're going to say import and is aspd we'll run this and then we'll say data frame one and we'll also have a data frame two and these are the different data frames the left and the right data frame that we'll be using to join merge and concatenate so we'll say data frame 1 is equal to pd. CSV read and we'll do R and here is our file path so we have this lr. CSV that's our Lord of the Rings CSV and let's call that really quickly so we can see what's in there and I'm having a dyslexic moment uh because it's supposed to be reor CSV uh I apologize for that but this is our data frame this is our data frame one we have three columns it's their Fellowship ID 10001 2 3 and four their first name froto Sam wiise gelf and Pippen and their skills hide and gardening spells and fireworks so this is our very first data frame that we're going to be working with let's go down a little bit let's pull this down here and we're just going to say data Frame 2 Data Frame 2 and this is the Lord of the Rings 2 so let's pull this one in now as you can see it's very similar we have Fellowship ID 1 2 6 7 8 so we have three different IDs here we don't have six seven and eight in this upper this First Data frame we also have the first name so froto and Sam or Sam wise are in the very first and the second data frame but now we have three new people barir Eland and legalis and now we have this age column which again is unique to just this second data frame really quickly I want to give a huge shout out to the sponsor of this video and that is zendesk I've been using zenes for my company's customer analytics and it has been absolutely phenomenal they're going to be hosting a conference called zenes relate on May 10th and they're going to talk all about customer analytics chat Bots and AI in this space you can attend in person in San Francisco or you can attend virtually but space is limited so be sure to apply if you want to attend so if you are a business leader and you want to make most out of your customer data or you want to learn customer data analytics I will leave links in the description again huge shout out to zendesk for sponsoring this video now the first one that I want to look at is merge and I want to look at merge first because I think this one is the most important I use this one more than any of ones that we're going to talk about today the merge is just like the joins that we were just looking at the outer the inner the left and the right and there's also one called cross and I'll show you that one although if I'm being honest I don't really use that one that much but It's Worth showing just in case you come into a scenario where you do want to do that so let's go right down here and I want to be able to see these while we do it so we're going to say data frame one and when we specify data frame one as the very first data frame we say datf frame. merge this is automatically going to be our left data frame then if we do our parentheses right here and we say data Frame 2 this is our right data frame and let's see what happens when we do this so what it's going to do and this we didn't specify this it's just a default it's going to do an inner join so it's only going to give us an output where specific values or the keys are the same now you can't see this but what is happening is is it's taking this Fellowship ID and saying I have 101 here a 102 here this is the exact same as up here with this Fellowship ID and fellowship ID of 101 and 2 but when we look at 13 and 4 those aren't in this right right data frame and 678 is not in this left data frame so the only ones that match are this 101 and two and that's why they get pulled in down here but because we didn't explicitly say here's what I want to join or merge between these two data frames it actually is looking at the fellowship ID and the first name so it's taking in these unique values of froto and Sam wise which are the same in both which is why I pulled it over but really quickly let's just check and make sure that we did it on the inner join because again we didn't specify anything that was just the default so we're going to say how is equal to and then we'll say iner and if we run this it's going to be the exact same because again the inner is the default but now just to show you how it's kind of joining these two uh data frames together I'm going to say on is equal to and then I'm only going to put Fellowship ID so let's run this now the first thing that you make may have noticed is this first name undor X and this first name uncore Y what the merge does as kind of a default is when you were only joining on a fellowship ID we have this right data frame with Fellowship ID the left data frame with the fellowship ID if you're just joining on these and you're not joining on the first name and the first name then it's going to separate those into an underscore X and an underscore Y and even though they have the exact same values since we are not merging on that column it automatically separates that into two separate columns so we can see the values within each of those columns if we went into this on and we make a list and let's do it like that and we say comma and then we write first name oops first name and then we run this it's going to look exactly like it did before again it automatically pulled in both of these columns when it was merging at the first time even though we didn't write anything but if we actually write this this it's doing exactly what it was doing when we just had df2 we're just now writing it out now there are other arguments that we can pass into this merge function let's hit shift Tab and let's scroll down here so within this merge function we have a lot of different arguments that you can pass into it first we have this right which is the right data frame which is this data frame two then we have the how and the on which we've already shown how to do there's a left on right on left Index right index not something you'll probably use that much but you definitely can if you want to look into that and there's all these doc strings which show you exactly how to use all of these so if you're interest in looking at the left and the right and the left index it's all in here the one that is really good is the sort and you can sort it saying either it's false or true then we have these suffixes now if you remember when we took these out what it automatically did was it put in these underscore X and underscore y you can customize that and you can put in what whatever you'd like instead of the underscore X andore Y you can put in some custom um string for that we also have an indicator and a validates again all things you can go in here and look at I'm just going to show you the stuff that I use the most so these things right here are things that I definitely use the most so now that we've looked at the inner join let's copy this right down here and let's look at the outer join and these get a little bit more tricky I think the inner join is probably the easiest one to understand well look at the outer is spelled o u t e r i I don't know why I always want to say o t t r but let's run this and see what we get so now this looks quite different the inner join only gave us the values that are the exact same this one is going to give us all of the values regardless of if they are the same so we have 1 2 3 4 six seven and eight so let's scroll back up here so we have 1 2 3 4 1 two and six s and 8 so we don't have a 105 and then if you notice in this data frame right here if the value doesn't have so if we can't join on the fellowship ID or the first name like legalis wasn't one that we joined on or that has a similar value in the left data frame it just gives us an N which is not a number and it's going to do that for any value where it couldn't find that join or it couldn't match uh something within that either ID or first name so in age we also have that for the ones that weren't in the right data frame we only had 101 and 102 so we'll have the age for both froto and Sam but for Gandalf and Pippen we don't have their corresponding IDs and so it's just going to be blank for Gandalf and Pippen and you can see that right here so again outer joins are kind of the opposite of inner joins they're going to return everything from both if there is overlapping data it won't be duplicated now let's go on to the left join and I'm going to pull this down right here and now we're just going to say how is equal to left and let's run this so what this is going to do is it's going to take everything from the left table or the left data frame right here so everything from data frame one then if there is any overlap it'll also pull the overlapped or the you know whatever we're able to merge on from data Frame 2 so let's go back up to our data frame 1 and two so it's going to pull everything from this left data frame cuz we're specifying we're doing a left join so everything from the left data frame will be in there we're also going to try to bring in everything from the right but only if it matches or or is able to merge so just this information right here will come over we weren't able to join on 1006 17 or 1008 so really none of that information is going to come over so let's go down and check on this so again we have 1 2 3 4 all of the data with this first name and skills everything is in here but then we are trying to bring over the age but we only have matches with 1,1 and 1002 so only these two values will come in let's look at the right join because it's basically the exact opposite let's look at the right and this is basically the exact opposite of the left in the fact that now we're only looking at the right hand and then if there's something that matches in data frame one then we will pull that in so this this is basically just looking like data Frame 2 except we're pulling in that skills column and since only 101 and 102 are the same that's why the skills values are here now those are the main types of merges that I will use when I'm using a data frame or when I'm trying to merge a data frame but there also is one called a cross or a cross join uh and let's look at this one and this one is quite a bit different here we go let's run this so this one is different in that it takes each value from the left data frame and Compares it to each value in the right data frame so for froto in this left data frame it looks at the froto in the right data frame Sam wise in the right data frame legalis elron and baromir all on the right data frame then it goes to the next value Sam wise and does the exact same thing Roto Sam wise legalis Elon baromir and it does that for every single value so let's go right back up here so it's taking this this this 101 it's comparing it to 1 2 3 4 5 then it's taking Samwise it's comparing it to 1 2 3 4 5 Gandalf 1 2 3 4 5 Pippen and then you kind of see that pattern and that's what a cross joint is um there are very few in my opinion reasons for a cross join although you'll if you ever do like an interview where you're being interviewed on python you will sometimes be asked on Cross joins but there aren't a lot of instances in actual work where you really use need a cross join now let's take a look at joins and joins are pretty similar to the merge function and it can do a lot of the same thing except in my opinion the join function isn't as easily understood as the merge function it's a little bit more complicated um but let's take a look and see how we can join together these data frames using the join function so let's go right up here we're going to say data frame one do join and then we'll do data frame two very similar to how we did it before and let's try running this and it's not going to work um when we did the merge function it had a lot of defaults for us let's go down and see what this error is it says the columns overlap but no suffix was specified so it's telling us that it's trying to use the fellowship ID and the first name just like the join did except it's not able to distinguish which is which and so we need to go in there and kind of help it out a little bit again a little bit more Hands-On than the merge but let's see what we can do to make this work let's do comma and we'll say on and let's really quickly let's open this up and kind of see what we have so this one has less options than the merge does we have other and that's our other data frame we can do on and we're going to specify you know what column do we want to join on and then we can look at how do we want it to be a left an inner an outer the same kind of types of joins as the merge then we have that left suffix right suffix and that's right here is kind of part of the issue that we were just facing is that those columns are the same but if we say left suffix it'll give us an underscore whatever we want to specify any string four columns that are both in the left and the right we can give it a unique name so we'll no longer have that issue and then we can also sort it like we did on the other one but anyways let's go back to our on we'll say on is equal to and then we'll say Fellowship ID let's try running this and we're still getting an error it's just not as simple as the merge so let's keep going so now let's specify the type so we'll say how is equal to and we'll do an outer and if we run this it still doesn't work we're still getting the exact same issue as the left suffix and the right suffix so now let's finally resolve it I just wanted to show you how a little bit more frustrating it was but now let's say uh L suffix is equal to and now it automatically when we did the merge did an underscore X but we can do let's do underscore uh left and then we can do a comma we'll do right suffix and we'll says equal to and we'll do underscore right now when we run this it should work properly let's run this so this is our output and obviously looks quite a bit different over here we have this Fellowship ID then we also have Fellowship ID left first name left Fellowship ID right and first name right so it just doesn't doesn't look right now something I didn't specify when I first started this cuz I kind of wanted to show you is that the join usually is better for when you're working with indexes before when we were using the merge we were using the column names and that worked really well and it was pretty easy to do but as you can see right here when we're trying to use these column names it's not working exceptionally well let's go ahead and create our index and then I can show you how this actually works and how it works a little bit better when we're working with just the index although you can get to work just the same as the merge it's just a lot more work so let's go right down here and let's go and say df4 so we'll create a new data frame we'll say df1 do setor index and we'll do an open parentheses and we'll say we want to do this index on the fellowship ID and then we're going to do the join so now we're going to say join so we're setting an index so we're setting that index on the fellowship ID now we're we're going to join it on df2 do setor index and then we're also going to do that on the fellowship ID and I'll just copy this oh geez I hate it when I do that okay now we also want to do and specify the left and the right index so I'll just copy this as we do need to specify this now let's try running the data frame 4 so really quick just to recap we were setting the indexes we were doing the same thing above right we have this join we were joining data frame one with data Frame 2 now we're joining data frame 1 with data frame two except in both instances we're setting the index as Fellowship ID so we're joining now on that index so now let's run this and this should look a lot more similar to the merge than the join that we did above except now the fellowship ID right here is actually an index so it's just a little bit different but we can still go in here and do how is equal to Outer oops let's say outer so we can still specify our different types of joins or the different way that we can merge or join these data frames together we can still specify that again it's just a little bit different and that's why for most instances I'm using that merge function because it's just a little bit more seamless little bit more intuitive the join function can still get the job done but as you can see it takes a little bit more work now let's look at concatenate concatenating data frames can be really useful and the distinction between a merge and join versus the concatenate is that the concatenate is kind of like putting one data frame on top of the other rather than putting one data frame next to one another which is like the merge and the join so concatenating them is just a little bit different in how it'll operate but let's actually write this out and see how this looks let's go up here and we'll say pd. concat we'll do an open parenthesis and then we're going to concatenate data frame 1 comma data Frame 2 that's all we have to write and let's run this and so just like I said it literally took the First Data frame 1 2 3 4 and put it on top of the right data frame 1 2 6 7 8 so that is our left data frame this is our right data frame and they're literally just sitting one on top of the other but just like when we merg either with a left or a right when you have these skills and there aren't any values that populate for them it is going to say not a number and since we're not actually joining we're not joining on one and two even though this one and this one is the same rows it's not populating that value because again we're not joining these together we're just concatenating and putting one on top of the other now if we go into this concat we say shift tab there are a lot of different things that we can do which if you remember the zero axis is the leftand index and the axis of one is the top index which is the columns so you can specify that and we can also o do joins and this is the one that I'm going to take a look at but there are other ones that you can um look into as well let's look at join let's do comma and we'll say join is equal to and let's do an inner join so let's see what happens with this as you can see it is only taking the columns that are the same that's what this in is doing it's joining these columns together and the ones that were different they didn't take because again we weren't able to combine them they aren't similar between both frames Let's do an outer and now it's going to take all of them and like I said that's doing this on these columns right here but we can also do it on this axis as well so let's go ahead and say axis is equal to one and when we run this now it's joining us on this Index right here of 0 1 2 3 4 so now these ones are being joined together and it's putting it side by side much like a merge wood so that's how concatenate works and I'm going to show you one more thing and again it's not up here in this you know title because it's not one that I recommend but is one called append the append function is used to append rows from one data frame to the end of another data frame and then we can return that new data frame and so let's do data frame one. aend we'll do an open parenthesis and we'll say data Frame 2 very similar to how we've been doing other things and let's run this and as you can see this is almost exactly like how the concatenate did when we first did it but if we read kind of this warning it's saying the frame append method is deprecated and will be removed from pandas in the future version use pandas do concat instead so it's literally warning us you know a pend is on its way out if you want to do exactly what you're doing right here go and try concat or concatenate because that'll do the exact same thing so I'm not really going to show you any other variations of a pend because there's no reason it's going to be on its way out in the next version so that is our video on merge join and concatenate and aend as well uh in panda does and I hope that that was helpful I hope that you learned something I mean this stuff is really important because often times you're not just working with one CSV or one Json or one text file you're working with multiple of them and you need to combine them all into one data frame and so this is a really really important concept and thing to understand with that being said be sure to like And subscribe check out all my other videos on Python and pandas and I will see you in the next [Music] video [Music] hello everybody today we're going to be building visualizations in pandas in this video we'll look at how we can build visualizations like line plots Scatter Plots bar charts histograms and more I'll also show you some of the ways that you can customize these visualizations to make them just a little bit better with that being said let's go right over here start importing our libraries and we'll start with importing pandas as PD and this one is really all you need to actually create the visualizations in pandas but we may get a little bit crazy uh and so we're going to do a few different ones as well like import numpy as NP and then we're going to do import Matt plot lib do pyplot as PLT now I may or may not use this I just you know when I get into visualizations I may want to change some different things so we're going to at least have them here in case we do want to use them let's go ahead and run this so now let's get our data set that we're going to be using so let's say data frame is equal to pd. read _ CSV and let's get this in right here now we're going to be doing these ice cream ratings let's take a look at this really quickly now these values are completely randomly generated they are not real in any way um but that's what we're going to be using cuz I just wanted something kind of generic something that wouldn't be too crazy confusing just something that we could use and you guys can understand that they're just numerical values but let's also set that index really quick so we'll say data frame. setor index and then we'll say date and then we'll say that's equal to the data frame and we have this date column right here as our index so we have uh January 1st 2nd 3rd 4th and then we have our ratings right here and again these are all just integers and they're pretty easy or are really easy to demonstrate how you can visualize these so that's why we're using it today so the way that we visualize something in pandas is we use something called plot so let's just take our data frame we'll do data frame. plot and we'll do our parentheses now let's go in here really quickly let's hit shift Tab and this is going to come up and this is pretty important because this kind of is going to tell us what we can do within this plot and unfortunately there isn't like a quick overview we just have this doc string but we have our parameters right here these are what we can pass in to kind of customize our visualization so the data is going to be our data frame then we have our X and Y labels we can specify the kind and this one's important because you can specify what kind of visualization do we want we can do a line plot horizontal a vertical bar plot histogram box plot and then a few others including area Pi density all these other things we can also specify if we want it to be a subplot and a lot of these things that I'm specifying you know I'm going to show you how to do you can use a different indexes you can add titles add grids Legends Styles all these different things I mean you can go through here CU there are a lot but you can specify and and you know customize all of these things we won't be going into all of them but I will show you some of the ones that I probably use the most and that I think are the most useful to know right away so let's get out of here and we're just going to do DF do plot and when we run this we'll get this right here and that was super super easy created a line plot by literally doing just about nothing nothing um but by default it's going to give us a line plot so if we come up here we say kind and let me get that out of the way is equal to line and we run this so by default without us actually having to input anything it's giving us that line plot as a default so uh we can specify it's a line plot as you can see we already have all of our data right here we didn't have to specify anything it kind of automatically took it in it is visualizing all three of these columns and it has this little um Legend right here and we can specify where we want that uh there is an argument to be able to do that it also gave us these tick marks of 2 4 6 8 10 again it read in and said it's only going from 0.0 to 1.0 that is kind of the peak and so it kind of automatically gave us these ticks for us again that's another thing that you can specify we make it go up to 2 5 10 1,000 whatever you want it to be and then we're doing this based on off of this date value right here really quickly I wanted to give a huge shout out to the sponsor of this entire Panda series and that is udemy udy has some of the best courses at the best prices and it is no exception when it comes to pandas courses if you want to master pandas this is the course that I would recommend it's going to teach you just about everything you need to know about pandas so huge shout out to you me for sponsoring this Panda series and let's get back to the video if we wanted to break these out by the actual column we could go in here and say subplot is equal to true and it's actually subplots whoops and now we can run that and then we can see each of those columns being broken out by themselves instead of them all being in one visualization it's now uh three separate visualizations now let's go right over here we're going to get rid of the subplots I want to show you just some of the different arguments that you can use to make this look nice uh because I don't want to do this on every single visualization I just want to show you what you can do so we have this one right here we can add a title notice there's no title or anything really telling us what that is so we can say comma title and we'll say ice cream ratings if we run this we now have this nice title right here now we can also customize the labels or the titles for the X and Y AIS it automatically took this date which is right here this is our date index it automatically took that for us but we can customize that if we'd like to all we have to do is comma and then we'll say xlabel is equal to and so our X is this date one right here and we can say daily rating and then we can do the Y Lael we'll say y label is equal to and for this one we can say scores hope you cannot hear my dog in the background CU they being insane uh but let's go ahead and run this and now we have these daily ratings on the x- axis and on the Y AIS we have scores now let's go right down here and start taking a look at our next kind of visualization which is going to be a bar plot so we'll do DF do plot we'll do kind is equal to and for this one we're going to say bar now this is what your typical bar plot will look like and a lot of the arguments that we just did on the line plot you can also apply to this bar plot something that's unique to the bar plot is that you can also make it a stacked bar plot all we have to do is go in here we'll say comma and we'll say stacked is equal to true so now this going to make it a stacked bar chart instead of just know your regular bar chart let's go ahead and run this and as you can see this is now stacked on top of one another with each of these columns all representing the values that they have now we don't always have to do every single column we can also specify the column that we want so let's take the flavor rating for example we could do flavor oops flavor rating good night flavor rating and then it's only going to take in that flavor rating column and if you notice we don't have a legend that's only when you have multiple values which we are only looking at this one column so all the values are right here now in this bar chart it automatically defaults to a vertical bar chart but you can change it to a horizontal bar chart let's go ahead and take a look at how to do that bring back all of them we'll do DF do plot Dot and then we'll say barh and I don't know if I can keeping that kind equals bar let me run this yeah I need to get rid of that because the bar. H is its own um this is its own function so now I'm going to run this it should just have a stacked bar chart except now it should be horizontal so now you can see this worked properly it's basically the exact same thing as a vertical bar chart just now horizontal which may look better especially depending on if you have values like this or you know something else that just looks better being horizontal now the next one that we're going to take a look at is the scatter plot so we're going to say DF do plot do scatter scatter and if we run this we're going to get an error what we need in order to run this properly is we need to specify the X and the Y AIS in order for this scatter plot to work so let's go here and we'll say x is equal to and we can take any of our columns that we have up here so we'll say x is equal to texture rating and then oops Y is equal to we'll do overall rating now when we run this it should work properly let's go ahead and take a look now if we go in here and we do shift tab we can also see some other things that we can specify so let's go right down here so we have our X and we have our Y and those are the ones that we just did we can also pass through an S which is going to tell us or or change the size of the actual dots right here in our scatter plot then we can also do a c which is the color of each point let's start with the S let's say s is equal to let's just do 100 let's see what that looks like so we have a much larger number let's do 500 and see what that looks like so we can make these much larger on our visualization depending on what you're looking for we can also look at the color let's put comma C so for color we can say color is equal to and let's do uh yellow let's see if this works so now we've changed it to Yellow that looks absolutely terrible but it does work now let's move on to the histogram histogram is always a good one it's very similar to something like a bar chart but what's great about a histogram is you can specify the bins um so let's go ahead and say DF dolot doist then we'll do an open parenthesis and let's go ahead and hit shift tab in here take a look at this one as well so some of our parameters are the actual Columns of the data frames that we want to pull in we get you can choose the bins and they have a default of 10 in here and so let's take a look at how this works so we'll just run this as it is so this is by default what this histogram is going to look like let's go ahead and specify our bins we'll just say it was 10 by default let's just do 20 see what that looks like so there are smaller columns right off the bat and remember histograms are really good for showing distribution of variables you know that's really what a histogram is for but of course since these are completely random numbers this histogram isn't going to make any sense at all but you can at least kind of see visually how it works and if I didn't mention it before which I should have the bins represent how many kind of tick marks are down here so if we just do one only going to be one very large uh you know histogram we could even go further down from 10 and do five so now there's only one 2 3 four five so the distribution gets smaller and and things get more compact as you spread it out again like we did 100 it's going to spread it out a lot um and this is what it shows you know it's showing the distribution of those bins across however many you want so the 10 by default you know it usually is pretty good for a lot of different things now let's go down here and look at the box plot and the box plot is a pretty interesting one let's go ahead and visualize it really quickly and then I'll kind of explain how this one works let's do d boox plot let's run this and really what we're looking at is some different markers within our data this line right here is the minimum value within that column we also have the bottom of the box which is the 25th percentile of all the values within just this column this is 50% then we have 75% and then up here we have our maximum value so I can take a glance at this and see that we have a low minimum a high maximum and it definitely skews towards the lower range whereas if I look over here we have a lower minimum and a higher maximum and you can see that this medium point is at0 6 versus 04 over here so the skew is a lot higher now let's go down here and take a look at an area plot we'll do DF do plot. area and let's just run this this is what we're going to get by default now something I wanted to show you earlier I just haven't gotten around to I want to show you something called Figure size or fig size um so for this it's know it's just looks small small looks a little bit cramped let's say we want to increase the size of this and we'll say fig size oops fig size is equal to and let's just do a parentheses and say 10 comma 5 that should be pretty large this is going to make it a lot larger just something I wanted to throw in there I look at these area charts as pretty similar to like a line chart if we went and compared those be pretty similar um but they're different visually and you know you absolutely can use these for different types of visualizations but I don't use this one a lot if I'm being honest that's why it's kind of towards the end of the video but you definitely can do it let's go on to our very last one of the video that's going to be the beautiful pie chart let's say DF plot.py do an open parenthesis and let's run it we're going to get this error that's because we need to specify what column we're working with here so let's just say the Y and that's what we need let me open this up for us right here we have our y and this is our our label or a column that we're going to plot that's really all we need so we can just say Y is equal to flavor rating oops flavor rating let's run this and now we get this visualization right here let's make this one a little bit bigger big size is equal to 10 comma 6 now it's a little bit bigger it definitely depends so this Legend is going to autop populate you know you can make this as big as you want and obviously it's going to look a little bit better if you do it larger and these colors autop populate now you can customize these colors although I found these ones to be just when you have a lot of them it's harder to customize them as easily but you know definitely look into it these are things that everything in here is almost something that you can customize in some way although it does get a little bit tricky you definitely have to do some research and some Googling around just to kind of figure out how to do those things now one last thing that I wanted to show and something you know I could have probably done at the beginning um is you can actually change what visual this is and we can do that pretty easily within mpot lib there are different styles um and so let's go right here let's add a new row a new cell and we'll say print and we'll do PLT so that's that map plot lib right here we'll do PLT do style. available and what this is going to do whoops what this is going to do is show us all these different different types of stylings that you can do to kind of change up this visualization then once we find the one that we like we'll just do PLT do style. use and then in the parenthesis we'll just specify which one we want now there's all these Seaborn ones and Seaborn is a really great um really great Library let's try Seaborn deep I haven't tried this one at all let's go ahead and try this and just changes some of the colors some of the visuals we can try something like 538 let's try this that looks quite a bit different and let's try something like um classic I don't know what this one looks like let's just try it so you can try out all these different styles find one that you like find one that you think looks really nice and you can run with it through all your visualizations so this has been our video on visualizing data in pandas I think it's is a really good introduction on how you can visualize data within python and in future videos we'll look at mpot lib and Seaborn which are some really great libraries for visualizing data which I use a lot so I hope that you enjoyed this video if you did be sure to check out all my other videos on Python and pandas and I will see you in the next [Music] video hello everybody today we're going to be cleaning data using paint P now there are literally hundreds of ways that you can clean data within pandas but I'm going to show you some of the ones that I use a lot and ones that I think are really good to know when you are cleaning your data sets so we're going to start by saying import pandas aspd and we're going to run that and now we're going to import our file so we're going to say data frame is equal to PD that's pandas do read uncore and we actually have this in an Excel file so we'll say read oops say read Excel do an open parenthesis eses and we'll do R and then we'll paste the path right here and now we're just going to call that variable so we'll call data frame and we'll actually read it in and look at the data so let's scroll down here and let's take a look at this data frame or this Excel file that we're reading in so right off the bat we have this customer ID that goes from 101 all the way down to 1020 we have this first name and everything looks pretty good here except in this last name column uh looks like we have some errors we have some forward slashes some dots some null values um so definitely going to have to clean that up because we don't want that in the data we have a phone number and it looks like we have a lot of different formats um as well as Nas not a number um just lots of different stuff so we're going to need to standardize that so clean it up and then standardize it to where it all looks the same um we also have address and it looks like on some of these we just have a street address but on some of the other ones we have like a street address and another location as well as a zip code in some of them so we'll probably want to split those out we have a paying customer uh which is yes and Nos and some of those are not the same so I have to standardize that we have a do not contact kind of the same thing as the paying customer and we have this not useful column which we'll probably just want to get rid of okay so the scenario is is that we got handed this list of names and we need to clean it up and hand it off to the people who are actually going to make these calls to this customer list so they want all the data in here standardized and cleaned so that the people who are making those calls can just make those calls as quickly as possible but they also don't want columns and rows that aren't useful to them so things like this not useful column we're probably going to get rid of and then ones that say do not contact if it says yes we should not contact them we probably will want to get rid of those somehow so that's a lot of what we're going to be doing to clean this data set normally the very first thing that I do when I'm working with a data set most of the time except very rare cases when you're actually supposed to have duplicates is I actually go and drop the duplicates from the data set completely all you have to do for that is say DF do dropcore duplicates so they make it super easy for you let's just run it and up here is our original data set we have this 19 and 20 and those are obviously duplicates they have the exact same data it's just a duplicate row that we need to get rid of if we look right down here we no longer have that 20 we now just have one row of Anakin Skywalker and of course we want to save that so we're just going to say DF is equal to and DF so now it's going to save that to the data frame variable again and now when we run this our data frame Now does not have any duplicates that's definitely one of the easier steps that we're going to look at uh things are going to get quite a bit more complicated as we go but I'm starting out you know kind of simple so that we can kind of get a feel for it and then we'll start getting into the really tough stuff so the next thing that I want to do is remove any columns that we don't need I don't want to clean data that we're not going to use so if we're just looking through here you know they may need you know first name last name phone number for sure address might give them some information of where they're calling to or time zone so we want that this not useful column looks like a pretty good candidate to delete and it's very easy to do that we're going to go right down here and we're going to say DF do drop and we'll do an open parenthesis drop just means we are dropping that column and we can specify that by saying columns is equal to and then we'll paste in that column that we want to delete so let's run this and see what it looks like and it literally just drops that column exactly like we were talking about it no longer has that column again we want to save that we can always do in in place equals true um if you follow this tutorial series you can always do in place equals true and that'll save it as well but just for our workflow most of the time I'm going to assign it back to that variable um just for keeping it the same really quickly I wanted to give a huge shout out to the sponsor of this entire Panda series and that is udemy udemy has some of the best courses at the best prices and it is no exception when it comes to pandas courses if you want to master pandas this is the course that I would recommend it's going to teach you just about everything you need to know about pandas so huge shout out to you me for sponsoring this Panda series and let's get back to the video now let's kind of go column by column and see what we need to fix and we'll start on this left-and side this customer ID to me looks perfectly fine I'm not going to mess with it at all the first name at a glance also looks perfectly fine I don't see anything wrong with it visually which is a good thing um although sometimes that can be deceiving and that can cause errors down the line but we're not going to uh assume that there are errors in here now let's look at this last name now the last name obviously I'm I'm seeing some obvious things things that we talked about when we were first looking at this data set we have this forward slash which we definitely need to get rid of we have null values so not a number right here we have some periods as well as an underscore right here so all those things I think we should clean up and get rid of it so that when the person is making these calls you know it's all cleaned up for them so how are we going to do that we can actually do this in several different ways but let's just copy this last name the first one I'm going to show you is strip and we'll write it kind of like this we'll say data frame and then we'll specify the column that we're working with because we don't want to make these changes or strip all of these values from everywhere we only want to do it on just this column if we do this and we don't specify the column name it will apply to everywhere so if we're trying to do these yeah let's say bum these underscores maybe that would mess with something else in another column and we don't want that so we just want to specify just this last name so let's go last name. string. strip now what strip does and let's see if we can open this up really quickly no we can't um but what strip does I was just I was hitting shift tab in here to see if it could bring up um you know some of the notes on it but what strip does is it takes either the left side or the right side well L strip takes from the left side our strip takes from the right side and strip takes from both but you can strip values off the left and the right hand side and we can specify those values now for what we're doing in this column we can just use strip because as you can see this forward slash these dots as well as this um underscore are all on the far sides if there was a value Like swancore Son the strip wouldn't work at all because it's not on the outside of the value or the word so we can use strip I'll also show you how to use replace and replace is another really good option for things like this but let's start with strip and just see what it looks like and see if we can get what we need done so let's just run this for now see what happens so it looks like nothing has changed because again we're not specifying any specific value just by default it's only taking out white space so like spaces that shouldn't be there that's what it does by default now we can specify within this exactly what values we want to take out so let's go ahead and do that let's say left strip and let's try to take out these dots real quick so we're just going to do a parenthesis dot dot dot now let's run this and see what it looks like for this one Potter it is now gone so those three dots were there before let's just show it so they were there and then when I ran it like this now they're gone that's what the L strip does it takes it only off the left hand side now we can also do a forward slash so we'll do something like this and it'll get rid of the white but as you can see now we aren't taking out these three dots so they're still there now is it possible to do something like this where we put these values inside of a list um let's try it so we'll say just like this one two 3 let's run it and no it doesn't um this L strip actually sits within the the realm of regular expression so if you've ever worked with regular expression you know it gets very complicated very complex so you want to keep it kind of simple especially with these values where we're just taking a few out so what we're going to do is we're going to do dot dot dot and we're take it out one by one now in order to save this because we want to save this we want to take out that value we don't just want to say data frame equals because that would be uh very bad what this would say is now this data frame is only equal to these values that we're seeing right here we want to only apply it to this column so we're going to go like this so now when we do it and then we call the entire data frame it's only applying this to this one column the last name column so let's run it and now when we go down to Potter right here it's cleaned up so we're going to do the same thing but for those other values and we'll do it just like this we'll do a forward slash and it's a left strip and then we'll do I'll do the left strip on this underscore to just to show you that it won't work and then we will go on from there so it's not pulling it because we're looking at the left hand side only we need to use R strip so now let's use R strip and now that looks perfect has no underscore so that's how you can use strip for either the left side the right side or just Strip by itself which covers both sides now I showed you all of that because I am going to show you a different way to do it um and I apologize because I somewhat lied to you earlier um let's run this right here actually we're just going to pull it in like this we're going to remove the duplicates again bear with me we're going to drop that column and then now we're sitting with that data frame again with those exact same mistakes I just wanted to reset it for a second there is a way uh that you can do this and I just wanted to you know kind of show you how you can do it you can do this right here and we'll say so we're now again we're just looking at this column just this column and we're using strip and let's get rid of R CU we want to do apply it to everywhere you can input all of those values in visually and it will clean it up so let's say we want to get rid of numbers we'll do one two three then we can do the dot so that's going to be for a period or for a dot dot dot Potter we could also do the underscore and we can do the forward slash so we put it all in one string right here now let's take a look at this we'll get rid of this really quickly now let's take a look and all of them were removed I showed you how to do it before because that's at least how my mind would think about it I'd think oh I can put it in a list and run it through this L strip or this right strip and it would work um but that's not how strip works you have to kind of combine it all into one value so uh yes I deceived you I apologize but now when we call data frame and we assign it to that column so the last name column or assigning what we just did to this last name column everything should look perfect and it does so our customer ID first name last name are all cleaned up now we're going to come to a much more difficult one this is probably if I'm being honest the hardest one I said we were going to work up but this is probably the hardest one of the whole video working with phone numbers and look at all these different types of of formats I mean it is um it's not going to be fun and imagine you you know there's 20,000 of these you can't just go and manually clean those up you need something to kind of automate that so that is what we're going to do so let's go right down here we'll copy the data frame and I'm going to pull it right here so now we need to clean up this phone number what we want is it all to look exactly the same unless it's blank and we'll keep it blank we don't want to populate that data but we want all of them to look exactly like this one and what we're going to do is right off the bat we're going to take all of the non-numeric values and just complete completely get rid of them strip it down to just the numbers so this 1 23- 643 or forward slash will just be the numbers same with these bars and these slashes and everything all of these will just be numeric then we'll go back and reformat it how we want to format it which will look exactly like this one um but we just want to do it for the entire column so let's go right up here and we're going to try replace for the first time so let's do phone number just oops that's not what I wanted so we're going to do a bracket say phone number do string. replace just like we did before now we're going to use some regular expression in here and I'll kind of do a really high overview although I'm not going to dive super deep into the regular expression then we're going to do a parenthesis and within there we're going to do a bracket um I can't remember what this is called is it called a carrot I think it's called a carrot uh I'm just going to call it that it may not be correct but I think it's a an upper Arrow so it's an upper Arrow a dash oops A- Z A- Z and then 0-9 now at a super high level what that character that first thing is doing it's saying we're going to return any character except and then we specify anything A to Z A to Z upper or lowercase and then actually I think this should be like this A to Z uh and then 0 to 9 so any value like a BC 1 2 3 those are not going to be matched it's going to match all of them except these values and then we're going to replace them by saying comma and we're going to replace them with nothing so this is just an empty string so literally we're taking everything that is not an A B C A one two 3 so a letter or a number we're replacing all of that and then we're replacing it with nothing so let's run this and see what it looks like and it looks like that worked properly now we do have this na cuz we had an n- a for I don't remember maybe that was Creed Bratton um but it worked for basically everything else we're going to go through the entire process and then at the end we'll remove any values we want them to just be completely null we we don't want them to even see n an and wonder what that is we just want it to be blank and we'll do that at the very end so now that we know that that worked let's assign it we'll do DF phone num is equal to and then we'll say data frame and this looks a lot more standardized than it did before already but now what we want to do is try to format this um and I've done this many many times I always use a Lambda you can definitely use a for loop I just I don't do it that way myself so I'm going to show you how to do it using a Lambda let's get rid of this and we're going to say thef phone number we've already done that I'm just going to get rid of it now we're going to say d phone number then we're going to say do apply we'll do an open parentheses and then this is where we're going to build out our Lambda so we'll say Lambda X colon now this is where we're going to kind of format it so what I want to do is I want to take the first three strings one two three then I want to add a slash and then the next three strings add a slash or a dash uh and then that be the value that's returned so it's not super difficult we're just going to do X then a bracket let me get rid of that an X and then a bracket and then we want the 0 to three so goes 0 1 2 so 0 1 2 it doesn't include the three it goes up to three so 0 one two that's our third first three values then we'll do plus and do a quote and do a dash so this is our first kind of sequence and I'm just going to copy this we'll do plus and instead of three or we are going to start at three because now it's inclusive so we're going to go from three and we're going to go all the way up to six so it should be 3 four five our next three values then we have a dash and we'll copy this and we'll say plus and now we go from six all the way to 10 now let's try running this and as you can see we get an error now I already know what the error is float object is not subscriptable which means we're trying to um basically look at it like a string right now it's not a string it's actually a number so let me get rid of this for just a second I'm G show you what it's talking about so right now we have values that are floats and values that are strings or not even a number so we have values that are strings or not a number so if we want to actually look through it like kind of like indexing if we want to do that they all have to be strings so we need to change this entire column into Strings before we can apply this um formatting now when I was creating this if I'm being honest my first thought when I was doing this was to do it like this string DF phone number um let's just run that this is what the values look like um and I don't remember why or why it was doing this I can't I can't remember but I looked into it quite a bit and I was like oh I need to apply this string converting it to a string on each value not the entire row or not the entire column so how we can do that is actually fairly easy because we've already done a lot of the heavy lifting we're just going to copy this and we're going to say x so string of X and again Lambda is like a little Anonymous function so you could do this by saying for um X in this uh column we could do a for Loop and then say for every X it equals the string of X and then it changes it to a string but a Lambda just does it a lot quicker um so we're going to say so let's do that really quickly and all of our values look exactly the same and that's how we want it so we're just going to copy this apply it good and now we're going to take this and we're going to run this again just ignore all my commented out stuff pretend I don't have that um so now when we run this it should work there we go now if we look at these numbers 1 2 3- 545 d 5 421 and it does that for every single one where there's values even when there's Nan or na it's still adding those values but we expected that so let's apply it say is equal to and then we'll look at the data frame and this looks almost exactly what we're hoping for we just need to get rid of these so this n- Das and this na Dash we need to get rid of those and that is super easy to do um we're just going to say so now that we've done it and we'll comment that out we'll say DF and let's copy this ignore the messiness I do apologize for that it's very messy um but if you're following along with me you get what we're doing so DF phone number so only on the phone number say string. replace no open parenthesis now we can specify this value so we want to take this exact value and replace it with nothing and let's just see if that does work it does now we have these Nas and so let's actually I'll paste that right down here we're going to do this is equal to and then we're just going to take this entire string put it right here and put this value as our what we're looking for and then replacing and then when we call that data frame it should work properly and it is perfectly cleaned so we have every single value all the exact same they don't have different characters or different um you know formatting and we got rid of all the ones that we don't have or don't need um all the ones that were just random values so this column is now completely cleaned up again definitely one of the more difficult ones um one that I've done a thousand times I've had to work with a lot of phone numbers and stuff like like that this one does get very tricky especially if you have like a plus one which is like an area code um that can get tricky as well but this is on a kind of a high level this is how you can do that and it's pretty neat how you can actually you know clean up and standardize those phone numbers so let's go right down here uh let's run it the next thing that we're going to look at is this address now let's just pretend that the people who are on the call center want all these separated into three different columns they can read it easier see what the ZIP code is where they live uh you know whatever they want it for let's just say we want to do that and this is you know again for this use case it may not make sense but you have to do this I do this all the time um you need to split those columns now luckily all of these things are separated by a comma so we can specify that we're going to split on this column and then we'll be able to create three separate columns based off of this one column which is exactly what we want then we can name it as well and we can do that very easily by using this split so we're going to say DF and we want to specify oh jeez not again so we want to specify that we're looking at the address then we're going to say. string. split we'll do an open parenthesis now the very first value that we need to specify is what we're splitting on so we want to split on the comma so we want to specify that and then we need to specify how many values from left to right it should look for now we'll just start with one and then we'll go from there let's just see what this looks like so it doesn't really look like it did anything let's do two well let's go back to one and then let's say expand equals true when we expand it it's actually going to uh separate it I believe okay so we're expanding we now we're only doing this with one comma so we're only looking at the very first comma and splitting it but in some of these well just in one there is an additional comma so we should do it up to two let's do this okay so now we have three columns if we just save it like this it's going to give us these 0 one two these basically these indexed values for these columns and we don't want that we want to specify what these actually are and we can do that by saying DF and let me just do is equal to we'll do bracket and then within there we're going to specify our list so we have three three of them that we have so I'm going to do um the first one this is the street address so we'll say street address the next one is and it's sh is not a state uh but these all are states so I'm just going to say State and then for the very last one that looks like a zip code so we'll say zip and we'll do code in fact I also want to do streetcore address um so what this is is now going to do is these three columns are going to be applied to these three names and they'll basically be appended it doesn't replace the address we're not saying DF address equals the DF address we're not replacing it we're now creating different columns so let's run it and then let's also call it so they're right over here on this right hand side I couldn't see them at first but it did exactly what we needed it to do so now if we wanted to at the very end if we want to we're not going to we could just delete this address and keep the street address the state and the zip code another really common thing that you can do this happens often again with like first name last name well you'll have Alex freeberg but it's Alex comma freeberg or Alex space freeberg and you can separate those out into different columns now the next one that we want to look at is this paying customer and the paying customer and do not contact are very similar um in the fact that it's yes no NY yes no NY um and so let's go right on down here and we're going to say DF Dot and we want to just replace these values as all yeses or all NOS but just with the same formatting um just to keep it consistent so let's make anything that's an N into a no anything that's a a y into a yes I like it spelled out so let's change anything that's a yes into a y anything that's uh a a no into an n that's usually how I do it just saves on data because it's less strings although it's can be often very minimal um but let's specify the P customer we'll s say DF bracket Pay customer then we'll do do string. replace so now we're just going to look for those specific values so if it's a y oops a capital Y then we'll say yes now let's run it and now we have no more y's we now just have yeses although now these are yes yeses okay we don't want to do that let's do if we're looking because it's taking it's literally looking up here and saying okay there's here's a y um let's change the let's change that Y into a y so now it's doing y uh we don't want that so let's look for the yes and change it into a y now when we run this that looks a lot better um so we'll do DF paying customers equal to and then we'll copy this we'll do the exact same thing no and N then let's call it and now that entire column looks really good except for that value right there but I'm going to leave that because I'm just going to apply it to the entire thing all at once to get rid of those at the end instead of just going column by column and then it's it's literally going to be the exact same thing so I'm not even going to scroll down whoops I'm just going to put it right up here because this is the exact same thing I'm going save us all some time and when we run this this looks exactly like what we're looking for again some not a number values but we can get rid of that in just a second by doing a place over the entire data frame and that is basically the end of cleaning up individual columns now let's go right down here we're going to say DF do string. replace and then we'll first do these values oops so we'll do oops let me do that there we go and replace that with nothing and let's just see what it looks like oops data frame object has no value string well that's CU we were looking at columns before yeah I think I just need to get rid of this string we're not looking it we're just really doing it across the entire data frame now let's try that okay that worked appropriately and we'll just say data frame is equal to and then we'll copy this and we'll do the NN as well and we'll [Music] do and now when we do this it is not going to replace these because these aren't actually a value because we're looking for that string we actually need to use and I I completely forgot this I'm not going to lie to you um let's get rid of this uh to get rid those values because it's literally not a number there it is technically empty um I forgot we can do um or we could not even specify it we'll do DF do fillna so we're going to fill these values if there's nothing in them we're going to fill it and we're going to say blank and when we run that every value that doesn't have something in it is going to show up blank even over here where we only had a few all of them throughout the data frame if if it doesn't have a value it is now blank so let's apply that and we'll run this and now all of our cleaning we actually cleaning up the individual columns is completely done we've removed columns we've split columns we've formatted and cleaned up phone numbers we've also taken values off of first name or or this last name column and then we formatt it in just kind of standardized paying customer and do not contact now they also asked us to only give them a list of phone numbers that they can call so if we take a look some of these do not contacts are why which means we cannot contact them and then there are some that don't even have phone numbers so we don't want to give the people the call center numbers that or or people who don't have numbers so we want to remove those now there's a few different ways that we can do this but let's start with and we'll just go by do this do not contact it seems like the most obvious one now if it's blank we want to give them a call we only want to not call them if they've specifically said we cannot call them so if it's y we're not going to call them so what we need to do it's not anything like this we probably need to Loop through this column and then look at each row that has a value of this and drop that entire row uh and we probably will'll need to do that based off this index instead of doing it based off just this column uh that may not make sense but let's actually let's actually start writing it so we'll do 4X in and we need to look at our index so we're just going to do let's do nf. index and we'll do a colon enter and then we want to look at these indexes how do we look at these indexes we use lock that's going to be DF Lo and then we need to look at the value which is this x right here so each time it looks at the index it's looking at the value but we want to look at the value of this column do not contact I don't know if I copied this before let me copy it we only want to look at the value in this one column if we didn't it would look at um a different value so we don't want that so we're looking at just that value if it's equal to Y so if this value is equal to Y then we want to drop it so we actually need to say if so if this value X in this column is equal to Y then we want to do DF do drop and then we'll say x and we I think we have to say in place equals true here otherwise it won't take a fact um otherwise you have to say like DF is equal to DF yeah I don't I don't want to start messing with that let's just do in place equals true um and let's see if that works I I can't remember if this is going to work or not invalid syntax okay neon and now let's try to run this okay okay yeah if we look at our index we can already tell that there are ones missing the one the one is missing the three is missing um let's see and the 18 is missing so we already got rid of those values and you can you can see that there's no y's in here anymore which is really good we can if we want to and we probably should we should probably populate that um really quickly um let me just go up here really quick I'll copy this we probably should populate that and I didn't plan on doing this so um if it's blank oops it's blank give it an n and we want to attribute it to do not contact do not contact whoops let's see if that works and we probably need to do do string let's just see if it works so if it's blank dude okay I don't know why it's giving us a triple n maybe there's maybe I need to strip this or something uh okay never mind let's not do that but now we basically need to the exact same thing for this phone number um because if it's blank we don't want them calling it um so we can copy this entire thing go right down here and but now we're looking at phone number so now we're looking just at the values within phone number and we only want to look at if it's blank so if it literally has no value we want to get rid of it let's run this and see if it works again it should good and now our list is getting much smaller so you can see in our index a lot of um those rows were removed and okay good actually this worked itself out because these all have ends um so right now we're sitting really good everything looks really um standardized cleaned everything looks great I might drop this address if you want to you can drop this address but besides that this is all looking really good this Paint customer doesn't uh the yes and NOS aren't really anything um now we could and we probably should before we hand this off to the client or the customer call list we probably should reset this index because they might be confused as why there's numbers missing or you know they might use this index um to show how many people they've called or I don't know something like that so let's go right down here we're going to say DF Dot and then we'll do reset index and let's just see what this looks like um it does work but as you can tell it didn't uh get rid of that index completely it actually took the index and saved that original one we do not need to save that whoops let's put it right in here now we're just going to do drop equals true and when we do that it just completely resets it drops the original index and gives us a new index and that is what we want let's do DF equals and this is our final product now one thing that I you definitely could have done here um and I made this a little probably more complicated than it needed to be um that was just how my brain was working at the time when I'm you know typing this out we could have done DF do drop an a um which is literally going to look at these null values um before we couldn't do that with this one because these aren't we're not looking at na we're looking at y's so we couldn't do that but because we're looking at null values we could have also done drop na um and done subset is equal to and then done it just on this phone number and then done like this and done in place equals true so we could have also done this and then said DF equals um I can't I mean I can run it it's just not going to do anything I can run it on the different column but that'll me mess everything up but this is another way you can do it and I'll just save it in case you want to um I'll say another way to drop null values there you go and that'll just be a note for us in the future um but this is our final product it looks a lot different than when we first started I mean we had mistakes here completely different formatting in the phone number different address everything that we just talked about um and this looks just a lot lot better and you can tell why it's really important to do this process because again we're working on a very small data set I I purposely you know created this data set with these mistakes because you know when you're looking at data that has tens of thousands 100 thousands a million rows these are all things that are going to be applied to much larger scale and you won't be able to as easily see them um you'll have to do some exploratory data analysis to find these mistakes and then you're going to need to clean the data or doing it at the same time when you're exploring the data uh so you'll clean it up as you go but these are a lot of the ways that I clean data a lot of the things that you can do to make your data just a lot more standardized is a lot more um visually better and then it really helps later on with visualizations and your you know actual data analysis so I hope that that was helpful I know that this was a long video I'm sure it was uh but I hope that you got something out of this you learned some of the techniques on how to actually clean data in pandas if you like this video be sure to like And subscribe check out all my other videos on pandas as well as Python and I will see you in the next [Music] video [Music] hello everybody today we're going to be looking at exploratory data analysis using pandas exploratory data analysis or Eda for short is basically just the first look at your data during this process we'll look at identifying patterns within the data understanding the relationships between the features and looking at outliers that may exist within your data set during this process you are looking for patterns and all these things but you're also looking for um mistakes and missing values that you need to clean up during your cleaning process in the future now there are hundreds of ways to perform Eda on your data set but we can't possibly look at every single thing so I'm just going to show you what I think are some of the most popular and the best things that you can do when you're first looking at a data set the first thing that we're going to do are import our libraries so we'll do import pandas aspd we're also going to import Seaborn and matplot lib now dur during this exploratory data analysis process I often like to visualize things as I go because sometimes you just can't fully comprehend it unless you just visualize it and it gives you a a larger broader glimpse of everything so we're going to import and let's do caborn oops as SNS and then we'll import Matt plot li. pyplot as PLT let's run this this should work okay perfect now we need to bring in our data set so we've worked with that world population data set that is the exact one that we're going to use now so we'll say dataframe equals pd. read undor CSV do R and we'll paste in our CSV and this is what it should look like although your path may be different be sure to make sure that you have the correct file path then we'll read it in now this data set should look extremely familiar if you've done some of my previous pandas tutorials but I did make some alterations to this one took out a little bit of data put in a little bit of data here and there um to change things up because if it was just exactly how I pulled it which I got this data set from kaggle if it was exactly how we pulled it like we've looked at in the previous videos it's too simple you know we wouldn't actually be able to do some of the things that I would like to show you so be sure to actually download this exact data set for this video because it is a little bit different but what we're going to do now is just try to get some highlevel information from this now if yours looks just a little bit different like your values are in scientific notation uh I have applied this so many times I think it's um you know still applied to this you can do something and we'll write it right down here we're going do pd. setor option and we'll do an open parenthesis and we'll say display. float uncore format and so we're going to change that float format by just saying Lambda X colon and then we're going to change basically how many um decimal points we're looking at so let's just do here so we'll do a quote percent sign 2f so we're formatting it whoops 0 2f so we're going to format it and we'll do percent X this is going to format it appropriately I'm I can run it um and actually it will change it this is at 0 one I believe last time I did it so let's run this and then let's run this again it'll change it to point 2 so that's two I like it at 0.1 we don't really need it any well let's keep it at point two why not we're going to keep it at point two that's how you change that and I like looking at it like this a lot better than scientific notation so just something to point out um let's go down here and let's just pull up data frame so we have this data one of the first things that I like to do when I get a data set is to just look at the info so we're going to do doino and this gives us just some really high level information this is how many columns we have here are the column names here are how many uh values we have and if you notice this is where it kind of gets so we have 234 in each of these so in each of these columns we have 234 until we get to this 2022 population once we get there we start losing some values and then at the world population percentage we have all of our values all 234 of them the count tells us that it's non null so it does have values in it and then we also have the data types and these come in handy later um and these are really great to know and we'll be able to kind of use those in a few different ways later on in this tutorial really quickly I wanted to give a huge shout out to the sponsor of this entire Panda series and that is udemy udemy has some of the best courses at the best prices and it is no exception when it comes to Panda courses if you want to master Master pandas this is the course that I would recommend it's going to teach you just about everything you need to know about pandas so huge shout out to you to me for sponsoring this Panda series and let's get back to the video the next thing that I really like to do and this one is DF do describe this allows you to get really a highlevel overview of all of your columns very quickly you can get the count the mean the standard deviation the minimum value and the maximum value as well as your 25 50 and 75 percentiles of your values so just at a super quick glance there is a row somewhere in here and there this country their population is 510 for 2022 and in fact if you go back to 1970 it was higher it was at 752 that's just interesting then if we look at the um max population one has 1.42 billion I believe that's China and then over here in 1970 we have 822 million again I still believe that's China but this gives you just a really nice high level of all of these values or all these different calculations that you can run on it and we can run all these individually on even specific columns but you know it's just a nice highlevel overview one thing that we just talked about was the null values that we're seeing in here um I'd like to see how many values we're actually missing because that is a problem um we don't want to have too many missing values or could really obscure or change the data set in irely and so we don't want that so we'll say DF do is null and then we'll do a parenthesis and we'll say do sum and when we do this whoops dot sum there we go when we do this it's going to give us all the columns and how many values we're actually missing now we have 234 rows of data so we have 41 477 55424 um so we have we definitely have data missing what we choose to do with it in the data cleaning process maybe we want to populate it with a median value maybe we just want to delete those countries entirely if the data is missing um you know I don't think you're going to do that but these are things that you need to think about when you're actually finding these missing values this is what the Eda process is all about we want to find different um either outliers missing values things that are wrong with the data or we can find insights into it while we're doing this as well so so this is definitely something that I would consider um when I'm actually going through that data cleaning process really really important information to know now let's go right down here go to our next cell say DF do unique and this is going to show us how many unique values and it's actually n unique uh this is going to show us how many unique values are actually in each of these uh columns and this one makes the most sense um for continents because I think there's only seven continents right right um but we have six right here and for all of these each of these ranks countries capitals should all be unique that makes perfect sense as well as these you know these populations are such specific numbers and such large numbers I would be shocked if any of these were similar and then for these world population percentages it's much lower and again that makes a lot of sense because when we're looking at and we'll pull it up right here when we're looking at these world population percentages um a lot of them are really low 0.00 0.01 like this one um 0.2 there are a lot of really low values for those small countries and so those are all um you know one unique value now let's say we just have this data right here and we want to take a look at some of the largest countries and we can easily do that we could even we could say Max and take a look at the largest country but I want to be a little bit more strategic I want to be able to look at some of the top range of countries and we can do that based off this 2022 population so we'll say DF do sortore values this is how we sort and um not filter but um order our data so we'll do sort values and then we'll do buy is equal and then we'll specify that we want uh this 2022 population and then we're going to say comma and we'll say actually let's just run this as is um but we'll do head because we just want to look at the top values so now we're just looking at the very top values so what we're looking at is actually these 2022 population um that's what we're filtering on or sorting on basically and we're looking at the very bottom values because it's sorting ascending so from lowest to highest so this Vatican City in Europe is um you know 510 that's the value that we were looking at earlier now we can do comma ascending equal to false because it was by default true we can do false whoops we can do false and then it'll give us the very largest ones so if we just take a look at the top five largest by population we're looking at China India United States Indonesia and Pakistan and we can even specify that we want the top 10 in this head we can bring in the top 10 we also have Nigeria Brazil Bangladesh Russia and Mexico and you can do this for literally any of these columns whether you want to look at continent capital country um you can sort on these and look at them and you can even look at you know things like growth rate world percentage this one seems really interesting let's just look at this one really quickly before we move on to the next thing um if we look at this world percentage just China alone I believe yeah just China alone is 17.88% of the world so 17.88% world population percentage again just getting in here looking around that's all we're really doing now I want to look at something and I have always liked doing this which is looking at correlations um so correlation between usually only numeric values we can do that by saying DF docr and a parenthesis and we'll run this and what this is is it is comparing every column to every other column and looking at how closely correlated they are so this 2022 population if we look across the board it's very highly I mean this is a one: one this is highly correlated to each other and that almost for all of these populations they're very very closely tied to each other which makes perfect sense because for most countries they're going to be steadily increasing and so they're probably almost exactly correlated but we can look at these populations and if you look at the area it's only somewhat correlated and that's because in some countries you know they have a very high population but a small area or vice versa a small area and a very high population so there isn't a one toone correlation there but it's hard to really just glance at this um and understand everything that's there we could just visualize it and it would be a lot easier so let's go ahead and do that let's go down here we're just going to visualize this using a heat map basically so we're going to say SNS do heatmap and an open parenthesis and the data that we're going to be looking at is DF do core correlation and then we also want to say inote equals true I'll kind of show you what that looks like in just a little bit um but let's do PLT doow and this will be our first look and I need to say show not shot um we can get a little glimpse of what it looks like but this looks um absolutely terrible let's change the figure size really quick so I want to make this much larger than it already is we'll do pl. RC prams RC params oops right there do an open parenthesis and then right here we're going to do in quotes do figure. fig size this actually needs to be in brackets I believe just like this not parentheses we'll say fig size is equal to and now we can specify the value that we want let's do 10 comma seven and see if this looks any better no no that doesn't look good do 20 okay that looks a lot better and um you know this is just a quick way because it gives you basically a colorcoded system highly correlated is this tan all the way down to basically no correlation or negative correlation even which is black so when we're looking at these 2022 populations and these are populations right down here on this axis we can see that all of these are extremely highly correlated very very quickly whereas the rank really has nothing to do it's it's negatively correlated doesn't really have anything to do with it then for the population and the world population percentage it again is quite correlated except for the area density and growth rate so I find that really interesting that you know the density the growth rate in the area aren't really all that Associated or correlated with the population numbers that is I kind of would assumed that on some level they went hand inand the area does um would you know again make sense you know larger area larger population that kind of thing but even density um I guess I guess density and growth rate um growth rate I can see because that's a percentile thing that could be definitely not correlated I thought the density would be more correlated than it is all that to say is this is one way that you can kind of look at your data see how correlated it is to one another that can definitely um help you know what to analyze and look at later when you're actually doing your data analysis let's go right down here um something that I do almost all the time when I'm doing any type of uh exploratory data analysis like this I'm going to group together columns start looking at the data a little bit closer um so let's go ahead and group on the continent so let's look at it right here let's group on this continent because some times when you're doing this Eda you already know kind of what the end goal of this data set is you know kind of what you're looking for what you're going to visualize at the end that you really comes in handy when doing this but sometimes you don't sometimes just going in blind and so far we've really just been going in blind we're just throwing things at the wind kind of seeing some overviews um looking at correlation that's all we've done now I kind of want to get more specific I want to have like a use case something that I'm kind of looking for not doing full data analysis not diving into the depths but something we can kind of aim for so the use case or the question for us is are there certain continents that have grown faster than others and in which ways so we want to focus on these continents we know that that's the most important column for this use case this very fake use case um so we can group on this continent and we can look at these populations right here because we can't really see growth you can see a growth rate but the density per uh kilometer we don't have multiple values for that it's just a static one single value same for growth rate same for world population percentage but we have this over a long span many many years um you know 50 years of data here so this we can see which countries have really done well or which continents have really done well so without you know talking about it even more let's do DF Group by and then we'll say continent oops let me just copy this I'm I'm not could it's spelling we're going to say DF groupy and then we'll do mean and we can just do it just like this and now we have Africa Asia Europe North America Oceana and South America okay so if I'm being completely honest I knew most of these all right I'm no geography extra expert but I I knew most of these I don't know what this ocean is um this that I don't I genuinely don't know what that is um so let's just search for that value and see we'll come back up here in just a second but I want to I want to kind of understand um what this is so we're going to DF um and we'll say continent let me sound that out for you guys um then we'll do string. contains oops contains good night and then I want to look for Oceana uh and let's let's run this oh I need to do it like this now let's run this so now we're looking at our data frame we're seeing when the values have this continent as Oceana um okay so these look like Islands I'm guessing so we have Fiji Guam um New Zealand Papa New Guinea yeah these look like all I'm I'm guessing based off the continent Oceana um Oceania o Ocea Oceania guys this is tough for me okay I'm doing my best I you know this is part of the Eda process I don't know what that means I don't know what ocean ocean ocean Oceania geez I'm just going to call it Oceana that's so wrong but I'm just gonna it's so easy for me to say you know I I now am seeing this and it looks like Islands um which would make sense because for their average they have the highest average rank um and I'm guessing that's because they're just mostly small continents so let's let's order this really quickly we're going to do dot sortore values do an open parenthesis and I want to sort on the population we're just doing the average population um we'll do BU um equal so on the average population and we'll do ascending equals false so we're looking at this average or the mean population Asia has the highest population on average and we have South America Africa Europe North America and then Oceana at the very bottom which makes perfect sense again small Islands um world population percentage so each of the countries each of those countries in Asia makes up about 1% on average really interesting um to know and just kind of look at this and and the density in Asia is far higher than double almost double every single other continent um really really interesting actually now that I'm looking at this but you know that's something that I would actually look into and I would be like what is this Oceana or Oceania what does that mean and you know let me look into that let me explore that more because I want to know this data set I'm trying to really understand this data set well but what I want to do now is I want to visualize this um because I just feel like looking at it I don't it's hard to visualize and again the use case that we're saying is is which continent has grown the fastest like it could be percentage wise it could be um you know as just a whole on average let's take a look so we're going to take this and let's copy it like this let's bring this right down here so let's look at this so if I try to visualize this and let's do that let's do df2 is equal to because I'm I already know it's not going to look good just based off how the data is sitting um we do df2 oops what am I doing I don't need to do that but I will okay df2 and we'll do df2 do lot I'll we'll run it just like this um as you can see Asia South America Africa Europe North America Oceana we can kind of understand what's happening but these are the actual um values that are being visualized not the continents which is what I wanted um in order to switch it and it's actually pretty easy and this is something that um you know is good to know we can actually transpose it to where these these continents become the columns and the columns become the index and all we have to do is say df2 do transpose and we'll do this parentheses right here and let's just look at it and then we'll save it so now all these columns are right here and all of the indexes are the columns so let's say df3 is equal to and I'm just doing that so I don't you know write over the DF or my earlier data frames so now we have this data frame three so now let's do data frame 3. plot and it should look quite a bit different uh whoops I didn't run this let's run this and run this and as you can see this does not look right at all and the reason is is because we're not only looking at uh the correct columns we have this density in here word population percentage rank we don't need any of those the only ones that we want to keep are these ones right here this population now we can do that and we can just go right up here this is where we created that data frame two that we transposed we can go right up here and we can specify within this we actually only want specific specific values now we can go through and handr write all of these and by all means go for it but I am going to go down here I'm going to say DF do columns and I'm going to run this it's going to give us this list of all of our columns and I'm just going to you can just copy this and you can put it right in here think I need a list with I think it needs to be like this if I'm let me try running this okay so this worked properly you can do it just like this or a little shortcut if you want to do it like that if you want to do a shortcut like um I I would hope you would you would just do DF doc columns just like how we looked at down here except since this is our an index we can search through it so we can just say 0 one two okay so we can do five up to 13 because I think it's seven and we'll just let's see if this works uh it may not I may actually need to go like this let's see there we go so you can just use you know the indexing to save you some visual space gives you the exact same output so now we have this this is our df2 now let's go down and transpose it so now we just have these populations and we have our conents right here and then now we're going to plot it and this looks good although it's backward um okay it's backward so what I actually want to do is not this uh that is a quick way to do it although not the best way to do it um so I'm actually going to copy all of these and although I said it would save us time it did not at all so I'm going to put a bracket right here I'm going to paste this in here and I'm literally going to change these up I might speed this up or I might just have you sit through this because you know this is an interesting part of the proc process and I want you know you to get the full experience you know what now that I'm talking about it that is what we're going to do you guys can hang out with me this is a good time we have 2010 2015 2020 and 2022 now let's run it what did I do oh too many brackets there we go so now it's ordered appropriately we have 1970 all the way up to 2022 this is how we want it let's transpose it appropriate let's run it and now we basically have the inverted uh image of this now just at a glance and we haven't done anything to this except for literally what we are looking at at a glance we can see that from 1970 China here you know Asia and China are already in the lead by quite a bit and it continues to drastically go up especially in the 2000s like right here it explodes like just straight up then kind of starts going up and just leveling off every other continent especially oce Oceana is just really low it it never has done a bunch let's see look at green green has gone up um from you know Point let's say 0.1 up to about point2 so they've almost doubled um in the last 50 years and again you can just get an overview a high level overview of each of these you know continents over the span of this time so this is kind of one way that we can you know look at that use case we're not going to harp on that too long I just want to give you an example like you know when you're looking at this sometimes you'll have something in mind of what you're looking for and you go exploring and just kind of find what's out there and find what you see um the next thing I want to look at is a box plot now I personally I love box plots you know they're really good for finding outliers and there's a lot of outliers I already know this because because the average the 25th 50 percentile are very low and then there's some really just big outliers but for your data set it may not be that way and those outliers may be something that you really need to look into and box plots have been something that I've used a lot where I found those outliers that way and started to dig into the data to find those outliers and you know came across some stuff that I'm like oh I have to clean this up I have to go back to the source really um really really powerful and useful to be able to find these so all you have to do is d. boox plot and let's take a look at it and this already looks good as is maybe I'll make it a little bit wider um let's do fig size oops sorry fig size is equal to let's try 20 by 10 um okay that didn't help at all I apologize thought I would but let's keep going what this is showing us is that these little boxes down here which are actually usually much much larger because you have a more equal distribution of of um numbers or values in the small value this is where our averages lie this number right here is the upper range and then all these values all these Open Circles those actually stand for outliers so we're looking at the 2022 population there's a lot of outliers now for our data set knowing our data set is really important outliers are to be expected especially when most countries are continents are small so we're looking at you know all of these little dots are outlier countries um or outlier values which each value corresponds to a country so if this was a different data set I would be you know searching on these and trying to find these so that I can see what's wrong with them if anything or if they are real um numbers like if this was Revenue everyone's revenue is way down here and then there's one company that's making like 10 trillion dollar that'd be an outlier up here and it would definitely be something that you want to look into to for our data set knowing that you know we're looking at population this is more than acceptable you know oddly enough but that's what box plots are really good for showing you some of those cor tiles the upper and the lower um as well as denoting these points that fall outside of those normal ranges for you to look into so really really useful so now let's go down here pull up our data frame again and we've kind of just zoomed into the whole Eda process there was one last thing that I wanted to show you and this is the very last thing that we're going to look at we're ending on really a low point if I'm being honest because the last kind of stuff was more much more exciting but there is something DF DOD types oops let's do DF DOD types and we'll run this now just like info it gave us these values but we're actually able to search on these values now so these um object float and integer we can search on those which is really great because we can do include equal and we can do something like number and none of these are numbers right or none of them explicitly say number but when we run it I'm getting an error series object not oh that's because I'm doing um D types is for a series we need to do select underscore D types now let's run this now it's only returning um The Columns in this data frame where the data types are included in this number so you won't see any you know country or any of those text or the strings if we want to do that we go in here and say object and run that and this is another really quick way where we can just filter those columns to look for specific whether it's numeric um we could even do float in here and so now it's not including that rank which was an integer so we can specify the type of data type and it'll filter all of the columns based off of that which you know when you're doing stuff like this you it is good to know what kind of data types you're working with and look at just those types of data types because there might be some type of analysis you want to perform on just that whether it's numeric or just the string or integer columns within your data set so again ending on a low note I apologize um you know everything else that we looked at all those other things that we looked at are all things that I typically do in some way or another when I'm looking at a data set exploratory data analysis is really just the first look you're looking at it you're going to be cleaning it up doing the data cleaning process and then you're going to be doing your actual data analysis actually finding those Trends and patterns and then visualizing it um in some way to find some kind of meaning or Insight or value from that data and again there's a thousand different ways you can go about this it it does typically um you know depend on the data set but these are a lot of the ways that you'll clean a lot of different data sets and so you know that's why I went into the things that we looked at in this video video so I hope that you guys liked it I hope that you enjoyed something in this tutorial if you like this video be sure to like And subscribe as well as check out all my other videos on pandas and Python and I will see you in the next [Music] video what's going on everybody welcome back to another video today we are back with another data analyst portfolio project where we will be scraping data from Amazon using [Music] python now you may be asking do I need to know web scraping to become a data analyst and the answer is no you absolutely don't need to know it but it is a very cool skill to learn and in fact I have used it in my job in the past and so it is useful but you really don't need to know it something that it is used for is kind of creating your own data sets um and we're going to be looking at one where you can create your own data set today but there are a lot of other uses for web scraping and I'm sure I'll talk a little bit more about that while we're actually walking through the project one last thing I want to say before we get started is that this is most likely an intermediate project so if you are just now learning the basics of python this might be a little bit challenging for you but I still recommend going through it because I will do my best to walk through everything every single step of the way and and kind of explain all the concepts and so you can still learn something even if you aren't super good at python right now with that being said let's jump over to my screen and get started on the project all right so we are going to get started and if you didn't watch the last project I had people download Anaconda uh we use Jupiter notebooks um and I'll show you how to get to that in just a second but I'll I'll leave this link in the description if you haven't done that already and you are just doing this project um but you'll go you'll download andaconda You Know download super easy um and you're going to open up Jupiter notebooks I'll launch it right now I already have it open uh but I'll open up another one just for you know the purposes of demonstration what we are going to do today and what we um what people voted on I mean there's like there was like 8,000 people that voted um in the poll that I made of what data you wanted me to scrape there was like Amazon cryptocurrency weather um something else I don't remember overwhelmingly I mean like 70% of people maybe even 80% I you don't don't fact check me on that voted for Amazon um and so I'm going to do it now there are many things that you can scrape um off of Amazon just a ton of stuff um and I'm going to show you how to do it I'm going to show you how to make it useful how to make a data set um and it's going to be really interesting but there are lots of other ways to do this and so I think um and I have already kind of created it I'm going to show you how to do it off of this page um when you're actually in an item and you can scrape you know basically anything in here um and I'll show you how to do that another thing that is a little bit more advanced and that's why this first video is starting off I think on the more easy side it's not easy but it's easier the next thing the next video that I'm going to make is how to actually do um basically do multiple items right so this item this item this item this item and then Traverse through the different pages so there 20 Pages um you want all of that data how do you get all of that that'll be the next project um I don't know when I plan on doing that I it like 90% of the way done um but I had this one completed and so I wanted to get that out to you guys now but that will probably be the next project I think that is much more difficult um and so if you can understand this one and you get it and and you understand it then the next project you should be able to understand too is just a little bit more complicated so with that being said um we are going to actually get into the project I'm going to delete one of these um all we're going to do is go to new do Python 3 it'll open up new one we'll call this um Amazon web scraper um project that's what we'll call it I spell it right perfect um the first thing that we need to do uh or that we should do is upload um or or or import our libraries so I'm going to say um import oops what am I doing it's off to a terrible start there we go import libraries now I'm not going to write out all the libraries um I have some things that I'm going to be copying and pasting throughout this I won't there's only a few things that I'm copying and pasting you can take a quick glance um some of the things that I just don't want to waste time on um because this could be a long video I don't know I don't want to waste time on stuff like this um and so you know I'm just going to copy and paste it you guys are going to I'm going there will be a link below if you haven't clicked it already that will go to the GitHub page where you can literally have all of this code already written WR I do recommend writing it all yourself because you will learn it much better I promise CU then you'll make mistakes and you'll figure it out and all that all that good stuff but you will have that code available so just go copy and paste it um that's what I would do but what we are we are going to be using today is uh something called Beautiful soup requests um then we're going to be using time and date time and a potential one if you want to get and I'm going to show you this at the end this is not really part of the project it goes above and beyond but this Library here is for sending emails to yourself um and I'll show you how uh you can use it if you want to I already have the whole code written out um you can just steal it and try it out yourself and see if you can get it to work but this one is not um as important I'll put it down here so um let's move on now one thing I want to say before we get too into it is that well give me a second is that right here in front of me is a different laptop now it took me a solid I would say you know 10 hours or so to write all of this is took over the course of like two weeks in my free time I'd pick it up it took me a solid you know two weeks on and off an hour here an hour there to finish this project um and I made a ton of mistakes and messed a bunch of things up and I finally got it to work um you know after a bunch of revisions that's typically how things go when I do projects and so uh I'm about to give you a stream lined version of this because I have all the code right down here and so I'm going to be glancing at this a lot um just so I don't make this video 20 hours of trying to remember all the code off the top of my head I have it written out already I already did the project it works it's beautiful it's a good project so um I don't want to waste your time and I just want you to know that you know you you nobody should be able to do this up top their head in an hour most people won't um it takes time you make mistakes um but uh let's get started on the project now in this uh in this what we're going to have to do is we going to have to tell beautiful soup and requests where we are actually getting this data from what website um what is our computer you know some information from our computer I'm going to again there's going to be a little copying and pasting in here because you don't ever you will never ever ever need to know this um but right here we're going to to basically connect to the website so I'm just going to say connect to website and we going to say URL is equal to and let's go get our URL so we have this right here so literally just go up here do you know uh controll a copy that oops that's the actual project get rid of that uh paste it in here and that is our URL we will use that in just a second uh what am I doing me just get some room here and then we what we're going to need is something called headers now again you will never ever ever need to know this so I'm just going to say headers um what I'm going to do is I'm going to copy this I'm going to show you how to get this really quick um but is something called headers so uh let me show you how to use how to get this and why you don't need to know any of this so what this headers is is this something called a user agent you need to do this for your computer um and you can do that by going to this link right here so I'm going to put this link in the description so that you can go and get that and there's something right here called the user agent so all you have to do is copy this just like this do copy I'm going to go back here and I'll show you that it's I'm going to copy it in um it'll be the exact same so there you go it's the exact same um all of this extra stuff except encoding except um this HTML stuff Connection close all the you don't need to know any of it I promise you'll never come in handy ever in life actually there will be one person who that becomes in handy for and then they'll message me um but we are now connecting um using our computer using this URL and then what we want to write is we want write page we're going to say equals and this is where we start using uh these libraries so we're going to use requests.get and we are going to pull in that URL and we're just going to say headers is equal to our headers right here so uh we have this and this is where we're going to actually start getting the data bringing in the data um and it's not going to look like that at first but I'll try to print some stuff out out as we go along the way so that you can kind of see what it looks like and how we're going to kind of make it more useful because it comes in very dirty uh when we first get it and some of the things I'm going to show you will just help clean that up um and before we actually go any any further I don't want my head to be here for the entire time I'm going to get rid of myself so you can just see the page uh I just it's less distracting uh I hate when I feel like people are always watching me so I want people to just focus on the code uh so I will see in a little bit let's get back into it all right so what we are going to do is we are actually going to start using the beautiful soup Library all right so we are going to say soup one is equal to and this is where we actually start bringing beautiful soup and you guess it you're going to say beautiful soup and then in parenthesis we're going to do page. content um and again these aren't really things that you need to remember or need to memorize we're just pulling in the content from the page that's really all we're doing right now and and it comes in as HTML so we're going to do html. parser uh and let's see if I can print out uh actually let me just do soup one I don't like I don't like doing upper caps on stuff let's see if anything prints out real quick so we are literally pulling in all of the HTML um and let me go show you really quick because we're going to get to this in a second anyways um if you come here this is this is a static page basically written in HTML um if you have never seen HTML before um you know actually a lot of this is you know just stuff that most people will never use uh it's just good to know some of the stuff is good to know so as you see I'm scrolling on this right side by the way I did rightclick and inspect or control shift I whichever one works better for you but as I'm scrolling over this you should see it kind of highlighting different areas um it's hard to kind of get what you want let's say we want this title um what I can do is I can click select element go right here um and then we can select like a TI the the the header or the title of the the page now I just want to show you though of what we're pulling in so we're pulling in this doc type HTML all of this is coming in so that's what this is right here this doc type HTML and we're pulling every single thing in that is what we're doing right now uh so let's get or let's go down a little bit let's do soup two we're just going to do a very uh you know an upgrade to soup one basically we'll do beautiful soup again and then we're going to do uh soup one so we're pulling in that content again so that soup one and we're going to do do PR prettify if you don't know what that is it is common in a lot of different languages and a lot of different stuff um it just makes things look better it that's really all it is uh I don't know why I'm using double quotes I don't know why I can you can do single ones if you want um and now let's do beautiful soup to and it should just be a it should be better formatted um and let's see if that's true and it is so before if you did if you could tell it was didn't have basically any formatting it has a little bit of formatting now um it'll help in a second um and you'll see that but now what we want to do is go back and we want to actually get the data that we want now you can get any data you want I'm going to show you simple things really really easy um in my in in in my opinion it gets more difficult the more complicated stuff you start pulling um and and you'll understand that as we go into it so what I'm going to do is I'm going to select this and I'm going to select this um the title I want that and so if you do span ID it's equal to product uh title so we need to remember that um class we don't need to know class I believe uh we're going to be using that ID this um ID equals product title so that's what we're going to be using um class will come in in the next video when we start looking at these uh but not in this one so let's remember ID equals product title so let's go back over here so we have this soup 2 it's basically all of that HTML in it right down here that that is what we're pulling in so we need to kind of specify what we actually want so let's say title that's what we're going to be getting um and we're going to do soup 2 so using taking all that content um we're do find and we're going to do open parenthesis and we're going to say we want to find that ID where it's equal to product title and then we're going to do do getor text and then we're going to do open parentheses so now let's um let's print the title and see what we get all right so that is exactly what we're looking for it's funny got data Mis um T-shirt that that is what we're trying to pull in so that's perfect that's exactly what we want we don't uh let me let me just do this save me some time later on we don't only want the title we are also going to be pulling in the price so if you can guess uh we'll be doing some uh a data set on the actual pricing um and so let's go back here we're going to again use this right here and we're going to go to this price and it says again we're going to look at this ID the ID equals price blockor our price so fairly easy you can copy this I'm just going to write it out um we're going to say price is equal to sup 2. find and then it's going to be again ID is equal to and it's going to be price block underscore our price did I saw that right oops excuse me there we go and the exact same thing.get text parenthesis uh and there's a g text there's a get all or get all text um so you know that get text is a specific thing that we are using you we might use a different one later on um but that that is what we have so now let's let's print the title and print when I why do I have all this too much uh too much space so let's T print the title and print the price let's see what we get okay so we have our title and we have our price I mean you know I don't know what all this white space is over here um but it looks like there's a lot of white space over here we'll have to get rid of that uh in a little bit as we clean it up a little bit you can if you want do things like um you can get and this is up to you I'm not going to do this right now but I'm just going to show you how to do it you can get this where you're pulling in the ratings um which is you know if you want to look at like how the ratings over time or or what ratings are for specific products that could be really useful um you can pull basically anything you can go down the product details and look at Dimensions uh anything you want on this page it is static so you can go in here and pull anything it's it you just have to pull it from the HTML know where you're looking pull it in um and now when we go back here excuse me I'm going to show you now kind of how to use this right because we have this but how are we going to use it um that's kind of the important part I think first thing we need to do is clean this up a little bit because it just is you know if we try to use this it wouldn't be super useful because it'd be just a little bit dirty it's not super clean um so what we want to do is let's start with the price why not uh we're going to say price. strip um and that's just going to take uh basically the the junk off of either side and so let's run that real quick so this is what we have but what we can also do is I don't want that dollar sign I just want the numeric value um later on we are going to be putting this and we're going to be um creating a process to put this into an Excel file again we're trying to create a data set I don't want you to have to copy and paste stuff it's all going to be automated basically to input this data into an Excel file for you or a CSV file for you so um you know think about making it useful in a CSV or in an Excel later on so what we can do is do a bracket and we're going to do one and then everything after that so basically it's just going to take everything from the first position onward uh so let's run that and there we go so let's just say price is equal to price. strip um and pull uh just do everything after that first um that first not value what am I saying what's the word for that I can't remember the word the first space that's not the right word but all right let's do the title um this is basically going to be the exact same thing um super easy so we're just going to do title. strip and open parentheses um and we can you know if you want to do this exact same thing so now we have it it's a little bit cleaner so this is what it originally looked like and now this is what it looks like so you know nothing super crazy but you know something interesting to know now we are about to in the very next part what we are going to do and let me just add a few of these because makes me feel better um what we are about to do is we're going to create our CSV to insert this data into the CSV and then later on what I'm going to do is show you kind of how to um automate this process to pull this data um to create a data set right just pulling this one time and putting into a csb really doesn't do anything you can just copy and paste that and save yourself a lot of time um what I'm going to show you is is um basically doing it over over time and just having it automated in the background that is what I'm going to show you um I guess a spoiler but what we need to do is we need to create uh create the CSV insert it into the CSV and then create a process to append more data into that CSV um I'm doing a lot of talking let's do some writing so what we need to do is we're going to use um I should have done this at the top maybe I'll go back and add that later on we're going to do import CSV now in a CSV what you want is you want headers and then you want the data right so for our headers and we're going to call it header we're going to do um we're going to do a bracket and let's make the first one a title because that's going to be uh we can call it title you can call it product um whatever you want I'm just going to call it because I've been using title I'm going to call it title um and then we'll also have price now we need our data so I'm going to say data is equal to now this is important um right now how our data is and I can do this right here we're going type um title or no let's do type price so these are strings and that's important to know um again I don't want to get too much into you know dictionaries and arrays and lists and and strings and all these things but this is a string and you can't put that right now it's not super usable what we're going to do is make this a list um and so I'm doing an Open Bracket and I'm going to say our data is title comma price oops price now oops if I do type oops of data I'll just run that it's a list now um and this is important because you can run into a lot of issues with the stuff it's really important to remember what's what type um how do I say this uh how your data is is it a list is it an array is it a dictionary um you know what is it these things are important they do play a big impact especially with this type of stuff so just wanted to show you that really quick but what we are now going to do is create a CSV um you're going to create an Excel I I call an Excel CSV you know whatever you want to call it so what we are going to do is we are going to say with and we're going to say open and now we're going to name our file you can name this whatever you want I'm going to call it uh um Amazon web scraper data set that's real long uh. CSV and then we're going to do underscore W and that means right um oh whoops that's not right just like I was wondering why that was uh in Black uh so we're going to do W which means right um and then we're going to do new line and if you don't know what new line is uh all that does is when we insert the data it doesn't have a a space in between each CSV and then we are going to do encode coding is equal to oops is equal to utf8 and that is it and we'll just say as uh let's do F so some of that stuff you don't need to know some of it's useful this W definitely need to know this new line is is good to know and um I'll take it I might take it out just to show you what it actually does because it's annoying if you don't have it I promise um but you know that that new Line's important this encoding you know good to know I think that's by default is is it's like that uh anyways what we're going to do now is we're going to uh it's something within the CSV within the CSV um Library so we're going to do something called CSV writer and oops CSV do writer and we're going to do open parenthesis and that is that and we'll just call that writer and then we'll we'll do this is where we need to actually create the header so uh we're going to do writer is dot sorry writer. WR row uh and this is just for the initial um the initial import or or or um not import the initial insertion of the data into the CSV this is what's important the next one that we're going to write is for when we're actually appending the data which is going to be a little bit different but anyways we're going to do right Row open parenthesis and this is where that header is going to go so we're going to that these headers are going to be the title and the price and then for our last one we're going to actually write the data which is this data right here and we're going to say writer. write row and we're going to do data so this one we are creating the CSV and then we are inserting the header and inserting the data so super easy um yeah I think that's fairly straightforward right now let's do this and let's see what happens so I just ran it um let's go over here in here somewhere Amazon web scraper data set let's open that up and there we go oh jeez this isn't good can't verify my uh my subscription uh why does it say $699 I'm going to go back and look but I think I know the issue um but this is exactly what we want now of course we want more data and maybe a little bit more useful data um and I'll show you how to get that in just a second but we just created that out of thin air uh that was not I didn't have that saved before so we have this data set and the issue was is that I ran this multiple times so now it's $6.99 if I do it again it's 99 uh and if I did it again it's it gets rid of everything so I'm just going to run this again run this again now everything's back to normal okay so now if we run this it's going to overwrite this Amazon webscraper data set. CSV and it will put the data in properly so there we go oh jeez guys this is embarrassing I'm embarrassed no I don't want this okay perfect um guys I if you can't tell I'm in need of some um I'm in need I'm in need of some help here but I'm just kidding I'm I'm doing fine uh I just I don't know why that uh why I don't have my uh subscription activated it's not going to matter for this video I guess but that's really random um so we got what we need that's perfect now what we want to do after this um I I guess actually what is important is some more useful data something that I like to do a lot when I do this type of this type of stuff is I like to have some type of date stamp um or some type of Tim stamp to know when I collected this data it usually comes in handy later on um I I have never regretted putting it in there I'll show you really quick how you can do it uh you're going to do import daytime geez I hate having to format stuff like that and what you can do is you can do date let me get date time and you do dat. today open parentheses and that is going to give us this right here uh and so we're just going to do um today that's what we'll call it is equal to this and we'll say print today and there we go so that is today's date is the 20 of August in 2021 so today is now um is now this so actually I'm going to get rid of that I'm going to put it back up here I'm going to put it right there I'm going to run it again let's add this right here we'll do um we'll do we'll call it date and then we'll add today and we'll just run this again and what we can do just to check the data without having to open up the data every single time which is super annoying is we're going to use pandas again I should have imported this at the top I'm just kind of um I'm not doing this off the top of my head but uh I didn't have it 100% planned so import pandas and we're just going to say pd. read CSV and then we'll read it in um what you can do or what I often do is I go to properties and I go right here and we'll say boom boom back slash this right here this I am doing off the top of my head I don't do this often I think I have this memorized by now uh I I I hope and then we'll do print oh no we don't have to do print we'll just do this uh what do I do R let's actually call this um data frame and we'll do print let's see what happens perfect okay so what we have now is the new our new header our new data that we added in there so we have our title we have our price and we have our date now again you can customize this whatever you want to add go back here um you know find what you want you know do you want it to make sure it has a men's option or different colors or you want to pull in this information whatever you want it it really does not matter um just matters that you know you get what you need for whatever purpose whatever you're making this for this is more of an introductory video to how to scrape data from Amazon um the next video will probably be a little bit more difficult and in-depth but this is kind of let's get you guys started so um we now have this and this is beautiful now something that you want to do when you're scraping data and you're getting um I guess data over time and that's kind of what we're doing is going to be almost like um a price tracker over time is you want to then append data to this so we can't only create it and that's what this does because if I run this 100 times it'll only give me this first row we need to now append data to this so um let's let's pull this down here uh again I'm I'm not I haven't added a bunch of notes I'm going to say now we are appending data to the csb I haven't added a ton of notes I'll try to go back maybe afterwards and add some notes for people who like to read notes um so what we are now going to do is we're going to change this W to an A+ now this is going to be how we append the data um and we no longer need the header so we don't aren't going to do the header anymore and there we go so now instead of excuse me so now instead of creating that header again creating that first row of data again we are ignoring the data and we're now going to the next nearest free row and a pending data which means to add on data to that um and so if I run this which I'm not going to right now I mean why not I can I can run it um and then we can read this in so now there there's our data I'll run it a few more more times I ran it like three or four more times I I run that in and there we go now it's all the exact same data super um boring but very very uh you know good to have now we don't want to have to come in here and run this every day let's say we're going to do this daily um we don't want to have to come and write run this every single day right we want a way where it does it while we sleep it does it in the background of our laptop um and is easy to do right I don't want to come in here every single morning with a set an alarm on my phone every single morning come in here I want to automate this so uh how are we going to do that give me one second uh if you didn't know I have three kids and one of them is waking up I'll be right back all right I think he is asleep um at least let's hope he's asleep so now what we're going to do is we're going to put this all into uh this check uncore price now you may never have used oh geez what are these things called oh my gosh super used all the time you'll know what I what it is uh not a function I don't even remember what it's called maybe this's a function um I can't think I'm having like a writer's block or whatever that is we're going to put it all in here and then we're going to be able to use this price check later um because we want to be able to automate this so let's go back all the way up here we are going to use this so let's copy all of that in and oh jeez I hate this all right everything just like that um so this pulls in our data pulls in uh or or yeah pulls in all of our data down to the title and the price we want to make it look right so we're going to put it right here so now we have it formatted properly um we want to add our date time do it just like that I don't know if there's a better I'm sure there's a better way to do this um then we need need this right here and just like that like that so now we have our header and our data and then we want to pull this in right here boom boom boom okay so everything that we just wrote out we are now putting into this check price now you can call it whatever you want doesn't matter but let's run that see if we get any errors we don't so this is now good to go basically um what we are going to use this for um and what this is going to do is we are going to put this on a timer um you know have you ever wanted to like check something once a day once every 10 seconds once a minute whatever you want and you don't want to have to actually pull up your phone and look at it this is how we are going to do that so we had something called uh let's see time this this Library time right here that's what we're going to use right now so we're going to say while oops while true and go like this do a colon we're going to say check unor price that's what we just wrote out and we're going to do time dos sleep now this is completely up to you how how much time you want to put in here for the purposes of demonstration I'm going to put 5 Seconds which means every 5 Seconds it is going to run through this entire process and so let's run this really quick and I'm going to run it for let's say 30 seconds and then I'm going to pull this in right here so we just looked at it earlier we had four um well five rows of data right what we are going to do is in just a second I'm going to stop this you know maybe after 30 seconds or so we're going to see how much data is in there uh and let's stop it right now it's been going far enough um and La let's run it so now we have five six seven eight so I guess I ran for 20 seconds we can that was for demonstration purposes I've never do any some anything every every 5 Seconds um unless it was like Black Friday on Amazon we can put this as long or as short as you want you can run it every second if you want um that doesn't make sense to me but you can what we can do is do a little bit of math uh and I don't know this off the top of my head so I'm going to uh do the math with you live pretty exciting stuff got the calculator out so there are 60 seconds in a minute and this goes by seconds by the way and you could do you know you can do some um some string up here of calculating this but I'm just going to put in the number because it's easier uh maybe not easier I'm just going to do it there's 60 seconds um in a minute there are 60 seconds or 60 minutes in an hour so that's one hour uh and we can do 24 hours in a day so that's 86,000 400 I believe did I read that right oops did I read that right right yes so this now if I ran this and I'm going to this is going to check the price every single day and this is the entire point of this um of of this project not the entire point but this is a big part of this project is we want to create our own data set now something that I personally really love is a data set that has you know that I can do some type of time ser series with now this is not exciting it's probably not super exciting for this right but you get the idea that if this price were to change we would then see that reflected in the data at some point you can do this on any item you could ever imagine on Amazon it's the exact same process and some items change often this t-shirt will most likely never change um and so you know again this is for for demonstration purposes the code itself will be nice to put in a project although the data set that you get from this probably won't be the best I would imagine but notice that this is running um I can then minimize this and this can run on my computer basically as long as my computer uh is is working um one thing I will say before I go on to some more stuff one thing that I will say is that I personally when I did this for a when I um created this I did something similar and I put this in Visual Studio code um and I didn't put it in Jupiter notebooks that's a personal preference I would look into that if that is something that you want um I think visual studio code is a little bit easier for automating these types of tasks um but for illustrative purposes and for demonstration purposes you cannot beat jupyter notebooks that's why I did it so with all that being said that is basically the end of the project now um I'm not going to stop this and read it again but you get the point um we now have um a data set that oh jeez all this again that now has um data I'm getting out of here oh geez it's hounding me let me get out of here oh no all this is embarrassing guys I'm embarrassed we now have a CSV file with data in now you run this in the background of your computer you can do that I have done it I've ran it for weeks I have ran it for months um if you restart your computer just come back in here and restart running this process um it's the same for any automated process unless you start using some online um automation service which will run it regardless of your computer they do it you know either in the cloud or on some um server so you know that this is a really good option again if if you restart your computer or something happens and you lose connection just come in here run this through this script again um except for the one where it deletes all your data don't run that one again only run that one time um and then you will in fact what I would do is then um I would just comment this out right I'd come in here and I would just comment this out so that anytime I come back in here I would never accidentally delete all my data but that is what this project does now something really interesting something that I have done in the past that I thought was really cool really useful I actually did it for um I actually did it for some watches that I was watching especially on Black Friday it's when I used it I was interested in a price drop or specific price change and what I did was is I said and I don't know so what I basically did was is I said if the price is lower than let's say let's say we wanted to drop below $14 it would then send an email um and I'm going to show you the script that I used it still works um and if this is something that you are interested in this could be a completely different project I just think it's interesting and I wanted to show it to you although I wouldn't say this this is part of the um final project let me just come in here and we are going to create this super simple um not super simple we're sending a mail we're connecting to a server we we're using Gmail we're logging into our account that is my email you will not get my password we're creting the subject the body um we we configure or or just kind of create this message and then we send a mail so then I have this Define uh or this send mail I am blanking on what this is called I'm going to call it a function but that's probably not right so if that price drops below a certain point it'll send me an email um I have used this and I used it and was able to buy a watch that was like you know let's say 140 bucks for like 90 bucks um on Black Friday sale I was really really happy about that so this can be used in that way as well um not something you to write into your project just something I'm going to include down here if you want to try it I think it's super interesting something really fun um really fun to mess around with I enjoyed this so with that being said uh this is this is the project um I in the next one and I promise you this one is probably going to get a lot more difficult if you thought this one was easy which I hope maybe I hope you do then that means you're you know pretty good at python you know in the next the next um web scraping project and I hope to do many of these I might do um even all the ones that I put in that poll but I started with the one that was the most popular um you know if you were able to get through this I think that that is fantastic I think this is a solid project to create um a data set and so use this how you will you can copy my code exactly I don't have a problem with that again I don't think this is beginner there are some a little bit more advanced things and I not even Advanced just like intermediate level things um that you kind of learn as you get into it and so um I hope that this was instructional I hope I explained it you know well um and I hope that this is useful again you know when you actually use this you'll have 22 23 24 25 you know you'll see a price change a price change a price change a price change go use a a product or go to something that you were interested in or that you know fluctuates often um and there are plenty of those on Amazon I promise you there some that literally change almost every other day like down a dollar up a dollar um and then Black Friday just goes crazy um with these price changes so use this as you will I hope that this was instructional I hope that it's useful I think I said that before is you know I'm doing this because I think it's really interesting it's really useful um um this to me again was a good introduction a really good introduction to web scraping because in this next one it gets quite a bit more difficult um I would say on a scale of like difficulty this is like maybe a four and it'll probably jump up to like a seven on this next one um just just much more um technical or or coding heavy so um you know look forward to that if that's something that you look forward to with that being said I'm going to go back over here for my send off with that being said I hope this was helpful I hope that you learned something um don't get mad at me if it was too easy don't get mad if it was me if it was too hard uh I'm doing my best over here so I appreciate your patience thank you so much for watching I really appreciate it if you like this video be sure to like And subscribe below and I will see you in the next [Music] video [Music] what's going on everybody welcome back to another video today we're going to be creating a script to automatically take data from a crypto [Music] API now this project stems from an earlier video that I did where I walked through what an API was and how you can use it and in that video I showed you how to use coin market caps API so you could start pulling in their crypto data and in this video we're going to take it one step further and automate that process now we're going to do a little bit of transformation with the data I'm going to show you some cool stuff of how you can use it and maybe we'll do a little bit of visualization at the end but that is not the main point of this video it's mostly around the automation piece and a little bit of the data cleaning piece as well now fair warning this is not a beginners level project it's probably more like an intermediate project and it's not even a complete project per se because we're not doing all the data cleaning we're not doing all the visualizations but but if you follow along we're going to cover a lot of different things and you're really going to set yourself up to be able to do just about anything you want with this data or different apis that you pull from so with that being said let's jump onto my screen and get started with the project all right so this is where we stopped in our last video so if you haven't watched it now is the time to go back and do that I'll have a link in the description also all the code that we're going to be looking at today and working through is going to be in a GitHub repo below so you can go and get all the code and have it completely finished and just follow along or you can code it from scratch along with me I do recommend writing it from scratch if you can because I think you'll learn more and you'll make mistakes and you'll learn from that as we go through it but it is up to you so let's get started and as you can see uh we have the script right here and I'm starting basically from scratch I have a completed one up here I'm actually going to get rid of those um and what we're going to do is we're going to start from exactly where we started in our last one I'm going to run the script um this is going to p from our API and we're going to look at the dictionary set our option and do our Json normaliz so this is where we literally left off from the from the last video so we have all of this data and what we want to do with it is we want to kind of automate that process right because we don't want to have to come in here run this and you know put into a CSV manually or something like that we want to automate this data collection process so that we can just have the data ready for us to use um and it all be ready to go so we're going to be using this script um but you know we we might want to add a little bit more to it before we do that uh the first thing that I want to do before um before anything is something that I like to do when I'm creating these automation scripts as I I like to add a Tim stamp uh and the reason for that is because I want to know when I ran or when each of those um Loops you can say runs through an and does those automated runs right so if I do it every day I want to know what time of day I ran it making sure each run ran successfully and so all I'm going to do is I'm going to add a new column at the end and just call it timestamp so let's go right up here and we're going to say PD Dot and there's something called two date time so we're going to do 2core date time and then we're going to do now and what this is literally going to do is take the the date the the Tim stamp of right now when it's running and it's going to show that now we need to of course add a new uh a new column for that so all we're going to do is we're going to say data frame whoops we're say data frame and let me see real quick we just have the data we need to add we need to create this data frame right here so data frame equals and then this Json normalized and we're going to say data frame and then we're going to do a bracket and we're going to say timestamp and we'll do well are all these lowercase we're going to keep with the the lower case we're going to say time stamp and we do that bracket and we'll say equals so what this going to do is g to first off it's going to create this dat or or assign this DF as our data frame and then we're going to add this time stamp and add this new column and so let's run this really quickly and let's go all the way to the right and this is our timestamp and this is the time uh that it is right now this is the day that I'm running it this is the time that I'm running it and so this is working properly now if you look really quickly there is a last updated in here and this is very close to this timestamp but it is not the same thing um but if you looked through this data and you really into it a little bit there's this last update is coming from coin market caps API and this is when the actual um cryptocurrency was updated in their system and so it is going to be really close but it's not going to be exact and so I don't like to rely on built-in ones that you know are coming from an API or something I want to make one myself that's running on the system where I'm creating the automated process just like just something I do um so now we have this original data frame created right we H we now have what we need but what we want to do is to keep adding data to this um we don't want it to just go to um you know create these 5,000 rows we want it to create 5,000 5,000 5,000 over time whether it's a day an hour a week um whatever you want to run it so um what I'm actually going to do is I'm going to limit this a lot I just want to look at the top let's say 15 so we're going to do that that we're going to run through all this again so now I just have top 15 it's going to be um easier to to see and it won't take as much time to run our scripts again you can keep as many as you'd like if you want a 100 200 all 5,000 you do whatever you'd like but what we are now going to do is we're going to create a function using this original script so we again we have this data frame and we are going to create an automated process that is going to autom a script to automate this that is going to append data to this data frame right here so that's kind of you know the big thing that we're trying to accomplish in this project um so let's go up here and we're going to we'll just take from here all the way to here we just going to copy this and going to paste it down here now what we need to do is we need to create a function so we're going to say DF and we're going to call this the a apore Runner because this is going to run our API um whenever we need it to run now when you are formatting um something for a function it it needs to be formatted properly and so what we need to do is need to go over here hit tap we're going to do this all the way down I'm just going to skip forward when it's all the way done all right so now we have this URL and what we want to add because this is again this is going to run through kind of this this automated process we're going to run this um this function there what we want is to also add this right here so we need to take this and we're gonna need to add this we'll just put it down here [Music] okay and let's do that so what we have so far is really close to what we want our function to be um we have this function that we're going to be running through it's going to call this function it's going to call the the API we're going to use our key we are going to um you know test it load it format It And format it right here then we're going to add this timestamp and then we will have this now right now it's just C it's just going to print this data frame basically but that's not what we want right now what we want is to actually append this data so when it gets to here when it gets to this data that's going to be right um right here what we want to do now since we already have the original data frame set up up top is we now want to say that this is going to be data frame two and we're going to say it's going to append it to data Frame 2 and so the original data frame we're going to say data frame 2. append and we're going to say df2 all this does is this says this new data that's GNA be coming in every time let's say it's a loop and it's just looping through pulling the data pulling the data pulling the data we're going to create this data frame we're going to add add this time stamp like like we want and then we're going to append that to this original data frame so as of right now this looks good I will we'll run it in a second I'll create it so I just created it so now we need to actually create our script to automatically run this so we're going to do something called import OS and let me tell you there's a thousand different ways to do this and there are better ways to do this but they are much more complex much more complicated and some cost money in order to do it I'm going to show you different options on how to do this in future videos on how to automate your Python scripts but this one to me is one I've used a lot um many many times for different projects and it works so I'm not going to show you the most complicated thing in the world I'm going to show you something that I've just used a lot and so we're going to say from time import time from time import sleep that one's important and now we're going to create our Loop so what these um what the time and the sleep and the OS uh your operating system what what these are going to do is they're going to give us the ability to track the time and we're going to be able to run through and call this function in certain intervals that we want so let's create our for loop we're going to say 4 I in now you can create this specific part in different ways but what I'm going to do is I'm going to say range of one uh let's say 333 and I say 333 and if you remember from the first video on the API you only have 333 runs per day and so if I ran ran this 333 times today that would be our Max and so that's why I'm using that 333 just for reference so now we're going to do API Runner so in this loop we're going to call this function up here and then I'm going to say I want to prove or or show have an output to show that this is running through successfully so I'm just going to and you can write anything here we're just going to say API Runner completed uh completed successfully successfully how do you spell that successfully that doesn't look right I'm just going to say completed all right forget that I don't remember how to say uh Spell successfully if that's if it spelled it right you guys spell it that way but I can't remember now we're going to use this sleep right here now this counts it in seconds you can change it to minutes hours whatever we're GNA have it run every minute which is every 60 seconds and so this is going to I'm just going to say it's going to sleep for one minute and then we're g to say exit so all this is going to do and this is again fairly simple it's just a simple for Loop and what it says is it's going to call this API it's going to tell us that it ran successfully and then it's going to wait for 60 seconds and it's going to run again that's it so let's run this and see what happens see if what we did works so rant the first time now I'm not gonna I'm not going to bore you because I'm doing this live exactly what we're about to get is what we're going to use I didn't run it overnight or or for a week so that we have a bunch of data I'm what you were going to work with I'm going to work with as well so I'm going to wait a few minutes I'm going to let this run I want you to do the same thing I'm going to let this run for maybe like five minutes or so and we'll work with what we have and we'll keep going with the project because again we're not the point of this project is not to create the final product or creating all the visuals ations that um will most likely be in another video where we're taking all this data and doing all these things with it the point of this video is to automate it clean it up to where we have it to where we can really use it and then I'm going to let you guys loose and you guys can do whatever you want with it and I think it's really setting you up for a lot of successful projects in the future that you can do all by yourself without me having to walk you through it so as you can see it's already ran through twice I'm going to pause for a second I'm going to let that run through uh just a few more times and then we will continue with the project all right we are back and of course it's only ran what five times um it has not reached the limit of 333 so we are perfectly fine what I'm going to do is I'm just going to stop this by clicking this uh square up here and it's going to give us some error and then we're going to check it and we will see what we have I don't know why it's taking so long if I'm being honest all right so I interrupted it and let's run this let's see what we got I hope we have more than 15 because if not I'm going be very upset okay so okay well uh I made a mistake um I was supposed to put data frame right here and I had data frame too so um take change your script do not do what I just did we're supposed to be append it's supposed to be data frame append and we're supposed to be appending the original D this data frame two to the original data frame so so um I messed up on that one let's rerun that let's rerun that um let's see local variable DF reference before assignment okay this is perfect because this happened to me before um we're running into all sorts of good stuff I like to keep this stuff in my videos I laugh because I hate running into mistakes but everybody says they they're happy that I do this um so I'm going to keep doing it I'm not going to cut this out I promise um but what we actually need to do is we need to go back up to this function because what happened was is we called this data frame and now it's it's because it's in a function it's in what they would call a local variable what we need to do is we now need to state that this is a global um it's just called a global that's all it is um and so what we're going to do is we're going do tab we're say Global say DF and what this should do is this should declare it as a global variable and it should let this run properly let's hope it does all right it's running um again I run into mistakes I let me tell you something while we're here for just a second this project I ran into probably a hundred mistakes or a hundred errors issues that I had to research for hours um and hours I'm legitimately on stack Overflow and just Googling and F figuring these things out there were a lot of new things that I had never run into before um just on this project and so um everything that you're seeing is from after I went through all of those things or after I fixed all of those things and had to really work through them it was it was very um it was frustrating at times I just I couldn't figure it out and so what you're looking at is kind of the polished version of that now that I have everything laid out because I I can't spend 10 hours on a project nobody would watch it so just know that if you are running into some of these mistakes or you run into mistakes later on when you're expanding this project that's completely normal so what we're going to do is we're going to let this run for a little bit and then after maybe three or four minutes we'll come back and we'll keep going with the project all right so let's run this and check and see if we have uh the data that we're looking for uh and it looks like we do let's go actually back up here really quick um we want to set this to display Max rows because I want to be able to see all the rows and not just um a few of them so and that just instead of it gives us this scrolling instead of that dot dot dot that shows us just a few so there's our original 15 and then we have the next um the next Loop and then we have the next Loop and let me scroll over to the timestamps and I'll show you what I mean um so was ran on 52651 let's go down 526 at 150 2905 I say 1501 2905 and then the next one you can see was ran at 36 31 these are all the ones one minute after each other my original one was from earlier 32 33 yeah so you can see 32 31 3030 or um 3029 and this one was about 15 minutes ago when I first um ran the original data frame right all right guys this is Alex from the future I've actually completed this entire project uh in the video and you're about to see all that after this but I wanted to show you one more thing that you can do in this function up here that I didn't show you uh originally that I'm coming back to show you and that's how to actually put it into a CSV now all we've done in this one is we we've kept it all enclosed in a data frame and that's it and that may be great but a lot of you guys are going to want to automate this and put it into a CSV and I want to show you how to do that all right so what I'm going to show you really quickly is right here in this uh in this folder right here I have all these different API 3es and fours these were tests that I did before but what you can do is instead of just putting it into a data frame you can actually append the data to a CSV and have that CSV sitting out there for you instead of just keeping it all in the data frame and there's a lot of different uses for that you may want to have that file separately from here just in case something times out or something breaks which is a legitimate concern or your computer shuts off or or something like that that is a legitimate concern so what we're going to do is we're going to say um if not and this is basically an if statement we're going to say os. path dot is file so what this is going to do is check if there's already a file under this name and we're going to do r dot or or R um if you have never done um if you've never done CSV stuff before it's really important that you put that you you're going to get an error every time so we're going to take this right here and we're going to copy that and we're going to put that right here and then we're also going to do a slash and then we're going to name it basically um let's name this API because I don't think I have that one in there I think I deleted it yeah so I don't have API so I'm just going to keep it api. CSV and then I'm going to close that parentheses and then we're going to add a colon right here and we're going to say if that does not exist we are going to write this to it and create it so we're going to say data frames that's this data frame right here data frame dot we going to say 2or CSV and we're going to do that R and then we're going to copy this so let's just let's just replace it like that and then we're going to say comma header oops header is equal to column uncore names so what this is going to do is if we run through this and what we would have to do is um I'll talk about this in a little bit we'll have to change this up a little bit but what this is going to do is going to check to see if this file right here exists if it does not it is going to create it and create the column headers based off the this data frame that is what that does now what we want to do is say else and this next part that we're going to write is saying if there's already the API file there we want to append the data we don't want to overwrite it or anything like that we want to append the the data so we're going to say we're basically going to copy this maybe not the whole thing but I already did it um so we're going to copy that and we're going to say mode oops mode equals a and a stands for append and then we're going to say header oops keep messing up header and we're say false oops we're going to say false which means when it depends the data it's not going to use those the column headers every time which you don't want because every time you append it if you added the headers every 15 rows every 15 rows you're going to have another headers that you're going to have to like go out into that CSV and filter out and and get rid of them so we're going to say header equals false now just a second ago I said you would need to mess with this just a little bit and you would because every time um you'd be putting in this data frame which it's already appending it to this data frame so every time you'd be creating a lot of duplicates if if you kept it exactly as is what you were going to need to do is basically take it back to its to its um bones um so you need to kind of keep it like this so what you need to do is just now run this and it would work perfectly uh let's test it really quick um to see if it works uh because I'm I'm promising you something I want to make sure it actually works let's run it this time okay so it just ran for the first time so it should have created this file let's go see if that works properly so now it just created that file and now we're going to see if it actually appends the data so let's wait just one time um and then I'm going to stop it I'm going to see if it works again I'm just verifying to make sure that what I'm telling you is actually working uh because if it doesn't I would feel terrible we don't want that and while that's running actually I'm going to add this because now I want to show you how to call it um super easy we're just going to do pd. reor CSV we do that we're going to call this just like that and then we're going to say data frame and we're just going to do 72 something random because I've already done this whole project I don't want to mess anything up so we're going say data frame 72 so now let's stop this um and what we're going to do is once that stops we're going to run this and see if it actually um worked and see make sure that this actually pulled the data in all right so we interrupted it the file is ready to be read in so let's read it in there's our file um let's see what did I mess up or did I mess anything up ah I didn't mess anything up this is the index for this file and we already had this in here we'd probably be able to get rid of it but if you see we have zero 1 two 3 four five six seven eight n 14 then we have zero 1 2 3 and if we look at the time stamp it should be one minute apart so it's 11 1945 it said 12045 so this worked exactly as planned um again you have two different options you can just keep it how it was before and I'll leave both of those options you know in the in the script so that you can kind of choose which one you want but um that's how you do that so then right here you're appending it to a CSV file and then if you just keep this and you get rid of all this you're just appending it to a data frame now please continue with the rest of the video that I already have done um but again I'm future Alex so uh please continue with the rest of the video okay so we have all this data we have we have so many columns we can do now you know if you want to completely just go and do your own thing you absolutely can do that I'm going to mess around with a few things um kind of show you something that I did that I thought was really interesting um in order to visualize this data a little bit and transform it a little bit to make it more usable um but we're not doing a full data cleaning that's not what this project is I'm not doing a full data cleaning of this data that would be a ma a very large undertaking because honestly this needs a lot of work one thing that I do want to clean up really quick uh is is this right here I this the math will be fine it's just the way that it's shown on here is in state the scientific notation and I don't like it so what I'm going to do really quickly I is just um get rid of that so we're going to we're GNA say pd. set and we do underscore option and this is going to be do parentheses I'm going to say display this is just this how this is formatting so we're going to display float underscore format and we're going to say comma and now we're going to use this Lambda say x colon and we're going to say percent 0.5f and that right there and we're going to say percent X now if you don't know what lambdas is lambdas are um I highly recommend looking those up um again this is not a beginner tutorial whoops no such Keys display floor format that makes sense uh this is float yeah guys this is not a beginner's level all right uh you can't use the floor format this is the float format all right so now let's take a look at this uh this DF uh this data frame that we have so we're just GNA hit DF hit enter and now our numbers are a little bit more easily readable I prefer it this way you do not have to do this I'm doing this just because this is what I prefer so let's jump right into it um something that when I saw this data I was like something that I really thought was interesting is this percent change of one hour percent change 24 hours 7 days 30 days 60 days 90 days if you're not in crypto or you don't do investing or anything like that what this is going to show us is how I mean it's pretty obvious how much the price of this coin has changed over the last hour 24 hours seven days so as you can see it's it's barely fluctuated over the past 24 hours a little bit over the past um seven days a lot over the last 30 days 60 days and 90 days 20 minus 26% minus 33% we're in may we just had a kind of a crash in crypto a couple weeks ago so I mean this tracks right but I want to visualize this see this and kind of see um you know how this is going to look and how if I can gain any insight from that information and just having it all displayed for me but in its current state um you know we really cannot do that um now another issue not an issue but another thing that we have to take into consideration is we have Bitcoin net right here we have Bitcoin right here after different polls now we just did it a minute after each other but for your project may do it a a run each day a run every hour or something like that right and if you did that your data could be very different and so you may just want to take this first one but what I'm going to do for the sake of this project I'm going to group them so let's go down here and we're going to say DF dog Group by and so if you've ever done something like SQL uh this is how you Group by in pandas basically we're going to group by uh the name so so on bitcoin etherium te so we're gonna we're gonna do that on name and uh I'm not gonna I'm gonna say sort is equal to false oops I'm not going to sort it uh you could say true there but we're not going to and I guess you'll see why later we're going to do an open bracket and now we need to choose what we're going to group by uh or what we're going to what columns we're going to have so I'm going to do another Open Bracket and I'm just going to copy and paste these so I'm going to start right here at quote percent one hour so I'm going to do boom and then go over one and we're going to take 24 hours paste that comma we have the 7day 30-day and we're going to do like that and I'm just going to do comma I'm gonna do the same one but I'm just going to manually change it to 30day rid of that at the end I don't know what that is uh then we're going to do 60 days and comma and we're going to do our last one which is 90 days and let's see what that gives us uh doesn't give us anything okay I know what's wrong here um we forgot to add basically the what we're we have we're grouping by something we need to have like an average a mean a mode or something like that right so all we have to do is go to the end right here and let's just do we're going to do an average um and so we're taking this number let's say this is for Bitcoin so we're going to take this number in this one hour for every time it's Bitcoin it's going to group them all together um and then it's going to average them so in the past five minutes where it's been running we're going to take the average or the mean of that so let's run this again and so now this is our output let's take a look Oops I meant down here let's run this now now what we have is all of these um cryptos these are all 15 that we have and this is the average um for this 1 hour 247 days 30 days 60 days and 90 days so now we have all of our cryptocurrencies over here we have our percent changes up top and then our averages um here as well and so now what we're going to do is you know if you try to visualize this as is doesn't really work because these percent changes are up here as columns and we don't really want them as columns because that it just doesn't work for visual for actually creating the visualizations we really need these to be rows and so my initial thought when I was doing this was I of course I need to Pivot um you know if you've ever used pivot like an Excel or powerbi or something like that that was my first thought and I tried everything and I could get not could not get it to work and I almost gave up until I I ran across um something called stacking or back and and so this was not something that I I I think I have used it before but I I couldn't remember to be being completely Frank I couldn't remember how to do this so I just did um once I saw what it was I did Stack let's make that dat four you don't have to do this uh you can keep this all the original data frame I'm just I like for visual purposes you can see like the progression that we're making um but I like to you know create its new data frame and I can always go back and look at this data frame three um as we go but you don't you don't have to do that that's just what I'm doing so now let's take a look at this now uh up here we had Bitcoin and we had all these columns and we had uh these numbers as rows but now we have all of these as rows as well this how we have this is much much more usable um and if you've ever done something like pivot or the stacking before you'll know that you you kind of have to do it if you really want to visualize this well but um you because we just stacked it it kind of changed it so if we look at um let's look at the type of let's do type of data frame three this is before um before we stacked it this was in a data frame but now let's go and look at data frame four so this is a series this is no longer a data frame so we have to remember that that's that's really important because we can no longer treat it as a data frame it's now a series so we want to get it back to a data frame we don't want it to be like that because you can't really use it in the series so what we're going to do and let me just create a few of these so you can be up here better so now what we're going to do is we're going to say data frame 4 Dot and something called 2core frame so we're going to make this into a frame and now we're going to specify the name and it doesn't mean um the name like right here we have actually mean the name of these values right here this is part of the stacking process in these columns or these two columns so let's go right here and we're going to call it let's just say values and let's make this data frame five and let's see the output whoops for data frame five and now so there's that values and now this already looks a lot better right so it's in this it's in this more um this is already a data so this is a data frame so let's look at type data frame five so now it's in a data frame but the issue is is that this name is kind of acting like a an index which we don't want because we want to be able to use this so it doesn't really have an index at the moment so we need to give it an index but typically when you give an index you'll do something like um we'll say dataframe do5 we'll do setor index and then you'll do something like um name so let's just do dat frame six is equal to we'll see we'll see what happens here it's going to give us an error oops what I meant is we're going to do data frame five bracket uh name and that's a column right we're going to do that and it's basically going to say that that's not going to work and and what we need to do is what or at least what I want to do and what we're going to do in this video is I'm going to create numbers I really would just want it to be numbered one two three four five that's what I want um but we don't have that right now I can't just will it into existence so now what we're going to do is kind of create uh an index basically out of thin air so we're going to do pd. index and we're going to say uh you know we basically want how many um rows are in here that's where we want our our um index to be we want it to count how many are in here now you can make this Dynamic and I it probably wouldn't be that hard but I'm gonna take this super lazy route um and I'm just GNA say let's do DF do5 or oops df5 doc count and there's 90 values in here so what I'm going to do is I'm going to do a range of 90 uh and this is not uh I would definitely make this Dynamic but I'm again I'm just being being a little bit lazy we call this index is equal to and I'm going to put this Index right here so now this is a number so now it's going to literally Index this for us now I've ran into this issue many times um so what I need to actually do is to reset this index and then do it properly the first time uh so let's do re let's get rid of this let's reset this index um and it actually fixed itself um so what was happening was is we were indexing something that was already indexed we were causing issues in a nutshell so we reset the index and now this is what it looks like and this is exactly what we want this is really how we wanted it formatted in order to for our visualizations we have multiple rows for the Bitcoin um each of these columns are is now a row with the value attached to it exactly what we wanted so um really quick I for whatever reason it it makes that uh level one I don't know why but we're just going to rename that column really quickly so we're going to do data frame 6. rename and then we're going to do and open parentheses say columns equal to we're going to do one of these these bad boys oops one of these bad boys this this type of bracket and we're going to say levelor one and we do a colon and then oops and then a colon and then we want to change it to and I'm just going to call this the percent underscore change so let's call this dat frame [Music] seven again you don't have to do that I'm just doing it so now this looks much much better now let's try to visualize this one um because we haven't done any visualizations yet we've just been messing with the data a little bit I I you know I kind of want to see how we can use this it's something that I personally am interested in so I kind of wanted to see visualize how these changed over these these time periods um but we need to um import some stuff in order to be able to visualize this so we're going to import cbor as SNS and if we need to um we're going to import map plot lib as well I don't know if we'll use it right now or at all but um we're going to we're going to add it in here either way so now those are added and so what we're going to do is come right here we're going to do SNS doat plot and we're going to oops we're going to say the x axis is equal to and we want to do this as the percent change percent change and then we have the Y AIS now we want the y- axis to be these values right here say comma Y is equal to and we're going to say values oops and then we're going to say comma and we'll say we want to basically create a Legend um I guess you could call it we're going to say Hue is equal to name um I'll show you what it looks like without it and then you know you can see that we need that we're going to say the data is equal to this data frame seven data frame seven and then we are going to say the kind is equal to now let's run this and see what we get and super quickly with just you know limited um inputs here's what we have now this looks really good we can narrow this down if we wanted to to a few less because there's a lot here and there's a lot of colors but again that's just because we have a lot of different stuff but there's a few that are doing really well I think this is Tron um and then we have a few that are not doing so well but it's really hard to see if you look down here it's really hard to see this um and that's just because of the the column name and so I actually want to change these column names or these values so that when we visualize it right down here it it doesn't look like that I kind of want this to be you know at least one good visualization you can take out of here this is definitely not perfect or complete by any means but you know you can take take that away from here um so let's um I did Alt Enter which adds another row I could have just pushed plus that's was kind of the lazy way um what I'm going to do is I'm going to change these um these values in here so how I'm going to do that is I'm going to do data frame seven and we only want to look at this one column so we'll do that right there and we want to say dot replace and we're going to an open parenthesis and then a bracket now what we need to do is I'm just to show you um one of them is I'm going to say this one hour do that oops and then what I need to do is a comma another bracket and this is what it's going to change to I'm just going to say one hour oops one hour um and we'll do this one really quick and then I'm gonna I don't want you to have to watch me type all this out but I'm going to go through and basically do all of this uh for those but let's let's see this really quick and so now as you can see that um the originally it said quote. USD percent change 1 hour is now only 1 hour now this didn't actually do anything we need to apply it to this right here so I'm going to say data frame 7 is equal to and then we'll run data frame 7 again so now that has actually changed that value now I'm going to go through and I'm going to update that for every single one all right so I basically just put the other ones um in here that we wanted to change with commas afterneath so I have 24 hours comma with the seven days 30 days 60 days 90 days and then this bracket over here which tells uh it what to change it do 24 7 days 30 days 60 days 90 days so let's run this I haven't even tried it yet uh and it looks like it obviously worked properly so now let's go back down here and let's run this again and look at that it looks so much cleaner so much nicer um and as you I mean all of them with that 1 hour change has very little change and then you can look back so we can see back within 90 days it's gone a lot of these have gone down which again if you're following crypto you know there's a big crash recently um especially with with you know all these altcoins um that you're seeing right here went down a ton so I think this is um Avalanche or die or whatever these ones are you know went down dramatically whereas there's one up here this Lone Wolf um that's just that's just did do really well for whatever reason so it's really interesting um to see now this is a pretty specific um visualization that I personally wanted to see and I thought was interesting you can do absolutely whatever you want to do with this data I mean there's so much here you can do a lot I mean a lot with this data especially depending on how long you track it right I only did this over the course of like five minutes but if you set this up um and you can track it over a longer time now um let's say you wanted to do something much simpler uh you just wanted to look at like Bitcoin over that time that you you know uh uh took the data in that's going to be a lot simpler than what we just did and I'll show you how to do that really quickly so we're going to look at the data frame and we are going to say uh or we're going to take specific columns we just want um a few columns that we want to keep or or pull from so we're going to take uh oops we're going to take the name column we're going to do uh might be easier if I copy them but I'm just going to write them out quote. USD do price this is the price of the actual cryptocurrency then we're going to do Tim stamp and let's make this data frame and we're just going to do 10 for absolutely no reason uh maybe made at n it would have been easier so now we just have these um these columns and you know we have all these separate columns so what we can do and the re kind of the reason I want to show you this is you can just query this really quickly and just take the columns that you want so let's say we just wanted to look at Bitcoin so we're going to say data frame 10. query do open parenthesis and we're going to say name is equal and equal is not like that uh when you're doing it like this you need to say equal equal equal to oops ignore that uh is equal to bitcoin and we're going do it just like that and we're going to say data frame 10 is equal to let's try running that I think something's wrong with it try it like this oops all right let's try that there we go it was just the I needed a double quotation instead of a single quotation that was the issue so now we have Bitcoin we have the price and we have these time stamps so this is the actual time when we ran it so this is the original data frame and then in the you know this this project it took me 15 more minutes to get this one and then we had it running properly for the next five minutes so that's you know that's actually what we have now if we want to just visualize this really simply what we can do is we're going to say uh we're going to do SNS doline plot and that's going to be like a little line chart or line graph what whatever you want to call it and then we're going to say x is equal to and we'll say quote no actually we wanted the time stamp to be on the x-axis um and then we'll do y is equal to quote. USD do price and let's see if that works good not interpret time stamp for the parameter uh that's because it's not understanding that the data equals data frame 10 now let's try this all right so this is uh looks terrible let me me just say SNS doet underscore theme and open parentheses we'll do style is equal to dark grid this looks a little better now again we are looking just at a very very short time series but we can look at just Bitcoin or we could look at multiple and we're showing this you know this line that's showing us this trajectory over time so you can get really creative with this you can run this for a long time you can show Bitcoin over days weeks or month months however long you run this and so that's really all I've got um honestly like I said this is not a I wouldn't say this is a complete full project but I'm showing you how to do something to enable you to kind of run with it and run with the ball and do basically whatever you want with this you can pull it from you know data from a different API you can use this exact API in data but I wanted to show you just a few things that I initially saw that I might do with the data and you you have so much let me go back to this original data frame uh right we'll use this one right here this one right here look at all this data I mean you have so so so much data actually let's go to this one this one's better you have so much data so many numbers here um so many columns that we didn't even look at that you can use um and so you know there's a lot that you can use here and I'm really trying to just set you up so that you can run with it and do whatever you want I could have done a thousand different things here but you know I tried to just show you two things that you can do with the data that I thought were pretty interesting or or simple to do and you know I want you guys to go out and do something way way better than what I did so I hope that this was helpful I hope that this showed you how to automate that process so you don't have to sit there and click it and append it and do all these different things that it can show you how to kind of automate this process and hopefully that will be helpful in your future projects so with that being said thank you so much for watching if you made it all day to the end you guys are fantastic if you like this video be sure to like And subscribe below I'll see you in the next [Music] video what's going on everybody welcome back to another video today I'm going to be walking you through how to create your very own portfolio website [Music] now we just completed our data analyst portfolio project Series where we walk through four projects in SQL Tableau and Python and so if you have completed those projects you now want to share them with potential employers and I think the best way to do that is to create your own website in just a little bit I'm going to show you two options on how you can actually create your own website the first one is a website builder like wix.com and the second one is hosting your own website through something called GitHub Pages now if you have never created your own website before it can sound a little bit daunting but don't worry I'm going to walk you through every single step of the way from the very start to the very end and once you reach the end you'll have a complete data analyst portfolio website so without further Ado let's jump on my screen and let's get started all right so the website that you're looking at right now is the actual website that we are going to build in this video um it is hosted on GitHub Pages or github.io so this is actually being hosted right now by GitHub pages so if you type this in I'll leave a link in the description if you type Tye this in um you will get this page and you can check it out for yourself if you don't want to just watch me look at it um so you know it has this little header and you can write a little bit about yourself and then these are our actual projects so this is our data cleaning in SQL project um and then there's the covid uh data exploration Tableau dashboards movie correlation with python um this is a future video I plan on doing a few more of these projects because I just really enjoy them so uh you know and then there's this contact information at the bottom so it's a really simple website and it gets the point across and uh I have something similar to this for my own personal one I I use a different variation but um this all comes from this website HTML 5 up there are lots of templates lots of options that you can use um again the one we're going to be working with is this one but I use a different one for mine and they are really good I me super easy to build and customize yourself and I will say again I have no experience doing this I just watched a YouTube video that showed me how to do this and now I am creating my own YouTube video to show you how to do this so it's coming um pretty much full circle so like I said there's no no real narrative to it it just clicks to your project um if you click on this and let's just open a new tab it'll take you right to our to the GitHub project um and then you the the whoever is checking this out like a an employer or a recruiter can see your code so super simple another way that you can do this is kind of creating your own website through like a template or something like that um almost like a Blog style so I imagine it being very something very similar to this where there's this introduction and you can talk about you know where you got the data set how you got the data um and then you can kind of have a more narrative uh approach with screenshots and with some code as well so you know this person included screenshots um and then there's the code right here that I can actually copy um and paste that and it just walks through the logic of how the project was done um there's a story to it really and so that might be something that you're interested in now I have done something like this in the past and I used Wix and there's a you can do this completely for free um the one we're doing today is completely free as well but you know if you want the customize um the customized URL you do have to pay for it on Wix but you can get a free Wix website with the Wix um in the URL so you know try this out these are super easy you can find thousands of templates and a million tutorials of how to do them um so that's not the one we're going to be working on today so with that being said uh the very very first thing that we need to do before we do anything is actually download visual studio code this is where we're going to download that HTML and we're going to be working with it in there um again I don't know if I said this before but it seems a little bit intimidating at first but once we actually start looking at it it's a lot easier than it looks I promise you so if you are me and you have a Windows computer you'll just go right here you'll install it um super easy to install I'm not going to walk you through how to do that um of course I already have it up and running down here so once you have that installed what you're going to do going to come to this website a link should be in the description we are going to download this all you have to click is the free download it's going to pop up I'm going to put it in my downloads I'm GNA click save fantastic uh so let's go to the downloads and it should be right here now if we open this up it has a few different things in it okay so um I'm using the brave browser so that's going to be right here so that's this the symbol but for you if you're using Google Chrome that should be the symbol there as well but this is everything that you should be seeing and what we want to do is we want to take it out of this um zip folder because it's there are things that can read into it with Visual Studio code but I want to make this as user friendly as I possibly can so what we're going to do is we're going to make create a new folder and I'm just going to call it massively or you can call it um Port website whatever you want to call it I'm just going to do Port website um and we are just going to I'm going to copy this in I'm not going to cut it in just in case I make a mistake so going to put all of those um all of those things in here and now what we're going to do is we're going to go to visual studio code right here and you should be greeted with this um this right here and we're just going to click open folder and we're going to go to Port website and we're going to go select folder and you're going to say say yes I trust this one and right over here is all of the documents that we were just looking at now the one that the only one really that we're going to be working in um we'll work a little bit in the images um because I'll show you how to add your own images the really the only one we're going to be working in is this index so again it looks complicated um if you've never looked at HTML before um it does look a little bit complicated but HTML to me is one of the more easily understood languages um once you start kind of getting into it which we're about to we're going to walk through the entire process it actually makes a lot of sense and it is pretty simple um something that you're going to want is you're going to want something called a live so like if I click right here and I click open with live server you don't have it yet I'm guessing unless you've done this before um it's going to open up this website and this is what we're looking at right now so it has a bunch of um gibberish or some language that I do not know and so we can view this live um in just a second I'm going to take myself off screen but before I do that um let's download or let's um search for that that live um I think it's called live share live server um let me see what this is called yeah live server so come right here it's called this live server there it is yeah that's the one so this is our live server you just need to click install it takes like 5 seconds and it should be completely installed um what this does is it just hosts a local website it's not something that anybody can access um but it connects to your code and when we make updates it'll make a lot you can see it live you can see those updates live so I'll show you all that in a second just be sure to um be sure to download that or install that uh with that being said let's get out of this let's go all let's go back right here uh with that being said I am going to take myself off screen so that you can see everything that I am seeing as well um it's been really great seeing you have lots of different videos coming up lots of new projects um I just I really enjoyed this project series I think I'm just going to do more of them so uh all right I'm G to get myself off screen so let's look at what we actually need to do so I'm going to um so let me see okay so we're already connected to the live um actually I got rid of it whoops let's pull this over and let's pull that and we're going to open in live server so if we look right over here and I know this going to be a little bit Squish and I'm sorry about that um but if we look right over here this says this is massively so you you can change that that's that's this right here and you can say we're going to say Alex the analyst portfolio and we'll get rid of this massively I'm gonna hit control save you can also go up here and hit save but I'm I'm going hit controls so I hit contrl s and just like that it updates on the website now again this is just a local so it's nothing that anybody can see so don't worry but what we're going to do is I'm going to walk you through the entire process of creating this and then at the end I will show you how to host it on GitHub um and it's honestly it's it's a fairly easy process it's just takes a little bit of time to customize it all so let's get into it so we have this um you may not be able to see it let me actually pull this up so it says massively by HTTP we're going to customiz that customize that as well whoops I don't want to do that every single time I'm I'm going to try not to go full and go back and everything like that so we're just going to say Alex the analyst portfolio um contrl s and right up here that changed it you may not be able to see yeah don't ask me that again thank you uh right up here you probably can't see at the moment we'll see that later um but it it customizes this um tab which is really cool so let's go right down here now this is where it says a free fully responsive HTML uh five template we can customize that and I highly encourage you do so what you can do and they actually included their Twitter handle right here and you can do the same if you look at this one right here I included my Alex the analyst handle that that goes to my YouTube channel and you can do the exact same thing includes your LinkedIn or your GitHub profile or whatever you want to include in there um and so you know be aware that you can do that so let's say um oops I need to click back in here so we're going to say um data analyst skilled in and then again don't write what I'm writing um you can it's I'm just going to make it really simple but you know this part is meant to be a little bit about you um as who you are so I'm going to say data analyst skilled in SQL Tableau and Python and then I'm just going to get rid of all of this yep yep yep everything from here over and contrl S and so super simple um actually let me where was that four four here it is we don't need that actually we don't need any anything from here over probably here honestly see what that looks like um and yeah and I can again you can use any website right here that you want and you can customize what it looks like so I'm going to say Alex the analyst um and then whatever URL you want to include in there that's what you need to put so now if I save oops if I hit contrl s so now it says Alex the analyst um so pretty easy now we're going to go down and you can use this however you want to use it I would you can even make this um you can make this like one of your one of your readmes like a you and put the link for that I decided to include um again on this one I decided to include the project that I thought that we've done that was like the most impressive or the I don't know the coolest one I don't know if you consider data cleaning and SQ cool but um I do I think it's cool so I included that one as my very first one so that's what we're going to do um right here so we're going to go down and it's going to say let's say it says this is massively that's not it uh cool so let's see what oh okay I know what that is we'll come back to this up here um in just a little bit I'm going to go full screen I'll show you what this is and then we'll come back to it but if we go right down here this is our what they're calling a featured post and then the ones below this are posts so in our featured post um I'm going to get rid of the date I don't want them to know that I just created it like um I don't know oops I keep doing uh control a selecting everything whoops so we're going to say um data cleaning in SQL and we'll get rid of this and contrl S again I'm just updating it a lot so that you see what I'm doing and where it's going and we're going to get rid of basically all of this and go back and we're just going to say in this project we C clean data in we clean let's do we clean housing data in SQL server and contr S so super easy again uh give a little bit more description I did in my other one um and you have the you have you can see that website so go check it out and then we'll have an image and I'm going to show you um at the end we're going to go back and redo all the images but I'm not going to do that at this very moment um so what now you can have this full story I chose to do view project and i h contrl s it says view project I think that just looks better especially if you're displaying a project I think it is nice uh now we go into all the indiv individual posts um actually no wait what I want I want to show you really quick is how you actually link it to this so let's go right over here this is our co uh that's our Co one here's a data cleaning project so all you have to do is take um take this website so that's the URL and you're going to put it right here now there's three different places this href is places are places where you can put a link to a website um and on here it references this right here so you can they can click on this data cleaning and SQL they can click on the image um as because you know this href is right next to this image they can also click on the view project button so you can put it in all three um and you'll just go like this you'll you'll stick the URL right where that um hashtag or pound sign is and then we're going to save that oops oh I I this is embarrassing I am not a website I am not a web developer as you can see um but then if I go in here and I right click and I say open link it is going to take me to that project so super super simple and we're going to do basically that for all of these um I'm only going to show you three and then you can do the rest but I want to show you how to also do the um put the Tableau it's the exact same thing but you know it's different so wanted to show it to you so the next one that we're going to do is go down to posts and again I'm going to get rid of this date you can keep that in there if you want excuse me and that's totally fine just update the date um this is that said mag again I think this might be like some language that I just don't know about um the next one is data exploration in SQL and I'm going to get rid of this and we'll save that perfect and we'll do view project cool and yeah so now we need to um customize this summary and so I'm just going to say something really simple um data exploration of covid-19 data set in SQL Server there we go let's save that we have view project now let's go get our project so this is the data exploration we're going to take this we're going to copy it and we're going to put it right in here and right in here as well and if you want to you can also include it right up here so we have it in all three places uh again once you click on these they will come up let's go to the next one we're going to get rid of this this one is going to be our Tableau projects so actually let me just copy that while we're here this is going to be our Tableau projects so if you have one specific project that you want to include what you need to do is actually go in here click view grab that URL what I am doing is I am just sharing my Tableau public page so if you have tons of projects in here and um you want to display all of them then or you want them to be able to see all of them and go and pick and see and choose what they want to look at then just choose this URL that we're choosing right here so um in here on in the um HTML we're going to put I'm going to put tab projects and let's go like this and then we will get rid of uh that hashtag pound sign whatever you want to call it and we'll hit contrl s and oh we got to do the um this as well this is my this is going to be a terrible don't use this this is my Tableau this holds I'm just this is bad this holds all of my Tableau dashboards don't please don't do this um I am doing this because I don't want to take forever in a video to make it perfect um and then you know you're going to do the exact same thing so in this one right here I included four so I'm going to keep four um let me do the no I'm just going to do these three I'm not gonna take up more of our time um so we did those I'm just going to keep these three in for visual purposes but once you get down here um you know what we're going to do is delete some of this right so we this is our data exploration and where's our Tableau this is our Tableau right here so Tableau projects they're separated by these articles so what we're going to do is go around right here and we're going to go down down down down to right here this is going to get rid of all these other articles or all these other what they're calling um posts so we're going to get rid of those and we're going to hit save and now as you can see we have our header we have our first project and we have our second and our third I would include those other projects that we've done in here so that it looks good this is this footer right here we don't need that because we don't have any um anything else in there so we're going to get rid of that as well and now we just have this information now I don't have anything where they can do the name email message or you can keep that in there if you'd like um but I am going to get rid of this so we're going to go right here that's the section so don't delete the section we want that I'm going to delete this footer section as what they're calling it and now we have this address phone email social um and I'm G to get to the Social in just a second it's again super easy but for the address I just put location I don't want to give somebody my address or put it on a website anywhere um it's not something I want to do so what we're going to do is just put I'm going to put Dallas and Texas and we can keep it like that and we'll hit oops we'll hit save and it'll have Dallas Texas um hate the look of the zeros 6 seven8 n z so we're going we're going to do that phone number two3 56 7890 and then email and we'll put Alex the analyst 95@gmail.com if you have issues with this um you can email me but I'll try I will try to respond to all your emails I get a lot um so I will do my best but that is my actual email if you are curious now um now that we have this we also have these the social media now I want to display my LinkedIn and I also want to display my GitHub so what I'm going to do right here is I'm going to go over here and do LinkedIn perfect let's go to this so I'm going to take my LinkedIn URL and I am going to get rid of these first two because I'm only going to include two and for this one I'm going to do uh LinkedIn oops linked in and then for right here I'm going to replace that with linked in and what you're going to do is put this link right here and then we're going to go get get the GitHub so let's do GitHub oh who is this sign up what is going on um I don't there let's just go back here I that was some I was like viewing a while back or something um so we're going to take the GitHub and we're going to put that right here so it already has it as um the GitHub is this supposed to be lowercase I think it is let me see if this is lowercased as well yeah um so do it like that do it lowercased um I forgot that that was how they did it um and oh that's the label that doesn't matter as much but this right here is the class is actually the important part because then when we go back here there is no LinkedIn image but when we save it oops when we save it it has the LinkedIn image because it's already a class that was created in this HTML um template so we have that um and let me bring this full screen really quick because there are a few things that we couldn't see in that that screen these right here are things that we could not see before um and these as well so what we can do is we're going to go down here we're just going to copy these social we're going to replace them right here so they can have those and then we're going to get rid of these two right here and this says this is massively um and we're going to change that as well let's make this full screen for the first time feels good um I hate doing split screen but I do it for you guys um so this is massively and we're just going to put we're just going to get rid of these two this is um it's called The Navigator the the different tabs we're going to get rid of those two tabs and then for this I'm just going to call it projects and I'll once I once we go back and update all this then you will um you'll see those changes so let's see so we made those changes here's our social or the social medias uh Social Media stuff we're going to go and copy copy these two and we're going to replace all of these with this um and let's save that and let's go back so now as you can see those two are gone this says projects there's only two right here and if you click on it it's going to go to my LinkedIn or your LinkedIn when you do it um and this will take you to the GitHub so it is all working as intended this is great um when you scroll down and it says massively we can change that as well and we should let's do that really quick um we'll just say Alex the analyst and we'll update that and there we go so in a nutshell this is the a lot of it um we need images and I don't think I set this up for this video so I'm going to I'm going to like cut myself off for like 2 seconds go pull those images in um because it could take like a few minutes I don't want to waste your time and then I'll come back so I'll see you in two seconds all right so I just pulled over the images that we are going to use let's go to the downloads um they're right here they're the housing Tableau and Co um if I open up this Co one this is what the image looks like this is what we're going to use for that covid project so I'm going to copy these I'm going to go into the port website um that we just have I'm going to go to images and I'm going to insert these in here so now that we have those images in here let's go back and let's see what we got so we just put these images in this um you'll have this folder right here and you can open it up and you can see all of these that we have so all we're going to do is go and replace the images these these you know temporary images that they had for us and we should be gold and then we're going to actually upload it to to GitHub and then create our website for free so let's go right down here this is our very first uh one this is our data cleaning in SQL this is with the housing data so this image right over here it says images p1. jpeg so jpeg I don't know why I said it like that so this is the housing so what we're going to do right here is do housing and it'll autocomplete for us um so that housing should be in there now next one is the data exploration in SQL that was with the co so we're going to get rid of this we're going to say Co um because that is the image that I have right over here and then the last one is excuse me Tableau so let's go right over here let's do TBL low let's get rid oh I got to save that uh contrl s perfect and now let's look at it there you go there you go go oh this one still says full story go change that um I'm going to go change it just doesn't feel right uh view project oh that's not how you spell it okay contrl s perfect okay so now this looks a lot better um and when we host it um through GitHub Pages or github.io this is going to be what it looks like I mean it is and you can add a lot more to it you can take away from it you can add as many projects as you want you can keep adding you can copy those articles or those posts and you can just keep adding them um so this is kind of what it's going to look like and it was not that hard I don't think I hope this was not too difficult I really don't think it is um it's really just using a template and kind of understanding a little basics of HTML so um we are going to take this and we we have this saved already we have this all saved what we are going to do now is upload this to GitHub so let's go right over here let's go to here and let's go to repositories and how do where where's the new one oh I need to sign in okay I'm going to get rid of this part so you can't see it so we are going to say a new repository we're going to call it Alex the analyst 2 . github.io so we're going to write it just like that you know if your name's um Alex Jimmy I don't know why I said Jimmy Alex Jimmy Alex jimmy. github.io you can always go back after the fact and change this so it's not a big deal whether you change it or not and we're going to create this repository we're going to say upload an existing file and instead of choosing them what we're going to do is just go right over here go to this and we're just going to copy this in or not copy it in but drag it in okay so we're going to take this drag it in right here and it can take a it'll take a little bit has a 75 but it shouldn't take that long and let's just wait for it I taking a sip of water I apologize but it is literally uploading just everything that we had in there so all the updates and all the changes and all the stuff that we um had and it looks like it's done so let's just write initial commit commit changes it is processing it all right and it should be done very very soon as long as I have a good internet connection we shall see stick with me it's taking its time um while while it's loading let's go over to oh oh there it is so perfect so here's everything that we have has this read me that it generated let's over to settings and we have this U github.io and if we go right down here to GitHub Pages pages settings now has its own dedicated tab let's check it out here so it is um it's currently disabled but we're going to say want it to do pull from the main um I think it's the doc we'll see I'm going to save this your site is ready to be published let's open this up okay site not found maybe it's from the root save um your site is having a build a problem let me see if I can actually change the name I already have an Alex analyist but I'm GNA see it's already taken um I'm just going to try this one one more time oh and now it's working uh I have no idea why it uh didn't work before but this is fantastic it was giving me all this I was maybe I was just reading too much into that I had I had never tried to create another umio or or GitHub pages on this so anyways thanks for sticking with me through all that um stuff so now we have our actual website um it doesn't look the same up here because of that thing that we were just looking at it should just be this part right here but um this is an actual website now it's being hosted through GitHub and it's completely free if you want to pay you can hide this from your GitHub um your repository has to be public uh something I didn't mention when you're doing this your repository has to be public um if I change the visibility to private um you will not be able to see it anymore you'll have to then pay if you want to make this repository private you have to then pay I think it's like $4 a month or something like that so worth looking into um if you don't want to display that on your GitHub worth looking into but this is our final product I mean it looks pretty fantastic and you can use any of these templates right there are lots of different templates that are fantastic I mean they look amazing they look professional um it's really up to your style like this one looks kind of cool a little bit um edgy for for my taste but uh this one looks really good too might might be able to add some more narrative to that one so again go through it make your make a good choice in it and then update it how we updated it uh I will include the um let's see I will include everything that's in here and I'll keep this on my on this GitHub that you can go in there and if you want to download these images you can download the images that I I used um or you can go find your own just um you know look for try to get like HD images on Google just type in Google Images and search for whatever image you want to search try to get an HD image with that being said that is the entire project I I I I hope this didn't go too long um this may have gone you know this may have gone like 30 45 minutes but in the end of it at the at the end which is where we are now we have an entire website it was completely free and I hope that you can host the projects and you can create create more projects I will be coming out with more projects myself that hopefully will be interesting to you in the future so with that being said thank you guys for joining me for you who stuck it out to the very end you are fantastic you know send me a post your website on LinkedIn and tag me in it because I love seeing um you guys do these projects and this stuff so I'm super excited to see all of these um that you guys tag me on on LinkedIn and whatnot so with that being said this is it I hope you learned something I hope that it worked for you and I appreciate you watching be sure to like And subscribe below and I will see you in the next video [Music] goodbye what's going on everybody welcome back to another video today I'm going to help you create a data analyst resume [Music] now when I say data analyst rume it's not that much different than a regular rume except that it's going to be catered for a data analyst job in just a second we're going to take a look on my screen at a sample resume I'll have the template in the description so you can just go and download it and fill in your information but it's a fantastic starting place to actually creating your resume when we're looking at this resume we'll take a look at each section kind of dissect each part of it and then at the very end I'll give some extra tips on what you should include and how to actually write your rume as well so without further Ado let's jump onto my screen take a look at the rume and see how you can create your own data analyst resume so here's our sample resume I'm just going to walk through the entire thing super quick and then we'll break down each section individually I'll give my thoughts and some tips on each section and remember you can download this exact thing in the description below I'll have a link I'll probably put it on my GitHub or somewhere else but it'll be free to download uh so you can go ahead and do that but let's zoom in just a little bit so at the very top we have our header we have some just basic uh contact information then we have skills then we have projects and notice the projects are up here at the top and we'll get to that later about the order of where you should be putting your things then we have work experience and then we have education so really quickly I'm going to zoom out and I hope you can still see it the order is actually quite important now there is one piece that is not in here right now and that is a summary section I don't have a summary section on my real resume I just I don't think it's useful or helpful I don't have one you can include one and it would be right up here at the very top now why do we have the skills and projects at the top well it's because that most people who are trying to break into a data analytics don't have any experience in data analytics if I am reading this resume as a hiring manager and the first thing that I look up here and I see is experience and it's not analyst it's a teacher or a nurse or something I'm going to be like this person doesn't have any experience I don't want to hire them the first thing that you want to have in your resume is something that is good for the hiring manager to see the first several things you should put all your best stuff at the top that's my uh what I believe so I think that these skills are really strong a lot of great skills and then these projects are all really good projects now this is just a sample these aren't all real projects um or they are real real projects they're just not you know ones that I built myself it's just a sample so uh then right here we have our work experience now if you're like I said a nurse or a teacher or a lawyer or something that's not relevant to data analytics you want that at the bottom um and then you're going to want to tie in uh some things in these descriptions and then the education at the bottom my education was terrible okay I had a bachelor's in recreational therapy which had nothing to do with data analytics so for a tech job has was not good I always had mine at the bottom so let's start at the very top and walk through each section so at the very top you want to have maybe a title but for sure your full name you definitely want to include your phone number if you're okay with them calling you but definitely an email for sure include things like a LinkedIn profile or a GitHub profile you can also put your portfolio in fact I highly recommend putting your portfolio because it just looks good or if they check it out that's a really good thing and then your location cuz sometimes your job is going to be location based whether you're in Dallas or another Metropolitan City it's just nice ni to have that on there this should be the simplest one to fill out unless you haven't built out something like a portfolio you just don't include it um but this one should be the simplest one right you're just putting contact information maybe a link to a website next we have the skill section and this one on my own personal resume I have at the very top I typically recommend anyone who does not have experience who is trying to break in to data analytics to put this at the top as well and have these skills and know these skills that's important um but when the hiring manager first initially sees this there's just going to be a mental check okay they have the skills that we're looking for let's move on to the rest of the resume um but you want as many mental checks for what they're looking for at the beginning just going to I'm going to keep repeating that um this is how I personally write my skills so I write something like SQL and then I'll say SQL Server my SQL postrace SQL now I have used all these different types of SQL in my actual job if you don't you haven't done done that and you're just starting out maybe you put something like um you know subqueries store procedures joins whatever the actual things within SQL I don't really think I don't recommend that as much because typically people know what SQL is like if they use SQL they know what SQL is so they're just going to expect that you know those things now for something like python it's different because there are packages something are there are packages and libraries within them so you can specify I have worked with pandas in my actual job and I look for people who know pandas as well because you know we use it so actually specifying these packages or libraries is really helpful so this is how I would put these things on a resume now this is another resume this is our sample two I'm going to maybe include this one down below although I don't like this format as much but if you like it you can but here's another way that you can um show these skills just a different way to do it I want to show you both ways um we have like Python and the libraries underneath it I've even seen it to where people will write out almost like um let me go down here they'll write out like a narrative um they'll do Python and then they'll have like a colon and then they'll say use to um manipulate data and I'm not spelling that right in pandas dot dot dot and they've write it out you can do that as well again I'd like bullet points because it's to the point it's exactly what you need let's get rid of this one real quick so this is the one uh that I like so that's the skill section let's move down to the projects now the project section is almost primarily for people who are just starting out once you get experience typically you maybe have one project on there or no projects at all but the project section is used as kind of um inl of actual experience right I've always said that you need to build projects not just for your resume but also for the interviews so so then when you get into an interview you can point to these projects and say yes I've used SQL I did it in this project and they may have seen it and you can walk them through how you actually used it it gives you more credibility than just saying you know how to use SQL So within the project section we're going to have a project like this one says data science job market exploratory data analysis so this is a personal project and then within it they did some really great stuff here's usually what I recommend and this is in here which is you specify what you did you say I used Python and what did you do to analyze this and gain insights in the job market then you walk through some of the things that you actually did things like regex techniques you used pandas matplot lib you built a wordcloud these are keywords that somebody will look for and they even highlighted them which I personally like and do as myself they highlighted these things so that the viewer or the um hiring manager is actually seeing them making sure that they're bold so that they are catching their eye so I personally do this and I recommend this that's all it needs to be it just needs to be I built a Tablo dashboard doing this from this data set I cleaned it in SQL and you show those skills something that's important in both the skill section and the project section is using and highlighting your skills as much as possible especially if you don't have any experience if you've never had a job before once you have a job and you come down to like the work experience then it kind of speaks for you but if you don't you want the projects and the skills to speak towards your skills and credibility so we have this right here now one thing that's not in here that I actually do recommend is a hyperlink maybe right here or actually this being a hyperlink to the project because they might read this and be like I we work with you know data science job market data I don't know and then they'll click on this link and they can see your work that is the one thing that I would change change in this other than that this is exactly how I would have it very very very similar to my own um and a lot of this that I did I actually took from other resumés and formatted how I prefer and like it um so again some of this is personal preference and you can change it however you want that's just how I like it so that is the project section now we're going to go down to the work experience section now this person does have a little bit of analyst uh experience so you know if you don't that's okay but you put your previous experience now here's what I recommend if you've been a teacher for 15 years you've been a nurse for 10 years you've had 10 different jobs don't put all your experience on here um maybe put your last two jobs going back maybe three years I don't recommend you filling it up because it's not going to be super relevant unless you're applying for a healthc care data analyst position and you have a Nursing degree then it's relevant and that experience is super helpful because it's domain experience right then you may go back five years just you know use your discretion but what you need to include of course your title where you worked your location and the times that's standard for almost any resume but within here uh what you really want to do is highlight again the skills if you can if you can't that'll change but in here he says implemented a new reporting using Excel pivot and VBA which reduced processing time by 50% these types of um quantitative information I reduced time I I I saved the company money I I did something quantitative putting that in here is always helpful always highly recommended although it can be tough to measure these things right typically what I recommend especially if you're first starting out is to highlight skills if you're a teacher you've probably used Excel and you've probably used Excel for closer to data analytics than you think just in a teacher way and not a data analytics way but you can reward these things and make them sound good if you are a a nurse like I was saying youve used used Excel you've used a health information system you've used uh some type of database talk to that include that in here um and it can be hard to write these out and I'm going to show you away in just a little bit about how you can write these out and think about these things or have a way to help you write them or give you ideas we'll get to that in a second lastly we have the education piece this is again really simple at the very bottom education what your degree was where you went um and if you have you know some help ful things to include you can do that and then when you actually went now you can include other things in here as well like boot camps if you went to a boot camp or you could also include things like a GPA although I don't personally recommend it GPA has never been anything that I've ever cared about or I've seen anyone care about ever um so you don't normally have to include it one other thing that you can include at the very bottom is something like certifications uh I personally don't put a lot of stock in certifications unless it is one that I have recommended in previous video like the Tableau certification or Tableau desktop certification if you're applying to a job that uses taow that actually could be really good so definitely include that but ones on udem me ones on corsera or like my Alex the analyst boot camp that I have on my channel I wouldn't really include that in your resume it's mostly for learning if you get something like the Tableau one or the AWS uh Cloud one or the um Azure Cloud one those are all actual certifications that can help you and give you credibility towards a certain skill now really quickly let's just take a glance at the other resume this is resume 2 so we have the education at the top doesn't have to be at the top unless it's relevant which you could put at the top we have a skill section they again this is the projects same projects and then work experience this is just a little bit different um order so you can do it like this as well in different way you can write the skills and you can also include a summary section as well so that's the meat and potatoes of how I would create create a data analyst resume now writing it is actually a different Beast right you have to actually write it out get something on the resume and then apply using that resume but it can be hard to come up with these ideas so uh I just want to show you something that a lot of people have been using I personally haven't written a resume in a little while so I don't use it for my own resume or haven't used it but I will um and that's using chat gbt or some variation whether it's on Bing or you know you get some different version or some new product that's out there at the moment I'm just going to show you how to do it in chat GPT some of the things that you can prompt it to do and that'll be it I'm just going to show you kind of some ideas that it can generate for you to help you write these things all right so here in my screen we're on chat gbt if you haven't used it I'll leave a link in the description I also have a whole video on how to use chat GPT for a data analysis um so I like chat GPT now I've already written out these questions because I don't want to wait for the responses but here's what I asked it to do and you can do some variation of this whether you're a nurse or a lawyer or a teach teacher or whatever I said I'm a math High School teacher trying to become a data analyst how can I use my experience on my resume to help me get a job this is just to help provoke some ideas and it says you know you most likely have some skills emphasize your quantitative skills so those are some of the things you can focus on showcase your ability to commute complex Concepts which is really important in data analytics being able to present information which teachers have highlight your experience with technology hopefully you're using some type of uh you know database for students or you know Excel or something like that you can highlight that and showcase your ability to solve problems now the next thing that I asked it was I built a covid tableau dashboard using Tableau how can I add this to my resume and then it's going to tell you exactly how you can do that it's going to say include the link to your dashboard which I also recommend provide a brief description highlight your data visualization skills include screenshots or images which that's what I would be putting in the project itself not on your resume then provide context for the data all really good stuff really great now the last thing is kind of what I'm trying to get at as a whole it can help you write things so I'm going to say write a two sent I said write a two write two sentences highlighting my covid Tableau dashboard to add to my resume and it's going to say developed a covid tablet dashboard to visualize pandemic Trends using real-time data sources demonstrating strong data visualization and Analysis skills so this can help you generate those descriptions in your work experience it can help you generate the descriptions in your projects and this can be really helpful to just generate some ideas cuz I personally really struggle with like highlighting my skills and descriptions within those things this can be a way to kind of help you do that so don't you know just copy and paste but let it prompt you let it give you ideas now the last thing that I want to mention is just your overall resume as a whole the template that I use the template that I recommend is very very friend friendly to these automated systems that check your resume if you did not know most companies especially big companies use these automated systems that scan your resum see if it has what they're looking for and then that rume if it gets through that system gets passed on to a recruiter or hiring manager typically most companies don't go straight to the hiring manager so you need a resume that can pass through those initial systems and pass those tests the RS that I've shown you today will do that they have bullet points they have the keywords they have everything you need that's why I recommend or partially why I recommend this type of resume other ones that have images and different fonts and different stylings can cause issues with these automated systems where it just doesn't read it properly or you know it doesn't read the right words that you want it to read so just know that these types of résumés have different uses right you're not just handing it off to somebody to where they can read it and it's needs to be visually stimulating really what you need is you needed to get through those initial systems which these resumés uh if you write them well you have good you know skills and the right things on your resume they will pass through that first layer to get to those hiring managers so again be sure to download those those are completely free I just I highly recommend using them I think they're really good so be sure to download those use those just put in your own information be sure to build out your own projects don't just keep the ones that are on there because you'll need to be able to speak to them sometimes recruiters or hiring managers are going to ask you about them how you build it what you did and you can also point to those projects in your actual interview so I hope that this was helpful I hope that your resume is ready to go I hope that you ready to start applying for those data analyst jobs thank you guys so much for watching I really appreciate it if you like this video be sure to like And subscribe below and I'll see you in the next [Music] video what's going on everybody my name is Alex freeberg and today we're going to be walking through my top three tips on how to use LinkedIn to land a job LinkedIn is a fantastic place to look for a job it's its own little ecosystem where career-driven people can connect and talk with one another and help each other find jobs I personally have landed jobs through Linkedin and so I know how effective it can be let's jump over to my screen and I'm going to show you my top three strategies that I have found to be the most successful to actually finding a job so I'm logged into my completely Anonymous account here and I'm going to show you the very first tip which is you shouldn't be just applying to a position you should be actually reaching out to the recruiter and I'm going to show you exactly how to do that so the first thing that we have to do is actually find a job that we want to apply to so let's go to the job section right over here and let's search for data analyst and let's do that in let's do Chicago because why not uh so it's going to search for data analy positions in Chicago we have one right here let's see what it looks like cuz you know I don't want to apply to jobs that I'm not extremely qualified for so this is a job that I want to apply for and before I actually go and applies to the job I want to see if I can reach out to a recruiter and talk to them beforehand so let me show you how to do that so what we're going to do is actually click on the company right here it's going to take us to basically their LinkedIn profile page for their entire company and we're going to scroll down we're going to go over to people and then we're going to search for recruiter so if we scroll down all the way to the bottom we can see that there are recruiters that actually work inh house for this company and so now would be a time where I actually reach out to some of these recruiters and I say hey I see a job that I really like I think I'm really qualified for and I would love to talk more about it with you you can ask them things about the job to make sure that it is a good fit for you and then I highly recommend you asking them what they think is the best way to apply for this job to make sure that your resume gets noticed and you get an interview since they are a recruiter who works at this company they may be the the one who's actually going to be looking at these resumés and so they may give you a tip on the best way to actually apply they may also just ask you to send them your ré directly that they can look at it or maybe later on down the line this actually is a person who is reviewing resumés and so if they come across your resume they may be able to put a face to the name and that may give you bonus points I'm going to leave a template script in the description in case you don't know exactly what you want to say to this recruiter and it'll give you just a baseline of some of the things that you might want to say number two is to actually ask for a referral now if you don't know what a referral is it is is where somebody who already works at the company can refer you to a specific job and then might get you a little bit higher on the list for interviews so I highly recommend reaching out to somebody who already works at that company and ask if they're willing to be a referral for you I get people reaching out to me all the time asking to be a referral for them for my company and nine times out of 10 I say yes I always ask to see their resume first just to make sure that their resume aligns with the position at least a little bit but there's basically no harm in me being a referral for somebody in fact I may actually get a bonus if that person ends up getting hired and so for the most part there's almost no risk for the employee to actually being a referral and so a lot of times they will say yes now let me show you how to do that and it is very similar to finding a recruiter so we're going to stay on this people section but instead of searching for a recruiter we're going to search for a job title that is similar to yours so let's actually see if they do already have any data analysts and if they do that is the person that we're going to reach out to because that is the person we'll probably have the best connection with so it looks like we have six employees and let's SC SC down and so it looks like all these people have data related jobs and so I would reach out to these people and say I saw an open data analyst position at your company I would love to know more about your company as a whole and then you can talk to them a little bit and then in the end your goal is to ask them for a referral and if that happens that is fantastic and then you can go ahead and apply for the job and mark them as a referral for you now my third tip on how to get a job through Linkedin is to actually have recruiters reach out to you so let me show you how to do that the first thing we're going to do is actually go over to my profile here and we'll click view profile now there's a few things that we want to make sure that we have on here so that recruiters can reach out to us the first thing that I want to do is to actually come to this section right here which is show recruiters you're open to work and when I click on this I can actually choose some job titles and some locations where I actually want to apply and have recruiters reach out to me and so right now I have data analyst I have in the DFW area which is where I live I can also add titles like business analyst um and then maybe Junior data analyst entry-level data analyst or things like that that could potentially have recruiters reach out to me for positions that I'm interested in and then you can say that you're immediately and actively applying and you can also say that you're only looking for full-time positions or contract positions and then you can actually add this to your profile and I only want recruiters to see that because I do currently have a job at McDonald's and so I don't want McDonald's firing me because I'm looking for employment elsewhere so let's save that and it looks like it was updated and so now when recruiters are searching for candidates for a specific position you will be on that list so that they can find you and reach out to you something else I should mention is on your profile page I would try to have some type of professional photo so that you look really good I would also try to include data analyst somewhere in your title if you already have a data analyst job and you're looking for another one you can just have your previous company but if you're looking for a data analyst job you can always put seeking data analyst position or something like that another thing I think is really important is having really good descriptions for your previous work I don't currently have this but I would go a little bit into the work that I actually do make sure that the experience matches kind of what you're looking for if you do have previous experience if not that's totally fine the next section on your profile page that I would recommend looking at and updating is your skill section and so you want to go in there and make sure you have all of your relevant really data analyst heavy skills on there specifically hard skills because soft skills aren't going to translate too much into this section I would definitely stick to things like SQL python Tableau Excel things that data analysts are going to use because this is where they're going to actually look and see if you have the skills that they are looking for for that position when I was applying to jobs in only applying to job postings and not using any of these strategies my success rate was 0.04 which means out of 1,000 applications that I filled out and sent my resume to I only heard back from four of them to actually get an interview but with these strategies I was able to get that up to 10% and at my best I was able to get that up to 15% but that's because I was applying to a lot less positions and I was targeting jobs that I really wanted to work for and so I put in more effort in order to contact people and work with Recruiters in order to get that job I genuinely hope that these strategies can be helpful for you especially if you're trying to apply for jobs right now thank you guys so much for watching I really appreciate it if you liked this video and got anything out of it at all be sure to like And subscribe below and I'll see you in the next video hello everybody congratulations if you are watching this that means that you completed the data analyst boot camp if you haven't don't keep watching this is only for people who have completed the data analyst boot camp playlist on my YouTube channel woo all right now that we filtered those people out I'm going to show you how you can download your certificate and your certification now that you've completed the data analyst boot camp I will leave a link in the description but let's go on to my screen I'm going to show you how to actually access this and download your certification all right guys don't go around telling people this or sharing this uh but this is our data analytics boot camp on the Alex the analyst GitHub right up here I will have this link in the description what you can go ahead and do is you can come right here you can download this you'll just right click or click download and you just do something like save image as um or you can come to this one this is the one that I think is the the real money maker here uh this is the certificate of completion for the data analytics boot camp I have my not signature but my name as well as U my position with a blank space right here to fill in your name feel free to put this on LinkedIn or Twitter or Instagram and tag me in that because I would love to just say congratulations because honestly it's a lot lot of work to go through all those videos and learn all of those skills so congratulations I hope that you learned something along this journey a new skill a new thought a new idea and I'm proud of you I'm proud of you for putting in the work it's not easy but you did it and I hope that you came out on the other side better for it so congrats I'll see you in the next [Music] video