Transcript for:
SQL for Data Analytics: Full Course Tutorial

dat a nerds welcome to this full course tutorial on how I use SQL for data analytics this is the course I wish I would have had when I first started learning this tool as it's a super easy to learn skill when taught properly and I feel like you can strongly master the basics by the end of this video now this tool is one of the most highly sought after skills in the data science Industry don't believe me let's check out the data for skills required in data analyst jobs SQL is the number one skill in almost half of job postings for data Engineers it's still number one and it's more than half for data scientists it's the second most important skill right after python now SQL stands for structured query language and it can be spoken either SQL or SQL if I needed to perform analysis on my computer of a data set in a database sqls the programming language I'll be using to extract out any insights and this skill is something that I've been working on refining for years all the way from my first job as a data analyst in Corporate America to even my last role with Mr Beast sequel is everywhere so what will'll be covering in this course well I've broken it up into three chapters we'll be starting at the very Basics and then building way up to an advanced section building our very own Capstone project for the basics chapter we're going to start at the very beginning including breaking down important Concepts behind SQL databases from there we're going to jump head first into actually practicing SQL queries focusing on the basics of common keywords simple analysis and even more advanced topics like joints now don't worry if you have no coding or analytical experience as this chapter is designed for you now after we get the basics down we're going to then jump into Advanced Techniques we're going to walk through creating and setting up your very own database locally on your own computer we're then going to move into more complex analysis using things like CTE and subqueries now you're easily going to forget these basic and advanced concepts unless you actually get into implementing what you learned and the best way to learn something is to build something and that's what we'll be doing in the Final Chapter we're going to be working to solve a real world problem problem analyzing the top skills and jobs in the data science Industry we'll be using the data set powering my app data nerd. Tech that provides insights on data science job postings and by the end of it you'll have your own custom project that you'll be able to showcase on GitHub for the world to see now I'm a firm believer in open- sourcing education so this course is completely free and all the resources that you need for it are included so I'm going to be providing you locations to actually run your SQL queries along with the data sets when you build your project but I'll be honest YouTube isn't paying the bills like it used to so I have an option for those that want to actually support the course while also rewarding you for doing this for those using the link below to contribute I'm going to give you some extra perks that will help you learn SQL even faster you're going to have access to multiple practice problems within each section they break down the problems into easy medium and hard and even provide the solutions for completing the problems which will reinforce all the skills that we're learning in the video additionally I'm providing all the course notes for this video they're going to provide even more details and resources for the sections along with breaking down the queries in the videos a lot further and then once you've completed the course I'm going to award you a certificate of completion now one quick shout out before we jump into the first chapter on Basics explaining what is SQL I want to give a shout out to Kelly Adams She is the brains behind the majority of this content in this course she's a full-time data analyst that I've been following on LinkedIn for years now and I was beyond ecstatic whenever she said she'd team up with me to build this course I'll be linking Kelly's LinkedIn and also portfolio below the like button so be sure to check that out all right with that let's actually jump into the basics chapter all right welcome to this chapter on the basics and for this we're going to be focusing on the very fundamentals in this section and also the next so in this section we'll be breaking down what exactly is SQL and then after that I'm going to give you access to all the different data sets so you can work the problems right alongside me now what I'm showing here are the course notes and for those that have contributed to the course you'll have access to go right alongside me and see all the different notes as I'm walking you through this so with that let's actually dive into understanding what is SQL and for this we need to understand two major Concepts behind this the first concept is around where you'll be writing and executing your SQL queries and the other main concept is around databases where our data is actually stored so let's dive into this first a database is a collection of data and these things were made to hold massives amount of data an Excel file Can Only Hold around a million rows of data but a database can easily hold millions of Excel files so if you do the math that's a lot of rows of data now about those databases there's two major types of databases relational and non-relational relational databases include things like tables very structured data non-relational databases are for unstructured data we're going to dive into each in a little bit but first let's go back to where we're actually running and executing SQL now whenever we write out SQL code in order to request data from a database this is called a query and this SQL code can do a whole a heck lot more that just request data a common acronym associated with what it can do is called crud and it stands for create read update and delete with SQL keywords like create an insert into you can add new records then with keywords like select you can retrieve specific data you need next with the keyword update you can modify existing records and then finally when the scariest of alls if you need to remove records you can use delete now these are all commands we're going to be covering throughout the basic and advanced section so it's nothing you need to commit to memory right now so now we understand the basics of querying and also a database but where the heck are these databases even stored there's a couple major options you can either run it locally on your computer or you can use a server now for this this course we're going to be running the database locally it's going to save a lot of costs and also I feel you learn a lot by running databases locally before you jump into something more complicated like a server local computers are also used during development and testing so you may find yourself from time to time actually downloading a database on your computer so you don't mess with the real one so where a real world databas is typically located and you have two major options one is a company server and that's something that's actually hosted internally and you have has the company full control of This Server this option is nicknamed on Prem now the other options like Cloud providers do this for you companies like AWS Google cloud and aure are known for this and frankly for a price they'll manage all the different headaches of managing your own server so this is conveniently called serverless and that doesn't mean there are no servers it just means you don't have to have the headaches so now that we understand the databases can be stored either on your computer or on a server what are the different databases we can store in these locations well there are two major types relational and non-relational databases relational databases store data in rows and columns this is very much used in transactional applications say I had a database of job postings that companies go to daily in order to transact with an update of what jobs are available you could have one table to keep track of all the different jobs and then you could have an Associated table to keep track of all the different companies and their information there would then be things like IDs within the two in order for you to relate the different tables hence relational databases this course will be primarily focusing on using SQL to interact with relational databases but the other type of databases are non-relational databases commonly known as nosql but nosql doesn't stand for not SQL it stands for not only SQL these type of databases support a host of different options including unstructured data so not only can it include more structured data like column and row based data but also you can have other types like key value payers graph-based and even document based these are using your everyday applications to track your connections on LinkedIn or to manage your docs in Google doc so what are some popular database options well if we navigate to the stack Overflow developer survey from 2023 over 76,000 respondents voted on what is their database of choice if we look at the top five options four of these are relational databases and only one is a non relational database and for this we're going to be focusing on the number one and three option the first one is pronounced postgress you don't have to pronounce the SQL at the end the other one part of the Creator is called SQ light but I'm pretty lazy so I'm just going to call it SQL light so we covered everything we needed to know for databases but where the heck are we going to be actually writing and running these queries well there's a a few popular options for this and it's commonly done in an editor whether that's from a database provider a cloud provider or even just a plain old code editor let's talk about database provided editors first for popular options like postgress and MySQL they'll actually provide you an application to use on your computer to access their databases inside of you'll be able to track status of your different tables and actually look at the progress of it and you can even do something as simple as writing SQL queries now if you're using something like a cloud provider like Google cloud or Azure they're also going to give you the option inside of a web browser in order to interact with your database I keep a lot of my data on Google cloud and it allows you to monitor all the different tables that you have along with an interface to go in and query the data now the third option is the one that we're actually going to be using but I want you to be aware of the other two and this is a code editor during the second and third chapters of this course we're going to be focusing on using a popular option VSS code which allows you to not only manage all your different SQL files that we'll be using for this but it will also allow us to run these queries for the beginner section we're going to be using a code editor right inside of your browser this bad boy is called sqlite viz and it's completely free to use this is a popular open- Source option that allows you to run queries right inside your browser and in the next video we're going to talk about more about what the data and databases we're using inside of things like sqlite VI and vs code all right so now you should have a basic understanding of how and where SQL queries are run and then what and how databases are used in order to store data so with that let's actually dive into seeing where you're going to be playing with all this data all right see you in the next one all right welcome to the section on an intro to a course where we're going to be focusing on understanding what data sets we're going to be using for this and how we're going to be running those data sets but before we even talk about any of these data sets we need to understand what problem we're going going to be solving for the entirety of this course you're going to be taking the perspective of a job Seeker with the goal of identifying what are some of the highest paying jobs along with what are the most optimal skills to learn so in order to solve this problem you need a data set that has this type of data in it well luckily I've been collecting this type of data in my app datan nerd. te and inside of this app it Aggregates job postings in the data science Industry in order for you to see things like what are the top skills for data analyst additionally it also includes pay requirement for the different jobs along with detailing how these skills are paid based on a certain job title so the data set that we're using for this course is from that app and specifically it's for data science job postings in 2023 here is an ER or entity relationship diagram about the different tables in this data set we have four major tables one fact table containing all our different job postings then two Dimension tables that contain key information about the skills required for those jobs and then finally a dimension table around the different companies that are posting these jobs so what the heck are these fact Dimension tables you probably never heard of fact tables contain the core data for the analysis they measure and record actual events so in our case the different job postings because of this there's typically a high volume of records and then there's usually some sort of foreign key to associate it to a dimension table the dimension tables describe attributes or dimensions of the data so in our case skills or company data these are really important in supporting filtering and grouping different sets of data they usually have fewer rows of data but generally are more descriptive anyway going back to the drd the job postings fact table has a job ID column in it and this is a unique ID to all the different job postings this ID is inside the skills job dim table as a foreign key and we have this in a separate table because there can be multiple skill IDs associated with the job ID and conversely this is associated to the skills dim table which actually has the list and the type of skills that are associated to the job back to that job postings fact table we also have a company ID column and this column is associated to the company dim table which in this table has key information about the companies such as its name and its URL that you can access anyway that's all theoretical talk let's actually dive into actually looking at the different data we're going to be using for this and for this you can navigate to this URL right here and it's could provide access via that tool SQL light viz in order to analyze and visualize our SQL queries we're going to dive more into this tool in the next section but for now understand that I'm running a query up at the top and the results are appearing below anyway this is the job posting fact table so for this analysis that we're going to be doing I'm going to be analyzing it from the perspective of a data analyst because I'm a data analyst and that's what I'm interested in but you actually have different job titles available to use here's a list of all the different options you can not only look at data analysts but also data scientists or data Engineers if you wanted to you can explore senior roles or even roles that are slightly related such as business analyst machine learning engineer software engineer and Cloud engineer so feel free during this course to modify any of my queries in order to adapt it to what job you care more about now the data set includes postings from around the world I'm primarily going to be searching it for remote jobs but you can adapt it to any location that you find fit and the last thing to app about is it also includes a lot of companies from diverse perspectives not only tech companies but also Commerce companies now there is one other data set used in this course and we're only going to use it from time to time specifically we need it for our arithmetic operations this is a fictitious data set Based on data science job invoices there's only one table inside of it and it's conveniently named invoices fact we'll go over more of this data set in a future section so we covered what data we're going to be using for this but what data base are we going to be using to actually host all this data well as mentioned in the last video we're going to be focusing on postgress and SQL light this beginner chapter will all be based on SQL light it's a lightweight file based database and it's ideal for small to medium apps with zero configuration it's so small in fact that it's actually running right inside your browser whenever you access this tool SQL like Vis for the second and third chapters in this we'll be using postgress it's an advanced open-source relational database suited for large applications and it supports a lot more complex queries so because it's open source you'll be able to download this database on your computer for free and use it now now I do want to call out there are some slight syntax differences between the two and syntax is the set of rules and structures you have to follow while programming for the majority of course commands like this are going to work in both these different databases it will only come to very Nuance situations where these two databases are going to have differences in it but really we're not going to find a lot now that we've laid all this ground work let's actually jump into doing all these different queries for those that purchase the course notes whenever you navigate to a section it's not going to only show showcase what databases you're using for this but also we'll have an overview of what we're covering for each of the Min sections within the chapter you'll have some basic notes on the topic along with the query and the expected results when we get to the practice problems within the chapter it will have similarly not only the data set but then also the questions and the solution all right so with that enough me yapping let's dive into it see you in the next one all right welcome to this basic section where we have this mini section on the basics we're going to focus on on a handful of keywords that I use on a routine basis and I think you need to have committed to memory in order to know how to use and so we'll be actually implementing it and running queries with these commands to learn more about them first up it's actually instead of one keyword we're going to focus on three and this is going to be saying select star from and a database table select is a keyword that identifies a column or columns we want to connect to and from identifies the table or tables we want to connect to let me show what I mean so routinely I'll need access to some sort of database for my job I'll have to reach out to something like a database administrator and they'll give me access anyway select and from is how I'm going to verify that I have access to the database so in this case let's say I got this email and it says I have now access to this jobs 2023 and it has all these tables in it I want to go now verify that I have access to it so let's verify we have access to this and you're going to navigate over to the URL that I have linked right here and this is going to be our little workspace that we're be working in in order to verify access now when you pop this open you should have two separate Windows one an upper window where you're going to write the query and then a lower window below this that is going to display all the different results that you have we can see here that this query is already executed I can open up this left side pane right here and actually look into the database that we have access to in this which is jobs 2023 then all of the tables within that database so these four tables and then you can further break it down or see what's in it by expanding this and then this displays the columns within that table so I'm going to go ahead and close that out cuz I don't want to really look at it and that first statement should already appear on your window right here you can go ahead and run it again if you want by either pressing run SQL query and it'll go through and fetch results looks like there's 33,500 available you can also as a quick shortcut press controll enter and it will also run the query now anytime I get access to the database this is one of the first commands that I'm running in order to see all the different columns that I have access to and to ensure actually that I have access to that table so in this case job posting is fact if I wanted to I could also see another table in this data set I can put in company dim and press control enter and see that hey yeah I also have access to this one now these commands must be written in this order you would think you would want to say from a table select these columns um because that would be the order but if you tried to actually run this control enter it's going to give you this error syntax it has to be in that specific order of Select and from and I'm going to be sprinkling also best practices in through this so you'll notice the select and from in this case are all capital letters these keywords are not case are not case sensitive so in this case I could have lower case and also uppercase mixed in with each other they're still going to work the values for things like the table name are case sensitive depending on the database we're using SQL light in this case so if I were to use a Capital C it's still going to work whenever we start using postgress in the basic or in the advanced section it is going to be case sensitive so it's best practice to just leave this lowercase as it was intended in general these uppercase keywords and then lowercase of things like the column names or table names makes it easier to read and so that's what you want to do especially whenever you have to get into troubleshooting later you want to be able to easily read it and able to edit it now in this example we selected all columns but there's going to be cases where you don't want to select all columns and I'd argue in most cases you don't want to select all columns it's very resource intensive for the server that's hosting this database to provide all those columns so you'd want to actually fine-tune the specific columns you want to use so in this case let's say I wanted only the two columns of job title short and job location instead of using this asterisk which is used to select all the columns instead I would just specify the column so I'd say job title short and also job location I'm going to put a comma between each one of those to say that hey we're moving on to the next one this case when I press controll enter executing the query we can see we have now these two columns shown now moving into some best practice for this one I like to have it to where whenever I'm selecting multiple columns I put it on separate lines to make it just more readable now with specifying these column names from a table they can actually be specified by saying the table name itself and then using this dot operator to then showcase the column after it I can do this actually for both of the these here because they're both within that same table and whenever I run this query going to get the same results now this is going to be more important later on whenever we're combining multiple tables and you actually need to specify where this column is coming from within a table but in this case because we're only using one table I'm going to say it's not necessary and I'm going to go ahead and remove it but I just want you to be aware of it for the time being and show you that it still works all right so we already understand that quering all the different Columns of data set can be intensive for a Ser EX for really big data sets another thing that we want to do besides limiting The Columns is actually limiting the amount of rows and this can be done via the limit statement we can spef specify any number of values after this and this will specify the number of rows we want it to return right now these results are returning around 33,000 job postings back so in this case let's say I only want to provide back five I'd put limit five and that limit five needs to come after the select and from statements and it's the very last statement you'll ever be writing inside of a SQL statement I'll press controll enter and now we have five rows retrieved we can also see that for the five rows retrieved it only took 026 seconds running this without that limit we can see that it takes around three times longer to get all those different rows now we're working with a relatively small data set so you're like look this is only milliseconds yes whenever we start getting into millions or even billions of rows this is going to add up and limit is going to save you some time one note on best practice so I talked about indenting these lines to make it more readable typical best practice is to have anywhere between two and four spaces personally for me I just hit Tab and it automatically inserts four spaces into here I could also do multiple different tabs and if I were to press control enter in this case it's still going to execute whenever your SQL statement is sent over to the database itself so effectively all that indentation is removed and in this case like it's one line although this is hard for a human to read the database itself can obviously go through it and understand what it needs to be next up is the distinct keyword and this one is going to follow select in order to select a distinct amount of values within the rows of a column this is going to be a very resource intense type of calculation because it has to go through all the values in a colum column and then Aggregate and find what are the distinct values so let's take for example this job title short column right here I'm going to go ahead and reset this to include all the different values if I scroll down it I can see a lot of the values in here such as data scientist or repetitive and then that engineer I've seen multiple times that analyst so let's say I want to get all the unique values from this in this case I'll remove that other column and then specify distinct and from here press contr enter to run it now I also didn't have that limit command remember so now we have 10 rows retrieved and these are all the different values or unique values from this job title short column now this isn't only limited to categorical data like in this case let's say I wanted to get the unique values from the average salary column now I could do the same thing of select a stinct salary year average column from this database press and control enter and then from here I now have all those different unique values within it looks like there's over 2700 unique values now you may notice I have two queries within here and you can see that each one of these queries I've added a semicolon at the end the semicolon sometimes you'll see me use it sometimes you won't it's used to symbolize that this is the end of the SQL statement that you want to execute in this case we have two separate and distinct uh SQL stance uh no pun intended with that and in this case with our editor we can only show one result of a SQL statement mainly that last one in this window right here when we get to the Advance section and we start using a different code editor you'll see how you can actually run all these different queries and have them populate in different tabs anyway for the time being anytime you're running queries I like to just keep it one in there and whether you have a semicolon or not it's up to you I'm going to leave it to my colon free next is the wear statement and this is used in cases when we want to filter out particular data we already talked about specifically now you can do things like select columns or limit the amount now we can go down even further and actually filter into what we need all right going back to a query where we have the four Columns of interest that we're looking at from our job postings fact table and we can see we have a lot of different values from the different job typ short so we can use the wear statement to filter down to something more I'm more interested in such as data analyst the wear statement will be always directly after the from statement and we want to specify a condition we want to specify that that the job title short is equal to data analyst now we can go into either other condition such as greater than less than all that we're going to keep it really simple for now now the main thing to note right is that data analyst is in quotes in this case because it's a string character and it's immediately following that column whether there's spaces in here or not that doesn't really matter I just do this for readability pressing control enter we can see that we've now filtered in on this data and it only has data analysts so previously we had around 30,000 rows now we have less than 10,000 rows and we're not just limited to categorical or that text data we could also do it for numerical data so in the case of that salary yearly average column we could say hey we want to have everything that's greater than $90,000 and then now on scrolling through it we're can see that we do we have around 16,000 values that meet this condition now now that we're adding all of these more complex conditions this adds a great use case of now adding comments comments are denoted by these two dashes or tax if you're ref from the military before something that follows it so anything to the right of these two dashes is ignored no matter where those two dashes are placed and this is the standard practice of documenting your code and if you do something complex keeping track of it typically a comment would come at the beginning of a query to specify what's going on in here and in this case it's right at the front executing this query you can see that it ignored it you can even put it after a SQL command as you can see here it's grayed out so that means that basically showing to you that it's going to be ignored and so in this case when pressing control enter still executed and now I have a documentation for why I maybe filtered out this certain subset of data the other common use case of this is if you're debugging or troubleshooting say I didn't want to care about this column or this column right here I can then put those two dashes in front of it whenever I run the query it only Returns the coln colums that don't have the dashes in front of it now the other comment besides a single line comment is a multi-line comment and it's denoted by a forward slash asteris and then an asteris forward slash that forward or back slash just forward slash we're good anyway this is a multi-line comment and it's commonly used whenever you have to use well multiple lines but when you have to be more robust in your description on what you're trying to convey let's say I wanted to convey to whoever's going to be using this query why do I keep on using these four Columns of interest over and over again well I can leave a note section in here that says hey these are going to be the most common you're going to see throughout these queries and here's the reason why well they provide the most comprehensive view of the data and there's the common factors that most people are looking at um we also don't want to call all the columns because it will just take up too much processing anyway if we go and run this query we can see that hey it is in fact shown and just to show the point if I were to remove these operators right here and try to actually try to run this query it would give me an error near the note because it's going to try to execute this and things it's SQL syntax so I just fix that up real quick all right and the last keyword that we're going to cover before we get into some practice is order by as you guessed it this is used to specify a column and order it by that value so let's go ahead and use this order buy is going to come almost directly last the only thing that would come after order bu is a limit keyword in this case let's say we want to actually Aggregate and be able to see the salary from lowest to high pressing control enter it's going to go ahead and execute now it's going to notice here that we have null values first and that's because null means that there is nothing there for that value so it's even less than if you will Zero but if we scrolled over some pages that I've done here we can see that now yes when we have values inside of here it is going from that low to high those values are slowly rising and in an ordered case now this is lowest to highest which actually you if you write ASC and go ahead and execute this it's doing the same thing here it's doing in ascending order but now we don't need to necessarily write uh ASC um especially as it's sort of repetitive but if you do want it in descending order from highest to lowest you would then specify DEC and executing this query we're now getting it from the highest to lowest this would be really good in this case we want to see what is the highest at salary we can now see it 650,000 now you may be curious what order should you be writing these commands in well here's a little convenient little cheat sheet on what you should be following for this follow it goes select from where group ey having order by and limit now I don't have these commands memorized I even asked chat gbt for demonic on how to memorize this and it said super fast works great beneath the surface and has outstanding balance and Leadership I not going to use that mainly we're going to go with trial and error that's how I've memorized this over the years you're going to make mistakes get out of order but you'll understand what the error message is and you'll adjust from there all right now that we covered all of those Basics keywords now what your time to go in and actually practice this I would play around with all those different keywords that we have also exploring all those different tables that you have available right now now for those that purchase the course notes and certificates you have some specific practice problems available to you um there's about five for this section right now we'll be adding to this um and in it you'll have things like the solution and also the results to make sure that you're on track and actually following along and doing it correctly all right good luck with these see you in the next one all right let's get into this section now on comparison and also logical operators and you've been exposed to this previously when we use that wear keyword before to filter for job titles of data analyst we used an equal operator this is called a comparison operator we're going to go over all the different types for this these type of operators are used after the wear or even the having clause in this section we're only going to be focusing on the wear Clause we'll move on to the having as we get more advanced now in addition to this also we'll be using the logical operators which allows even more advanced functionality to find tune how you want to filter data since we already understand the fundamentals of the equal operator we're going to move on to the not equal operator and we can use use this symbol of basically these two Pacman try need each other or just the keyword not and it specifies that it's not equal to All Right Moving in back into the data let's say we're get this data right here and we want to filter it further let's say I have some Insider information that says those jobs that we get from the job platform AI Tech jobs.net is unreliable and I don't want to use it that's not necessarily true we're just giving it an example here well I include that wear keyword along with job via and that comparison operator of not equal to providing it what statement I want to meet on and then pressing control enter I can now see that they are removed now we could also use that not operator and that's used directly after the where keyword where not job via uh AI jobs.net this is sort of confusing but this in this case is going to rep return all the job Vias of this so in this case if we wanted to not use it I'd put that equal comparison operator in there run it and then we can see that it now has has this like this so not is a way to do basically opposite of what we want to do now besides something like the equal or not equal to operator for those that have numerical columns we can use things like greater than or less than to actually look inside of there and find a value and find things that meet that condition so in this case I'm going to filter for salaries that are going to be only greater than 50,000 it's also going to remove these null values whenever I do this pressing control enter see that it is in Factor move in case like this I would include something like an order buy for that salary year average and I did this out of order uh because I don't have these memorized and putting this in after the wear statement and actually running it it runs correctly and now we can see okay now they are ordered in this order and it's starting at that 50,000 in this case now you're not just limited to greater than or equal to also you have things like less than or equal to I'm assume you have familiar RT with how this actually works so we're going skip an example on that one and we're going to move into logical operators now starting with the first logical operator and this is great in conditions where we want to meet multiple filter conditions so an example shown if I want to meet a certain salary and job title I can now use the and operator to combine this in this case it's only going to show conditions where both of those conditions met are equal to true so it has to have a job title of data analyst and it has to have a salary greater than 100,000 so let's actually test this query out by plugging that in for that wear and specifying those two conditions I'm also going to leave in that order bu after this so we can read it more easily pressing control enter we now have it now remember it's greater than 100,000 not greater than or equal to so we don't have any 100,000 values in this and scrolling through we can see that like we expect it Returns the conditions that we're trying to meet for this now conversely to that and logical operator we also have the or logical operator and this would be used in a condition where we wanted to see if we meet any of the conditions so as the example shows it could be either data analyst or the salary is greater than 100,000 so we could have something like a business analyst that gets greater than 100,000 or we could have something like a data analyst that technically has less than 100,000 replacing that and with an or and then still keeping that order by so we can get through this quickly I'm going to press controll enter and as we can see we have nothing but data analyst positions for the null values and for values that are greater than $100,000 in salary we have everything that's not necessarily a data analyst it could be a data analyst and here's one at $225,000 so an or condition I'll be honest I don't use or as much as the and logical operator now let's let's say I went to search for salaries between 100,000 and 200,000 technically I could use that and operator combined with those logical operators to say hey I want something that's $100,000 or greater than $100,000 and less than $200,000 but there's actually a better way of doing this and that's using the between logical operator still going to be used within that wear statement in this case it's much more readable and much more concise we and say for a given column we want between 60,000 and 990,000 in this case and this is not just limited to numerical data you can also technically use this for Text data or even dates so I've updated that previous query now to have that between statement specifying the values and using that and statement to join it I'm still doing that order bu to make it easy to look at pressing control enter bam we have everything between including that of which is 100,000 and 200,000 now the last logical operator to cover is in and this I find is more common in Text data so the example shown job location if I wanted to search maybe Boston Massachusetts and also anywhere I could do this with an instatement and then I would enclose all those values whether it's two or even more so let's say in our case we wanted to look for both data analyst jobs and also data engineer jobs written currently it's a little bit too robust and has too much wording as we're repetitive with using job title short and then combining it with that or statement it does however if we went ahead and execute it we can see that yep data engineers and data analysts are only in this but instead I'm going to modify this now to include this in logical operator and then have data analysts and data Engineers within parentheses separated by a comma and this is also great because now let's say I wanted to add something else like a data scientist I can just add that in I don't have to type in the keyword again pressing control enter you can see now we have data Engineers data analysts and data scientist all right now it's your turn to try out these comparison and logical operators for those that have purchased the course notes and certificates you have a host of different practice problems you can work through and test your skills for that all right see you in the next one all right let's actually combine everything we've learned from that basic section and the comparison logical operator section into a more advanced query we're going to do this with a practice problem that I feel would be applicable if I was actually job searching so let's say I'm looking for roles I work as a data analyst and I could technically also work as a business analyst so I want to look for both of these rules and for this though I have some conditions for data analyst I only want to search for jobs that are greater than 100,000 and I know from some market research business analyst pays even less so I want to look for those jobs that are greater than 70,000 now in addition to this I also only want to include locations in we'll say that I'm located in Boston Massachusetts or I want any remote works I'll also include any of those that include the location of anywhere which are remote jobs for this we're going to continue to include those four main columns that we previously did so starting out with our core query we have our select statement of the four Columns of Interest we're want to focus on and then our from statement whenever I'm breaking these queries down I like to actually iterate through it so the first thing I'm going to look for and actually query down on is the location CU that seemed like the easiest I want to get Boston and I want to get anywhere so I add this wear statement for this job location and executing the query control enter we can see now okay we're limited only anywhere in Boston all right the next easiest thing to move on to is instead of just getting both that analyst and business analyst let's actually go down even further I want to get data analysts that are greater than 100,000 so I also have to besides meeting the job locations I have to meet this new condition now I'm going to be putting both the job title and then also the salary condition within this so I can use parentheses to say hey I want you to meet all of this and verify this before moving on to verifying the and statement and what do I mean by that well you remember by our order of operations uh from math the parentheses are going to go first so we're going to meet the condition first of is it dat analyst and is it greater than $100,000 are both these commission conditions met it's true and then in the next case it's going to check the job locations whether it's in Boston Massachusetts or anywhere so the parentheses help control this order now remember we also want to meet this condition of business analysts and salary year average greater than eight $80,000 so I can remove this comment here now I could put an or statement after this so we being this data analyst or this one but now we have this statement we're going to meet and job location and this statement and then it's going to be an or for this one so if I were to execute this because of this or statement now I have this business analyst right here and it's in D Texas and it doesn't meet our condition because I actually wanted it to be either in Boston or anywhere although it does meet our salary condition anyway I'm going to fix this by including this or statement now between the data analyst and the business analyst in its own parentheses to further specify hey I want you meet these two end conditions or these two end conditions then from there once that's met then verify the job location let's check this out and now we have everything we want looks like I have a typo here I did 80,000 earlier I'm meant to say 70,000 from our instructions we had 61 oh now we have 64 jobs available anyway this me to the condition of those data analyst or even business analysts anywhere with our salary conditions the key thing to understand from this is you have to iterate through this query I wouldn't ever expect you to just write this out all in one shot and then try and test it you're going to run into so many errors iteration is the key to success for this all right in this section we're going to be covering wild cards and wild cards are used to substitute one or more characters within a string we can use this by one using this keyword of like along with some special operators like the percentage sign and underscore oh and all this is used within the wear Clause let's jump into some examples in the first wild card to cover is the percent sign which in this case shown here we're searching for analyst and this percent sign symbolizes zero one or more characters so let's say I had some white space or some words before analyst or after it would account for that and only look for those words of analyst all right so previously we've been focusing on a job title short column and this is just a shorten version when we actually look at these job postings more in detail we get to it further we'll see that the actual job title and a list in these is a lot longer and sometimes has a a lot of fluff and unnecessary data in it but it could be a condition for this where we want to actually filter down on something see in this case we have this business data analyst and also data analyst let's say we actually wanted to filter down for anything that says analyst within the job title column well adding this wear statement and then specifying the column using that like keyword and then I can put those percentage signs before and after running this now everything in this job title has analyst within it now it's important to understand right I would want it before and after let's say I removed it to the ending in that case it's not going to find anything with values after the analyst so if I run this I'm only going to find analyst on the end of all those statements and I want to add that back because I actually want to see all the different analyst rules now another way this could be used is let's say in this case right here where I have this it's classified as a data analyst but the job title is business data analyst I could put something instead let's say I wanted to search for business analyst roles I could put in business percent sign and then oops that's not a percent sign and then analyst running this now I'll get things like that will meet the business data analyst that may be categorized as a DI analyst maybe categorized as a business analyst but it picks it up now the percent sign symbolizes zero one or more characters let's say I just wanted to represent a single space in between something in that case I would use an underscore going back to in our example instead of having this percent sign here I'll put an underscore in this case I only want to find those that directly say business analyst so I'll expect to see this one in there whenever I run this query pressing control enter yep 343 I can see that anything has business space analyst I meet this along with anything on the left and right using the that percentage sign all right now it's your turn to go ahead and try this like keyword along with testing out those different Wild Card operators of the percent sign and underscore those that paid for the course certificate and notes you have some practice problems you can work through or those that don't feel free to just try out and see how this actually tests out for these different wild cards all right see you in the next one all right this short section is on alosis and I can relate to this cuz sometimes I want to be a different person and sometimes even column names want to be different names this can be very helpful especially when working with other people to pass off data with these really convoluted column names you can rename it before giving it to them so let's say I want to use this table this core table that we've been querying from the get-go in a presentation and I don't want to have this as you can see a lot of it starts with job which is sort of redundant and then it's just the nameing conventions are just really weird well at least weird for somebody that's not familiar with this database somebody that's not familiar with this database just wants the quick information about what it is real quick so I can go through and add these as statements after these column titles right here and then they're now rename this running this query right now pressing control enter I can see all of them are renamed now as or alas can also be used not only for column names but it can also be used for tables and frequently you'll see whenever people post abbreviations for tables they'll use a single letter or multiple letters at the start of the table so in this case I'm renaming it JPC and then if I wanted to although not required in this case I could put it at the front of these different tables right here and when I execute this query still going to work now we haven't gotten into joining multiple tables and whenever you do this is a very common practice so I wanted to make sure that you're aware but before we get to that is we're going to be doing that now the one last thing to note on this is sometimes you'll read other people's syntax especially when it comes to tables they won't necessarily put this as keyword in there instead they're just going to have a space in between the two and it's still whenever I go to execute this it's going to go ahead and work you can also do this for these keywords in here but this I argue makes it hard to read but you will frequently see in this case here you will see it done without that as keyword all right now it's your turn to try out aliases I have a practice problem for you to go and try out feel free to experiment around not only with those column names but also those table names all right see you the next one all right let's now combine what we just learned with wild cards and also that alias in order to solve a problem I want to continue to build on that last problem that we're looking for data analyst or business analyst roles but also we're looking for those that are not senior additionally we'll make aliases for the columns if they don't make sense now before I attack a SQL problem like this I like to break it down further into what are the actual steps that I need to look for for this so what's common among this is or what's not common among this is we want to look for data or business in it so that's going to be using basically an or statement we do want to look for the word analyst in there and then we don't want to include things like senior so let's jump in the and try it out all right so here's the core query that we're going to be working with I'm going to go ahead and start with Alias this first I just rename the location as location and salary is salary so let's just start attacking each one of these filters one at a time and once again we're going to be needing to use that wear keyword the first one we're going to be looking at is we want to look at only job tells that include either data or business so I built or statement now that captures that of matching the data or business and anytime we want to iterate through this to make sure that we're on the right track press control enter it looks like okay we got Data Business data Data Business okay so it looks like we're on the right track now we want to include anything that would be basically data analyst or business analyst so we want to meet this or condition and then basically the analyst is an and condition so what I'm going to do is actually wrap this in parenthe es CU this is the or state we want to meet and then from there I'm going to go in and put in the one about the analyst all right so now we have the analyst one executing this one looks like now I have anything that has business or data along with analyst in it all right so finally this last one don't include any J tiles with senior followed by the character so this would be another and statement here and we're going to be still searching in job title and for this one we don't want to use that it's like senior we want to match basically a not condition and so for this we can put not right before like to say not like this um do we have any seniors in here oh we got this one right here so let's see if this one's still here after we refresh this query and Yep looks like it was taken away so we're meeting all those conditions all right we got a query now let's jump into learning some more about operations all right let's get now into operations or more specifically arithmetic operations we'll be using operators for this like addition subtraction and even things like modulus and for this we're going to be using the invoice database which is located at this URL as a reminder this is a fictitious data set on data science invoices throughout the year of 2023 we're using it for this because this these columns that it has in it are really good for doing arithmetic operations on now we have a column specifically on hourly rate assigned to a specific data nerd in this data set and let's say that our accounting department is actually trying to figure out whether they should raise these rates or lower these rates so we need to provide them with a data set with these updated rates well we can use something like subtraction or even something like addition in order to increase that rate we can use this within the select statement so this is the core query we're going to be working with in this case and we want to see what would be the hourly rate if we say dropped it by $5 or if we raised it by $5 so to make sure I'm doing this operation correct first I keep that core one that we want to change in here still and I'm going to rename that as Raider original then from there all I'm doing is I have this hours rate and a minus5 and that's going to go through in every single column subtract five and then I'm renaming this as the rate drop so let's actually execute this so we can see now that it's going through and it's executed this for a rate drop subtracting all these different columns by five conversely I can add this last statement here for the rate hike of adding $5 and we have that now now of now that's band or whatever we have this whole table made we can send this off to accounting for them to Crunch their numbers now I demonstrated this operation use inside of the select statement but we're not just limited to that it can be used in a whole host of keywords that including things like the wear order bu and group bu that we've used previously and even more that we're going to use in the future so let's actually look at a case of how we can use this even outside of that select statement and for this we're going to be using the multiplication oper Ator figuring out basically a total pay using hour spent times an hourly rate so going back to that core query that we have let's say that accounting comes back now and they say hey we want to only have the projects that are going to cost a total of $1,000 and more after the rate hike and this project total after the hike is going to be equal to that rate hike times those hours spent we don't have hours spent right right now so I'm actually add that here so we can see it pressing control enter got now the hour spent we're going to do it times the rate hike I'm also going to remove this rate drop because we're not concerned with that right now okay so we want a filter for this so we're going to make a wear statement and we want this where the rate hike times that hour spent is going to be meet this condition of greater than 1,000 okay executing this query looks like it's going through usually I like to double check something like this whenever I'm doing a calculation so I'm going to go ahead and add this up here where I have this rate hike of hours rate plus 5 times hour spent as the project total go ahead and execute this to actually see what it is and scrolling over can see that hey yep everything here looks like it's greater than a th000 now as a best practice I want to go ahead since it's defined here replace it down here and then now executing it we still have the same thing and it's more concise and easier to read all right next up is the modulus operator and this operator is used to return the remainder available after a division specifically let's say we have a number of hours say something like 10 if we wanted to find out how many hours past eight that they worked we could take 10 and do the modulus of 8 and this would give us two th showing that somebody worked 2 hours past this so let's say that accounting comes back to us and they want to do some analysis mainly on projects that take between 1 and 2 days they want to find all the projects that don't necessarily exactly end within 8 hours or 16 hours so the modulus so the first thing I want to do is actually visualize this and see what this is going to look like so pressing control enter I have now this hour spent modulus 8 and assigned is this extra hours scrolling over a little bit just to double check my work we can see that at whenever it's 14 we would expect eight goes into 14 once and then it has a remainder of six so kind of wants to analyze these where these conditions are true or where the modulus is not necessarily zero so from here I can just put this wear statement into parenthesis at an end and then from there to find when extra hours are basically not equal to zero or greater than zero and we got it all right now it's your turn to give it a try we have some practice problems for those that purchase the course certificate and notes you can go ahead and go and try out these different operations within the query all right that see you in the next one let's now get into aggregation functions and after using those arithmetic operators previously probably notice that yeah it's nice to be able to do this arithmetic operation across a row but what happens if we want to do it down meaning what we want to sum a row or even count up an entire row on how many values it has in it well this is where aggregation functions come in we have things like sum count average Min and Max now I commonly use these within things like the select statement but they can also be used with keywords like group by and also having for group by we can actually aggregate depending on what value we selected a certain subset in order to sum it up or even aage it or maybe we can also use things like the having function which we've been talking about from the get-go and finally getting introduced to which allows us to actually filter data by an aggregation for this portion we're going to be using the job postings data set specifically we're going to focus a lot of time on this salary column and doing a lot of aggregation methods to it and the first aggregation method to talk about is sum and this is a function that allows us to inside the parentheses next to sum place place things like a column name and from there it will Aggregate and actually sum up all those values so if I want to sum up all those average yearly salaries I could do this and itd also give it an alias of salary sum pressing control enter you can see we have over $2 billion worth of salary and job postings here next aggregation method is count and similarly to some you're going to place either a column value or in this case you can always do something like a special operator the asterisk or Star to actually select all the different columns so in our case say we wanted to see not only what is the sum of all the salaries but we also wanted to see what was the count of all the different rows I could do a new line sending in count specifying that star symbol and then put from there as count rows from here pressing control enter it's going to aggregate the same now I have the salary on the left and then the count of all the rows on the right now I can also use count in conjunction with another keyword distinct in order to filter down on distinct values let's say in our case we want to get a distinct count of all the different job title short values that we have in here say count then add distinct and then from there the actual job title short and then I probably name it something appropriately like job type totle pressing control enter we can see that we have 10 different job types now besides sum and count the other major types of aggregation functions are average Min and Max and as you expect they work all the same we're placing the column value that we want to average inside the parenthesis and then for there it's going to provide either average Min or Max so I could use that average column across that salary year average column and from there we're going to get this pressing control enter an Al an alias of salary average we can see that the average salary inside this column right now is around 123,000 now I can take this a step further adding we onto this and then from there specifying the job title short column of data analyst and from here press in control enter we can see that oh unfortunately that analy the average salary is little bit less than the real average of uh only $93,000 and if I want to see the spread of the data analyst salaries I could add in those men Max functions in order to actually go through and do that and from there we see we have a Min of 25,000 and that Max of 650,000 now we saw how that average salary went from 125,000 down to whenever we filtered it just for data analyst down to around 9,000 so we wanted to dive deeper into this well this is a great way of using the group by keyword and this allows us to specify a column of interest in this case we're going to do the job title short in order to now filter down further and in our case actually be able to see all those different job titles and see where the disparity is so I'm going to get rid of this wear statement in that case and I'm going to add Group by specifying job title short column now I can run this but um it does group it but I don't know what these groupings are so we need to add that into here so I add it in at the top as Jobs pressing control enter we now have this in and we can now see the average Min and Max I'm going throw in a quick order Buy in order to group this by the average salary and as expected with that average being around 125,000 we can see that machine learning Engineers are middle data analysts have some of the lowest whereas senior data scientists and Senior Engineers have some of the highest now another popular keyword besides Group by that need have TR belt is having and having allows us to use an aggregation method to then filter by unfortunately this one my complaints were sequel where you should be able to do this inside of the wear keyword but it doesn't allow it so let's say we want to only analyze this further but for job postings that have basically a certain amount of values if I insert into here I want to find out how many actual counts of these different jobs they have first so we're going to add that into here so I've added this job count column in here to basically total up how many there are in there all right so actually looking into this list further we see we have quite a bit for some of these but then something like Cloud engineer we don't really have a lot of values in it and it may be skewing the data so let's say we want to exclude anything that has less than a job count of 100 well we can insert a having keyword and this needs to go between group buy and Order buy and I can specify it in this case hey we want to do this for the count of job to shorts that are greater than 100 pressing control enter that cloud engineer disappear years is no longer one of our problems now you may be like Luke can't we just use a wear using this job count cuz it's no longer using this aggregation function well removing this having function right here and then keeping in that wear when I actually go to run this I get this error there's a misuse of aggregate count you can't use it inside of wear it's still trying to do this aggregation inside the wear that's what we have to use having all right now it's your turn to dive in and try some of these practice problems we have for aggregation functions for those that purchase the course certificates and notes a lot of different ones to test out and try and go through all right that see you in the next one let's get into a practice problem combining what we've learned previously with the aggregation functions and also with those arithmetic operations specifically we're going to be going back to that previous problem I was talking about the accounting department asking for us to provide numbers around what would happen if we would increase the hourly rate by $5 and so for this we're going to calculate not only the total earnings for a project but then also a theoretical situation of what would happen if we increased it by $5 so previously we were taking that hourly rate displaying it as that rate original and then also doing the addition of five to show that rake hike right now I also have that project ID right there conveniently that we're going to be breaking a down further by so let's first go at tackling that first problem of calculate the total earnings per project which we going to be using hour spent times hours rate here I'm going to first start by just adding in that hour spent cuz I want to keep track of it to make sure I do the calculations right from there I'm going to use the sum function in order to sum up the hour spent times that hourly rate I'm going to rename this project original cost from here I'm going to iterate through this so I'm going to add this up right now now there's a mistake in my SQL query if you haven't caught it yet as you can see we should have multiple different project IDE s right we're trying to Sumit based on that project idea that's at least that's what we want so we need to put a group ey in there and now I'm specifying project ID doing this bam I have this for all the different project IDs and doing some rough math that you always should do whenever doing any of these queries to make sure it's doing correct looks like it's doing that hour spent times rate original to get that project original cost now I add in this for that projection of what it's going to be if we were to increase that hourly rate by $5 and putting it within that sum function and then labeling it as a project projected cost now with this we can actually see it but there's a lot of data here so I'm going to go ahead and actually cling this up to actually make it more visible and contrl enter now we have all of the different comparisons right here in here all right now it's your turn to give it a try this short section is going to be going over null values and we previously encountered null values whenever we were looking at the salary column of our job posting data set and we filed filtered ascending so from low to high those null values appeared first a null is a field with no value and that differs from something like a value where it's zero because that does have a value in it although it is zero or when we're using some sort of string character maybe in it that it contains like a space that still would not be a null value because something is actually there taking up bytes of data we're going to be using this with the wear and having cause in order to filter out and look at this type of data so going back to that previous example where we're actually pulling those common columns that we have then ordering it by that salary we can see that said null values are appearing first so we can use that wear keyword in order to filter this so I can specify that the salary yearly average is null press control enter yep it's still null or I can change this to more specifically is not null all right told you this section was short of null values now I do have a couple practice problems for you now to go in and try it out all right see you the next one all right we're coming almost to the end of this basic section on SQL queries specifically we're going to be focusing in this one on joins now these are the four most common types of joins and we're going to be diving into each one of these separately talking about what their use cases are and an example of each so if youall back from the beginning of this video when we first introduced that job posting data set well for the majority of the time in this section we've been primarily focus on queering this job postings fact table right here and there's actually other tables associated with this in the database and right now you have access to all these different tables that we just spoke about inside this jobs 2023 database which I go ahead here and I actually just showcase each of the different ones quering into it one quick note you may be wondering why in general would you even have these different tables well in the case of that skills dimensional table we can see we used a skill ID to relate it and we have a skills column and a type column we could technically have this all of the skills and type inside the jobs fact table but this is going to be very repetitive also in the case that we have the SK skills of job table or skills job dim table it allows us to be able to Aggregate and put in more than one skill so there's multiple reasons why you would actually have tables external to another table so the first join and by far the most popular join that I find myself using is a left joint and what this is going to be doing here is what it's trying to Showcase with A and B are two separate tables and in it for this it's going to whenever we use this left join it will return all of the contents of table a and then whatever we're matching A and B on it's only going to return the contents from B that it matches a on so going back to quering that job postings fact table I have on here the job ID job title short and then Company ID so we have a lot of different job title shorts and I want to see what the company ID or what is that company it is if I went over to that company dim table as we're quering it here I could see what it is but it's over here I want to combine it with that fact table so the first thing I'm going to do is add this keyword of left join and then specify the table company dim have a little typo right here and we're going to then give it an alias I don't have to type this dim every single time now in addition to this we need to specify how we're going to be actually connecting these two tables we need to use the company ID from this job posting facts table to the company ID of the company dim table so putting that in there and then going ahead and executing it they're now connected but I haven't really brought anything over so now I can add that column from companies now remember this is why I previously talked about you can put the table name in front of a column and this is very important to do in order to make sure you're not confusing different column names between the two because you can see from here companies also has a company ID name but more importantly we want to get that company name so I have it here included as name from here I'm going to go ahead and press control enter to load this all in and so now we see we've now used this left join in order to join that company name along with those job titles this company ID isn't really necessary at this point nor did you need to actually do do it from the get-go I was just doing it for illustrative purposes I'm going to go ahead and run this and actually remove it and then also going in and go in to renaming those column titles okay pretty cool left joins now for those that would like a more illustrative view of how this is being done that job postings fact is that table a that we're joining we're keeping all rows in it so it's that yellow and then we have that company's DM which is that table B in our case we have all the different companies available here so whenever we put these two together we then have that company name right available inside of a results table all right and as you guessed it if there's a left join there's probably a right join and this is going to be somewhat completely opposite we're going to be basically for the B table we're going to say hey we want to match all the contents of this and keep all the contents of this but anything that matches from that a table we want to join it now similarly we're going to be connecting backwards we're going to be connecting the company's table to the job posting Factor table the problem is though is there's more records within the job posting fact than there are in the company's table so only a subset of data is going to be returned so going back to that illustrative example from before of merging those job posting facts to the company table if we're using a right join in this case it's going to do a very similar thing now the thing to note is the company's dim table only has one occurrence of Netflix or meta Experian whatever it may be and there's multiple in the job posting facts what's going to actually happen is when we get a results table there's whenever we join this up we may have that company appear more than once now so in this case I can go ahead and just replace that left one with a right press control enter we'll see that we have right now 33585 rows whenever I do it again this one's a little bit longer takes a little bit more longer time to actually compile and we have it back with all those previous rules so you may be like Luke what's the purpose of this right join then if it's basically the exact opposite thing of this left join well let's say for example in this case if we had more companies listed in that company dim table than we did in our fact table itself well in this case we could spot those irregularities by allowing us to now do this right join and ensure we are actually providing all the key details of that company table so although not necessarily common all the time it does come up from time time so you should be aware of it all right next up is inter jooin and what this is going to do is when we have an A and B table whenever we go to combine them it's only going to return the contents that appear in both so if there's something in a that's not in b not going to appear in a and not going to appear and B so let's say there's a scenario where I want to look at jobs but I only want to look at jobs where a skill is available this is where inner join is going to come in so this is a fictitious making of our data set but we have the job postings fact table along with the job IDs and let's say that this skill job dim table is connected to it and let's also say it's an order by that job ID there aren't any necessarily skills associated with this job ID of three and four so we don't want to necessarily see that now this combination of these two tables is a single inner join in itself but as we talked about earlier there's actually another table so we need another inner join in order to do the same thing of filtering for only skills that are available in this case whenever we look at these skill IDs we can see that 1 three and two are used 1 two and three which correlate to python R and SQL so we're not going to return anything that has Scala or Java in this case so let's actually build this query out working from that left to right the first thing we want to do is an injoin on that skill jobs Dimension table which we're going to be connecting on that job ID so I specified inter jooin specifying the table of skill jobs demm I've shortened that table or change the name to skills to job and then specify what I want to meet it on I want to meet it on that job ID of both of those pressing control enter making sure that it works still working and in now in this case we're seeing that in the case of this job ID machine learning engineer they have quite a bit and we're actually whenever we get into job ID number one we skip over as no longer included because apparently it doesn't have any skills associated with it so we're already seeing good that we're meaning on what we need to do and if I want to actually see all of those different values or those skill IDs that I see I can see okay that's why the machine learning engineer has so many and then moving on to the business data analyst but we're not done with this because we also want to make sure that we have everything only including those skills inside of here that are relevant to the Jaws table so we want to also do an injin with this table so I'm going to go ahead and add in that inner join specifying the skills dim as a skills table and then specifying that hey I'm matching the skill ID of these tables I'll also go ahead and add the skill now pressing control enter to go ahead and execute it bam we have now the machine learning engineer with a lot of different skills that it's requiring um business to analyst and so on so inner join along with that left join are two two of the most popular types of joins that I'm going to be doing on a frequent basis depending on whether I need to one return all the contents of a table or two for the inner join only return those where it meets in both those tables the last joint to talk about is a full outer join basically if you're going to have two tables whenever we join these together it's going to combine them together no matter whether they're are matching ones on the a table or the B table frankly as a data analyst I never have a need for this and don't really have an example use case for his go through and I think it's sort of a waste of your time so we're not going to really focus on this but mainly we're going to just have it in the back of your mind to know that this is available all right now it's your turn to give it a try if you've purchased the course notes and certificate you have some practice problems for you to go in and jump in and try out both a inner join and a left join all right that see you the next one as we're coming to the end of this basic section there's a concept that you need to be aware of especially as you become more advanced in SQL and that is the order of execution of SQL queries now whenever we send a SQL query via this flow path right here the statement itself into a database it goes through a few different processes before it then goes and actually is executed and this execution is broken down into multiple steps once this is done within the database itself these results come back to you now if you notice we have this step first of this parser now I don't want this to be confused with an earlier concept that we talked about that handles in the earlier steps of this in between the parser and Optimizer and that is the order to write commands as we've talked about multiple times you have to go or you have to write this query syntax in a certain order you can't have a wear statement at the very beginning has to follow that from statement once our SQL query we've built has meet these conditions and has gone through that Optimizer it then gets into the execution phase and that's where the order of this execution right here is comes into importance and it follows this general order that we're following right here now you don't need to have this order memorized mainly I just remind you of this because it's going to be important one day down the road when you're writing a SQL query well this order is actually very important because from the perspective of the database this ensures that it's processed efficiently and logically right now we're working with a relatively small database and every time you queer it notice it probably returns it in less than a second but if you get it to billions of rows of data this can take seconds upon minutes and by understanding this order we could potentially early in this phase filter down our data in order to speed up this query so just keep this tool in your tool belt whenever you're approaching a very complex query which we're going to be doing in the advanced section and also within the portfolio project if we need to speed up a query we're going to start here by looking at the order of these and working earlier up in this sequence to filter down and make our data as small as possible to speed up those queries all right that see you the next one all right let's wrap up this section on the basics by doing a practice problem and for this we're going to be focusing on the joins in order to build a more complicated query using a left joint specifically I'm going to find something I feel that's pretty interesting for a given skill I want to find the number of job postings for the skill itself and also the average salary as expected we're going to be using that left join to combine our skills table to our job postings table also have a few hints along the way go help us build upon this process let's jump into it first thing I'm going to do is query the skills table or the skills dim table renamed as skills and then from there just get back all the different skills that we have available so right now we have 225 skills let's build upon it further after now that we've gotten the skills from the skill dim table I want to count how many job postings mention each skill from the skills to job to gym table so the first thing I'm going to do is add in this left join in order to go in and combine that skills gy do uh skills job dim table which I've renamed skills to job linking it on that skill ID now going to that table we can see have a skill ID and the job ID so with this we can get a count of the jobs in there okay and we have this back but we now only have it for one python because now right we're using an aggregation function so we need to group it so I'm going to group it by the skills column and then we have this along with those number of job postings all right now following along my plan of action we've now have the skill names and the number of job postings I want to now calculate the average Yer salary for the job posting associated with each skill so we need to bring in another table in specifically back to that jobs fact table in order to get that salary that we need so once again I do another left join bringing that job posting facts importing it as job postings linking it on that job ID making sure that it still executes it's running fine so now we have access to that average year salary so I can now go ahead and add in the aggregation function for that salary year average I renamed it average salary for skill and we'll go ahead and enter and I forgot a comma so we'll go ahead and add that and press control enter and Bam we got it all right now anytime I get any type of results like this um we pretty much have gone through our entire path um but anytime we get um something like this we want to order it by something I find the most for me I want to see it ordered by salary and for this I'm going to specify for the average salary for skill I'm going to put it in descending order and Bam there we have it actually opening this up all the way this is actually pretty insightful you get to see a lot of these skills especially around web development are more higher paying and then more basic skills like something like Microsoft list vb.net get out of here weback are uh lower paying all right sweet now it's your turn to give it a try and after this this wraps up the entire basic section and now we're going to be moving into more advanced concepts so take a second to reflect on what you've learned so far you really have come a long way I mean look at this we just wrote a 12 line query and hopefully if you're keeping up with this you understand everything that's going on in this that's pretty impressive all right with that see you in the advanced section then nerds Welcome to the Advanced section of this course and this portion will be on about what the advanced section's on and more specifically how we're going to be setting up a database locally on your computer using postgress now since we're going to be have this database locally you're going to be able to do a lot more with this specifically going be able to do things like manipulate it creating altering dropping tables doing a whole host of things and actually getting into the core of the power of SQL now also in this section we're going to be obviously covering a little bit more advanced topics such as how to handle case Expressions subqueries CTS and even unions so it's a whole host of things that I use from time to time and it's who of you to know so previously in the beginner section we were using inside your browser sqlite viz and this actually loaded inside your web browser the SQL database that you were then writing these queries to and executing and then getting those results back now this tool is great especially when you're practicing and learning SQL in order to get it up and running quick without any setup but this is not how you're actually going to be interacting with it in the real world and so I wanted to provide a scenario similar to how you're going to be doing it in a real world scenario so instead of using this app we're going to be using popular editor option VSS code and VSS code allows you to track all your different SQL files that you're using and then also connect to a host of different SQL databases but we'll get to VSS codee in a little bit as the more important thing that we need to actually get installed is the database itself and we're going to be using postgress with this which is an easy to download program which about to go through and able to get it running locally on your machine now as a quick refresher why are we using postgress for this well looking at the 2023 stack Overflow survey the results from this concluded that postgress is one of the most popular options among developers now there's a lot of other popular options in the data lindex Community especially things like MySQL SQL light like you've already used and SQL server and the concepts that you're learning in this course regarding specifically with the SQL syntax can be applied to these different databases so even if you don't use the number one popular option it's still going to be able to have and use those skills in other databases also last dat a point I promise before we get into the downloading with postrest itself it is topping the charts at the most admired and also the most desired SQL database technology that most admired data point is the proportion of users that used postgress and want to continue to use it whereas that desired metric or the blue is the proportion respondents who want to use this technology and in both of these cases it exceeds all of its competitors and so that's why I think it's a great database for you to get into all right let's get into downloading postest and for this you don't have to have a lot of space your computer it doesn't take up a lot at all and so you're going to navigate to this URL right here feel free to just Google postgress and it'll take you right here anyway I'm going to click download and from here we're going to select our operating system of choice I have a Mac right now but I verified that the installation instructions were exactly the same for a Windows user so you're going to follow the exact same steps now I'm going to download the installer now this has a few different options available and also I'm not sure why had to select the operating systems at the beginning cuz you're just selecting it here anyway we're going to go with the most current version so 16.2 I've done this previously on older versions like 10 so don't worry if you have an older version this still should still work so navigate over to your download folder and you should have some sort of installer package like this and just go ahead and click on it it may prompt you if you're sure you want to open the internet yeah go ahead and open it now we're going to walk through the setup process for a lot of these things we're going to be keeping it it's default such as this location right here and as far as installing all these different packages I do want them so we're going to continue once again default for the data location next is password and this is very important that you remember this password because this is going to have to be used in order to access the database and even start it up so if you need to write down what you're going to have for your password for the port number don't change it if you want to change it from this 5432 make sure you write it down as well for the local keep default I don't know what this is okay setup's not ready to install so let's get to installing it all right looks like it's installed and it's asked if it wants to launch stack Builder which can install additional modules to help you out we're not going to need this right now so I'm not going to launch it up and I'm going unclick that check mark So now let's launch postgress and any time you're wanting to actually query this database that we're going to be building on here you need to do this so for some reason you restart your computer and bring it back up you need to restart postgress anyway going to navigate into here this folder postgress sequence 16 and go into PG admin there's the admin dashboard of how you're going to control it and it's loading up so I have it up and running over on the left hand side this little pane is going to have all your different servers and database and all the configuration with it and then right hand side sort of like think of it like your editor where you can do a bunch of configurations anyway I'm going to go ahead and click this open and I have an older edition of postgress well databases at least on my system we're not going to worry about that if you only have one you should see that 16 we'll go ahead and open it up now this is where you have to have your password memorized so I'm going to go ahead and put it in I'm also going to click save password and then go Fed so now that I've done that it's going to ahead and started up my databases I can see this in here by navigating into databases and right now there's only one these are the contents of the database there's only one database in there and it's conveniently titled postgress anyway now Beyond this inside of PG admin is basically beyond the scope of this course as you can come in here and actually run SQL queries to thus interact with your database but and what happens if you want to use another about database such as like MySQL or SQL Server you're not going to use PG admin for this so we're not going to be focus on this for the remainder of this course just understand that you do have to launch PG admin in order to get your database running all with that let's get into getting the code edor so we can actually run some SQL queries all right in this section we're going to get into installing vs code which going to be our code editor in order to run our SQL queries now what is a code editor and this is a location that you can go and organize any type of code you may have in our case we're going to have these SQL queries written out inside of a SQL file and so from there we want to keep it inside of a code editor now another term to keep in mind is integrated development environment also known as idees Ides are basic basically a text editor on steroids it's debatable of what exactly VSS code is technically it's a code editor that we're going to be using but because of the additions we're going to add to it it functions like an IDE nonetheless the terms are going to come up and I want you to be aware of it so once again if we go over to stack Overflow to see what are the popular options among developers right now Visual Studio code also spoken vs code is by far the most popular option on the market so that's frankly why the reason why we're using it and I just frankly like it in general now VSS code is a code editor for a multitude of languages but there are editors that you can use that are specific to SQL and I want you to be aware of them in case down the road you want to decide to use it specifically down here if we scroll down data grip that's one from the team over at jet Brains it's a very popular option although paid and another popular option that didn't make that list there but I know about it from the data analytics is dbeaver this is a free crossplatform database tool for developers and it supports things like postgress an app like this I would say is actually more powerful when it comes to running SQL queries as you can do a lot more functionality with it in regards to SQL but then if I want to run other different languages such as python or R I can't do it inside of this that's why as me as a python user I stick with vs code anyway let's get into downloading and setting up visual studio Studio code so navigate to this URL right here where you're going to go and then download it you can download it for either Mac or Windows I would get the actual stable version and actually get it this installation process is a lot simpler as once you unzip that file that it gives you in like your downloads folder you'll have the app available from there you'll take it and you drop it into your applications folder from there actually navigate into the applications folder and then start this bad boy up so getting into a quick overview of how VSS codee works over on the left hand side we have our activity bar and I have a few extra icons right here don't worry about that too much but overall I find that I'm using this right here this explore and this is going to display all my different files on the lefthand side I can also do things like search through my files and then even add extensions which we're going to get to in a second so let's actually create this project folder that we're going to be working within so I'm going to navigate back here to the explore and click open folder I'm going to navigate to to wherever I want this project folder to be I keep mine within a developer folder and I'm going to add this new folder of where we're going to title it en titled it SQL project data job analysis I use underscore in between this it's just for coding purposes you don't have to if you want to but I like to now with this folder selected I'm going to select open now that we're inside of this folder right here the titled SQL project data jobs I can access or add a file by directly just selecting this icon of adding a file and then naming it appropriately so I named this one test. SQL and that's going to create a SQL file now we have the file located on the right hand side and this is where the editor portion of it is so I just wrote some SQL code just to Showcase that you can do this and it's possible you may have the option down here or you should have the option down here to select your language mode depending on what language you're using you'll want it to actually check this if you don't happen to see this down here come down to this Bottom bar down here right click it and then from there you should be able to select editor language anyway we want SQL for this cuz we want to check it to make sure that it's correct if I were to select something like python it's it's not really checking anything but right here it's going to throw some errors now because it understands with this yellow and this red under statement that this isn't correct python so I'm going to select SQL and those errors go away anyway we can write SQL now in here but the problem is we don't have it connected to our actual database so we need to set it up to where VSS code inside of our editor now connects to this database that is running on our computer so navigate over to the activity bar and select extensions for this you're going to go into the search bar and look for SQL tools doesn't have any spaces in it so that's why it took me a little bit defined anyway we're going to go on ahead and select this this whenever we click install is installing an extension to thus supercharge this code editor of vs code to allow it to connect to SQL so we're going to go ahead and click install now if you read the front print of this it has that to use SQL tools you'll also need to install the appropriate driver extension for your database a driver is now used to connect this extension to a database going back to the search bar I'm going to type in SQL tools and then also postgress SQL and pressing control enter we have it right here this is called SQL tools postest SQL cockroach driver so this installs uh the driver for all these different things and click install so I can go ahead and close out of this no longer need extensions going to explore so SQL tools should be installed on here right now unfortunately if I zoom out by pressing uh command back I can actually see it it's located right here it's pretty important for me so I'm going to just put it right up there anyway I'm going to zoom back in and some other things are going to hide anyway click on SQL tools so it should have add new connection so we're going to be adding a connection of this postgress this initial database in here anytime you want to add a new database you have to use this or set up this add new connection which also can be done bya this icon right here so I'm selecting add new connection selecting that it's postgress this is just the core postgress database so I'll use the connection name of postgress I'll keep all the default settings the way they are for database I'll keep it postgress for username I'll also name it postgress it's going to be really easy for the password we're going to be using SQL tools the driver that we've installed before that that way to verify the credentials scrolling on down I want to say test connection and it says hey this extension wants to sign in using the driver credentials I will allow it and then from here now we need to enter in that password from before four that we access postgress with enter again in press enter boom so successfully connected I'm going to save connection now all right so closing out this now this postrest database is not the database you can see by this database icon right here um is not what we're actually going to be using for this course this is just a core one there I don't really want to mess with it let's actually create our own database that we're going to be using for this course so if we come up here selecting on the database itself and select new SQL file I can then use this to run a query on that database now this SQL file is connected to that postrest database and I know this because if I come down here to this bar down here I can see that it says postgress and this is because of the SQL tools if you don't have this rightclick it and make sure that SQL tools extensions is uh has a check mark next to it in order to show it so that's that shows what database you're connected to that you're about to run a query on that if you PR this run on active connection it's going to do it side note real quick you may see during this this press command I to ask GitHub co-pilot chat to do something start typing dismiss just ignore that this is my GitHub co-pilot an AI coding assistant that I use to actually write SQL queries but that's for a whole another video just ignore that for the time being I may use it from time to time if I do I will explain it anyway I'm going to enter in this SQL command right here of create database SQL course and this is going to create a new database named SQL course so I can select run on active connection or I can highlight whatever SQL code I want to run right click it and then from there go into actually running the selected query which is conveniently on Mac the shortcut command D command D you have to press it twice anyway I'm just going to highlight it press command e it's going to say hey command e was pressed waiting for second key of chord press command e again anytime you run a query this SQL tools extension is going to have a popup on the right hand side if you're running a query asking for a table back or some results back it's going to display it here for us we just created a database so we're not going to actually see anything anyway I'm going to close this out now this new database is created I can go back to PG admin I can come up here and I can actually see that it's created by right clicking postgress SQL 16 and clicking refresh now we have two we have postgress and we have that SQL courses I need to go ahead and click that right now it was gray out and next out that means it wasn't running so now it is running and we have this we're able to connect to it so we're going to go ahead now and create a new connection once again we want to connect to this going to go through the same process I'm going to name it SQL course as far as the database name itself the name is SQL course for the username I'm going to keep it the same of postgress then from here once again I'm going to test connection it's asking it can it use the driver connections yes entering the password successfully connected I want to save the connection all right so I'm going to close out of this I also don't need this create database anymore so I'm going to close out of this I don't want to save it and then now we have this when we click it it's available down and here's the database itself now notice now that I've clicked on it down at the bottom I have SQL course selected and I could navigate between the two we're going to leave it on SQL course for basically the remainder of this video all right we now have vs code completely set up for us I don't really care about this test. SQL file anymore I'm going to go ahead and delete it you also may see this vs code file this is like your configuration and all this kind of crap in there don't worry too much about what's in there and how it gets updated ated just keep it there so this concludes the installation on vs code we're actually be moving into actually understanding more about creating deleting and dropping tables before that we need to cover some data types because it's sort of a prerequisite to understand creating tables so that see you in the next one all right in this short little section we're going to cover data types and it's really important to understand data types because we're about to be setting up and creating tables when we recreate those tables we have to specify the data type for each of the columns now if you recall previously whenever we were working with our job posting data set I didn't call it out specifically but the columns were already set up to handle a specific data type the job ID was only a numeric integer so in this case it was specifies int or integer the job title column had strings in it and so that was cleric Tage under varar job work from home was a false zero value or True Value which is one so therefore this was a Boolean job posted date was time stamp and then salary year average could have decimals in it so it's not an INT it's going to be numeric anyway why is this important well these are necessary when setting it up in order to have data Integrity within your database now if we were to go and add data to a database if if it did not meet the conditions of being an integer to be inserted into something like the job ID then we wouldn't be able to insert it this is really good at just having a first line defense of having clean data now additionally because of these characterizations of these data types it also makes these SQL databases a lot more efficient in processing queries it doesn't have to guess what the data type is inside of a column for integers it automatically knows and it can process this data a lot more efficiently now if I navigate over to the postgress documentation they have a whole host of data types that you can look at we're only going to just focus on a few for us and what we'll be working with I only really think you need to focus on these top eight that I use on a common basis there's only a couple details I want to discuss about this so the first two int and numeric int is for an integer numeric is for a decimal so in the case of this we would specify in parentheses Precision which is the number before the decimal place and then scale the number after the decimal place that we're going to have in this next is text and then varar I'm just calling varar I don't know what other people call it anyway text has an unlimited length for the amount of variables that we can put within it or the amount of string characters we can put in it I don't like to have just this type of Freedom so we're going to use varar mostly with this n variable where inside of it you typically specify a value such as something like 255 255 characters that it's going to allow for the maximum length the next is booing and this is going to accept either true false or null remember our case we had zero or one this is effectively true or false and then finally those around dates and times we have a date one where just date alone time stamp where it's a date and a time and then finally time stamp with time zone where we can go ahead and specify even further what time zone we're in all right so that covers the data types and this is important because as you can see from the SQL query right here whenever we go to create our tables we're going to need to go ahead and specify the data type that we want to make a column this is going to be very important like we said for data security and with that I'll see you in the next one all right let's now get into manipulating tables specifically we're going to be creating modifying and even deleting tables all within that core database that we created titled SQL courses now there's four main ways we can manipulate a table first one's pretty easy just create a table we're creting it from stretch next is insert into so once we have a table actually inserting in Columns of data into it next is altering a table we can add additional columns even remove them change types and then finally just deleting a table whatsoever anytime you're doing any of these type of statements you need to be very careful and double check what you're about to do twice because once you do it you're not able to necessarily just click undo so picking up from where we left off last in the course first inside of PG admin make sure that your database is still running if it's not you just navigate into it and it'll start right up and here we have SQL courses the other thing to verify is inside of vs code navigate over to SQL tools verify that that database is still listed there so it is SQL course and also that's clicked on so that way it's selected down here on the bottom for where we're going to be running all these queries we're about to do all right so I'm going to come up here and we're going to create a new SQL file in order to run our queries on for me I notice this run on active connections may disappear or may stay up there don't worry about too much about that the first statement we're going to look at is actually creating a table and the Syntax for this is relatively simple we're going to be using Create table along with that table name and then enclosed in within parentheses is the column names along with that data type and you can list as many column names as necessary to specify for the table here's our situation we're going to create a table called job applied and this is a table basically that we can track all the different jobs we've applied to in the past when I've been applying for jobs I've made a table similar to this usually in an Excel spreadsheet but we're going to do this in Excel this includes things like maybe a resume I'm using cover letter a contact person at a company and then all those different companies I'm applying to so we're going to create this table called job applied and then I'm going to put those those en closing parentheses around it with a semicolon at the end to signify the end of the statement from there we're going to be putting everything that we need to inside of it so anytime you have any type of database you usually want some sort of ID number so we're going to start with the Java ID number specify that it's an integer we'll also be specifying other portions of the column specifically the application sent date custom resume whether we've modified or not we can include what version it is the resume of the file name a cover letter whether it's sent or not and and the file name of that cover letter and then just a status in General on what is the job search going on with that specific job ID now that we have this SQL query generated let's actually navigate into SQL tools and then we're going to look down this first I want to show if I navigate into the SQL courses if I go down to this drop down of schemas and then the tables there's nothing here right now so if we create this table it's going to go into there now for some strange reason this green icon indicating the active connection switched back to postgress I can also see it down here um so I'm going to go ahead and just press that right here to select it and now I can verify it's connected by this down here at the bottom also of the SQL courses all right so let's run this query to create that table we can either do this of run on a connections selecting that or you can highlight it all rightclick it do run select a query or finally my favorite just pressing command D and then command D and then anytime you run a query it's going to pop up on the right hand inside right here promise you you'll have stuff here when you actually start querying for data back right now we don't so we're just going to close this out so right now there's still nothing here inside these tables what you need to do is come to this icon right here and you need to refresh them and now magically what appeared was the job applied table I can navigate into it and I can see all those different values that we created right here or the same ones from this create table so if I wanted to I could just do a select all from job applied select it all command e command e run it and whenever we get a return back it says that there's no data available now that we have this empty table we need to actually insert into it the data we need and we do this with the statement of insert into and then from there include the table name after this we have a parentheses and we enclose all the different column names that we want to include for inserting the data now for using every single column name inside this table it's not necessary to include this it just makes the assumption that you're using every single one however you have to make sure that whenever you get to this values portion that we going to be specifying the different values put in that you do have at the same order as the table or you're going into issues so just for Best Practices I like to include both the column names and then no matter what whether it's all values or not and then the values themselves so in vs code I'm going to put in that insert into along with that table name I want to insert into now I'm going to specify all the different column names spe specified previously next I'm going to specify values and then start a parentheses to put all those different values in now here's one entry notice it's enclosed inside a parenthesis and then we have a one with a comma and then the next value just date custom resume resume file name cover letter sent and cover letter file name and then finally status now if I wanted to do more than one entry I'm going to just put a comma and then insert more in so now in this case I have two three four five different entries put into this all right so let's now actually insert this into our table and I'm going to go ahead and just select it all and then from there command e and command e again and boom get another popup not necessarily anything there but let's actually verify we inserted into this table so I have this select star from jav applied to basically query it all make sure I did this pressing command e command e it's popping up here I'm actually going to move it over so we can see it fully on this screen so I have the jav ID the application date resume resume file name cover letter cover letter file if we use it or not and then the status all right so sweet we now have created a table from scratch and inserted data into it let's not get into altering a table we're going to be doing this on this jav appli table the statement we're going to use for this is Alter table and then the table name then depending on what you want to do you're going to put one of these four different options that we're going to select from we can either add a column rename a column all alter a column or basically change its data type and then drop a column all right so we're going to start off with that keyword of alter table and then specify job applied and we're going to be creating a table of contact names for whatever company we're reaching out to so I have add contact and then specify that it's going to be a data type of far car and allowing only 50 characters just to verify opening SQL tools opening Java applied we can see that there's nothing there currently for this all right so I almost select it all command e command e and whenever we go back to it we should see after refreshing it now we have this contact up here but now you may be like Luke we quer that table there's going to be nothing in it and You' be right so let's actually look into it command D command D moving it over so we can see it fully so we have all this information and then for the contact name it's all null now this is a bonus command we got to cover and that's update update allows us to modify existing data within a table for this we use the keyword of update specifying the name set the column name that we want to a certain value where we meet a condition and usually this condition is you specify a value within one of the rows so we'll start with that update job applied next we'll do set and then we're going to be doing this to that contact column name and adding erck Bachman and then finally where we're going to be just using that index column of job ID so we want to do this for the job ID of one now we have multiple rows to fill in so we're going to do multiple statements in this case so I'm going to select this all command e and then command e again and it's going to go in and update it going to close out of this and now using that select star from java applied running this as well I can see that inside of the contact it actually inserted all those different contact names into it so now that we put these names into this column called Contact come to the realization that this name is not necessarily appropriate instead I want to rename it from contact to contact name this is where the rename column statement comes in after specifying alter table we're going to go in and specify the original column name to the new one so I'll start with alter table job applied and then rename column contact to contact name and then quering this table to actually inspect the contents of it we can see now that content was changed to contact name now sometimes now that I think of it names can be quite long and in this case we set that contact name actually Let me refresh this right here this contact name to 50 characters in length well I want to change this data type now from varar to text where text doesn't have a character limit that needs to be specified for it you can put any amount of characters into it so I use this alter column statement specifying the column name and then type to then specify the new data type so once again start with that alter table and then specify the name and then alter column column name and finally specifying the type as text so running this command pressing command D command D we have a success message and I find the easiest way just to check this data type is come back into here and we'll refresh this and now we see that it is text now I do want to call out there are certain limitations in changing this data so in this contact name example it has string characters in it if I wanted to actually change this data type to int and then trying to actually execute it by pressing command e command e I'm going to get this column contact name cannot be cast automatically to type integer because it has a string in there it's not going to be able to make any of those that are letters into an integer so it's not going to be able to do this so really it's important that you get this correct on the first time whenever building your table for the first time and now let's say this column of contact names we decided we're not going to go ahead forward with it because instead we're going to be using LinkedIn profile information to gather this information and we're going to be creating a separate table for this so we don't need this contact name column anymore in this case we can use drop column which is the simplest and also probably the most dangerous in that you just specify the column name and remove it so opening up SQL tools I can see that refreshing it that we have that contact name right now as a text so entering in that keyword of drop column and then specifying that contact name I can then see how we can remove it by running this query command e command e okay refreshing this we can see that contact name no more and it's gone all right last command of dropping the tables let's say that we're now tired of having to go into SQL and insert our records using all these different commands we're going to just use spreadsheet software instead which I probably should use anyway anyway we can go in now and specify drop table and then specifying the name itself be extremely careful anytime you're doing these drop columns or drop tables as this is very much permanent so specifying drop table and then job applied let's go ahead and Drop It Like It's Hot and Bam we ran this refreshing this query right now where job is applied is right here it's still there got to wait a second okay it took about a minute for me on my computer but refreshing this now we can see that nothing is here sometimes especially if you're going to be doing large changes to databases or dropping them or removing them there's a lot of code that goes on in the background besides these three simple uh keywords right here that we're able to do this with so you got to wait a little bit of time all right now it's your turn give it a try feel free to walk through this example that I just took you through making up any relative values that you would rather use instead all right with that see you in the next one all right let's now get into loading the database that we're going to be using for this continuation of the advance section and we'll be also be using it as well in the portfolio project section this data is going to be of the same schema using the same tables and column names that we were using previously in SQL light fiz however this data set is a lot more robust and includes a lot more details from 20123 and for those that are maybe a little lazy and want to continue just stay inside of sqlite viz and enter queries from time to time you can continue to do so but I can't guarantee you that any of the queries that we go forward with will work inside of here I highly recommend you just follow along what we're doing so how we're going to do this all well three simple steps first we're going to download all the data which is in CSV files and also SQL files from there we're going to move into vs code and create the tables using some SQL files that I'm going to give you and finally now that we have all these empty tables we need to load the data all into it and that'll be our final step so let's jump into it so if you navigate to the URL on the screen right here you're going to be directed to this Google Drive where it has a couple of contents inside of it first is a folder with all the different CSV files in it and these are conveniently named after the four different tables we're going to be creating inside of our database and you can even peek inside of there and see where the contents are overall it's just comma separated variables inside of it all right so back in the main folder the other main file in there is the SQL load folder and it has the SQL files that we're going to be using to not only create our tables but also modify our tables picking into it we can see that this create tables one it's pretty long so now we're going to just download these now you can download the these folders individually or I actually just zip them into their own little zip file to make this quicker for you to download so you can do either or the one issue with the zip file is you can't scan it for viruses so if you're not comfortable with it just download the other two files exact same thing all right so navigate back inside of a vs code we're going to be now adding those folders navigate into that project folder that you created from previously you may have it open if not open a new window and it should pop under recent and you can just select it there now I'm going to go ahead and uh select the two files that we've downloaded I unzipped them and then I have them right here I'm going to take them and I'm going to actually just drop them right inside of here it says do you want to copy the folders or add the folders to your workspace I'm just going to select copy folders over and it's going to place them right in I did add folders workspace previously and it did something funky I want it inside my project itself so that's why I did the copy over now since we added a copy just going to do some cleanup you don't need this ZIP file or this original folder that I have right here I'm just going select it and delete it and remove it from my uh remove it all right so first step is done now let's actually get into creating these tables for the database so back of vs code I'm going to navigate into this SQL load folder and I have this one here already on create database if you haven't done this already you can go ahead and execute this in order to run and ex and create this database that we're going to need for this but let me just verify real quick that we have this set up so I'm going to click on SQL course I logged out recently so it's just asked me to verify my credentials enter in the password and then I see I have the SQL course database already so I don't need to do this create database now as a reminder again make sure that we maintain this SQL course connection established for this is where we're wanting to create all these tables I'm going go ahead and close out of these so navigating back into the explore back into the create tables in this one is four different SQL queries for creating tables we have one for creating the company dim one for creating the skills dim one for job postings fact and then the fourth one for skills job dim these four commands are nothing different than what we learned in the last section for creating table the only difference now is we added a keyword in here for specifying when we're using a primary key and then when we're using uh something like a foreign key so in this case right here as a quick refresher on primary and foreign keys so if we go back to that core database schema that we have job postings fact so we have this job ID in there and that's unique to this column or this data set right here that is the primary key in job postings fact however in something like skills job dim the job ID is now the foreign key conversely skills dim is the main table when it comes to skills so that skill ID is unique value in there so that's the primary key in this one and in skills job dim is it's the foreign key so back inside of our SQL file we can see that the job ID itself is that primary key in the job postings fact table and then in that skills job dim table it is a foreign key then we also have to specify references specifying what table is the primary key in along with that value we expected to be for that primary key conversely that skills dim table has that skill ID as the primary key and then the skills job Di has that skill ID Associated as a foreign key and we also have that company dim table which has that primary key and then Associates it into that job posting fact table as a foreign key all right there's two major last statements I want to go over first is the owner we go through and actually establish the owner as postgress for this because we've already set up our connections already within vs code using postest we want to make sure it uses this and then finally in order to speed up performance we have this C create index and we're not going to be using this at all don't worry about it it just speeds up the queries a little bit more and it basically specifies the different foreign keys and helps with actually aggregating or creating as an index to speed up the query performance so now that everybody's all comfortable and knows each other let's actually get into executing this query I'm going to go ahead and select it all and then from there verify that cql courses is in fact the connection that I'm connected to and then from there press command d command D now this is going to take some time like previously in order for these tables to show up navigating over to SQL courses schemas public and then tables oh actually looks like they popped in already and just do an inspection everything everything's looking good conveniently we can also see based on this that like jav ID has that primary key which is has this gold icon for a key and then Company ID has this forign key of this silver one all right we're almost done we've now downlo downloaded all the files created tables now we need to load this data in to the tables that we just created so navigate back into our files looking at the explore menu we're going to go into this one now of three of modifying tables this is going to be a new SQL command that we haven't seen before and it consists of three things in order to copy the contents into a certain table based on a certain file so this starts first with the keyword copy and then we're specifying the table we want to C copy data into next is from we're going to then specify the file path location which you're going to have to update all these for your file path location and I will too actually and this is going to specify where the CSV is living and then finally we need to specify that we're copying into here this CSV file and I'm going to specify this with delimiter and then specifying what it's using to separate all the different variables in there which is a comma and then the keyword CSV header header is going to specify there is a header or basically column names at the top row of this data set okay so we need to update the file location for all four of these so I'm going to navigate back into the explore into the CSV files thems we're going to start with that company demm what you can do is you can rightclick this and then select copy path now inside of these parentheses here I'm going to go ahead and just paste this into here and then I'm going to do this for all the remaining being very careful that we're using whatever the name of the CSV file is for the file path itself all right so I've gone through and actually updated them all to make sure that they match I'm going to go ahead and actually save it by pressing command s all right so now it's ready last chance to verify that you have all those correct CSV files and underneath the correct table names select it all command e command D and this is going to take a little bit of time to load all this data into it all right so that was about a minute for me for this to all load I'm going to go ahead once again anytime I do any of these there's nothing really appearing here so I want to verify that I got this data into it I inserted it in correctly I'm going to just create this simple statement real quick of selecting all columns from job posting fact a lot of in there I don't want to query everything just yet so I'm just going to limit this to 100 values selecting all pressing command D command D moving this window over here it looks like all of the data loaded into it so now we're cooking feel free to also go through and check any other tables as well verifying that they all loaded but based on this other one I'm pretty confident that all the other ones are loaded just fine as well all right we did it just created our entire d database from scratch using CSV files but then loaded in using some SQL queries and it's all lited up all right next we're going to be jumping into some date functions and uh some more advanced features so with that see you in the next one all right in this section we're going to be going over dates and also times this is a very critical component as an analyst be able to understand how to manipulate dates and times because on depending where in the world you're working on data from you may have to convert it now for this section we're going to be working inside that table job postings fact and specifically we're going to be working with a column of job posted date this not only has a date but it also has a Time associated with it so it's a timestamp value and this value in each one of these rows it correlates to the date and time that a job is posted so for this we're going to be focusing on three main keywords or operators in order to handle dates the first first is how to cast timestamps as a date next is how can we work with time zones and convert to different time zones and then finally we're going to work with my favorite extract being able to pull out things like year month out of a date all right the first thing we're going to look at is how to cast different values or types to a different data type so in the case of our posted date this is a timestamp and we want to if we wanted to cast it as a date we would use this colon which allows us to specify the data type we want to cast something to this is typically used within the select statement in order to assign it to maybe a column name and we haven't seen this before which you can actually run queries without actually using a from statement to select a database so in this case I can say just select a string value in this case I'm selecting the string value of a date and I'm going to press command e command e to run it and so right now it provided this query of basically the string inside of it but instead I can actually cast it by using this double colon and then from there specifying the data type date from here I'm going to just go ahead and command e command e and run it and now it has this as a date now to show this for values we can actually see inside of here that we've actually converted to a different data type I did some other examples converting this one 123 string to an integer true as a string to Boolean and then 3.14 string to real in this case we can actually see the conversion the en it has this gray box around it Boolean it actually has the green around it to signify that it's a true value and then same with the float the gray around it so let's actually see this in action we're going to get back into actually running queries on our database and remember be sure that you have it set up through SQL tools that you're connected to that SQL course that it's actually active and you have SQL courses as the connection down here at the bottom so I'm going to go ahead and run this query and for it we can see that we have the title the location and then the date and with this date we also have this time stamp associated with it because it is in fact a timestamp value let's say we only needed that date value from this column and we don't really care about that timestamp well I can convert this data type of timestamp to date and we do this by specifying the double Callin and then the data type of date running this all together pressing command D command D we can now see that we have the title location and then date it automatically cleaned it up for us and removed that time all right next up is at time zone a keyword in order to convert timestamps to as you guessed it different time zones now it can be used with time stamp data whether it has a time zone specified or not as a refresher from our data type section timestamp alone includes things like the date and time whereas time stamp with times Z includes all that and then includes either a plus or minus to adjust the time zone based on where it is now if we go back in to see the data that we imported by specifically going to that job posting fact table CSV that was imported into our table and then scroll over to see the job posted date column I have this convenient coloring scheme so you can actually see it but anyway here's one of the dates right here right it's just a uh or the time stamps it's a date and then a time there is no time stamp on the end of this so the data in our database does not include time zone information so here I have that similar query from before where we're looking at title location but also that date time we're going to go ahead and actually investigate what the column looks like and similar to before that date time includes things like date time and we want to pay attention to these top values right here these aren't going to change from query to query on the top 10 values actually I'll just leave it to five we're going to leave these up here and actually pay particular attention to these as we go through this example to understand this more and still the same ones up here but remember that 1746 now if our data came as a time stamp with time zone whenever we use this at time zone keyword we'd only have to specify it once so you'd specify the column name add time zone and then from there specify the time zone you want to go to in this case we're showing Eastern Standard time now in our situation is different so it makes it a little bit more complicated because we don't have time zone information we need to First specify the time zone that this value actually is by saying at time zone and then from there use at time zone again to specify the time zone we want to go to in this example we're showing from UTC converting it to Eastern Standard Time so going back to that previous query that date time these values here I know from actually collecting the data this value is UTC so I'm going to first start by specifying at time zone and then specifying UTC and then we're going to do it again to go to the times we want to go to let's go to Eastern Standard Time so now I have at time zone EST we can see it fully here now Eastern Standard Time is 5 hours prior to UTC so whenever I run this I should expect it to adjust this appropriately and Bam we went from that 1746 to 1246 so 5 hours prior now if you Google the postrest documentation on different time zones you can see that they have a whole host of time zon basically every single time zone available for you to use here in our case we were converting that UTC which is at 00 to that Eastern Standard Time which is that ne5 the last keyboard we're going to be looking at is extract and this is used to extract things out of the date such as the year month or even day this is used like a function within the select statement so I would do something like select and from there specify extract function and then inside of the parentheses specify what I want to get from the column of interest in this case I want to get the month from the column name and then I'm just renaming it as column month so going back to that previous query we have the title location and then also date time let say we want to extract the month out of that job posting date I would first start with that extract function from there I would specify what the value I want from it is then specify the keyword from and then finally the actual column of interest and then we'll name this one date month selecting it all and then running it we can see we have from this the month values from that date time so 977 43 I can even take it a step further and add something like like the year to this and from here command e command D we can see from this we got the year into this column now you may be like luk what is this actually useful for well when we actually use this something like this in combination with something like the group by function I could do larger Trend analysis with SQL specifically let's say we want to look at how job postings are trending from month to month so let's start with a simple query and build on it further I want to first start by just getting things like the job ID and then the month from each of the job posted date columns running this query to double check that it's working I can see that I'm getting it right here now I want to aggregate it so I want to do a count of these different job IDs for each month so I'll start by putting a count around job ID and then from there add a group buy to then specify we're going to group by this new month column running this all command D command D we're getting these first five values now we're down to 12 so I'm just going to go remove ahead this uh limit column and then run this again to actually see all of them so bam it's shown us all the different months and then all the different values now personally I don't really care about data analyst roles so I'm just going to take it even a step further and then use a wear claw specifying the job title short of data analyst and then just for a little cherry on top we're going to do an order by and then in this case we want to obviously do the count so just not to be repetitive I'm going to rename this job posted count and then specify this down here of job posted count okay running this all command e command D we can see that this is ordered from low to high and I obviously don't like that so we're going to just change this to descending and then rerun this query again command D command D and then from this we can see that there looks like there's a trend that earlier in the year so January feary March have higher job posting counts specifically with January having some of the highest and then later in the year like December November and September are lower on the list and this pretty much tracks with what I would expect all right now it's your turn to give it a try of a few practice problems aligned with not only using the different date time functions that we just went over but also aggregating it with some previous functions that we used in order to look at things like salary and CS of jobs all right that s in the next one all right in this section we're going to get into a practice problem using what we just learned on the dates and also we learned previously on create tables specifically we want to create tables for each month of these job postings so I want all of the data for say January in its own individual table little bit of foreshadowing here we're going to be using all these different months in upcoming practice problems when we go over even more advanced operations so anytime I'm doing something like this I want to just start the very Basics so let's start very basic first with the query to actually connect to the database so we'll start with this of Select star from job posting fact and then I want to speed up these queries for the time being so I'm going to just put a limit statement on the time being to speed it up all right so we have all our information so the next thing we want to focus on is this job posted date remember we want to make tables for every single month or at least the first three months January Fair March so let's start by filtering this job posted date column and only get values that have January in it so in this case I can use that wear statement and then we're going to specify extract and we're going to use use the function we want to remember we want to specify that we're going to be using month from this job posted date now we need to specify what this condition actually meets of extract month from job post date in our case when it equals one so I'm going to go ahead and select this all here and press command e and command e again and scrolling over to that job posted date column I can now see that we have nothing but January dates in this column okay we go ahead and close this out I don't need this anymore now we're trying to create a table for this job posted date of only January and remember if we go back to our SQL tools extension on the Le hand side and actually go into our tables folder we can see we have these four folders right now so we want to create a table an additional one inside of here so first I'm going to remove this limit statement and once again make sure that it runs it's going to take a little bit longer to run and Bam we looks like we have around 92,000 values the next thing I want to do is now use a statement of create table we're going to specify the name of this table so in their case January jobs and then we'll use the as Alias to assign this and just to format this better to make it look more appropriate I'm going tab this over bam so that's how we'd want to do this and I'll put a little uh semic at the end of this so we have this for January we also need this for February and March I'm going to show you a little trick that I do this so I'm going to copy all this by pressing command C and then if you go into any AI assistant in this case we're using chat GPT specifically the chat bot for this course I can then specify in there to make this for other months so I specify make this query for all months in the year and then from there paste in that SQL query let's see what it does boom okay so in this case Chad gbt went through and did this all anytime you find very repetitive work you need to jump into chat GB to do this now remember we only need these three months right here and I have gone through and verified that it does in fact use the right syntax chat PT can be wrong sometimes so make sure you're always double-checking it anyway I'm going to copy it by pressing command C and then going back in here and pressing command V I also went in and added some indentation all right let's create all these different Tables by pressing command e command e okay so it looks like once again we got this blank screen that the tables were made I can come over here and I'm going to press refresh inside of SQL tools and Bam there it is January February and March jobs are now inside of here so we created tables using the extract function now I always like to double check my work so I'm going to run this select statement right here where we select job posted date from March jobs running this pressing command D command D we can see that all these different jobs in here looks like we're around 64,000 different values and they're all for March so I'm pretty confident that all the other values should be correct all right it's your turn to give it a try we need to get these tables built inside of your database cuz like I said we'll be using this in some future examples with that see you in the next one all right in this section we're going over case expressions and this is very common where if we want to create a column based on a condition we can do this through this case now if you have any experience with something like spreadsheets or even python this is very similar to an if statement where an if statement usually have some sort of condition whether you're trying to test whether it meets it true or false and then from there it assigns it a value whether it is fact true or false so let's go over the basic syntax on how to use this and commonly it's used within a select statement and that's what we're going to show here but you can use it in a whole host of other things such as we or even groupy to list and expression to satisfy by case we start out with case and then we end it with end and at the end you can also use an alias so we can have in this case as column description so that's what we're going to name that column as whatever we have build with this now this is going to go in logical order so for the first one of when column name equals value one we're going to check whether it does meet that and if it does then we'll provide it this value which will go inside that column description column from there we'll move to that next line of when column name equals value two and then description for column 2 and if it doesn't meet any of those two conditions it will meet the else condition of then other and that will be what it will be assigned so let's start simple with this query where we're just going to look at the job title short column and also the job location column so let's look at a condition where we would maybe want to reclassify where a job is located at and for this we need to look at the job location column so I'm going to go ahead and run this query by pressing Comm Comm D command D and inside of this job location column we can see we have things like either like a state city or even a country City and then we even have things for like remote jobs where we specify that the job location is anywhere so for the scenario I have three different conditions I want to look at I want to create a new column and let's say I'm job searching I'm located in New York and I want to label things like anywhere jobs as remote if they are in New York I want to label them as local and then otherwise I just want to label it as onsite that you have to be in that location so I'm just going to disregard them so with any case statement I'm going to start with that case and then I'm also have the end and the Alias we're going to use for this one is location category now let's go for that first statement of for anywhere we want to label as a remote so when job location is equal to anywhere I'm going to go ahead and move this over a little bit and get this out of the way then we'll label it as remote next when job location equals New York New York then I want it to label it as local and then finally with everything else because this goes in logical order and satisfies whether it meets it I'm going to then specify an else statement of onsite all right let's go ahead and run this all pressing command e command e okay I got an error right here and that's because we don't have a comma after job location because it's starting that new column so make sure you have that in there all right command D command D expand this out a little bit all right now we can see the anywhere rows are now labeled as remote the New York is located as local and then everything else is on site so this is great that know we have this done but from an analysis standpoint I want to dive into this further I want to analyze how many jobs I have I can apply to specifically the local ones and the remot remote ones also look at the outsite as well so we can use something like the group by function in order to aggregate all these different values so I'm going to start by how we're going to aggregate this and that's going to be using a count and I'm going to use that job ID column making sure to include a comma also with this count job ID I'm just going to label this appropriate as number of jobs so let's first start by just starting simple of doing a group ey and we're going to be doing this using that location category let's actually run this query to see what we have so far command D command D all right a lot more onsite verse remote verse local not bad for local 8,000 jobs in New York anyway if you recall I specifically care about the data analyst jobs so we'll specify a filter for where and we'll go with the job title short equal to data analysis running this again seeing how little we declined went from 8,000 jobs to to 3,000 jobs local and down to 13,000 what was it previously 69,000 okay now I normally put an order Buy in here but it looks like for some reason it automatically sorted it from highest to lowest so I'll consider this good enough for now anyway if we wanted to I could di dive deeper into these 3,000 local jobs that I could then apply to all right with that it's your turn to give it a try with case expressions for those that bought the course certificate and notes you now have a practice problem specifically that goes into bucketing different jobs specifically around salaries so assigning different values for each all right with that see you in the next one all right in this section we're going to be going over both subqueries and CTE and the concept behind both of these are that we're going to create basically temporary tables inside of our SQL query that we're performing and then perform an analysis on this temporary table this is very useful whenever we're getting into more and more complex queries that we need to be able to do as we want to break it up into sections and subqueries and CTE allow us to do this now in order to show the power of subqueries and CTE if you recall back from one of our previous practice problems of creating tables for each of the different months we use this statement of create table and then underneath it for it we specified that we wanted to only extract out those values where the job posting date fell in January in this case for the January jobs and then from there it created a table inside of our database this table is sort of like permanent it's there now the only way to get rid of it is we drop it let's tackle subqueries first because they're used for simpler queries let's say in this case we wanted to create a temporary table of January jobs well a subquery is a query as you guessed it inside of another query so in this case we have a select star statement and then from and then within parentheses we're specifying that subquery in our case we want to select only the jobs where the job posting month is January and then we rename this table as January jobs navigating back over to vs code to actually see this in operation I can select this all press command D command e and when we we scroll over to that job posted date we can see that the this is all the January so we did that select star on our subquery now the other popular way to create a temporary table are common table expressions or spoken as CTE and they can be used in even more locations such as select insert update or even delete with this one CTE are defined first using the with statement and then you're saying with this new table name and then the Alias as and then from there within a parenthesis es we're then specifying the entire query that we want to run to put in this new table called January jobs from there we can then run the next query of Select star from January jobs all right to show this actually in vs code let's actually run this one so command e command e and once again rolling over to that job posting date to make sure that it did it correctly and yeah we can see all of them are from January for this so let's jump into some harder practice problems for each of these first one we're going to focus on is subquery remember that's a query with inside another query and we can use it in things like select from where or having we're going to doing an example where using it inside of the where Clause because it's in this parentheses think of order of operations the inside parentheses will be executed first and then everything around it will be operated second so let's say I wanted to get a list of companies that are offering jobs that don't have any requirements for degree currently in our column of job no degree mention we have a true or false value and this says whether a degree is going to be required or mentioned in the job posting and it's located inside the job posting fact table so let's actually go ahead and just look at it real quick so right now I'm pulling company IDs and then the job no degree mentioned and for this all of these are going to be true now if we remember back from our diagram of the table going on here we have that job postings fact table itself that has whether a degree is mentioned or not and we have that Company ID however we don't have the company name in this that's in a separate table so that's why actually in this case the subquery is going to be so powerful because we're going to run a subquery to get the jobs that have the associated Company ID for no degree mention and then from there filter inside of the company dim table so let's make this into a subquery and we're going to start with very simple first we're going to add a select statement and then we'll say company name which is the column name that we want to look at naming it as name for the table we're going to be selecting that company name from we want to use that company dim and now this is we're going to enter in that subquery we're going to specify where that Company ID is in this subquery itself now if you recall whenever we ran just this I'm going to just select this and run the query again command D we're getting back in this a Company ID and then whether job no degree mention is true in this case realistically this column is not even necessary so I'm actually going to clean it out and we're going to go ahead and run this again command e command e and same numbers 1 nine 10 yeah this is what we saw previously here so I know it's pulling the right things so we're trying to say hey where the company ID is within this table so within this table we're going to only want to return those company names that are associated with it so let's run this all pressing command e command e uh company name does not exist oops I have this backwards it's actually name as company name my bad okay we'll run this again command e command D and Bam we can see all these different things I just want to show this further that we are actually getting the right data so I'm going to insert in that company ID and I'm going to do some clean up real quick anyway let's run this again command e command e okay we can see in here 1 3 4 6 7 8 9 so these are all the ones that are accepting it and I realize now whenever we look at this one we're not seeing those other numbers and I think this is just by an order by an issue so I'm going to fix this real quick by although it doesn't really matter it's just inside of here putting an order bu and then specifying Company ID then from there just running this subquery inside of here pressing command D command D okay we can see all the one the same one so 1 three 4 6 and then navigating over to the other one like I see 1 3 4 6 so this is correlating and checking out right and this is how you actually should go through troubleshooting it let's wrap up this section with a final example on using CTE or commentable Expressions this is used similar to a subquery to create a temporary result set which in this video previously I may have been referring to it as a temporary table that was a mistake it's a temporary result set they're two separate things anyway the results of this temporary result set can they then use in things like a select statement insert update or even delete this only exists during the execution of the query and it's defined using a with statement before it then defining the table and then using as and then enclosing all of the in parentheses how we want to create this temporary result set we can then call this within a query below it so let's actually work a problem to see how CDs are used for this we're going to be finding the companies with the most job openings now we need to break this up into two parts and that's why CTS are perfect for this because first we get the need to get the total number of job postings per Company ID which is located inside of our fact table of job postings fact but then once we have this total number we then need to combine it with the company name which was in the company dim column so we're going to start by building our query first for that first bullet of getting the total number of job postings per Company ID so for these queries I want to just start small we're going to be selecting the company ID and looking at the what comes back from this so we can see that it has multiple different IDs in so now we need to go into actually aggregating it for this we're going to be using the count function and we can just put in there the asteris symbol is that's going to count the number of rows but anytime we're do an aggregation function we need to specify the group ey on how we're going to be grouping it and in this case we want to group it by what's given in that table right there of that company ID all right let's run this one okay so now we have these counts so we can actually just go back and verify for zero we had four right here and it looks like we have four returning so this is the core statement that we're going to be using inside of our CTE so we can go ahead and create that now so I Define this using a width statement and then give it a table name of company job count using the Alias as and then inside parentheses all the query that we want to do and then just to get started and making sure that this works properly I'm just going to do a a simple select from statement to actually Define this and this will just query this temporary result set that we've defined up here and running it we have the same results that we had before now it's just through that temporary result set so just refresher on the schema we have that job postings fact data that is connected to our company dim table using a company ID so we need to use a join method in order to combine these two tables together to combine these two tables we're going to be using a left join and for this we want to use our a table to make sure we have everything from it as the company dim table because maybe there may be some companies that don't necessarily have job postings that we aggregated from the B table so we want everything to be listed there so that way if there isn't there's a zero associated with it that there's no job postings so as you guess it B is going to be the fact table that we're going to be combining to this so let's start simple with this basic query down here and we're just going to be looking at first that company dim table and so just running this query right here command D command D we can see all the different names from it so now that we have this let's actually move into joining this I'm going to do a left join specify that temporary result set and then what we're going to match these two left joins on which is the company ID from each table we're not going to necessarily get any different results with this but I'm going to go ahead and exe execute this entire query to make sure it's it's actually working properly okay it's working properly now if you remember we want to get the total number of job postings per Company ID and have it basically associate with a company name so this is from the company dim table and I'm going to rename this as company name so I want this value from this count star statement right now I don't have an ales should have done that before so we're just going to name this as total jobs and then we want to have it appear in this new query that we have here so I'm going to then Define it okay let's actually run this entire query Now command e command e bam and now we have the company names along with their total number of jobs right now remember we want to get at the highest who who has the most so we need to now do an order buy and I'll just add this here at the bottom order buy total jobs in descending order let's now rerun this query boom all right now we got it and it looks like andrigo is one of the highest amount along with other popular companies like city capital 1 Walmart and centure so yeah all right it's now your turn to give it a try and I have for those that purchas the course certificat notes I have multiple different practice problems that you can go to three for subqueries and three also for CTE if you're still not feeling confident on actually tackling these problems on your own just stand by because in the next section we're going to do uh go into another problem of using CTE no pretty fun example so feel free to even hold off until after that section all right with that see you in the next one all right let's get into a practice problem of how to use ctes even further than the previous examples for this we have a pretty exciting example that Kelly came up with and this problem was inspired by my app D nerd. Tech that Aggregates job posting skills depending on a different job title you want to look at so if I want to look at something like data analyst we could see what are the top results here anyway Kelly came with this problem find the count of the number of remote job postings per skill so we're going to be focusing specifically on this because say in my case I'm a data analyst looking for remote jobs and that's what I care about most we'll display the top five skills by their demand and then include other attributes like skill ID name and the count of job postings now how are we going to tackle this well remember the job postings fact table has all of our different job postings it doesn't have necessarily all the different skills in here instead we have to use that skills job dim table to get the all the different correlated jobs to skills and then our final skills dimensional table that includes the name of those skills so the first thing we're going to do is build a CTE that basically collects the number of job postings per skill so we're going to have to do some sort of join between our job posting fact table and our skills job dim table once we have this temporary result set we can then take this a step further and then combine it with our skills dim table to actually give us our final results that have the skill name in it for all the joins within this portion we are trying to get a count of jobs that actually exist we don't really necessarily care about if there's values that don't exist so for this we're going to find that inner join is the best method to use for this so let's start with the very Basics and then build upon there for the CTE I just want to look at first the skill ID column of the skill from the skills to job dim table which we're going to just conveniently rename skills to job messing command e command e all right so we're already seeing repeats in here I'm going to go ahead and just actually showcase the job ID column as well put a common in there run this query again so we can actually see this better because I don't think it's like showing fully what we want to see all right this one's much better this one has so for the job ID let's say zero which is one of the the job IDs it has the skill ID Associated of it of zero and one so there's multiple skills associated with this job so let's go ahead and actually join this on our job postings fact table since that we know that they're correlated through that job ID column so the first thing we do is specify in join after that from statement soing the table and we're going to rename it just simply as job postings from there we're going to specify how we're going to connect this and we're connecting it on the job ID I want to make sure this works so I'm going to just do command e command d and job ID is ambiguous so I need to specify right because it's in both of the skills to job and the job postings doesn't really matter which one we specify we're going to just specify this one right here and then run this query again all right so the query is running there's nothing necessarily new in here that we've included but if you recall we want to look at or filter for the jobs that have work from home as true or remote jobs so the first thing I do is just add to the select statement to actually see it shown in here and it looks like we got a lot of false results I'm sure we'll eventually get some true oh we got a True Result right here so let's actually now Define that where to meet this portion of the question of remote jobs and we want to specify where job postings of job work from home is equal to True bring this query again all right so they're all true in this case we don't need this column anymore listed here so I'm going to go ahead delete it I just showcase it to make sure that we were building the query correctly now getting back to that core problem we want to get the count of the number of remote job hostings per skill so we have this skill ID already we need to now get a count and we're going to be doing that count star method we'll sign this at an alias of skill count anytime we do an aggregation well we need to do a group bu and we're wanting to group it by obviously that skill ID now since we're combining by this skill ID and this aggregation this job ID column is no longer useful and actually we throw off our aggregation so we're going to go ahead and get rid of it and we're going to go ahead and run this query all right so now we're seeing that skill ID and skill count I can see here that skill count these this zero and one Whatever skills they are associated with the IDS are this high right here at 40,000 so we pretty much have our CTE built so let's go ahead and build that CTE out using that withth statement defining it as remote job skills and then using the Alias operator to put that all inside of there so now that we've combined that job postings fact with that skills job dim table and we have this in a common table with a skill ID for it we can now go further and do another inner join with this CTE to that skills dim table so the first thing I'm going to do is just make sure that this CT Works by just doing a select star of this remote job skills running this query I can see that it works so let's move into actually doing that inner join with the skills dim table I specify inner join and then the table assigning the Alias of skill and we're going to connect both of these on that skill ID column and now that it connected we need to specify the columns we want for this so the first we'll just keep it simple with only the skill ID but obviously next we want the skill name with that defined we need to now have the most important value skill count so let's go ahead and see if this entire query works then looks like I have an error here I referred to the table injoin as skill when I reference it up here as skill so we're just going to go ahead and add Nest that whoopsies any we going to select this whole statement again and run it again all right so now we have that skill ID skill name and the skill count couple things we are left to do now is actually order this and then remember we only need the top five results so I'll add an order by statement right here here specifying that we want to order it by that skill count in descending order and then we only want the top five results so I'm just going to throw in a limit statement right here selecting all this query and then pressing command e command D bam now we got it and we can see what were those 40,000 values now it was obviously Python and then SQL followed next by AWS Azure and Spark now I'm going to throw one simple caveat to this question because frankly I really care about data analy jobs so I'm going to filter our CTE further for only data analyst jobs using within that and within the wear statement and and specifically where that job title short equals data analyst all right let's go ahead and run this all to see what it looks like for data analyst and Bam these are more of results that I would expect for a data analyst CU well that's the query anyway we got SQL Excel and then python Tableau and powerbi so regardless of what you're filtering for it goes to show the importance of SQL this is the top skill and so I think it's booo of you that you've spent the time to learn this all right for those that purchased the course certificates and notes feel free to go ahead if you haven't already to work those practice problems working through those example problems for CTE and also subqueries and once you're done with that we'll be moving into the last major topic of this Advanced section focusing on Union with that see you in the next one all right welcome to this last major topic we'll be covering in the advanced section on unions this is very important for combining tables it's directly if you will opposite of how we're doing joins so joins are used in the case whenever we want to combine tables that maybe relate on a single value such as in the case of combining like the job posting fact table with the company dim table we're going to combine this on the company ID column now if you remember previously we created three tables for those job postings in January February and March and moving this over here so we can actually see it better the January table has the same columns as that February table along with that March table so in this case if we wanted to combine these basically rowwise we could use a union operator to do this so the first one we're going to cover is Union and it comines the results from two or more select statements you would use this operator by first defining a statement such as select column name from table one and then specifying Union and then underneath it specifying the next table you want to do it select column name from table two and now in order to Union this there is one specific condition it has to meet they have to have the same number amount of columns in this case we're only doing one column here but in our January February March jobs they can have a multitude of columns they just all have to match and they all have to be the same data type and the last thing to note is it gets rid of all duplicate rows whenever you combine this unlike the next operator Union all so let's start with this simple query I want to look into seeing all the different job titles and Company IDs and job locations for the month of January these are the main although they're not all The Columns of the main attributes that I care about I'm going go ahead and run this query just make sure that it runs perfectly fine and we can see here all the different results that we're seeing from it so now let's actually join this with our February jobs table first I'm going to specify Union then I'm going to do another select statement for the February jobs and I'm making sure that I have the exact same columns listed as the January jobs right now from January jobs we have around 92,000 jobs if we run this all together we can see now we have over 107,000 jobs so let's just take this a step further and now add in all of the March jobs so we have now January February March and we're specifying all the different columns how many jobs are we going to have return and for this one now we're up to 143,000 jobs so that's the union keyword moving on to Union all it's pretty much used in the same exact way we're still using two or more select statements in order to combine different tables and they need to have whenever we're doing this the same amount of columns and the same data type now the thing about the keyword all of Union all is that it returns all rows even duplicates and Kelly and I have talked about this further and we find that we use mostly this one in our jobs as we typically want to get all the data back to make sure we're looking at everything so going back to that previous query that we built here all this takes is adding in all to both of these now remember the previous query we got around 143 ,000 results so let's see what we're going to get back with this Union all we should be getting back more values and we do looks like 220,000 so quite a bit more than this almost 880,000 more values or 80,000 duplicates if you will all right now it's your turn to give it a try we have a few practice problems built in order to implement both the union and the union all if you're still not comfortable with either of these I do have a practice problem coming up where I'm using Union also in combination with some other things like subqueries and CTE so you can feel free to hold off until after that practice problem you're not exactly comfortable yet with unions all right with that see you in the next one all right let's dive into the last practice problem of this advaned section before we actually dive into our project for this I want to do an analysis of the job postings from the first quarter that have salary greater than $70,000 that's like my target range right now so we have those tables already on January February March we're going to use our Union operators to combine them all and then we're going to be using it within a subquery to then analiz it allowing us to thus filter it for those jobs greater than 70,000 and get a snapshot of the jobs we actually want so anytime we building a subquery or CTE I like to just tackle it first for this we're going to be selecting all the different columns from each of these different months tables so I'm going go ahead and just run it right here to show what it looks like okay and we can see that it has all the different columns associated with it so the next thing we need to do is specify that Union and we're going to use all because we don't want to remove any duplicates and then once again specify the next select statement for the February jobs and then our final Union all statement so we can combine this finally with our March jobs let's actually combine this all and see how many jobs we actually have here all right and it looks like we have around 220,000 which checks with what we did last time so let's go ahead and start building this into a subquery so I'm must do select star and then from and then build this Union all portion into the subquery now because we defined this table I'm going to give it an alias of quarter 1 job postings let's go ahead and run this again make sure that it's working just properly okay sweet still returning all the different jobs all right I don't want all of these different columns here so instead what I'm going to do is I'm going to define the four main Columns of interest for this we're going to be doing job tile short job location job via and the job posted date specifically only the date and then remember the last portion of this question is that we want to find the job postings that have an average yearly salary greater than 70,000 so with all of this I need to add a filter on it using a a wear statement and we're going to specify all this for greater than 70,000 going ahead and running this all command e command e boom we now have all this information I'm going to modify this further mainly two things one I want to see the salary and two I really only care about that iist jobs so the first thing I add is quarter one job postings and then I'm going to specify that salary year average and then inside of our weer statement I'm going to use that and keyword and specify where the job title short is equal to data analyst also one other bonus I'm just going to throw in an order bu so that way we have this listed for the salaries from highest to lowest okay let's go ahead and run this bad boy bam now we have all these different jobs and I can see that I need to apply for y combinator and it's a remote job that pays $650,000 now I went ahead and removed this table name prior to these column names some of you may run into problems if you don't include this but I didn't Rite any problems with it so I believe shorter and brevity is better in all cases if I go ahead and remove that all for all those different cases make this look a little bit more concise running this query I still get those same results all right now it's your turn we're going wrap up this final problem so we can finally moveed into to that portfolio project we're building with that see you in the next one that nerds welcome to the project section of this course so you've already learned so much in the basics and also Advanced section with using SQL queries but now it's actually time to put all that knowledge to the test by building a Capstone project now there's a few goals with this project we're going to be working with that same data set that we built inside of the advaned section and exploring it further and your goal for this is twofold you're going to be looking to analyze what are some of the most top paying roles and also skills but there's going to be a little catch to this similar to the previous quers you don't have to necessarily limit this search and follow exactly like I do searching for data analyst remote jobs you can actually fine-tune it to your job of choice Additionally you can change it to things like the location you are within the world you should need to research into the data itself and find a close enough location or locations so for this there's going to be a few different deliverables that you're going to be able to showcase as experience with the work that you've done I'm going to show it here within this project SQL folder that you're going to be building and I have five files here for the five different queries we're actually going to be diving into and finding further insights about not only roles but also skills for top paying jobs additionally with this we're going to be building out your own personal readme file and this is basically a descriptor of the project itself and it's going to go in and actually showcase all the different analysis you did and besides just analysis we can also go as far to show some of the results that we find from it and we're going to be using everything that we find in order to draw some conclusions and help you out in finding out what are the most optimal roles and skills you should be pursuing so this project can not only be rewarding for those looking to learn SQL but also maybe if you're job searching or looking for promotion you may able to find insights within this data set and use it in your real life now all this work you did is great but if you just just keep it on your computer and don't tell anybody about it does really no good so we're going to be working on this also by showcasing this to things like GitHub and Linkedin both of these sections using GitHub and Git along with LinkedIn are completely optional so whenever we get to these one of these sections I'll call it out and let you know that hey this section we're working on building a repository you don't need to pay attention to it if you don't want to but I highly recommend that you do anyway with GitHub we're going to be actually able to Showcase not only all the different SQL that we have which are all these different files right here but we'll be able to Showcase mainly that read me that we built to Showcase all the different work that we did and our findings and you may be wondering what questions are we actually going to be exploring for this well they all revolve around top paying jobs and also skills and so we'll be building on each one of these with each of the queries we do and finally ending up with probably the most exciting one trying to identify the most optimal skill something that's a high demand and also high paying we're going to dive much further into this after the next section where we're going to be actually creating our project repository to thus upload into GitHub one quick note for those that bought the course notes and certificates inside of your notes you're going to have access to the questions and all the different problems that we're trying to solve with it in addition to this we're going to break it down even further inside of here by including things like the reasoning so what's going on behind each step of the query and why we're doing each step along with the final query itself and what expected results you should be finding all right the next video we're going to be diving into actually setting up your GitHub account and setting up the project repository so we can get this thing started to host online so we after we get it completely built we'll be able to put it all there like I said at the beginning this portion of the GitHub integration is completely optional but you learn a highly valuable skill of git so I do recommend that you do it all right with that I'll see you in the next one so what is a repository well it's a personal library for your project where you can keep manage and record every change if you know of track changes in Microsoft Word this is very similar to that so how does this version control work well there's a special folder inside of your project that we're going to be installing that's going to track all the changes that go through it here I have another data science project inside of my finder window and it has all the different contents that are necessarily visible to me but in this one there's actually some hidden files there's this one for VSS code we don't really care about we care about this one right here on dogit and that's how we're going to be maintaining our Version Control inside of this folder it tracks all the different changes that we're going to be doing to our project exploring the contents of this is sort of out of the scope and probably above my PR grade but it's just important to understand the purpose of this folder so on our local computer our s we have this project or if you will a working directory and then as we're working on it and collecting new documents it's going to go into a staging area and once we have something we want to save or commit we can then do that and it will save to our local repository which is in that dogit file now all of this is on our local machine or our computer locally but let's say we wanted to collaborate with others or even share this like we're going to be sharing with this project we could then host it to something like GitHub in this case it will then host a copy of all of the different content that we have locally on our machine so in that case where I showed you my previous project that's located here on my machine I can also go up to GitHub and this is where I'm actually hosting that same content at this is would be my remote repository so let's actually get into doing this for our project and there's a few steps we're going to be walking through for this the first if you haven't done it already we're going to go through and actually install git git is the most popular version control system so we're going to be using it for this next we're going to be diving into setting up a GitHub account if you don't have one already and it's pretty simple once we have that we can now get into actually initializing a repository within our project so basically we're going to be creating that dogit folder inside of our project and then now that we have this local repository we're going to want to push this repository into GitHub and thus we're going to have a remote repository for the world to see so let's get into download and get don't be afraid if you're unsure if you have get installed or not as reinstalling it's not going to cause any issues anyway navigate to this URL that's on the screen right here or just simply Google get download and it should navigate you to this page from here select the operating system of choice that has yours and download it for Windows it's pretty simple as you're going to walk through a similar download process that you did like postgress and it'll take you through all the steps of the process and you'll have it installed for Mac users like me it's a little bit more complicated you're going to want to open up your terminal and then from there insert this Brew install git and you're going to press enter and execute it now if you don't have Homebrew you're going to navigate to the link here that it has and once again it's very simple you're going to just copy this command that it gives you right here for installing home brew and then inside of your terminal paste it and run this command then once you have home brew installed you run this other command of Brew install G anyway I promise you that's the last time you'll see the command line for this course now that gets installed we need to move into creating your GitHub account if you don't have one already because you need an account in order to host your repository here's my GitHub count right here has information about me and all my different Social Links and then from there it has all the different projects I've worked on here along with um more of them in this one right here so you're going to navigate over to github.com signup and it will walk you through the setup process to set up your account inside of GitHub once you have this account set up I'd go in and actually edit your profile putting a picture in and then updating all the relevant information that you want to include on your profile all right so we have git installed on our computer and we have our GitHub account now we need to move into actually creating that local repository and then pushing it to GitHub for our remote repository now there's four major option can do for this we're going to only be going through the VSS code option which I feel is the simplest for those that buy the course notes and certificate I have detailed instructions for the second and third option as well if you want to do that all right so let's get into creating that repository using vs code so inside of here you should have navigated or have your project open for me the project is SQL project data jobs analysis I'm going go ahead over to the activity bar and select Source control and for this we have two different options the first is just initialize a repository so basically create that local repository that dogit folder for you then to have locally but this doesn't push to GitHub so we actually want to do the second option of publish to GitHub as it not only initializes a repository locally but also it pushes all of your different content that you have here to GitHub now I've already logged into GitHub so whenever you click this button it may prompt you to go through the login steps to set up your GitHub account to be associated with your vs code profile right here walk through that process and then you'll get to the next step that I'm going to be at so I'll click publish to get GitHub it's going to ask do I want to publish to a private repository or a public repository I want this to be available for the world to see so I'm going to do public then it's going to ask me which files you should include in this repository this is very important that you get this step right as I had to repeat this earlier I'm going I made a mistake so for one we don't care about these dot files right here these are hidden files that have no effect on showing your work so I'm going to go ahead and uncheck them additionally these are the other folders I have right now I have that SQL load that you had that CSV files of all the different CSV file data that we Import in our database and then I created another folder for advanced SQL that had all those different problems that we worked so far in the advanced section this SQL files folder is extremely large and so because of that GitHub isn't going to allow you to upload all that data to it Additionally you don't really need to be keeping all that stuff on a GitHub anyway to show your work so I'm going to go ahead and actually uncheck it so for you you should only have one folder and maybe two actually created this point that you want to leave checked you're going to go ahead and click okay it should give you this message of successfully published this project to GitHub I'm going to go ahead and open on GitHub and Bam here we have it inside of GitHub we have our folder that I have for advanced SQL of all our different files that I created our sequel load for the files to create our database itself and then this do get ignore file and I'm going make this a little bit bigger if you recall these are the same files that we basically unchecked from before they were put into thisg ignore file hence they're not being tracked in git and because they're not being tracked in git they're not getting uploaded to GitHub so we just did all four of the steps necessary to get git and GitHub set up for this project and just to Showcase this is our project that we have right here I'm going to go ahead and show our hidden folders that we have inside of here such as that get ignore and also get so we can actually see that we're tracking all the changes inside of here don't check don't touch any of these files right here but I just wanted to Showcase where all these are and that they're actually in here in the project now when I navigate back into vs code I'm going to see that yes it has that new doeg ignore file then you're going to also notice that these two folders here are grayed out and that's because they're no longer being tracked by Source control control and because anytime we do any changes inside of them they're not going to be tracked so they're gray it out to let you know so we're going to walk through two different ways to actually Implement changes inside of your working directory and local repository and then have it updated inside of your remote repository so the first thing we're going to be looking at are pushing changes and this is basically uploading your local content that's been adapted or updated and then pushing it up to GitHub so I'm going to create a folder called project and then SQL this is the folder or directory we're going to be using to store all the different queries that we're going be working on during this project section inside of this project SQL I'll go ahead and create a file and I'll name this we'll keep it simple right now query 1sq now after creating it it should open up here on the right and you're going to notice that these contents are green and if we navigate over to our source control tab we can see that it's actually tracking that and not only only it sees that we have created this new folder but also this new file anyway I'm going to just come inside of here um and I'm going to put in a comment uh delete this later and I'm going to save this I'm going close out of this file so let's go ahead and now push these changes so inside of source control I'm going to come up here and give this commit that we're going to be doing a name specifically I'm going to just say name this push example because we're doing a push example from there we're going to go into actually committing it so we'll press commit and now these changes have been updated inside of our local repository and these are stage changes if you will so now we want to get these changes that are on our local repository up to the remote repository so we're going to then sync changes so I can see that there's nothing here in changes staged anymore everything should be UPS to date let's go into GitHub and see if it was updated so inside of GitHub I'm just going to refresh this page to see if it has and now we have this this folder of project SQL which has our query inside of it saying like this layer so that was example of how we can push changes from our loal repository up to our remote repository now let's actually look at an example of pulling changes we're going to make an alteration on our remote repository specifically inside of GitHub and then from there pull it onto our local machine to be in our local repository so you can also make changes inside of GitHub itself in this case right here it's saying hey you want to add a read me and we do want to eventually create a readme so we can just do it here if we wanted to so I'm going to select add a read me and so I'm going to put a message in here to update the contents of this later and now we're going to commit these changes to our remote repository we have our commit message right here of creating a read me then you should have your email that you're using for this we going to commit it directly to the main branch so I'll commit changes all right so now we have this read me here inside of here and I can see the read me down at the bottom to do update contents to this later now back on my local machine inside of vs code this file is not there that's because we haven't done a pull yet to get that information or get that new file here so we can do that by going over to Source control and then selecting these three ellipses right here and then going under pull push and specifically selecting pull now when I navigate over to the explore menu we now have have this readme file right here which the two do update content to this later all right so now we have our repository all set up not only locally but also remotely and we're now ready to go ahead forward with actually doing our queries and getting everything done for building our project now at this point you know everything that you need to know forg however you'd like to learn more I have this video right here where I go into further detail on how I use git as a data analyst and break it down even further so feel free to check that video out I'll link it with all the other resources I'll be providing for this course all right with that I'll see you in the next one where be jumping into the queries all right so let's move into that first question that we're going to be solving using a SQL query in order to understand what are the top paying jobs for my role in my case I'm a using data analyst as a remind we're going to be walking through five separate questions and these all really build on each other so we're going to be gaining insights from each and then by the end we're going to have a more holistic picture of what roles and what skills we should be targeting so if you haven't done so already create that folder where we putting all this I created this one called projector SQL and then from there start a new file I've named this one query one initially I'm going to actually go in and rename it specifically I'm going to give it that one to make sure it's up at the top whenever we actually save these files sequentially and then also name it top paying jobs cuz that's what we're trying to solve so at the top of my file I'm go ahead and put what question we're going to be solving and then also breaking it down into what are some deliverables I want from this so the deliverables are I want to identify the top 10 highest paying data analyst roles that are available remotely on top of this I want to focus on job postings with a salary so I want to remove anything that doesn't include a salary because obviously we don't know if it's top paying or not and finally I just put in there why we're doing this overall it's going to offer insights into our final problem and question and answer we're going to get into of finding the most optimal skills and most optimal roles to be pursuing as a data analyst so let's start building our base query and we're going to start with these six columns first they target things like the job ID the title location what type of job job it is whether it's contract or whether it's full-time the salary what we hear about and then actually job posting date additionally we want to do this from the most important table that we've been doing is that job posting fact table to make sure that we did this right as always as we're inating along I'll be running these queries to see that they're running correctly so pressing command D command D and then selecting what actual database you're going be using for this if you have to uh navigate out of this I'm going to go through and set it back up and boom we're getting these results for the different job tiles job locations job schedule type job posted date okay this looking good so moving into that first bullet point we want to identify data analyst roles and those that are available remotely so for this we'll be using a wear keyword and the first thing I'm going to specify is job title short equal to data analyst and then we want to do an and statement because we also want to get those that are available remotely so we have to specify a location so for location we'll specify it as anywhere so going to go ahead and run this Now command D command D boom looks like we're getting all these data analyst roles anywhere and then oh we're getting a lot of null values for that salary year average column so that's probably what we're going to want to tackle next which is conveniently our second bullet point that we need to do we need to remove all those null values so we're going to do this add to this wear statement more by adding an and statement and then adding that salary year average equal to or sorry is not null so let's try this n command D make sure that we're working and yep we removed all those null values all right so the last things for this are we need to one we want to get the top 10 so how do we do that well first we know we want to do a limit statement we going to limit it to the top 10 and the other thing is we actually need an order buy and for this we'll do the salary your average column again and remember this is initially or by default it's going to do ascending so we we want to do descending running this query we're getting all the different jobs and then they're going from high to low and if you remember crawl from earlier we had one that was around 650,000 and that was here now I'm noticing with this I actually I want to know one other thing uh with this and that's company because this is really great that it tells us these type of roles but I'm actually curious that not just these are remote and it looks like that most of these are all full-time but also what company so we're going to take this one a little bit of a step further I'm going to go ahead and close this out I'm going to add after that from statement I'm going to add a left join and I'm going to specify that we want to connect this to that company dim table that has the company name so and both of these are connected on that Company ID column so I list the the table name in front of the column for both of these and then set them equal to each other to get that left join so now I can come in include a comma that column of the company name is name and we'll actually give it an alias uh to make sure it's not sort of like confusing of company name okay let's go ahead and run this bad boy and moving this over here so we can see it a little better we have all the different things oh and now we get to see the different companies and so we have things like meta so like a tech company AT&T also kind of a tech company Pinterest a tech company it looks like some other Healthcare type companies all right so this is a great start to our first problem we've identified the different jobs that we have and we've also been able to see what could we expect at some of the top paying roles so now have a good frame of mind of what we should be targeting for now when you're working through this Feel Free as I mentioned before to modify this to your need whether you don't have to do data analyst you can do something data scientist data engineers and you can change up to that job location to where you are as we have locations from around the world inside of this data set all right for that we'll see you in the next one all right so let's get into the second query that we're going to be focusing on and it's going to really be building a lot on the previous query that we just did in analyzing the top paying jobs for a data analyst specifically here's the results of that query again and I feel like it's still lacking I know I added that company name into it to dive into it deeper but I don't feel like that really answers the question that I'm truly trying to find out of what skills are also important so that's what we're going to be doing with this diving in to find out what are the top skills inside of each of these roles that are the main drivers behind why the salary yearly average value is so high since we're going to be building on this further I'm going to go ahead and copy this query on over into our new SQL file that're going to be doing for this and I'm going to create a new file start it with with two and I'm going to name it top paying job skills and also make it a SQL file press enter and then I'm going to actually go ahead and paste that query in and additionally I added in what question we're going to be answering with this which are around what skills required for the top paying roles we need to be really doing two main things one using that top 10 highest paying roles that we identified previously query and somehow combine it in and join it with all of our skills now this because we've previously built a query this is a perfect opportunity for something like a subquery or CTE so this is a little bit more complex we're going to be going with a CTE for this once again let's make sure that this query Works while running it I'm going to move it over here and we can see from this there's a few columns we can actually remove from this specifically I don't really care about that job location or job schedule type and the job post to date isn't really providing much value either I'm going to go go ahead and clean that up and move those out of here so let's start by making this into a CTE we do that with that with keyword and we'll specify this as the top paying jobs result set then we'll put the parenthesis to enclose all of this in I'm actually going to select it all tab it over to make it have a nice little formatting and then put some closing parentheses right there make sure the CT is working properly I'm going to just go ahead and do a select all from this top panging jobs query right here or top paying jobs result set and I'm going to go ahead and run it to make sure that it's running properly and looks like it's good if you recall from our diagram of the actual database itself we're going to need to connect two tables of this so not only the skills job dim table but also the skills dim because that's actually was going to have the names inside of it but what join method are we going to be using to join this to our job postings fact table well remember from a join section there's typically only two joins that we're going to do it's going to be a left join or an inner join now in this case that a table is going to be our job postings fact table and we really only care about skills associated with a salary so if there's a job that doesn't have any skills we don't really care about that too much so left join is not going to be applicable in this case and instead we're going to be going with inner join so I'm going to start with an inner join on the skills job dim specifying for this one we want to connect the job ID from our top paying jobs result set with the skills job dim job ID additionally we want to do an inner join on our skills dim table connecting both of these on their skill ID column and right now I have a select star for all the different columns we're just going to go ahead and run this and make sure that it works pressing command D command D and then move moving this over here looks like we were able to join this all on but we have a few columns in here that we don't really care about so we're going to remove those so first for that top paying jobs result set that we have here I want all of the columns from that and I could go ahead and write out all these columns but instead I'm just going to do this of dot top paying jobs and then do a DOT and Then star and this is going to select all the columns from that table additionally I want the skills from this skills dim table okay let's go ahead and run this all right so it looks like the query's working I do want to do one last thing before we dive in and that's make sure that we actually organize this by the salary so I want to start looking at those first so we're going to put a last statement of order by and specify that we're going to do this for salary year and the same thing putting in in that descending order this order bu may or may not be necessary depending on how queries it may come already in an ordered fashion but you should just do this for best practices in case that this does happen to you and you want to get in the order and it doesn't anyway let's select all this and actually run this query and start analyzing it all right so this is pretty interesting I took a peek through all these different jobs and the skills that they're requiring now overall there's a lot of postings we're going to actually use some other tools to analyze this real quick but from what I'm seeing actually looking at it there's a very much a commonality of seeing SQL and python in almost every single one of these rules but with all these results with how much it is you'd really be better off using it in your analytical tool of choice could be something like Excel python whatever and you could do this by extracting this data out by Saving results and you can save the results as a CSV so I'm saving it inside my desktop and exporting it out anyway this portion is not required but I went ahead and gave chat gbt this CSV and told it these are the top 10 diis roles I found in job postings in 2023 can you analyze the skill column and display the insights and after it used some python code to go in and actually analyze it it found that that sqls leading an eight of the different roles python is in seven and Tableau is in six even went a step further and had it make this visualization which shows a lot better visually what is going on here with all this data that we just collected right I'm going to go ahead and copy this Insight right here on the skill breakdown for this and actually go in and paste it down here underneath our query that we did additionally if we want to use these results later I'm going to go ahead also and click this icon here to open the results as a Json file I'm must select all of this by pressing command a and then coming underneath here making sure I'm inside that comment right there actually pasting all these different results so if we wanted to we can go back back later and verify what results I had from this query anyway just to be clear this chat gbt portion completely optional I just wanted to dive into it further and pasting the results completely optional the main portion is we wanted to make sure that we have this query right here as this was solving what we were trying to find all right now it's your turn to give it a try I'm curious to see from your perspective especially if you're using something different from data analy or a remote location how those skills are different well with that we'll see you in the next one all right in this portion we're going to be diving into what are the most in demand skills particularly for me and be looking at data analyst now if we look back to what question are we doing for this we've already answered the first two already we're now on to the third and this one actually we answered previously and well you can actually go ahead and use those results previously or I'm going to actually work at a different method specifically in problem number seven we found the count of the number of remote job postings per skill and whenever you run this query command e command e we found that for data analyst and also for work for home these were the top skills with surprise surprise SQL and also python topping this list anyway if you have these query results saved you can go ahead and probably just copy and paste this then move on to this next section of query 4 but I'm going to go ahead now and work this problem slightly different is this one right here how we have it written is a little bit longer than I'd like it for a query to be I want it to be short and simple as possible because we get into bigger data sets it's going to take more time to run these type so let's try to rewrite this so I'm going to start by creating a new file I'm going to name it top demanded skills and. SQL all right so for this one we're going to be focusing on what are the most in demand skills and to do this we only have a few steps for this we're going to be joining our job posting tables to our skill tables like we previously did so we're going to reuse some code from that one and then from there we're going to limit it down to those top five skills and for this one I want to focus on all job postings not just those remote ones really just see if there's any difference anyway I'm going to go ahead and open up that last one that we did previously and I'm going to go ahead and copy this on over and then mainly pasting in that inner join this is a lot to type in so I'm going to I'm going to just streamline this a little bit so we'll do a simple select star just to start out from this and we'll do from the job postings fact table and we're going to be doing once again that injin to that skills table to the skills job dim table and then finally to the skills dim table let's just go ahead and run this and see if this actually works and already made my first there as I didn't rename the table right here I still had top paying jobs uh for that result set anyway let's actually try to run this again hopefully it works all right this is running for a little bit too long of a time and I understand that because we're combining all these different tables together we're going to be limiting anyway these uh to find the top five jobs anyway so I'm going to go ahead and just throw this limit five statement here to speed up this query I was waiting about a minute and it wasn't uh loading right lot quicker whenever I do that limit five in there don't do what I just did there and it looks like it's combined all these different tables I can scroll the way to end and see that the skills got combined into this so remember what we're trying to do we're trying to find what are the most in demand skills basically we're going to find a aggregation of the sum of skills so we're going to need to provide a count in order to find out how many SQL entries are how many python Excel and so on so first thing I'm going to specify is that skills column because I want to return that and then we're going to do that aggregation method we're going to be doing a count for this let's use the job ID for this and I'll just put in skillsjob dm. jobid and we'll put this as the demand count now remember anytime we do some sort of aggregation we have to do a group buy so for this we'll do the group bu and we obviously want to group bu the skills let's go ahead and run this to see where we're at right this make sure it's running properly okay this is looks like it's tabulating right now right now we're not having any type of sorting so that's what we're going to need to fix next but we can at least see that we're getting those demand counts of those different skills looks like they're actually doing this in alphabetical order so let's go ahead and put that order by in there now so order by and we're going to order it by the demand count run this again command e command e and obviously we got to do it descending run this again all right so now we're getting all the skills right now and it looks like sqls topping the list at around 3 80,000 python very close by now remember if we go back to our original question we want to find out what are the most in demand skills for data analyst so we need to modify this and insert in a wear statement to filter by this so let's use that wear keyword to filter this by we're going to be using that job title short column and then specifying that we want to filter it by data analyst so selecting it all and running it command e command e boom now we got got it and it looks like we have SQL Excel python Tableau and powerbi Topping the list now I am curious to see how it compares to if it was only remote jobs so I'm going to add I'm take this a little bit step further just to compare it I'm going to add the and statement for the wear specify job work from home as true running this command e command e we can see when we compare these we still have so on the left hand side we have those that table that is not uh that is that allows any type of work whereas the one on the right is only allowing remote work or work from home and the general trend is the same with SQL Excel python Tableau powerbi and it looks like it's very much in the same proportion so it's not a lot of difference between the two all right now it's your turn to give it a try feel free to once again modify this for your specific role but also you don't have to stick to this thing where I just did remote you can do your specific location or really anything particular to you that you want to filter down further on with that see you in the next one all right so let's get into the second last skill of analyzing what are the top skills based on salary now this is all in progression to learning what is the most optimal skill if you recall earlier we've already analyzed to find what were the most highest paying jobs with one at around 650,000 for all those high paying jobs that we found with then then dived into the skills themselves and we found popular tools like SQL and python are common in a lot of these rules and so we dived in further to look at how popular they were not just among the top paying jobs but also all jobs and there was still a similar trend of that SQL and Python and even things like Excel Tableau and powerbi for data analyst so our ultimate goal which will be in the next query of what is the most optimal skill looking at what is not only high paying but also highly in demand we need to actually answer that question of what is a high-paying skill so for this query we're going to be looking at what is the average salary associated with a skill specifically fine-tuning this for me for data analyst positions and we want to make sure this data actually does include salary data we're going to exclude any null values and then finally we're going to do this initially regardless of any location but then I'm also going to modify it to me for my remote case where I want to look for remote jobs and what would I expect it to be so back in vs code we're going to start with a new blank SQL document that we're working from and have the problem outlined up at the top so where are we going to start with this one so this one's very similar to the last query in fact because we need the names of the skills from that skills dim table and also we need the salary data from the job postings fact table and previously we did a count of values of this column but now we just need to do an aggregation method basically doing the average of these average salaries so navigating back to our third query that we did I'm actually go ahead and copy all of this as a lot of this is going to be reusable and paste that in right here so if you remember from this one we were doing the count of the skills I'm going to go ahead and actually just execute it to show this was doing a count of the skills basically we want a very similar thing we want the average salary right here so for the time being I'm going to remove this and we have the from job postings fact our inner joins where data analyst is true remember we're going to maintain this to look regardless location so I'm just going to put a comment in here right now and move this and right before this so we'll just move it out of the way for right now but we can put it back whenever we want to analyze it for OT jobs from there we have group by skills and then an order bu we'll have to actually update this as well um I'm going to actually change this limit to a little bit heftier value of just 25 so like I said the lot of this core query is already good enough and you could feel free to start over again with it but I really don't like to repeat myself so the first thing I'm going to do with modifying this query is previously we had the count up here I'm going to add in an average function to actually average that salary and I'll specify the column of salary year average now because we have an aggregation as always we need to group by the appropriate thing in this case we do want to group by the skills so we'll keep keep this as is and remember from our core things we want to focus on roles with specified salaries so I'm going to add an and keyword right here and then for this I'm going to remove any null values that are in that salary column and also help speed up this query a lot faster so I specify salary year average is not null and the last thing to do is we have this order buy here that needs to be filled in so I'm going to go ahead and just put this well we need to assign an alias for this so I'm going to do an as and name this average salary pretty original and then whenever we go down here to the order bu we'll order it by that average salary okay let's actually run this and see if it works fingers crossed oh and it works okay and okay like usual I messed up and uh the order by I did it in ascending CU I did the default but we at least have all the values here okay and it looks like it's working correctly also I'm not noticing these decimal places I don't really is a little bit too much data for me so first thing to fix the order bu I'm going to change this to descending run this again and then let's clean up this salary so there's a function we didn't talk about yet and that is the round function and we can put that outside that average function and with round we need to actually specify not only the column or the aggregation of interest but then ALS so after this we have to use a comma and we have to specify the amount of digits we want to round to I don't really care about any of these digits um so I'm just going to make this zero you could make it two for like two c uh for um down to the 100th place anyway let's go ahead and run this make sure this works okay we have all our values now all right so this is the top 25 and looking through it unfortunately Python and SQL aren't topping the list but it looks like it's more focused specifically for data analyst on web development tools so like things like terraform um and then more Niche tools like data robot so this does make sense anyway just out of curiosity I want to see how this Compares I'm going to close out these other windows I want to see how this Compares for remote work and I'm going to remove this comment right here and then run this section again command neat and looking at this for remote jobs there's a lot more familiar names that I'm seeing on here such as Jupiter that's like a notebook where you can run Python and probably the most surprising of all or coolest of all is postgress at least made the top 25 anyway similar to what we did in query 2 because this is a lot of text out of here a lot of analysis of what is actually going on or what all these different tools are used for I actually went up here and select this floppy disc right here and had copy results as Json to clipboard and I've been having an ongoing conversation with chbt about a lot of the results from this and I just said hey here the top paying skills for D analyst the top 25 can you provide some quick insights into some Trends into these top paying jobs and then providing all this Json value anyway it provided three insights into what are these top paying roles consisting of for data analyst specifically big data and ml skills are a high priority next up comes software development and deployment and then and finally cloud computing so a lot of specialized skills that require not only SQL but also python anyway I went ahead and copi and pasted this into underneath this query to basically break down what we found out in this query all right next one we're going to be diving into what is the most optimal skill so you haven't done this one already do it for you make sure you're adapting it to your need and then we'll be diving into the final query which I'm super excited to get into all right see you in the next one all right finally moving into what is the most optimal skill I should be focusing on as a data analyst or remote jobs now if you recall from our third query we found out what is the demand for certain skills granted we only limited to Five results shown but basically we can get all the different results or the demands for the skills in the last query we analyzed the skills from a different perspective and found what was the average salary for each of those skills skills for a data analyst so honestly the easiest solution to basically build on the code that we've already built is to use a CTE basically a CTE for query 4 a CTE for query 3 and then combine these two results together on something like the skill ID so inside vs code I'm going to create a new file called optimal skills give it that num clure five is the fifth query and up at the top of the file I put in what we're actually trying to solve for what are the most optimal skills to learn and we're going to be focusing on high demand which we've already calculated and then High salaries which is already calculated as well in query 4 so the first thing I'm going to do is going to go back and get the query from query 3 I'm just going to copy that all right here and then paste that in remember we're going to want to put this inside of a CTE we're going to be putting both three and four inside of a CTE so I'm going to title this with skills demand as and then start that CTE right there and then from there tab this over remove that semicolon and close the parenthesis okay so I have the skills demand coming in its own result set using the CTE the next thing I want to bring in is the number four query that we have so I'm going to copy this one also and paste it over here I'll name this one average salary for the temporary result set and then start a parenthesis here tab this over remove that parenthesis and and then from there add in another closing parenthesis right here okay so we have our two CTE located right here so I'm going go ahead and close out of this and close the side window we have what we need for this we now need to move into actually combining these and if you look at it we could technically combine it on the skills because each of those skills are unique if you will but that's not really best practice we need to typically whenever you want to combine anything you want to combine it on the actual um the key itself the either primary or foreign key so we're going to be using the skills ID for this so I'm going to specify skill ID here and then also skill ID here as well now since we're going to be combining these I don't want to limit this to five and I don't want to limit this one to 25 I want to combine all of them that way they do the itation also I want to speed up this query I don't care about an order buy so I'm going to remove that in this case and similarly I'm going to remove that as well here and then the last thing I'm going to modify is the group bu so like I said previously we're going to be connecting these two tables on the skill ID we can do group by skills it's just not breast practice so we're going to change that to the group ey of skill ID same thing for the group ey down here on skill ID so that way we know for sure whenever we do this that we're actually aggregating it correctly the last minor thing to note is for my things we're going to concentrate on remote positions with specified salaries so previously for our query 3 we were looking at all the jobs and not necessarily those that have or are missing a salary value so I'm going to select this from query 4 selecting Sal salary year average is not null and I'm going to also insert it up here as well anyway now we have both of these CTE built we need to actually move into combining them so we're going to move into our select statement we're going to first identify the skill ID we have this in both so we're going to specify the first one of skill demand. skill ID next we need to identify the skills so we'll do similarly with this specifying the skill then we need to bring in from query 3 that demand count along with that average salary from query 4 so now when you move into combining these we'll use a from statement specif if Ying the skills demand table or temporary result set and then doing an inner join because we only care about what exists in both of these tables and we're doing this with the average salary temporary result set and both of these are combined using the skills ID so this has the bulk of what we need already let's go ahead and run this it's already been too long already that I could have made a mistake and not caught it so command D command D and I did right into an error and it resolves around this with statement right here so anytime we're doing multiple CTE you actually group these together so this with keyword transfers this into a temporary result set of the skills demand and then you put a comma and then similarly we do this average salary so we're do two CTS right here so going ahead and running this command e command e and when I added in those skill IDs it was ambiguous sometimes you'll notice in if you bought the SC notes and certificates I'll just include all the different table names in front of these and that's just so you resolve this ambiguity and don't run into these different errors like I'm running into right now so let's try this one more time to see actually if this works pressing command D command D and spoke too soon again I didn't get all the skill IDs again so we got another error and saying skills must appear in the group by Clause so I'm gonna go ahead actually we already have skills up here with our count and that one should be fine so I'm going to remove it here let's see if this works all right so I've been troubleshooting this skills must appear in the group eye and I know it can be done and anyway I come to realize that compared to our I got to close out this so we can actually see it so in our first CT We Have Skills ID and also skills we want that skills to be in there right because we're going to be using this and actually displaying this in our final table and I don't want to do another left joint down the road anyway I was specifying the wrong table earlier it should be skills job dim and in this case I specified now skills di to clear up that ambiguity so now we should actually get this to work command e command D okay and finally hopefully the last error I'm now getting this relation average does not exist I'm like I know I can do average in here I got a dang typo right here on the in join and I specify average space salary so wasn't necessarily working oh my gosh this is killing me right now one another typo I don't have skills properly in there all right so we're finally through that after quite a bit of troubleshooting but honestly that's what I find myself doing from time to time with SLE queries especially when combining it sometimes it looks like it's easy just to combine two queries and put it into a final one and you run up to a lot of hiccups along the way so I wanted to include that to show that sometimes you're going to have to go through these troubleshooting steps with this anyway we have our results back I'm going to move this over to this window and we can see from it we have our skills our demand count and then the average salary associated with it right now it's not really in a particular order so we need to clean this up it looks like it's ordered by skill ID so I'm going go down to the bottom of the query after the injin and put in an order by and really I care about demand first so we're going to put in demand count put in this descending order and also you can put more than one things to order by so in case ever they have the same value we'll go to the second value to thus order that one so in this case we're going to do the average salary and then also do this in descending order and finally we're going to limit this similar to last time of just 25 values so I'm going to go ahead and run this query now to get this all right so here we have it moving on over here we have our results and right now it's ordered by that demand count but then we can still see based on what we know about some of the higher salaries for skills upwards of 150 to 200,000 we're seeing these high demand skills are around 100,000 so I also want to see this this table ordered from the average salary perspective first so I'm going move this around with average salary up before the demand count and then run this aggregation method again command D command e and this time as we can see whenever we do it from this perspective we are getting these higher salaries right here but this demand count is so low for these higher salary jobs based on what I'm seeing from that previous table I'm wondering if we could do a wear Clause to limit it by the demand count to make sure that we still have these high-paying jobs in there and ordering it similarly but have a demand count greater than 10 so I'm going to add in this of where and then specify the demand count greater than 10 okay I'm going to go ahead and run this new quer to see what it looks like for this one all right and I'm liking this one a lot better it's a lot more represen istic of what we would expect because there's actually values quite a bit of job data values for each of these so just analyzing this roughly doing a scan through it I can see that once again we're seeing that one Cloud tools and specifically cloud-based databases are some of the highest in here and then down here near the middle of this list of only 25 we're seeing uh programming languages like Python and R now although this was a great way to demonstrate the use of CTE I would recommend actually making this more concise and so that's what I did here I went through and actually rewrote this entire query keeping it uh pretty concise and all it basically did was actually combine in all those select statements into one that were previously spread out among those two CTE and then I still had the same wear group eyes and ordering the only thing I'll add is that little last cave at the end where I wanted to have the demand count greater than 10 which is just an arbitrary number you can't put an aggregation method inside of a w so I had to do a having keyword here to actually do that anyway running this all I can do command e command e and comparing it to our previous table there's our previous tier table on the left and then our new table it's the exact same results anyway mainly just show you this that there's multiple ways to actually go through and analyze and what can be done and what can't be done all right next thing we're doing is going to be actually packaging up all of our project that we have right here generating a readme and then actually going out and sharing this project on How I would go about doing this all right with that I'll see you in the next one all right so now it's time to actually showcase all the different work we did and doing this by uploading this into GitHub but before we can do that we actually need to streamline and put all of our analysis into one document so inside a vs code right now I have a few different folders you may not have this Advanced SQL that is the the problems that we worked during the advanced SQL section but you should at least have this project SQL folder which has all your different SQL files that we worked on previously along with that SQL load table on how we actually loaded the data into the database so let's actually upload this all on GitHub and see how it looks right now and going into on the activity bar on Source control see that it has all of our new files I renamed one of my files why I scratched out here and I give it this commit message of upload SQL files and then press commit and then I want to sync changes navigating into GitHub I can see my projects located right here at SQL project data job analysis and this right here is what everybody's going to be seeing when they first get to the project and overall there's not a lot here as you can see that read me that created there's nothing really detail it and you have to navigate into all these different folders to even see what is going on here like how do you even navigate it so here's an example I want to work off of I have a course on Chachi GPT for data analytics and then you go through and use chat GPT to generate code and basically generate a project anyway in this project here I have for the readme it details all the different analysis we do for that project there going through describing it all and describing all the different code that's actually located inside the repository so that's what we're going to be doing is building out a read me in order to handle this so for this readme file I want it structured in a logical manner to actually go through and so somebody can read it and actually follow along in all the analysis we did so we're going to start with simple things like an introduction a background and then tools I used from there we're going to move into the analysis the analysis will be the bulk of the session of actually capturing all five of those different queries that we analyzed and then finally we're going to wrap it up with what you learned or what I learned during the analysis and then any final conclusions that we came to or Drew as far as insights so back inside of VSS code let's actually get to work I have the readme here open and I'm going to drop in all those different sections that we're going to be working on for this now conveniently inside of here this is this is a text version right here so you can insert all this kind of text right there but we're going to want to open on the right hand side this open preview and this allows us to see what it's going to look like in the markdown version or the final version whenever we're actually inside of GIB Hub specifically I navigate back to that chat gbt one and I navigate here I can see like hey it's all formatted stuff but if I actually go inside of the readme itself and look at the code basically it's just a bunch of text and so there's certain format that you need to do in order to get things like headers here or links or whatnot let's go over a few brief ones that you should know about as we're building this out so the first thing is headers I actually all of these need to be headers and what do I mean by that I'm put a little pound pound sign right here in a space whenever you use that that is a header one I can also do two pound signs to make a header two or header three whatnot for each of these I'm going to make them individual header ones all right so I have all the different headers now I'm going to be going through and actually filling this out let's start with that introduction section first so I make this intro of dive into the data job market focusing on data analyst roles this project explores top paying jobs and demand skills and where high demand meets High salary in data analytics and then I give it this of seal queries check them out more here so I need to insert a link so for this I want to provide a basically a clickable link so I know that it's going to be going to this Advanced sorry it's going to be going to this project SQL folder when they navigate to it I want them to see this so I'm going to start with some brackets and then from there actually Define the folder that we're going to and that's project SQL folder and that's just what's going to be actual viewable then from there I'm going to put a paren syes next to it as you can notice right here let me close this out right here as you can notice it's now getting clickable but it's not going anywhere so we want to put some sort of Link inside of it so I press forward slash and then inside of here conveniently all the different folders pop up that I want to link to link to I put project SQL and now it should be good so whenever I click this it should want to navigate me over as you're seeing here but whenever I have this in GitHub it will actually navigate them over to that folder next up is background and I'm not going to bore you by reading through all this but basically I went through summarize what the background for the course was also put a link to the course and then from there define the five questions that I wanted to be diving into that we did in our final project analysis next up is tools I used and I wanted to summarize four or five main tools of SQL postgress vs code and then also git and GitHub and I put a little dis claimer across each I'm noticing right now so you can use something like a Tac or a dash to get it in bullet point within here but I'm still having a hard time actually reading what are these different tools I have I can actually bold these different values by putting double Asis before and after any value it then goes ahead and bold cases it all right next up is the analysis we're going to start first with just putting a blanket statement of what we're actually doing here that we're aiming at investigating specific aspect spec so this going to be broken up into multiple sections so the first one I used a header three and then specified hey it's number one one that we did of top paying data l jobs by the way we're not going to go through all five we're going to do one of these from there I'm going to put a quick summary in and so for this one it was just hey to identify the highest paying roles and then what I filtered it by and everything like that now I want to showcase my code from this query so that they don't necessarily have to go back to this file right here and actually see it instead I'm going to copy this code right here I'm going to close this panel we're getting a little smoed and I'm going to put it into here the problem is if you look at it it looks like a hot mess over here so what we can actually do is format this as a code block so for this I'm going to put three back ticks one at the beginning and then one at the end and so if you look inside of here now we can see that this is all formatted correctly and it actually looks like SQL code additionally it's good practice inside of here to specify what language this is and so inside of this markdown file right here it actually if you notice that it formatted it for what it is here and similarly it's doing that here as well with this color highlighting so making it a lot easier to read so definitely recommend doing that following this clo block I wanted to top this all off with the findings so I put in here hey here's the breakdown of the top. unlist jobs in 2023 and then had three different insights of the results that we had found earlier now if you recall from earlier during that second query when we dived into it I had chat gbt visualized the data that was provided well I also had chat gbt visualize the results from this as well to showcase the different salaries associated with the top 10 jobs anyway I want to show some of these visualizations inside of here to better convey some of the analysis that we did here so anytime we need to include these type of images inside of marktown we actually have to include the images inside of the project file itself so I'm going to create a new folder called Assets Now this option of including an image inside of here is completely optional whether you want to do it or not but it shows even more functionality of how to use markdown anyway I can take that folder or take that file itself and then go ahead and drop it into here and so we can see it's right here in top hang roles I'm going to go ahead and then go and select copy relative path now in order to display an image in markdown you need to use this nomenclature here where you're going to have an exclamation point and then brackets around some sort of alternate text for this I'm going to name this top paying roles next we need to do that URL so I copied that relative path I'm going to go ahead and paste it in you notice whenever I do that now it's accessing it right here from our folder so then I just end this off with a little comment at the end in italics that hey this is the bar graph visualizing these results and also that chat gbt generated this graph for my SQL query results all right so now I'm going to repeat this for all five of those other areas as well all right so I've gone through and put all this in this took me about like 30 minutes to go through and do right so here's what I did very similar to that first section as I continued the same layout of just outlining it and then providing an a quick snapshot of what the breakdown is and then if I had a graph I CLI included it now when I got down here into queries 3 4 and 5 I found visualizations were less uh helpful for this so I started inserting tables did that for both four and then also for five proving that final analysis now tables themselves are pretty easy to put into markdown you can put it in with this format right here I found that actually just copying and pasting the results from here in VSS code into something like Chach BT and having to get format for me saves a lot of time so that's what I did for a lot of this so last two sections that I'm going to include the one is what I learned this really going to be depending on your situation where you are in your Learning Journey I just put in these three examples of hey we worked on querying data aggregation and then also just getting into actually analyzing so I named it analytically wizardy all right the last section to move into is the conclusions and I'm going to break this into two separate sections first is insights and I'm going to just capture all the different insights that we had up here from before and capture it into just five main different value points that I found from this analysis and then finally I'm going to have some closing thoughts for this section I'd really be looking at what you actually took away holistically from this project for me just working on this alone really built up my seel skills although I'm already you know pretty confident in those skills I feel actually going through and teaching you through this I learned a lot myself so I put a lot of that in the closing thoughts here anyway so this read me has everything we need inside of it I'm going to go ahead and save it and then I'm going to upload it into GitHub for this is going to be committing those two images that I had in the readme along with the changes to the readme so now when I actually go to this project here I actually have a full-blown layout of all the different work we did and all different analysis here I'm pretty blown away this is a lot of work we put into this you should be super proud of this the other thing I'd suggest is actually pinning this repository to your profile so you can come up here to customize your pins and I'm going to come and select this SQL project data analysis and save those pins and then from here if I wanted to I can drag it around and even put it up at the top all right now that we built this entire file and this readme we need to go forward with actually showcasing it so that's what we'll be doing in the next section of actually getting this onto something like your LinkedIn to Showcase as experience all right with that I'll see you in the next one that nerds congratulations on making it to the end of the course been nothing short of your hard work and there's a couple steps I'd recommend taking further now and actually showcase your work so it's been great that you put this work on GitHub but now you need to get the word out and I recommend doing this via link in there's three main things you can do for this one add the certificate of completion for those that purchased course notes and certificates which it's not too late to do that you can add this to your profile another thing you can do is with your GitHub project now on the internet you can also showcase this project on your LinkedIn profile and the last and third thing that we'll cover is a social media post so after you complete the end of course survey I will send you an email via an Automation and you'll receive the certificate of completion with this email you can go ahead and then download that certificate to your computer after that you should navigate over to LinkedIn specifically to your profile and in it make sure that you have your certificate section and also your project section enabled within your profile by going underneath the recommended and adding these two for the certificate you just navigate to that section click the plus icon and then from there fill it out most of the information here is self-explanatory for the issuing organization you'll be able to put in my name of Luke barus for the skills I would list these five core skills of SQL postgress and SQL light now if you did the GitHub portion of this you can also put in get and GitHub finally it has an option to add media and you can go through and actually upload that certificate that you had from previously from there click apply this obviously in my certificate so I'm not going to save it now for sharing your project you're going to navigate down to the project section and then click a similarly that ad icon now this one's also pretty self-explanatory you're going to put in that project name a short little description I have about what I did and what I investigated and what I found similarly I put in the same skills that I had in the certificate section and if you obviously you did the project and you list a GitHub you can list get in GitHub when it gets to add media that's where you want to actually go in and add the link to your GitHub repository and this way it can be actually directed them right to it from there put that start and end date everything else can pretty much be left blank and for this one I'm going to go ahead since I actually did do this project I'm going to go ahead and save it so now that we've both showcased our certificate and also this project the last thing we have to do is just make a social media post it'll prompt you either after the certificate or even this project to maybe start a post now for this post feel free to tag both me and also Kelly the co-creator of this course in your post I love seeing everybody's progress and specifically diving in and actually checking out your different projects so I'm looking forward to seeing that all right once again congratulations on wrapping up this pretty hefty course on SQL I feel like it captures everything that you need to know to dive in and be confident in your skills no matter what the field you're working on in data science is for what to learn next you probably learn from the data that python is another popular skill and I'll have a tutorial for that coming real soon so stay tuned but until then I find that chat gbt speeds up my process a lot in data analytics so if you're curious about that check out this video right here also for those that didn't buy the course notes certificate still not too late check out this link right here with that I'll see you in the next one