Transcript for:
SQL Basics for Healthcare Data Analysts

SQL is the most important technical skill that you need to know as a data analyst and there's lots of great guides out there that cover the basics of SQL for free but very few of them actually use Healthcare data as an example so today what I'm going to do is I'm going to teach you what you need to know about SQL and I'm going to give you some simulated patient data to work with so that you can start building a portfolio that will impress recruiters and future employers hey everyone I'm excited to be doing this tutorial with you today my name is Josh Matlock I've worked in the healthcare industry since 2015 as a data analyst at several hospitals on the west coast including the Providence Health System and as of this recording Seattle Children's Hospital I'm going to tell you a little bit about what I'm going to be covering in this tutorial today first I'll start by giving you a short intro on why SQL is so important in data analytics and why it's used so often then I'll show you how to install the SQL tool that you'll be using and that's postgres postgres is completely free and easy to use so that's what we're going to be using and then I'll talk about our Healthcare data source that we'll be using which is called Cynthia next I'll show you how to load that data into a database and then finally the most important part I'll show you how to get started using SQL with some basic commands and I'll show you what data in a hospital looks like where I'll give you some example problems to work through let's get started now as a data analyst when I am not in meetings working with my team or my business partners I spend like 70 percent of my time in SQL SQL is basically a programming language that allows you to pull data out of databases prepare that data in whatever way you need using one or many queries and then you can use the final query to either report useful information to your stakeholders or what I like to do is plug that data into a data Vis software like Tableau or power bi and create graphs and dashboards basically telling a story with that data and solving business problems more visually if you've never used SQL before let me explain how it works SQL is almost like a much bigger and structured version of Microsoft Excel or a similar spreadsheet program many people use Excel every day to store data in tabular formats they might track their finances their survey data or in our case patient data there might be multiple different types of data in which case you might create different tabs for them like a tab for patient names demographics a tab for their allergies and the vaccinations that they received but there's a limit to how much data Excel can store in fact Excel can only store around 1 million rows per spreadsheet and if you work at a hospital and you're looking at all the medications that were ever given or all the lab values that were ever drawn against the hundreds of thousands of patients or even millions in your database you're going to have way more data than that and what if you have multiple data analysts wanting to work off of the same Excel spreadsheet at the same time for different reports or what if you wanted to update the Excel spreadsheet with the newest patients that just trickled into your hospital it would be a huge pain to have to manually update the Excel spreadsheet in the hundreds of tabs that you would have to maintain so this is where Excel really starts to struggle and where SQL really starts to shine SQL stores massive amounts of data in various databases it allows multiple people to query against the database and there's a lot of flexibility for data Engineers to update databases with new data on an automated basis it's also a lot easier for data analysts to pull that data and combine one table of data with another table if you want to work as a data analyst in healthcare you will be expected to know the basics of pulling data out of databases and doing some basic data prep so therefore you need to know SQL and that's what this tutorial is all about today so let's Dive Right In All right so before we can actually start playing with the data first we need to install postgres I'm going to show you how to do this two ways first on Windows then on a Mac operating system so to do this on Windows it's super easy we're just going to go to Chrome I'm going to type in www.postgressql.org I'm going to go to download I'm going to navigate to Windows and then I'm going to pick the latest version of this and I'm going to click on this hyperlink here that says download the installer and I'll just click on this download button here and that's just automatically going to start the download down below so I'll wait for that to download fully I'm going to open that file and then we got our setup menu so I'll just click next here I'll click next I don't need the stack Builder so I'm just going to deselect that you're going to click next next now we're going to set up a password so I'll just type my password Here click next I'm going to use port 5432 and I'll click next here next next so we'll wait for that to install all right it looks like it's done installing so I'm going to click finish all right so now what I'm going to do is I'm going to bring up postgres by typing PG admin for that's going to bring up our interface where we're going to query against our databases we're going to plug data into our databases we're going to do all of our work in postgres and this is the graphical interface that's going to allow us to do all of that so I'm going to click on servers I'm going to type in my password that I created and at this point I'm going to switch over to the Mac installation all right so now I'm going to show you how to do this on a Mac so what I'm going to do is I'm going to go to Safari and I'm going to type in postgresql.org that's going to take me to this website here so I will click download and that's going to take me to this screen I'll click on Mac OS I'm going to scroll down to postgres dot app and these instructions are pretty straightforward what I'm going to do is I'm going to click on the download tab up here and I'll click on the latest release and I will wait for that to download Once that's done downloading what I'm going to do is I'm going to click on downloads and I'm going to click on that file that we just downloaded I'm going to drag postgres over to Applications let's try that again oh there we go all right so once that's done we're going to go to finder I'm going to scroll down to Applications I'm going to click on my postgres application I'll click open and then we're going to click this initialize button alright so once that's done I'm going to go back to the introduction and see what the remaining steps tell me so it tells me now to go to the command line so to get to the command line I'll go back to the finder here scroll down to Applications I'll go to utilities and I'll bring up the terminal so we can close out of this and I'm just going to copy all this stuff that I see here so I copy that I'm going to right click and just paste hit enter you need to type in my password Here so I gotta type it again all right so then I'm going to close out of that and now what I'm going to do is I have installed postgres but I still need to install a graphical interface that will allow me to interact with the databases and all the stuff that we're going to create in postgres so to do that I'm going to scroll down until we see this picture of this elephant here that's PG admin4 I click on that it takes me to this site then I click download and I'm going to click on this Mac OS and this most recent version here and then I'm going to download this file this dot DMG file so click on that that's going to download so that'll take a few minutes and once that's downloaded we'll go back to downloads I'll click on that we'll agree and then we're just going to do the same thing again where we drag this over to applications now once that's done I'll go back to my finder scroll down to applications and I'll look for that PG admin that we just installed here it is so double click we'll click open again so let's right click on this and let's register the server I'll just call this PG I'll call this PG 15. and then I'll call this localhost and then I'll save that and then that should create our databases and we'll have this initial one here postgres as well as a a database that has my username here so at this point I showed you how to do this on both the PC and the mac and we're now at the point where both steps will be identical on either operating system so I'm going to switch back to the PC and I'm going to do the rest of the tutorial from the PC all right so we're back to the PC and we've installed our postgres SQL and the next thing we need to do is we need to load data into postgres so to do that I'm going to go back to Google Chrome I'm going to type GitHub first so I'm going to go to GitHub and then I'm going to search for data Dash Wizardry Wizardry and I'm going to look for users here and so here's what my profile looks like Josh Matlock data Dash Wizardry so I click on that we go up to repositories and then you're going to select the SQL webinar one and then there should be this link that says create new tables that's going to give us some SQL code to start with that's going to create some tables that are going to contain some of this fake patient data that we're going to build so I'm just going to click this copy button here it's copyright contents I'm going to go back to postgres now before I load or before I create these tables one thing I should tell you is how databases schemas and tables work so imagine that you have a filing cabinet and that filing cabinet is going to have a drawer okay so the drawer is going to represent your databases so let's say we have a drawer for this database here postgres so that that drawer is going to have some various things including schemas the schema is going to be like your various folders you can have multiple schemas within a database and that folder or that schema is going to have various files so our public folder or schema for example is going to have various files like Aggregates collations domains FTS configurations some of these I never even use I mostly use views tables procedures and functions okay so these are the files that I use the most functions procedures tables views I won't go into all of those things today I'm just going to talk about tables so you can think of tables as various files within your folder which is your schema and then those folders exist within a drawer of your filing cabinet which is your database so just one analogy to kind of help solidify things so what we're going to do is I'm going to go up to my public schema I'm going to right click and I'm going to say query tool and then I'm just going to paste in the contents that I copied from GitHub then I'm just going to click this run button or I could just click the F5 button but once I do that I go to tables and you should well actually I'm going to right click on tables first click refresh and then click this down arrow and you can see I now have four tables they don't contain any data yet though we only created these empty tables that are going to accommodate data very shortly here so now that we've done that I'm just going to hit Ctrl a to select everything I'm going to delete so now that our tables are created what we need to do now is we need to fill those empty tables with data to get to that data I have a website where I've hosted those files and so those are finished downloading what I'm going to do is I'm going to right click on this table that I just created and if you don't see these it might not be fully expanded it might be collapsed so you just click on that Arrow again and this should show up again here or you might have to refresh and then expand out but once you do that you should see these four tables here so next I'm going to right click on conditions and I'm going to import slash export the data I'm going to look for the thing that I just downloaded and that's going to be I'm going to have to click all files here and then we've got conditions here now what you need to do here this is really important to get this to work correctly so you need to make sure that this is in a text format okay so it needs to be in a text format then you need to go to you may you need to make sure that this tab import is selected here I don't select anything for encoding here for options make sure that it looks like this and that oid looks like that make sure that delimiter is just a comma here and that should be the default option and then make sure that this thing here null strings is backward slash n just like that otherwise it won't work okay so we're going to select ok and then that should import you should see a green box here indicating that that successfully loaded so if I were to say select star from uh from public dot conditions this is my schema and this is my table name of the data that we just loaded data into and select star I mean select every single column from that table we do that and we should have data let me just close out of those expand this and there we go here are our conditions next we're going to go to encounters all right so let's right click on this like we did before we're going to go to import export data so import we're going to select the encounters file that we just downloaded so encounters and again making sure that every step is the same as before text import button selected here comma delimiter these options should look like this and then there should be a backward slash n just like that click ok it might take a moment for that data to load but there you go the data is now loaded I'm gonna do this again right select everything just to make sure that it's all working I click this button to run that there's all of our encounters alright so next I'm going to do the same thing import we're going to go to immunizations and we're going to make sure we got that backwards slash and that's loaded and then lastly right click on patience import and we'll go to all files patience all right and there we go by the way if you're wondering where this data comes from don't worry this is not real patient data we're not breaking any hipaas here this is all synthetically generated data and it comes from a program called Cynthia Cynthia is an open source educational tool where you can download fake patient data and it emulates a patient database a electronic medical records system so it has a lot more data than what we see here it doesn't just have conditions and counters immunizations patients but it also has things like allergies and care plans and the surgeries that the patient receive the lab values it shows you the medications that were given and the developers did a really good job of modeling all this data to represent a real world scenario or a real hospital so things like diabetes for example you might actually see the same percentage of people suffering from diabetes cities that exist within the community the people that developed Cynthia actually modeled all of this stuff against Real World patterns that they observe for people suffering from various diseases or how long they tend to stay in the hospital so there's a lot of statistical rigor that was put into the development of Cynthia so it should be pretty real looking and now there's still a lot of things that need to be worked on in Cynthia to make the data look even more real but it's a really good starting point if you want to sink your teeth into a data project that looks like a hospital database another problem with Cynthia is that it uses a coding Paradigm called snomed snomed has all these codes that determine what did the patient suffer from while they were in the hospital what procedures did they have what was the reason for their encounter or their visit to the hospital hospital and while snomed is free in a lot of different countries there are some countries where a user of snomed might be expected to pay some sort of fee so if you happen to live in one of those countries you might not legally be able to actually use Cynthia and display that on a dashboard on Tableau public so to err on the site of caution what I did is I stripped out all of the snomed codes and replaced them with things that you can use like for example ICD-9 codes which are public domain it's long story short my version of Cynthia is a little bit different in order to avoid any possible legal issues of you displaying any of this data that you might develop in a dashboard like Tableau and Publishing that to Tableau public all right so let's continue on so we've loaded all of our data into these tables and I've explained a little bit about how Cynthia works now let's actually get started on some select statements so SQL is basically just a language that allows you to pull data and alter its contents using some very straightforward language and select is one of the first things you type when you're writing a SQL statement usually select is going to select various columns within the table that you're using so for example we see all these things here we've got ID birthday death date social security number drivers passport prefix blah blah blah blah if we wanted to select all of those columns we would use select star from our table name and that's patience I like to preface this with the schema and that's the public schema so I type public dot patients that should bring up all of our columns here but let's say I only needed the patient's name in their birth date so I I see that they have the first name the last name the maiden name I'm just going to select there first last and their birth dates okay birth date so you do that you run that and you get the three columns that we talked about first name last name and birth date so here's all the patients and their birth dates that we have in the database and to write that out you just separate each column with a comma with the exception of the final column you do not include a comma here by the way you can also have these select and from statements all in lower case if you wanted to I actually prefer to write my statements in lower case when I can just because I think it's easier to look at but you know everyone has their own preference another thing that you might see me do every now and then is rather than having the commas on the right sometimes I put them on the left like this so you know the this comma is like basically it being over here for example and this comma is basically the same as having it like over here I just put them over here instead sometimes because it's more clean looking that way if I wanted to create other columns in the future like let's say I wanted to look at their social security number their passport there their ID number it's it's cleaner to have all the commas on this side rather than on this side where they're not aligned and it makes it easier to edit in the future so sometimes I will have my commas on the left side instead and let's just run that I want to talk a little bit about Encounters in our data set just in case maybe you're not familiar with Healthcare data and do you want to know what is Josh talking about when he says encounters well I'll show you what that looks like next encounters is this Healthcare Centric word that refers to a visit really when someone goes to a hospital or they go to a clinic usually you're going to be looking at either emergency room encounters or some kind of ambulatory encounter or some kind of inpatient encounter so let me show you what the different options are in Cynthia I'm going to select distinct and counter class from public dot encounters what that's going to do is it's going to give me all of the distinct entries all of the unique encounter classes so you see here how it says ambulatory ambulatory ambulatory ambulatory is only going to show up once when I do this and it's going to show up with all of the other encounter classes so I'll show you what I mean when I run this so now you just see all of those unique entries here we have ambulatory emergency home hospice inpatient outpatient Skilled Nursing Facility and these encounter classes actually look somewhat similar to what I'm used to seeing when I work at a hospital with the exception of ambulatory and outpatients those are pretty much the same thing so if I were the architect of this database I would probably just lump ambulatory and outpatient within the same category but I'll just kind of walk through what each of these things mean so ambulatory and outpatient that refers to instances where you go to the hospital and you're done in the same day maybe you're a dialysis patient and you're just getting a simple dialysis treatment usually the patient goes in for a few hours they get hooked up to the dialysis machine and then they're sent home that same day emergency visit is you know exactly how it sounds you go to the emergency department and maybe you broke your arm and the doctor just gives you a cast to put the arm in and he mends that broken arm and sends you on your way or maybe someone has chest pains they think they're having a heart attack but it turns out they're just having really bad acid reflux and they're sent home that same day home visits could refer to Home Health where a medical professional sees you at your residence and treats you on the spot Hospice Care is typically reserved for folks who are in the end stages of their life and don't have much longer to live and are given Comfort Care measures in patient care is usually for folks who have some serious condition that needs immediate treatment within the hospital maybe they're getting a kidney transplant and they just receive their new kidney and they just need to stay for several days just to make sure that their body doesn't reject the new kidney or maybe someone originally went into the emergency room to address chest pain and it turns out they have a clogged artery so they have to go in for surgery and maybe they have to sit around for a little while to for their heart to recover skilled nursing facilities are typically reserved for folks who are maybe physically disabled or maybe they're a little bit older urgent care you're probably familiar with but let's say you have a fever and you didn't feel like it was urgent enough to go to the emergency room but you needed some physician to treat it pretty quickly maybe go to an urgent care facility rather than Emergency Care a virtual visit is where you have some kind of visit on the computer with your doctor maybe it's a wellness visit which we'll look at next or maybe it's a consultation with the doctor we are addressing some kind of new problem could be a specialist could be a general practitioner of medicine Wellness visits are basically like a check-in with your family doctor or your primary care provider where you maybe have a physical or maybe you address any medical problems that you might be concerned with in a typical office visit so now that we talked about encounter classes now let's actually explore the encounter table so what I'm going to do is I'm going to select star from public dot encounters so let's run that and now let's say that I only wanted to look at the inpatient encounter class first I want you to look at this number here we've got 455 935 rows or records and notice what happens when I write this statement out so this is where the where Clause comes into play so right now I'm selecting all of my columns from encounters now I'm going to say where and I'm going to say where in counterclass equals in patient we're in counter class equals inpatient once I do that I now have only 3709 rows what the where Clause does is it's basically saying take some column or multiple columns and according to whatever condition I have here in this case I'm saying where the encounter class equals inpatient only give me the rows where that column is inpatient so where is basically reducing the number of rows in your data set based off of some rules that you're dictating on one or multiple columns so I could do other things here too I could say for example that I only want the inpatient in counter class as well as ICU admission so if I said we're in counter class equals inpatient and description equals ICU admission so this was at three three thousand seven hundred and nine rows now once I run this again so now I'm at 162 rows because I said well not only do I want the encounter class to be inpatient but I also want you to also return the rows where they are also ICU admission for the description so now we've got two rules that will reduce my row count even further let's say that I wanted to add another condition where I wanted to look at the dates I only want rows that take place in the year 2023. so we've got our start date here and we've got our stop date here so if I want to restrict this to the year 2023 I need to pick if I want to do this on the start date of that encounter or the stop date of that encounter by the way the stop date of the encounter is commonly referred to as the discharge if you're dealing with inpatient cases like we are in this example so I might say where in counterclass equals inpatient and description is equals to ICU admission and the stop date is greater than or equal to 20 230101 and because this has this extra like time stamp here like the actual like military time I'm going to type out 0 0 colon 0 0. so we run that now we should only have inpatient ICU admissions where the stop time or the discharge date occurred within 2023. now let's say it's a year later like right now as of this recording it is the year 2023. but let's say we wanted to keep this query around and we wanted to run this in the year 2024 but we still wanted to limit this data to 2023. I might add another and requirement and I might say and stop is less than or equal to 2023 01 actually no that would be December that we would need so December 31 23 colon 59. so then this shouldn't change the number of rows in our data set but it will keep this at a consistent number once we reach the year 2024. so just something to keep in mind you you can also write between for the Stop date so if I said stop between a date and another date so this would be 2021 actually sorry 2023 010100 colon zero zero 20 23. 01 ah I keep doing that sorry December 31st and then 23 59. so that should work just as well yep we have the same results here so you can get very complex with your where Clauses you could have dozens of different requirements in your where Clause if you wanted to basically you're just saying I'm gonna filter this based off of various columns so that I can reduce my row count in some way and just trim it down to just the data that I'm interested in now so far I've only been using and all right let's mix this up a little bit let's do something different what if I just wanted to look at ambulatory and outpatient cases recall that when I was describing all of these encounter classes I mentioned the ambulatory and outpatient are basically the same what if I wanted to include either the ambulatory or the outpatient so I'll show you how to do that next I would just erase all of this and say where in counterclass equals outpatient or encounter class equals ambulatory so I do that I run this when I run this query notice that there is not only outpatient encounter classes but there is also ambulatory if I scroll up a little further now you might be asking why didn't I say where in counterclass equals outpatient and in counter class equals ambulatory let me show you what happens when I do that I get nothing the reason for that is because and I'm just going to switch that back to or a column can't be two things at once it can't be outpatient and ambulatory it can only be one thing at a time so if he said we're in counterclass equals outpatient and it's ambulatory it's not two things at once it can only return that one thing that it listed out so in this case I would say return the rows that are either outpatient or it's ambulatory another way that I could write this out and get the exact same result is if I said we're in counter class in outpatient canned ambulatory so I will and close this with parentheses notice that my row count is 329 997 and once I do that I should get the exact same results here I actually prefer to use the in statement when I'm referring to a column with multiple entries that I'm interested in because it's it's less lines of code it looks a little bit cleaner so whenever I have a situation where I'm looking at some column like just one column and I'm just looking at one thing or another thing within that column I almost always use the in statement just because it's more concise now that we've talked a little bit about encounters and explored the where Clauses I'm going to dive into a new table called conditions so let's start a new line of code here and I'm going to say select star from public dot conditions so with conditions what we have here is we have a start date we have a stop date patient encounter code and description so the start and stop dates are going to denote the length of time where there was some problem for the patient some sort of diagnosis code when a patient comes into a hospital they are going to be diagnosed with something if you have a sore throat you might be diagnosed with sore throat if your appendix is about to burst you might be diagnosed with appendicitis if you have a moment of anxiety or depression that that will be diagnosed on your chart you have all this stuff that's saying when you are diagnosed with some particular condition and the code here I'm using an ICD-9 code here to denote that and here's the description of that code so 277.7 is the ICD-9 code for this Metabolic Syndrome X or 285.9 is the ICD-9 code for anemia unspecified and then this particular patient here had that condition from this state and then if it's blank what that means is that we don't know if there's actually an end date for that patient yet but if it does have an end date then some provider or physician or nurse said all right this psychological distress in this example started in 1953 and ended in 1957. so we have when did this thing start and if we know and it stopped when did it stop if it says null then we don't know whether or not it stopped because it hasn't been documented yet so now what I want to do is I want to talk to you about the group bias statement let's say that a doctor comes to me and they say I want to know all of the descriptions that ever happened and how many patients we have or how many occurrences of this thing did we have in our patient population for all of these descriptions they just want to know the number of times that these conditions happened within their patient population and they want that number in descending order so the most popular things happening at the top and the most rarest things happening at the bottom that would really lend itself well to a group by statement so let me show you what I mean by that if I were to start writing this so I would need to First say description followed by count this is going to give me the number of occurrences of this description and I'm going to write from public dot conditions group buy so this is where the aggregation starts to take place I'm specifying that I only want to look at these unique entries and I want to count off how many time these happen so this will make more sense in a moment so Group by description and that's I think that's all we need to do for this statement so I do that notice what happens now we only see these conditions showing up once but there's something right next to it that says count and I could actually call this whatever I want so I'll call this as count of condition so once I do that that count is now count of condition so you can see we have all the occurrences of acute bronchitis now we have all of the occurrences of acute cholecystitis we have all of the occurrences of acute myeloid leukemia without mention of having achieved remission some of these are pretty rare though and we don't necessarily want the rare things to show up with the more frequent things so another thing you can do to organize this data a little bit better is by adding something called the order by statement so order by count so we're going to do order by account and notice that this shows the rarest stuff first the things that only show up like once and that's because this automatically sorts things in ascending order we want to sort this in descending order so I'm going to type desc if I wanted to do it in ascending order like it's doing now I would just type ASC but that's the default and I want descending so I'll type d-e-s-c all right I'll run that and now we see we see all of the most common things that show up for the patient so other psychological or physical stress not elsewhere classified is the most common condition followed by pregnancy followed by acute bronchitis the list goes on there's so many conditions here but this is how you would do this if you wanted to just specifically look at the description and list out the number of times that that description happened within your data set you would use Group by now let's say that the person requesting this data let's say that they only wanted to look at descriptions where you have 2 000 instances of this thing or more so in other words we don't care about unspecified otitis media or streptococcal sore throat or any of those things below because they have less than 2 000 occurrences of that condition so what we would do is we would use the having statement so the having statement actually occurs before order by we're going to type having count and we're going to say greater than two thousand so this functions a lot like the where Clause except it's only used when you're aggregating data using the group by statement I'll show you what that looks like so now we have a much smaller list and we're only returning the conditions that have at least 2 000 instances of this thing occurring and you know you could go even further you could say 5 000 if you wanted to and make it even smaller the having Clause Works somewhat like the where quads except it functions only on these Group by statements where you're doing aggregations so that's the only scenario where having is going to be used and so you're saying well I want to use this count as my filter so you would say having count and then you'd plug in your number right here now you might be asking so we're using the having Clause here having count greater than five thousand could I use the where Clause at the same time that I use the having clause and the answer is yes you can do that so let's say that I'm done running my query and the doctor or business partner looks at this and they say hey this looks great except I don't really care about body mass index that's something that's not really a disease that's just kind of like a situation that they have they don't care about that let's get rid of that so this is a great example where we don't necessarily need to use the having Clause here to trim this down even further what I would do in this situation is I would just say where the column description does not contain this thing so I'll show you what that looks like so I'm going to say where where description is not like this thing or it is not equal this thing is what I should say so that exclamation mark and that equal sign means this does not equal this thing so I'm going to control C to copy this right here I'm going to do control V to paste it and I'm going to remove this number here I'm going to remove this double quote and replace that with a single quote it looks like it created some extra quotes for me that I don't need so I'll get rid of that so it should look like this body mass index with this range and then these single quotes these apostrophes here so now it's going to say all right give me all of the stuff in the conditions table where the count of that thing was greater than 5000 and you also want to kick out anything that says body mass index of 30 to 30.9 adult so we run that and it gets removed so yes you can combine the where Clause with the having Clause when you're using these aggregations these group bias statements so what I'd like to do now is test your knowledge of what you've been learning so far and see if you can apply what you've learned so I'm going to give you a few problems to work through in SQL what I want you to do is try and write out the code yourself see if you can solve it so you'll pause the video as soon as I give you the question you'll see if you can answer it try taking a stab at it for 10 15 minutes however long you need and then resume the video and then I'll kind of work through how I solved the problem so what I want you to do is just write a query that selects all of the patients from Boston pause that try it for several minutes try writing out a query and see if you can come up with the answer and then I'll go over that once you resume the video all right so now I'm going to go over my answer so in order to get all the patients from the city of Boston um you know let's just start by looking at what we have in our data set we've got all these columns here you'll notice that there is a city column here okay we've got all these cities and all you got to do is just write a where Clause so select star from public dot patients that's our table where the city is equivalent to Boston so we are basically telling the query that we want to trim down our rows to only the city of Boston so once we do that we should have 1016 rows so that might have been a little too easy I'll see if I can give you something that's a little bit more complicated now for our second question I'm going to try and make this a little bit more complicated let's say that we have a doctor who's interested in every single time that someone was diagnosed with chronic kidney disease let's say that the doctor says he already knows the ICD-9 codes that he wants to give you and they are 585.1 585.2 585.3 and 585.4 how would you write a query that returns all of the occurrences of there being some condition with one of those ICD-9 codes so remember we are interested in all of them okay so take a few minutes to see if you can work through the problem and then resume the video when you want to see the answer all right so what I would do in this scenario is I would write select star from conditions okay so we have all of our conditions here and recall that you can use this where something is in something else so this column code is in 585.1 585.2 585.3 585.4 so we're basically saying this code has to contain one of these codes so it's either this thing or it's this thing or it's this thing or it's that thing and once we do that once we run that we get all of these occurrences of some diagnosis being made where the patient had chronic kidney disease and that's stage one through stage four okay so that would be the way that you would approach this question you should have 3869 rows here I'm going to give you one more practice problem before we move on and I'm going to make this harder so I've actually prepared it right here so I have this comment that says write a query that does the following one it lists out the number of patients per City in descending order and that city does not include Boston and you must have at least 100 patients from that City now when you're doing this just keep in mind that that patience table I think I mentioned this earlier but that patient's table only lists out each patient once okay so see if you can figure out how to write a statement a query that does all this stuff so again I'll give you a few minutes and then resume the video when you want me to go over the answer all right so here's how I would do this so we've got this patience table so first let's bring up the patient's table again public dot patients so I'm going to run this and notice that there is a city column in that table what we want to do is we want to group on the city and we want to pull up the number of patients coming from each City by using count so I'm just going to going to kind of repeat this answer here I'm going to just kind of gradually go through this so select star from public dot patients and actually we need to replace the star with City and count to kind of set the stage for this group by City And I'll show you how that looks so far we run that and we have all this stuff in here but right now it's including Boston somewhere there's just so many cities here so I'm not gonna like look for it but we don't want Boston in this data set here so once I exclude it this should drop to 407 so I'm going to say where city is not equal to Boston now if you haven't noticed this already keep in mind that when I'm writing these statements this is case sensitive all of these cities are capitalized at their first letter so we do want to make sure that that first letter is capitalized so I run this now this drops to 407. all right so this drops to 407 because we kicked out Boston but we're not done yet we still need to make this in descending order as well so remember that we have the order by and then we're going to say count and we're going to make that descending so we do that that's going to look better but we're still not done yet because if we scroll to the bottom we still have like all these cases here where there's just one one patient coming from some of these cities and the prompt said that we need to include at least a hundred patients from each City so after the group by and before the order by we have the having count greater than or equal to a hundred all right so we run that and we should have 16 rows as our final answer with chester at the top and Revere at the bottom by the way you might be wondering what's up with this section here like why is this a different color than the rest of the sections and shouldn't that be causing some error in the code well no because this is commented out code so when it looks like this when it's kind of like this brown color that's just basically saying this part is not going to be read by SQL this can come in handy when you're trying to write a SQL script that clearly labels out everything that you're doing and it's a good habit to describe as you're going through your code what each thing is doing so for instance I might label out this section here to say and I need to include two hyphens as I'm writing this at least two hyphens and I say this query gives me the count of all patients from their city of residence and there must have been at least 100 patients cities do not include Boston okay you could also do the same thing if you let's like erase all of this here and then let's do forward slash asterisk and then backwards slash asterisk or sorry um asterisk uh forward slash okay so notice that it still has that same color when we do that this is just another way of commenting out your code now we only have a few minutes left before we end our webinar so I'm going to wrap things up by talking about this Final Table called immunizations and I'm going to give you a high level overview of the next steps in your SQL learning which would be joins so I'm going to select everything from this immunizations table to show you what we're dealing with here so we've got all those patient identification numbers that's going to identify the patients in the patient table so these two IDs for example these are this this is the same patient who received these two immunizations on 2011 July 27th they got the herpes vaccine and the seasonal flu vaccine now you might be looking at this and you might say you know it's I see this ID but that tells me nothing about the patient so what this identification number allows you to do is it allows you to do a join a join is a way to pull in other columns from other tables so you could make this table wider by including other columns from other tables and in this case we're thinking about pulling in maybe the first name or the last name or the birth date of the patient or all three included with all this data here that we're currently looking at so that patient identification number would be translated into the three things we want to look at first name last name birth date to pull in those three extra columns we would have to do either a left join or inner join there's several different join types and the most common of them are left join inner join full outer join and right join I only use left join and inner join for the most part and I could spend another whole hour talking about joins because this is a very complex topic and you they can get very detailed I'll show you what it looks like here so if we scroll down here is an example so I run this and now I've got first and last name coming from the patient table and that's lined up perfectly with our immunizations data so this data I have highlighted here comes from the immunizations table this data right here comes from the patients table we're combining two tables together using the join and I actually forgot birth date here so let me add that really quick so we've got the birth date now so there should be birth date over here and there's several pieces to this that you have to keep in mind when you're setting this up you have to have a starting table in which case we're picking immunizations we have to figure out what our joining table is going to be we already know that that's patience we need to determine the join type and I won't go into those in detail today but we've got these four different types of join I choose to do a left join in this particular example you need to specify the thing that you're going to join on and that's where this part comes into play left join public patients on T1 dot patient equals t2.id these two columns are basically the same thing in the table immunizations the patient identification number is called patient so that column is called patient and then in the patients table that same thing is called ID so I'm just basically joining on the same type of column they just call different things in those two tables so this on statement is basically saying look up all these identification numbers on this patient column in immunizations and match that on the patients table in that ID column so that ID column is also going to have these same codes it's going to match those two things together and it's going to pull each of those rows to line them up where those identification numbers meet so by doing that by forming that connection that enables you to pull in these columns but there's other nuances that you have to keep in mind when you're doing this if you do a join poorly or if you don't plan it properly you can end up getting more rows than you started with and if you use an inner join usually that will reduce the number of rows so when you're doing a join there are circumstances in which you might end up with more rows than you started with or fewer rows than you started with so you do have to keep that in mind when you're building out these queries and to understand why that happens you need to study relationship cardinalities so that goes into one to one one to many many to one many to many relationships there's so much to talk about that is just too much for today's webinar but this should get you started with just dipping a toe in the water with SQL well we've covered a lot in this past hour and there's still a lot to talk about when it comes to SQL like I said we need to talk about joins and the relationship cardinalities we need to talk about functions too functions play a vital role in SQL and understanding how everything ties together we haven't talked about common table expressions and how to divide multiple queries and bring them together at the end so I'll cover things like that in future videos if I've piqued your interest a little bit about how to become a data analyst in healthcare I have a whole video for that check this one out thank you so much for watching and I'll see you in another video