hello everybody and welcome to the fourth and final video in this python web scraping tutorial series in this video i'm going to be showing you how we can find the cheapest in stock products on newegg now i was going to have a bunch of other websites combined into this search however that's just going to take way too long to do so what i show you here you can modify slightly and apply it to other websites and you can combine say four or five websites together when you're searching for a specific product now the product i'm going to be targeting here is a graphics card now you can really do this with any product in fact the script i'm going to write here should work with pretty much any product you can search up on newegg and again we're just going to be looking for the cheapest uh in stock graphics card and we can pick whatever graphics card model that is in fact i'll make it dynamic so it'll ask you what graphics cards you want to look for and then you can search for those graphics cards now what we're going to do is just spit out a huge list that will have the price of the graphics card the name of the graphics card obviously there's a ton of different models and then the link for that gpu in case you wanted to go and actually purchase that so that said let's go ahead and get started [Music] all right so in front of me i have my python script i have these imports so beautiful suit for requests and re for regular expression we're going to need all of these the first thing i'm going to do though is ask the user what graphics card it is they want to search for so i'm going to say gpu equals input like that and i'll say what gpu do you want to search for and really we could change this to what product do you want to search for okay so and then we'll do a space there nice so the user is going to input what product it is they want to search for then what i'm going to do is send a request to newegg so i'm going to say url is equal to i'm just going to copy in the url here but i will go to this url so you can see what it looks like so this is actually the url that we want and let me just describe what some of this is so first of all let's actually just go to newegg so notice i'm on newegg i searched 3080 and then i went here and i checked the in stock for availability option now what that did is change the url up here to say d equals 380 so that's our search term and then n equals 4 1 3 1 which is the filter so that's saying only give me ones that are in stock so you can see it shows in stock right here if i remove the and n like that notice it no longer is only giving me the ones that are in stock so that is kind of how that works so what we can do is if we want to change what we're searching for we can just change this d parameter so change that to 3070 notice this now changes to 30 70 and now we're looking for 30 70s so that's kind of how this works and this is what the url is i'm on dot ca obviously if you're in a different country change the i guess extension whatever you call that dot com whatever okay so anyways that's what we're going to look for so this is what we want to send the request to so we're going to say page is equal to requests if we spell this correctly dot get i want to get the url and then dot text and then i want to read this in with beautiful soups so say doc is equal to beautiful soup page and then this is going to be the html dot parser or is it parser.html i forget no html.parser is correct so let me just put these together okay and now that we have this the first thing that i actually want to do here is i want to figure out how many pages of results we have so if you go here notice how we have five pages right page one of five so i actually wanna figure out how many pages we have because i don't wanna just look on the first page i wanna look on all of the pages and note if i go to like page four here see how the url changes and says at page equals four so we can easily send another request to get the next page if we have the and page and we know how many pages we need to look through so what i want to do is find this text right here on this page excuse the voice crack find what this number is and then use that to determine how many pages we have and loop through all of the pages and repeat you know finding all of the products so it's a little bit complicated to do this but i just clicked right on what i want so on the text here and pressed inspect and then notice it actually highlights it in this window for me so we're inside of this strong tag and it's inside of a span that has the class list tool page pagination text so what we can do is search in the document for a uh a tag that has this class and then we can look in the strong tag and then we can try to find this number five again not super easy but i will show you how we do that so uh let's go here just make this full screen and i'm going to say page underscore text is equal to doc dot find and then it's going to be class underscore equals and then put this inside of quotes okay so let's just print this out and see what we get page underscore text okay uh so what product you want to search for let's go with 3080 actually i realize i can't use this to do that because sublime text doesn't work for console input for python so sorry just excuse me as i run this script python web scraping 4 i think dot pi okay so i'm just going to run it from terminal you can excuse what's happening but anyways let's go to 3080 and let's see what we get okay so we get this span and then we want what's actually inside of the strong tag right it says one over four so since we're going one over four then what i want to do is grab the four so let's first get the strong tag so let's do dot strong and now let's see what we get and okay let's just go 3080 and now notice we actually get this this sorry now what i want to do is i want to grab this 4. however if i try to access the string that's inside of here i'm going to get an error because these comments are kind of messing everything up so what i'm actually going to do seems a little bit counter-intuitive but what i'm going to do is turn this whole thing into a string i'm going to split the string and find what's on the right-hand side of the slash i'm then going to split the string again such that i can grab this 4. so it's going to seem a little bit weird but just follow along with me here so i'm going to say pages equals page text i'm going to convert this to a string and then i'm going to say dot split i'm going to split this at the slash and i'm going to get the last element now actually sorry i'm going to get the second last element and you'll see why in a second so if we look at this here notice that we have a slash and then we have this slash so what happens when i split this string is it's going to give me a list that has this as the first element has this as the second element and this as the third element so i want the second element uh the second last element really second element or second last element and so i'm going to grab that which will give me this so now if i print pages and i go here and run this let's go to 3080 notice that i'm getting this so now what i'm going to do is since i don't know if the pages is going to be a number that is more than one digit so like 25 or 100 pages or something like that i can't just grab this index i need to split the string again at this so i'm going to split it at the i guess this is the less than sign or the greater than sign i'm going to split it at that regardless then i'm going to remove the very last character and then i'm going to grab whatever number is here and that should work if we have multiple pages so let's do this again so i'm going to now split this i'm going to split this at this symbol and i'm going to actually just look at what that is to start so let's go here and let's go 30 80. and now notice that's what this looks like so i want to grab the last element and then remove that so let's grab that so let's go negative one and then we're going to go colon negative one and what that will do is remove the last element for us so now let's print this out excuse me 3080 and we get 4. now the only last thing we need to do here is convert this to an int so we're going to say int of this and now if we go here same thing it should just work let's actually go 30 70 and see if we get a different number of pages and here we get five pages perfect so now that we have the pages we want to loop through all of the pages and grab all of the elements on those pages so i'm going to say for page and range pages and then i'm going to copy this right here the reason i'm going to copy this is because now i want to send another request but i want to change this by adding a page is equal to page right so i'm going to look for the specific page now i think this is fine yeah that looks fine to me i was gonna do something else i think that's actually okay for right now okay yeah so what this will do is give us the gpu search result at a specific page ah what i need to do is change the range so page is plus one so the reason i'm making this change here is because we don't to start at zero whenever we're in a for loop and we just use range with an end value we start at zero i want to start at page one and then i want to go up to page whatever the number is so i need to add one to this otherwise we won't hit the last page which is how for loop and the range function works okay so now we are sending a request and grabbing every single page and now on every single page i need to grab every single item that says 3080 in it so the thing is not all of the items that are on here are actually going to say 3080. i'm pretty sure so what i want to do is just do my own filter and make sure that i'm only grabbing items that actually say 3080 in them or whatever the gpu is that i'm looking for so we're going to do that we're going to say doc dot find underscore all this is going to be we'll call this results actually no we'll call this items is equal to docs.findall and i'm going to say text is equal to re.compile and i'm just going to look for the text of whatever the gpu is that i'm looking for now i really should call this something else probably so let's rename this from gpu to search term just so that it's more dynamic okay and the reason i'm doing re.compile here is because when i do this it will actually match any text that contains this if i just put search term i didn't put inside of the regular expression it would not match the text that there was anything different than the search term i miss explain that really what i'm saying is like let's say we have the text 3080 and we're looking for 30 80. this will match with 3080 but if i had like 30 80 for the win and i just was looking for 3080 it wouldn't match because this has extra characters in it whereas using this both of these will be a match hopefully that makes sense but that's why i'm using re.compile and now i want to loop through all of the items so i'm going to say for item in items print item and let's just look at what items we're actually getting here so let's run this let's look for a 3080 and let's see okay so obviously we're getting a ton of stuff it's going to keep printing stuff out so let's go here whoa okay we have many many things here um okay yeah so we're getting quite a few things printing out okay that's fine anyways we're going to continue now we're going to look for the name of the item the price of the item and the link for the items so we will continue in one second we need to quickly thank the sponsor of this video and this series which is algo expert algo expert is the best platform to use when preparing for your software engineering coding interviews they have over 160 coding interview practice questions that cover everything you need to know to ace your software engineering interviews check them out from the link in the description and use the code tech with tim for a discount on the platform all right so i was just examining the output that we got there and i was realizing that we weren't quite getting exactly what i was looking for now let me just inspect on one of these right here and show you that what i actually want to look for is only items that are inside of this div right here so i'm going to copy this class name but if you look at this div this div is where all of the content is from my search results now if i just look for say 3080 or 3070 it's going to give me like the text up here as well i don't want that i only want actual items so what i'm going to do is copy this class name because this is the div that has the table inside of it and then i'm going to look only inside of this div for any text that matches what product we're looking for so how am i going to do that i'm going to say div is equal to doc dot find and then we're going to say class underscore equals that and then this needs to be inside of quotation marks again all this code will be available in the description in case i'm losing you here and then rather than saying doc we're going to change this to div so now we're only looking inside of the diff okay so now when i print item we should get some better results here so let me clear this let's run this and let's go 3080 and now let's see what we get that's much better so now notice we're only getting the text we're not getting a bunch of other random crap that we don't need just text is what we're getting here so that's exactly what i want and now that i have this text what i need to do is look at the parent of this text because that text is great but that is this right so oops let me get out of this okay so let me right-click on this so this is the text that we're getting we're getting what's inside of this link right so the first thing i can do is i can grab the href from this link so the parent of all this text is going to be a link so what i will do is grab the href so the actual link from the link from the text uh so let's do that now let's go parent is equal to item dot parent and then what i want to grab is the href now the thing is i'm still going to get some results here that for some reason will not actually have a parent tag which is the a tag just weird things can occur and well if i try to access the href property of a tag that doesn't have that i'm going to get an error so i just need to make sure that the parent is actually an a tag before i try to do that so i'm going to say if parent dot name equals equals a then what i can do is say link equals and this will be parent href like that and then i can print the link and i'm just going to say link equals none for now just so that this print statement doesn't crash so now let me try this i'm going to look in the parent which will be the a tag i'm going to try to grab the link so the href from that okay so let's do this uh actually let me just clear this first because it's probably better and now let's run it 3080 and see what we get okay so now notice we're getting a bunch of links and we're also getting a bunch of nuns so it seems like we get link none link none link none and i think when we're getting none that's because we're finding 3080 as maybe the attribute of some tag or something along those lines so what i'm going to do is say okay if the parent tag is not an a tag then we're just not going to bother adding it in we'll just skip it so let's do that now we'll say if parent dot name does not equal a then we will continue otherwise we'll say the link is equal to parent href and we can remove this link equals num great so now we're getting the link and we already have the name that stored in item the last thing we need is the price so how are we going to get the price let's go back here so the price is kind of hard to find but if you notice here inside of item container we have this which is the image we have item info inside of item info we have where our actual name is so from the name our parent is item info and the parent of that is item container and that's what we want so now we want to look up two parents we want to go the parent of the parent so we get item container and then from item container if we go inside of here let's look at this we want to look at item action price and then whatever the current price is so price hyphen current so let's do that now we're going to say um next parent is equal to parent dot parent and then we're going to try to find the price so now we're going to say price is equal to next underscore parent dot find and this is going to be price hyphen current and this is the class name so we're going to say class underscore equals that and then we'll print out the price and see what we're getting okay so let's try this clear and 3080 and we get a bunch of nuns okay so that's interesting why we're getting a ton of nuns uh let me have a look here and see if i have messed this up somehow all right so i think i'm having some issue with the parents here uh for some reason parent.parent is not giving me the div that i'm looking for so what i'm going to do is show you kind of an interesting thing that we can actually do here to locate a specific parent so if we go here the parent that we actually want is item container right we want the one that has item container as the class so what i'm going to do is rather than trying to you know find how many parents there are i'm going to use an interesting method that we have here which is item dot find underscore parent so what this will do is actually look for any parent so any ancestor in the tree that has a specific class name in our instance so i want this to be item container so now let's see if this is actually going to work so again this will look up in the tree and find the first parent that has this class name let me make sure that's correct yes that looks correct okay so let's clear this run this and go 3080 and now notice we're actually getting all of the prices right so they're showing up kind of inside of this span and inside of uh what do you call it the strong tag so what i'm going to do now is look for the strong tag of this price current class or price current div or whatever it is so now let's do dot strong and let's see if we get anything better okay 3080 and then notice we get all of this so now that i have the strong tag i can just look at the string of this so let's just go string like that and let's run this 3080 and now notice i'm going to get all of the prices so that's exactly what i'm looking for i have now found all of the prices of all of the gpus so now what i want to do is take all of this and put this inside of some type of data structure so that i can use it and print it out later on in the program so up here i'm going to say items underscore found is equal to this is going to be a dictionary now what i'm going to do is i'm going to say items underscore found and this is going to be at the name of the item so i'm going to say item which is the actual text that contains 3080 is equal to and then it's going to be another dictionary that has the price so we can put the price and has the link which will be equal to the link so there we go now down here i can print out items found so let's go ahead and run this i'll pause for a second so you guys can see what i just wrote but i just created the dictionary and added as a key another dictionary and then the sorry not as a key as a value another dictionary and as the key i'm using the actual text that contains 3080. okay so let's go here and i keep saying 3080. it's whatever text you type in here like if i do 3070 we'll get different results obviously but 3080 is just the one that i've been using so this is going to take a second because it needs to go through all of the pages before we get any output but now we're going to get the output and it is kind of a mess here we'll clean this up in a second but notice we have price we have the link and so on and so forth now the last thing i need to do is convert this price to an integer because right now as a string it's not that good to us and for me to be able to convert this to an integer i need to remove the comma from the price so i'm just going to say dot replace and then i will remove the comma and replace it with an empty string okay so now we should have prices that do not have the comma in them and now what i want to do is sort all of these items by their price and then have some nicer output for them so this is a little bit of more advanced python code to sort a dictionary to sort a dictionary you need to first convert it to a list of some sort sort the list and then convert it back to a dictionary now in this case i'm actually happy with my items being in a list and so what i'm going to do is the following and in fact it kind of defeats the purpose of making a dictionary but regardless i'll show you how this works i'm going to say sorted underscore items is equal to and then we're going to say actually sorted we're going to put our items found dot items so if you're unsure what items does it's going to give us a tuple that has the key and then the value and then i'm going to sort it with the following function lambda x x 1 at price this might seem a little bit confusing but what this is going to do is create all of the tuples that contain the key and the value the value is a dictionary that has price and link inside of it and then i'm passing the function that i want to use to sort this as lambda so this is an anonymous function we take one parameter x which will be all of these items we then say x1 so this will give us the second item which will be the dictionary right here and then we access the price again a little bit confusing but hopefully that makes sense we're just sorting by the price of all the items in the dictionary and now what i can do is loop through sorted items and print them out so i'm going to say for item in sorted items and here we will print the price of the item the name of the item and the link of the item on different lines so i'm going to say print item and item 0 will be the name because that's the key it's in the first tuple it's the key sorry first what would you call this index of the tuple and then i'm going to print items one of price and i'm going to put a dollar sign near it too so i'm going to make an f string and i'm going to put a dollar sign oops i did this the wrong way i want a dollar sign and then i want to put this inside of my curly braces okay so now we'll actually print dollar sign then we'll print the price and then i'm going to print the link so we'll say items one link now just to clarify because i'm sure some of you are kind of confused here when i do dot items what this gives me is a list of items so the items would look something like this you know 30 80 for the win and then we would have price let's just go like two nine nine whatever and then we would have link and then there'd be some link here so that's what all of our items are gonna look like so that's why i'm accessing them in this way because i'm gonna sort all of the items by the price and then what i'm going to do is print out item 0 which is the name and then items 1. sorry this is item not items so item 1 of price which will give me the price and then item 1 of link which will give me the link hopefully that makes sense okay so let's now run this and see what we get so let's go here and let's go 30 70 this time oops never mind 3080 and none type object has no attribute string interesting okay so dot strong apparently for some reason that's not giving us the string uh so let me see what's wrong with that all right so i'm just debugging this for a minute or two and found a few mistakes that i'll go through right now so first of all this is a modification so for some reason i guess a few of the entries don't have a strong field um i'm not sure this will happen sometimes where some of the html will be off on some of the stuff that you're doing so to fix this not very good practice wise i just put a try and accept block so pretty much if this fails and i'm not able to find the string from the strong tag and i change this to fine strong rather than just dot strong that shouldn't make a difference but i was just trying it in case that was going to make a difference anyways so i put the triangle and then changed it in in this way and then all is good now i also added key equals i forgot to do this before i had it like this so you need to do key equals lambda and then that will actually allow you to sort so let's make sure we have that and then one thing i'm going to do before i show you the output here is i'm just going to print a bunch of dashes here just to kind of separate them because it's really hard to read it otherwise and now let's go here you can see i was testing it let's run this let's go 3080 and let's see what we get so now we get all of our results uh we can see that the cheapest 380 in stock two thousand dollars two 2223 four oh nine two six one three two six four nine and then the link is available for all of them so if i click on this one right here uh let's grab it and let's actually paste it in and let's see here if this is available and there you go so we have the 38 ti i believe we had the correct price for that yes we did and then we could go ahead and purchase it now i'm curious if this one is just showing me 380 ti's no okay it's showing me 30 80s as well so let's just try this link out and make sure this is working okay so let's grab this and paste this in here and then there we go we have this perfect all individual project okay so this is you have to buy this in a combo i guess or something like that regardless it worked so with that said i think i'm going to end the video here i just wanted to show you how you would actually go about doing this to automate some type of task in my opinion this is pretty useful pretty interesting especially if you're looking for graphics cards or any product really rather than going and having to find it yourself you could write a little script like this although given that might take more time than searching for it regardless hope you guys enjoyed the video if you did make sure to leave a like subscribe to the channel and i will see you in another one [Music]