Transcript for:
Getting Started with Web Scraping

Hey everyone, welcome to GeeksforGeeks. I'm Ishaan Sharma and in today's video, I'll be talking about web scraping and how can you get started with it. Let's get started.

This video is going to be more of a tutorial kind of a video in which I will walk you through the steps that are required, what all code you need to type, what does that code really mean, and we'll also see the end result from there on. Okay, so first of all, let's talk about the requirements, the dependencies that will be there for you to install first of all. So first thing is going to be Python. Okay, so we are using Python for doing this and if you want to get started with web scraping Python is a great language simply just go to python.org and you can install Python. I hope you can do that much after that.

Now I'll show you how can you get yourselves the three or two modules that will be required number one would be beautiful soup and the second one would be called as requests. Okay, so now let's take a look at what all you need to do by going on to the laptop. Let's do this.

Alright, so as you can see, I have Visual Studio Code opened up and I have just made a file called as scraper.py, okay. That's the Python file that we'll be using. So first things first, we'll need to actually get beautiful soup, okay.

So for that, I'll just go on to my Chrome and I'll just search for beautiful soup and that is right there. This is the complete documentation. highly suggested for all of you to go through this you can also look at our geeks for geeks blogs on this thing but let's go back and let's look at how can we do this so just search for pip that is pip that is python package installer so after that we'll just click on this and we'll just need to copy this thing okay so just just copy here and once you've done that we can open up our terminal and on the terminal let me just go full screen here you just need to type pip3 pip3 install and you can type the whole thing that they wanted you to do. So I am using Mac. So on that, I have to do pip3.

But if you are using Windows, you can just do pip and that should work fine for you. Just click on enter. And for me, it says requirement already satisfied. So that means that I am good to go.

Now let's take a look at how can we install the requests module. So first of all, I'll just say pip3 install. request i'll press enter and that is also satisfied for me so now i have installed both of the you know modules that were required now let's go on and take a look at our file here okay so what happens is that let's try to understand what is web scraping essentially basically we want to take some website and you want to scrape data from there you want to take some information that you can use for your own purposes maybe you want to make like a like a price tracker for some thing that you want to buy online or or maybe you want to just see what the temperature is right now or maybe you just want to see the latest blog on a particular website So things like that if you want to do and that particular service or website does not have an API then you can go ahead and use something like this. But just keep this one thing in mind that not all websites are happy when you actually scrape their website. So just keep this one thing in mind make sure that your website that you want to scrape allows you to do the scraping.

That's one thing to keep in mind always. So now what I'll do, I'll just go full screen with this. and first of all we'll need to import all of them so i'll just say import requests and i would say from bs4 that is beautiful soup for import beautiful soup so first of all what i'll do is that i will get something called as a request right so request for a particular website i'll just call it req and i'll equate that to request dot get and i'll have these brackets and in between i'll need to put in the URL of the website. Now it is not really required that you have a website.

You can also just link it to HTML document that you have in your own folder. But the thing is that if you are using web scraping, it is generally used for websites that are online and not on you know websites that are in your local folder itself you can just go on you can see whatever data is in it but anyways uh we will actually get some sort of a website so i'll just go here and i will actually go to geeks for geeks right geeks for geeks.org and here what i will do is that let me just close all of this thing over here and i will click on inspect okay i'll click on inspect but we'll do this afterwards so I'll just copy this link of geeksforgeeks.org and I'll go back and I'll paste this right here. Okay, so after this is done, the next part that we'll do is that we will use something called as a soup. And in this thing we'll just create an instance of beautiful soup okay so beautiful soup think of it like a class that you have and you're just creating an object called a soup so here i'll just say that okay i want to get the content from this particular website so i'll just click on content and then i'll say that the parcel that i want to use is just going to be called as html parser okay so that's all that you need okay so now first of all let's just see what this soup actually has so i'll just print out soup dot pretty phi okay so this pretty phi is important because else we'll just get some hodgepodge data so i'll just save this and next i'll just go into my terminal and i'll just say python3 and scraper.py again if you are a windows user you will just have to say python and that should work for you all right so what did we see here we just got a bunch of data okay we got a lot a lot a lot of data as you can see right here this is indented because we have used dot prettify and if it would not then we'll just get like a bunch of no random gibberish now let's try to understand how can we get specific elements from this thing so the first thing I will do is that I will just go here and I'll just say soup dot get text okay let's just try to do that first of all so get underscore text that's what I will do here And let's just close all the brackets. And now let's just try to run this.

So let's run. And as you can see, we get all of the text. So this is the text that you would be having in all of the elements that are inside of that. Okay, so this is how powerful web scraping is. is now what else can I do there's actually a lot of things that you can do over here let me just show you another example so let's say you can make something like this and I can just say soup.a okay so here or let's just go for super title let's say I just want to get the title of this html website so I can just save this so once we have done that we can just say res.prettyfi okay so let's do this let's save this right here and now i'll just try to run this thing once again and what do you see you get the title so this is what you can do and of course you can just say things like uh let's just go for get text okay so i can just do get text and let me close the bracket save this and let's go on to the terminal and let's just try to run this once again so as you can see we just get the text not the title tag and all okay so there's actually a lot that you can do uh but this was just an introduction you can also call a particular element based on the id and the class they will to so you can add a lot of features on top of that but this was just me scratching the surface on how you can learn web scraping this is how you get started you can also get the links of the images on a particular web page and then you can you know sort of download them using this request module and then you can try to you use them in some other website and you can showcase that there's actually a lot of things i won't go into the details this was just a tutorial for beginners to understand how web scraping works you can check out our you know blogs on web scraping on beautiful soup on requests on parsing all of that would be in the description and thank you for watching this video and i hope to see you in the next one