Python Web Scripting with Beautiful Soup 🌐

Introduction

Special Python tutorial on web scripting using Beautiful Soup.
Thanks to FreeCodeCamp for the guest opportunity.
Mentioned personal YouTube channel "Gymshape Coding" for additional tech content.

Objective: Teach web scripting concepts using Beautiful Soup library in Python.
Web scraping examples: bank account, job websites like LinkedIn, Wikipedia, sports websites, etc.
Plan: 3 Parts:
1. Scrape a basic HTML page to understand concepts.
2. Scrape a real website.
3. Store the scraped information.

Basic components: title, paragraphs, button, price.
HTML Tags: <html>, <head>, <body>, <div>, <h1>.
Special Notes:
- Head tag includes meta tags and link tag for styling.
- Body tag includes content displayed on page.
- Important HTML classes: card, card-header, card-body, etc.

Creating Beautiful Soup instance:
- soup = BeautifulSoup(content, 'lxml')
- Use soup.prettify() to print formatted HTML.
Extracting HTML tags:
- soup.find vs. soup.find_all for one or multiple elements.
- Examples: extracting <h5> tags for course names, prices, etc.

Requests Library:
- pip install requests
- Using requests.get(url).text to fetch webpage content.
Identifying HTML Structure:
- Use browser's inspect tool to find relevant tags and classes.
- Example: extracting job ads based on class names.
Combining Beautiful Soup with Requests:
- Creating Beautiful Soup instance with fetched content.
- Extracting desired tags and attributes using filters.
- Example: soup.find_all('li', class_='job-entry')_

Extracting Conditional Content:
- Extract based on post dates, job titles, skills, etc.
- Example: filtering jobs posted within a few days.
String Manipulation:
- Removing extra whitespace using replace and strip methods.
- Extracting and printing job details such as company name, required skills, and posting date.

**Writing to Files: **
- Store each job ad information in a separate text file in a directory.
- Use file I/O operations: with open('<filename>', 'w') as f and f.write(<content>)
**Automating Scraping: **
- Using a loop and time.sleep() to run scripts at regular intervals.
**Dynamic User Input: **
- Allow user to enter unfamiliar skills to filter out irrelevant job posts.
- Example: input() to take user input and filter results accordingly.

Final Program:
- Runs in intervals, scrapes job postings, filters based on user input, and stores results.
- Dynamic and useful for tracking changes or updates on websites like job boards.
**Potential Challenges: **
- Accepting multiple unfamiliar skills as input.
- Adjusting code for websites with frequently changing HTML structure.