Lecture: Building a Custom Filtering Engine for Search Results
Introduction
- Issue with current search results: Irrelevant results on Google.
- Solution: Custom filtering engine to rank results based on personal criteria.
Goals
- Show how to build a custom filtering engine yourself.
- Organize and improve search results for specific queries (e.g., best baby stroller).
Overview of the Project
- Use Google Custom Search JSON API.
- Use Python to query search engine and store results.
- Write filters to re-rank results based on quality.
Components Needed
-
Google Custom Search JSON API
- Create a programmable search engine on Google.
- Obtain API key.
- Free up to 100 queries/day.
-
IDE & Tools
- Use PyCharm, VS Code, or JupyterLab.
- Python libraries: Flask, Pandas, Requests, BeautifulSoup.
Setting Up the Project
-
Requirements File
- Create
requirements.txt
with necessary packages.
-
Settings File
- Store API key, search engine ID, country code, and search URL.
-
Database Storage
- Use SQLite to store search results.
- Create a class
dbStorage
to handle database interactions.
-
Search Functionality
- Use Google API to fetch results based on query.
- Scrape full HTML of pages for filtering.
-
Web Server Application
- Create a Flask app to show search form and results.
- Use HTML and CSS for styling.
Filters
-
Content Filter
- Penalize pages with fewer words than median.
-
Tracker Filter
- Penalize pages with a lot of tracker scripts.
Storing Relevance
- Store relevance score to mark results as good or bad for future machine learning application.
Implementation Details
- Database setup: Initiate connection, set up tables, store/retrieve data.
- Search.py: Functions to query API, scrape pages, and manage search process.
- App.py: Flask routes for search form and results display.
- Filter.py: Apply filters to re-rank search results.
Future Work
- Use relevance score for machine learning to improve filtering.
- Potential for further customization and enhancement of filters.
Conclusion
- Successfully created a custom search engine with improved filtering.
- Encouragement to extend functionality with machine learning and other improvements.
Note: Ensure proper handling of API keys and privacy considerations when implementing similar solutions.