🔍

Creating a Custom Search Filtering Engine

Dec 6, 2024

Lecture: Building a Custom Filtering Engine for Search Results

Introduction

  • Issue with current search results: Irrelevant results on Google.
  • Solution: Custom filtering engine to rank results based on personal criteria.

Goals

  • Show how to build a custom filtering engine yourself.
  • Organize and improve search results for specific queries (e.g., best baby stroller).

Overview of the Project

  • Use Google Custom Search JSON API.
  • Use Python to query search engine and store results.
  • Write filters to re-rank results based on quality.

Components Needed

  1. Google Custom Search JSON API

    • Create a programmable search engine on Google.
    • Obtain API key.
    • Free up to 100 queries/day.
  2. IDE & Tools

    • Use PyCharm, VS Code, or JupyterLab.
    • Python libraries: Flask, Pandas, Requests, BeautifulSoup.

Setting Up the Project

  1. Requirements File

    • Create requirements.txt with necessary packages.
  2. Settings File

    • Store API key, search engine ID, country code, and search URL.
  3. Database Storage

    • Use SQLite to store search results.
    • Create a class dbStorage to handle database interactions.
  4. Search Functionality

    • Use Google API to fetch results based on query.
    • Scrape full HTML of pages for filtering.
  5. Web Server Application

    • Create a Flask app to show search form and results.
    • Use HTML and CSS for styling.

Filters

  1. Content Filter

    • Penalize pages with fewer words than median.
  2. Tracker Filter

    • Penalize pages with a lot of tracker scripts.

Storing Relevance

  • Store relevance score to mark results as good or bad for future machine learning application.

Implementation Details

  • Database setup: Initiate connection, set up tables, store/retrieve data.
  • Search.py: Functions to query API, scrape pages, and manage search process.
  • App.py: Flask routes for search form and results display.
  • Filter.py: Apply filters to re-rank search results.

Future Work

  • Use relevance score for machine learning to improve filtering.
  • Potential for further customization and enhancement of filters.

Conclusion

  • Successfully created a custom search engine with improved filtering.
  • Encouragement to extend functionality with machine learning and other improvements.

Note: Ensure proper handling of API keys and privacy considerations when implementing similar solutions.