🛒

Guide to Web Scraping E-commerce Sites

Sep 24, 2024

Web Scraper Tutorial Notes

Introduction

  • Demonstration on scraping a simple e-commerce site using Web Scraper.

Site Structure Overview

  • The site has a two-level navigation:
    • Categories
    • Subcategories
  • Each subcategory contains a list of products leading to product pages.

Creating a Sitemap

  • Open Web Scraper's toolbar and developer tools.
  • Create a new sitemap: ecommerce.
  • Set the start URL to the landing page of the e-commerce site.
  • Web Scraper will start scraping from this URL and navigate the rest of the site.

Building Selectors

  • Selectors are structured in a tree-like arrangement, similar to site structure.

Category Link Selector

  • Create a Link Selector called category link.
  • Change the type to link.
  • Click the Select button to choose category link elements.
  • Check the multiple checkbox for multiple links on the page.
  • Validate the selector using data preview.

Subcategory Link Selector

  • Navigate to a category page to create a subcategory link selector.
  • Click on the category selector to execute it on the category page.
  • Create a Link Selector for subcategories with similar steps as the category link selector.
  • Validate with data preview.

Product Page Navigation

  • On the subcategory page, create a link selector for product pages.
  • Confirm that the selectors navigate from the start URL to product pages.

Data Extraction from Product Pages

  • Under the product link selector, add multiple data extraction selectors:
    • Text Selector for the product name.
    • Text Selectors for price and description.
    • Image Selector for the product image URL.
  • Use data preview to check that these selectors work.

Overview of Sitemap

  • Open the selector graph to visualize how the sitemap is structured.
  • Confirm navigation through link selectors to product pages.

Running the Scraper

  • Launch the scraper locally.
  • Open the scrape view and click the start scraping button.
  • A pop-up window will show URLs being loaded to extract data.
  • Use the refresh button to check the scraping progress.

Exporting Scraped Data

  • After scraping, open export data view to access a download link for the scraped data.
  • Check the exported data for verification.

Additional Resources

  • For more information, visit webscraper.io for tutorials and documentation on selector types.