👕

Building LLM Tool for T-Shirt Store

Aug 3, 2024

Lecture: Building an End-to-End LLM Project for a T-Shirt Store

Overview

  • The project aims to build a tool similar to ChatGPT that converts natural language questions into SQL queries to interact with a MySQL database.
  • The store data includes t-shirt inventory and discounts.
  • The project will use Google PaLM, LangChain, and other technologies.
  • The goal is to allow the store manager to query the database in plain English.

Project Requirements

  • Store Data: Includes four brands (V Hussein, Levis, Nike, Adidas) stored in a MySQL database.
    • Tables:
      • t-shirts: Maintains inventory count, price per unit.
      • discounts: Maintains discount percentage for each t-shirt.
  • Key People:
    • Tony Sharma: Store manager, uses software to query the database.
    • Lok: Data analyst who writes SQL queries, often busy.
    • Peter P: Data scientist tasked with building the query tool.

Technical Architecture

  • LLM (Google PaLM): Converts natural language questions into SQL queries.
  • LangChain Framework: Utilizes Google PaLM within LangChain's SQLDatabaseChain class.
  • Few-Shot Learning: Improves query accuracy by preparing a dataset of sample questions and corresponding SQL queries.
  • Vector Database: Stores embeddings of the few-shot learning data.
    • Options include Pinecone, Milvus, ChromaDB (using ChromaDB for this project).
  • Streamlit: Used to build the UI.
  • API Keys: Set up through Google Maker Suite.

Implementation Steps

MySQL Database Setup

  • Use MySQL Workbench for database operations.
  • Create necessary tables (t-shirts and discounts) and populate sample data.

Jupyter Notebook Experimentation

  • Setup: Import necessary libraries and set up Google PaLM API key.
  • Data Connection: Connect Jupyter Notebook to the MySQL database.
  • Creating LLM Object: Create and test Google PaLM LLM object.
  • Sample Queries: Run sample queries to ensure the setup works.
  • Handling Errors: Use few-shot learning to improve the model's performance on more complex queries.

Few-Shot Learning

  • Purpose: Improve the LLM's ability to handle complex queries by providing sample question-query pairs.
  • Embeddings: Use Hugging Face to generate embeddings for the sample data.
  • Vector Database: Store embeddings in ChromaDB.
  • Example Selector: Use semantic search to find the most similar examples from the vector database.
  • Custom SQL Prompt: Use LangChain's pre-built SQL prompt to guide the LLM.
  • Testing: Run various queries to test improvements.

Streamlit Interface

  • Setup: Create a main Python file (main.py) to house the Streamlit code.
  • UI Elements: Text input for questions, display answers.
  • Integration: Transfer code from Jupyter Notebook to Python files, modularize the code.
  • Run Streamlit: Test the user interface and functionality.

Conclusion

  • The tool allows the store manager to query the database using natural language, improving efficiency.
  • Encouraged to practice coding and explore the code provided in the video description.

Additional Resources

  • LangChain Basics Video
  • Vector Database Explanation Video
  • MySQL Tutorial Video
  • Embeddings Introduction Video