Lecture: Building an End-to-End LLM Project for a T-Shirt Store

Overview

The project aims to build a tool similar to ChatGPT that converts natural language questions into SQL queries to interact with a MySQL database.
The store data includes t-shirt inventory and discounts.
The project will use Google PaLM, LangChain, and other technologies.
The goal is to allow the store manager to query the database in plain English.

Store Data: Includes four brands (V Hussein, Levis, Nike, Adidas) stored in a MySQL database.
- Tables:
  - t-shirts: Maintains inventory count, price per unit.
  - discounts: Maintains discount percentage for each t-shirt.
Key People:
- Tony Sharma: Store manager, uses software to query the database.
- Lok: Data analyst who writes SQL queries, often busy.
- Peter P: Data scientist tasked with building the query tool.

LLM (Google PaLM): Converts natural language questions into SQL queries.
LangChain Framework: Utilizes Google PaLM within LangChain's SQLDatabaseChain class.
Few-Shot Learning: Improves query accuracy by preparing a dataset of sample questions and corresponding SQL queries.
Vector Database: Stores embeddings of the few-shot learning data.
- Options include Pinecone, Milvus, ChromaDB (using ChromaDB for this project).
Streamlit: Used to build the UI.
API Keys: Set up through Google Maker Suite.

Setup: Import necessary libraries and set up Google PaLM API key.
Data Connection: Connect Jupyter Notebook to the MySQL database.
Creating LLM Object: Create and test Google PaLM LLM object.
Sample Queries: Run sample queries to ensure the setup works.
Handling Errors: Use few-shot learning to improve the model's performance on more complex queries.

Purpose: Improve the LLM's ability to handle complex queries by providing sample question-query pairs.
Embeddings: Use Hugging Face to generate embeddings for the sample data.
Vector Database: Store embeddings in ChromaDB.
Example Selector: Use semantic search to find the most similar examples from the vector database.
Custom SQL Prompt: Use LangChain's pre-built SQL prompt to guide the LLM.
Testing: Run various queries to test improvements.

Setup: Create a main Python file (main.py) to house the Streamlit code.
UI Elements: Text input for questions, display answers.
Integration: Transfer code from Jupyter Notebook to Python files, modularize the code.
Run Streamlit: Test the user interface and functionality.

The tool allows the store manager to query the database using natural language, improving efficiency.
Encouraged to practice coding and explore the code provided in the video description.