🛠️

Building an End-to-End LLM Project

Jun 25, 2024

Building an End-to-End LLM Project

Project Overview

Objective: Create a tool similar to ChatGPT that converts human language questions into SQL queries to interact with a MySQL database for a t-shirt store.
Store Data: T-shirt store with data stored in a MySQL database.
Features: Convert natural language questions into SQL to fetch data from the database.

Technologies Used

Database: MySQL
Language Model (LLM): Google Palm using LangChain
Framework: Streamlit for UI, LangChain for the LLM-Database interface
Vector Database: ChromaDB
Embeddings: Hugging Face Library
Short Learning: Few-shot learning technique for better contextual understanding

Tables in Database

T-shirts Table
- Maintains inventory count
- Columns: Brand, Color, Size, Stock Quantity, Price per Unit
Discounts Table
- Stores discount information
- Columns: T-shirt ID, Discount Percentage

Project Requirements

User: Tony Sharma, Store Manager
Issue: Existing software struggles with complex custom questions, requiring manual effort or SQL knowledge
Solution: Create a tool where Tony can input questions in plain English, convert to SQL, execute on MySQL database, and get the response.

Technical Architecture

Language Model: Use Google Palm (via LangChain framework)
SQL Execution: SQL Database Chain class in LangChain
Few-shot Learning: For handling complex queries where out-of-the-box solutions fail
Embeddings: Convert text data into embedding vectors using Hugging Face Library
Vector Database: Use ChromaDB to store vectors
UI: Streamlit for a simple interface
Documentation: Ensure LangChain basics are clear and understood

Steps to Build the Project

1. Setup Google Palm

API Key: Get from makersuite.google.com
Model: Using the text-bison model from Google Palm
Creativity Parameter: Set to control the creativity level of the responses

2. Setup MySQL Database

Tool: MySQL Workbench
Schema and Data: Import provided SQL script to create and populate tables
Verification: Confirm tables are correctly set up with sample data

3. Jupyter Notebook Setup

Installation: Ensure all required libraries are installed via requirements.txt
Code Execution:
1. Initialize Google Palm and SQL database objects
2. Create and test LLM object with sample prompts
3. Create SQL Database Chain object
4. Execute queries to verify connection and correct SQL generation

4. Integrate Few-Shot Learning

Error Handling: Handle complex queries using few-shot learning examples
Embedding Generation: Use Hugging Face to create embeddings
Vector Store: Store embedding vectors in ChromaDB
Example Selector: Use semantic similarity to find similar queries
Custom SQL Prompt: Create a prompt template for better query generation

5. Building the Streamlit UI

UI Components: Input box for natural language questions, display area for responses
Code Integration: Move code from Jupyter Notebook to Python scripts; encapsulate functionality in helper functions
Execution: Run the Streamlit app, verify functionality

Final Notes

Practice: Follow along with the coding steps for practical understanding
Learning: Ensure to not just watch but also try out code examples
Resources: All code files and additional resources provided in the video description

Reminder: Always secure sensitive information like API keys and database credentials using environment variables or secure storage.

Full transcript