Building an End-to-End LLM Project
Project Overview
- Objective: Create a tool similar to ChatGPT that converts human language questions into SQL queries to interact with a MySQL database for a t-shirt store.
- Store Data: T-shirt store with data stored in a MySQL database.
- Features: Convert natural language questions into SQL to fetch data from the database.
Technologies Used
- Database: MySQL
- Language Model (LLM): Google Palm using LangChain
- Framework: Streamlit for UI, LangChain for the LLM-Database interface
- Vector Database: ChromaDB
- Embeddings: Hugging Face Library
- Short Learning: Few-shot learning technique for better contextual understanding
Tables in Database
- T-shirts Table
- Maintains inventory count
- Columns: Brand, Color, Size, Stock Quantity, Price per Unit
- Discounts Table
- Stores discount information
- Columns: T-shirt ID, Discount Percentage
Project Requirements
- User: Tony Sharma, Store Manager
- Issue: Existing software struggles with complex custom questions, requiring manual effort or SQL knowledge
- Solution: Create a tool where Tony can input questions in plain English, convert to SQL, execute on MySQL database, and get the response.
Technical Architecture
- Language Model: Use Google Palm (via LangChain framework)
- SQL Execution: SQL Database Chain class in LangChain
- Few-shot Learning: For handling complex queries where out-of-the-box solutions fail
- Embeddings: Convert text data into embedding vectors using Hugging Face Library
- Vector Database: Use ChromaDB to store vectors
- UI: Streamlit for a simple interface
- Documentation: Ensure LangChain basics are clear and understood
Steps to Build the Project
1. Setup Google Palm
- API Key: Get from makersuite.google.com
- Model: Using the text-bison model from Google Palm
- Creativity Parameter: Set to control the creativity level of the responses
2. Setup MySQL Database
- Tool: MySQL Workbench
- Schema and Data: Import provided SQL script to create and populate tables
- Verification: Confirm tables are correctly set up with sample data
3. Jupyter Notebook Setup
- Installation: Ensure all required libraries are installed via requirements.txt
- Code Execution:
- Initialize Google Palm and SQL database objects
- Create and test LLM object with sample prompts
- Create SQL Database Chain object
- Execute queries to verify connection and correct SQL generation
4. Integrate Few-Shot Learning
- Error Handling: Handle complex queries using few-shot learning examples
- Embedding Generation: Use Hugging Face to create embeddings
- Vector Store: Store embedding vectors in ChromaDB
- Example Selector: Use semantic similarity to find similar queries
- Custom SQL Prompt: Create a prompt template for better query generation
5. Building the Streamlit UI
- UI Components: Input box for natural language questions, display area for responses
- Code Integration: Move code from Jupyter Notebook to Python scripts; encapsulate functionality in helper functions
- Execution: Run the Streamlit app, verify functionality
Final Notes
- Practice: Follow along with the coding steps for practical understanding
- Learning: Ensure to not just watch but also try out code examples
- Resources: All code files and additional resources provided in the video description
Reminder: Always secure sensitive information like API keys and database credentials using environment variables or secure storage.