🤖

Building a Voice Assistant with LLM

Sep 28, 2024

Lecture Notes: Creating a Voice Assistant Using a Large Language Model

Introduction

Demonstration of creating a voice assistant using a large language model.
Example of a customer placing an order from an Indian menu.

Order Process Overview

Customer interacts through voice.
Voice input converted to text using Faster Whisper model.
Order example: "One Idly and one Coke."
Total cost calculated and displayed.

Faster Whisper Model

Faster Whisper is a reimplementation of OpenAI's Whisper model.
Advantages:
- Up to 4 times faster than OpenAI Whisper.
- Uses less memory.
- Improved efficiency with 8-bit quantization on CPU/GPU.
Recommended for better performance over standard Whisper.

Requirements for Installation

Python version 3.8 or greater required.
For GPU users, install Nvidia CUDA toolkit (includes necessary packages).
Installation command for CPU users: pip install faster-whisper.

RAG Pipeline Overview

RAG (Retrieval-Augmented Generation) processes the converted text for order.
Textual data includes restaurant menu and beverage information.
Data is converted into chunks and embeddings, saved to Quadrant DB.

Example Process

Customer requests a beverage (e.g., Pepsi).
Text processed to fetch relevant chunks from the database.
Retrieved data processed by a large language model (LLM).

Text-to-Speech Conversion

Uses the gTTS (Google Text-to-Speech) package for voice output.
Example usage of gTTS: gTTS('Hello').save('hello.mp3').

Maintaining Chat Memory

Chat memory retains user interaction history.
Implements Llama Index Chat Memory Buffer to avoid repetitive queries.
Ensures conversational context is maintained across multiple requests.

Application Structure

Main application consists of three files:
1. app.py - Starting point of the application.
2. voice_service.py - Manages text-to-speech functionality.
3. rag_module.py - Contains the RAG implementation and configuration.

Creating Virtual Environment

Command: python -m venv <env_name>.
Activate the environment and install required packages with pip install -r requirements.txt.

Code Walkthrough

Main method setup for voice assistant.
Utilizes models based on device type (CPU/GPU).
Loops through customer interactions until the conversation ends.
Checks if audio input is silent; if not, processes it for transcription.
Transcriptions are generated and used for further processing.

Running the Application

Ensure all necessary services (e.g., AMA and Quadrant DB) are running.
Use Docker to manage Quadrant DB containers.
Application communicates with the inference server to handle requests.

Example Order Interaction

Customer places an order (e.g., an idly and a beverage).
Assistant confirms order and provides total cost.
Interaction continues until customer confirms or modifies their order.

Conclusion

Overview of building a voice-based assistant using LLM and RAG.
Emphasis on maintaining conversation context and efficient processing.

Full transcript