Building a Voice Assistant with LLM

Sep 28, 2024

Lecture Notes: Creating a Voice Assistant Using a Large Language Model

Introduction

  • Demonstration of creating a voice assistant using a large language model.
  • Example of a customer placing an order from an Indian menu.

Order Process Overview

  • Customer interacts through voice.
  • Voice input converted to text using Faster Whisper model.
  • Order example: "One Idly and one Coke."
  • Total cost calculated and displayed.

Faster Whisper Model

  • Faster Whisper is a reimplementation of OpenAI's Whisper model.
  • Advantages:
    • Up to 4 times faster than OpenAI Whisper.
    • Uses less memory.
    • Improved efficiency with 8-bit quantization on CPU/GPU.
  • Recommended for better performance over standard Whisper.

Requirements for Installation

  • Python version 3.8 or greater required.
  • For GPU users, install Nvidia CUDA toolkit (includes necessary packages).
  • Installation command for CPU users: pip install faster-whisper.

RAG Pipeline Overview

  • RAG (Retrieval-Augmented Generation) processes the converted text for order.
  • Textual data includes restaurant menu and beverage information.
  • Data is converted into chunks and embeddings, saved to Quadrant DB.

Example Process

  • Customer requests a beverage (e.g., Pepsi).
  • Text processed to fetch relevant chunks from the database.
  • Retrieved data processed by a large language model (LLM).

Text-to-Speech Conversion

  • Uses the gTTS (Google Text-to-Speech) package for voice output.
  • Example usage of gTTS: gTTS('Hello').save('hello.mp3').

Maintaining Chat Memory

  • Chat memory retains user interaction history.
  • Implements Llama Index Chat Memory Buffer to avoid repetitive queries.
  • Ensures conversational context is maintained across multiple requests.

Application Structure

  • Main application consists of three files:
    1. app.py - Starting point of the application.
    2. voice_service.py - Manages text-to-speech functionality.
    3. rag_module.py - Contains the RAG implementation and configuration.

Creating Virtual Environment

  • Command: python -m venv <env_name>.
  • Activate the environment and install required packages with pip install -r requirements.txt.

Code Walkthrough

  • Main method setup for voice assistant.
  • Utilizes models based on device type (CPU/GPU).
  • Loops through customer interactions until the conversation ends.
  • Checks if audio input is silent; if not, processes it for transcription.
  • Transcriptions are generated and used for further processing.

Running the Application

  • Ensure all necessary services (e.g., AMA and Quadrant DB) are running.
  • Use Docker to manage Quadrant DB containers.
  • Application communicates with the inference server to handle requests.

Example Order Interaction

  • Customer places an order (e.g., an idly and a beverage).
  • Assistant confirms order and provides total cost.
  • Interaction continues until customer confirms or modifies their order.

Conclusion

  • Overview of building a voice-based assistant using LLM and RAG.
  • Emphasis on maintaining conversation context and efficient processing.