Creating a Voice Assistant Using Large Language Model

Overview of Voice Assistant Demo

Conversion of voice to text using Faster Whisper model:
- Faster Whisper is an optimized implementation of Open AI Whisper.
- Provides up to 4x faster performance and uses less memory.
- Further efficiency can be achieved with 8-bit quantization.

Python: 3.8 or greater.
CUDA Toolkit: Includes necessary libraries (CUBlas, CNN).
- Recommended to install it instead of individually installing libraries.

Voice to Text Conversion:
- Use Faster Whisper to transcribe audio input.
- Input example: Customer states "Pepsi and pizza."
Retrieval-Augmented Generation (RAG):
- Text goes through RAG which fetches relevant information from a menu database.
- Information is converted into chunks and embeddings saved to Quadrant DB.
Interaction with Large Language Model:
- The processed information is sent to an LLM for generating a response.
- gtts (Google Text-to-Speech) is used to convert the response back to voice.

Implements chat memory to retain conversation context:
- Essential for maintaining interactive dialogue without repetitive queries.
- Utilizes Llama Index Chat Memory Buffer.

Main Components:
- app.py: Starting point of the application.
- voice_service.py: Handles text-to-speech functionality.
- RAG Module: Manages data retrieval and processing.

Model and device configurations set in the main method.
Continuous loop to listen for customer input until the conversation ends.
Audio recording and transcription handling:
- Audio chunks processed and transcribed.
- Transcription sent for interaction with LLM.