Coconote
AI notes
AI voice & video notes
Try for free
🎤
Building a Voice Assistant with AI
Sep 28, 2024
Creating a Voice Assistant Using Large Language Model
Overview of Voice Assistant Demo
A voice assistant is created to facilitate order placement.
Example interaction: user orders "1 idly and 1 Coke."
Total cost of the order: INR 70.
Voice Recognition Process
Conversion of voice to text using
Faster Whisper
model:
Faster Whisper is an optimized implementation of Open AI Whisper.
Provides up to 4x faster performance and uses less memory.
Further efficiency can be achieved with 8-bit quantization.
Requirements
Python
: 3.8 or greater.
CUDA Toolkit
: Includes necessary libraries (CUBlas, CNN).
Recommended to install it instead of individually installing libraries.
Workflow of Voice Assistant
Voice to Text Conversion
:
Use Faster Whisper to transcribe audio input.
Input example: Customer states "Pepsi and pizza."
Retrieval-Augmented Generation (RAG)
:
Text goes through RAG which fetches relevant information from a menu database.
Information is converted into chunks and embeddings saved to
Quadrant DB
.
Interaction with Large Language Model
:
The processed information is sent to an LLM for generating a response.
gtts (Google Text-to-Speech)
is used to convert the response back to voice.
Chat Memory Feature
Implements
chat memory
to retain conversation context:
Essential for maintaining interactive dialogue without repetitive queries.
Utilizes
Llama Index Chat Memory Buffer
.
Code Structure
Main Components
:
app.py
: Starting point of the application.
voice_service.py
: Handles text-to-speech functionality.
RAG Module
: Manages data retrieval and processing.
Virtual Environment Setup
Create virtual environment using:
python -m venv vnv
Install required packages from
requirements.txt
.
Main Method Functionality
Model and device configurations set in the main method.
Continuous loop to listen for customer input until the conversation ends.
Audio recording and transcription handling:
Audio chunks processed and transcribed.
Transcription sent for interaction with LLM.
Running the Application
Ensure all dependencies (AMA, Docker for Quadrant) are running.
Quick commands to initialize:
AMA serve docker run quadrant
Monitor and manage processes to avoid port conflicts.
Conclusion of Demo
Customer interaction example provided.
Accurate order handling and response generation.
Emphasis on adapting components for more extensive databases when necessary.
Summary of Key Libraries and Tools Used
Faster Whisper
: Voice to text processing.
gtts
: Text to speech conversion.
RAG
: Information retrieval framework.
CUDA Toolkit
: GPU support for model execution.
Llama Index
: Chat memory management.
Quadrant DB
: Database for storing embeddings and chunks.
📄
Full transcript