PromtEngineer / Verbi
- ΡΡΠ΅Π΄Π°, 28 Π°Π²Π³ΡΡΡΠ° 2024β―Π³. Π² 00:00:04
A modular voice assistant application for experimenting with state-of-the-art transcription, response generation, and text-to-speech models. Supports OpenAI, Groq, Elevanlabs, CartesiaAI, and Deepgram APIs, plus local models via Ollama. Ideal for research and development in voice technology.
Welcome to the Voice Assistant project! ποΈ Our goal is to create a modular voice assistant application that allows you to experiment with state-of-the-art (SOTA) models for various components. The modular structure provides flexibility, enabling you to pick and choose between different SOTA models for transcription, response generation, and text-to-speech (TTS). This approach facilitates easy testing and comparison of different models, making it an ideal platform for research and development in voice assistant technologies. Whether you're a developer, researcher, or enthusiast, this project is for you!
config.py
for easy setup and management.voice_assistant/
βββ voice_assistant/
β βββ __init__.py
β βββ audio.py
β βββ api_key_manager.py
β βββ config.py
β βββ transcription.py
β βββ response_generation.py
β βββ text_to_speech.py
β βββ utils.py
β βββ local_tts_api.py
β βββ local_tts_generation.py
βββ .env
βββ run_voice_assistant.py
βββ setup.py
βββ requirements.txt
βββ README.md
git clone https://github.com/PromtEngineer/Verbi.git
cd Verbi
Using venv
:
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
Using conda
:
conda create --name verbi python=3.10
conda activate verbi
pip install -r requirements.txt
Create a .env
file in the root directory and add your API keys:
OPENAI_API_KEY=your_openai_api_key
GROQ_API_KEY=your_groq_api_key
DEEPGRAM_API_KEY=your_deepgram_api_key
LOCAL_MODEL_PATH=path/to/local/model
Edit config.py to select the models you want to use:
class Config:
# Model selection
TRANSCRIPTION_MODEL = 'groq' # Options: 'openai', 'groq', 'deepgram', 'fastwhisperapi' 'local'
RESPONSE_MODEL = 'groq' # Options: 'openai', 'groq', 'ollama', 'local'
TTS_MODEL = 'deepgram' # Options: 'openai', 'deepgram', 'elevenlabs', 'local', 'melotts'
# API keys and paths
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
GROQ_API_KEY = os.getenv("GROQ_API_KEY")
DEEPGRAM_API_KEY = os.getenv("DEEPGRAM_API_KEY")
LOCAL_MODEL_PATH = os.getenv("LOCAL_MODEL_PATH")
If you are running LLM locally via Ollama, make sure the Ollama server is runnig before starting verbi.
python run_voice_assistant.py
π€ Install FastWhisperAPI
Optional step if you need a local transcription model
Clone the repository
cd..
git clone https://github.com/3choff/FastWhisperAPI.git
cd FastWhisperAPI
Install the required packages:
pip install -r requirements.txt
Run the API
fastapi run main.py
Alternative Setup and Run Methods
The API can also run directly on a Docker container or in Google Colab.
Docker:
Build a Docker container:
docker build -t fastwhisperapi .
Run the container
docker run -p 8000:8000 fastwhisperapi
Refer to the repository documentation for the Google Colab method: https://github.com/3choff/FastWhisperAPI/blob/main/README.md
π€ Install Local TTS - MeloTTS
Optional step if you need a local Text to Speech model
Install MeloTTS from Github
Use the following link to install MeloTTS for your operating system.
Once the package is installed on your local virtual environment, you can start the api server using the following command.
python voice_assistant/local_tts_api.py
The local_tts_api.py
file implements as fastapi server that will listen to incoming text and will generate audio using MeloTTS model.
In order to use the local TTS model, you will need to update the config.py
file by setting:
TTS_MODEL = 'melotts' # Options: 'openai', 'deepgram', 'elevenlabs', 'local', 'melotts'
You can run the main file to start using verbi with local models.
run_verbi.py
: Main script to run the voice assistant.voice_assistant/config.py
: Manages configuration settings and API keys.voice_assistant/api_key_manager.py
: Handles retrieval of API keys based on configured models.voice_assistant/audio.py
: Functions for recording and playing audio.voice_assistant/transcription.py
: Manages audio transcription using various APIs.voice_assistant/response_generation.py
: Handles generating responses using various language models.voice_assistant/text_to_speech.py
: Manages converting text responses into speech.voice_assistant/utils.py
: Contains utility functions like deleting files.voice_assistant/local_tts_api.py
: Contains the api implementation to run the MeloTTS model.voice_assistant/local_tts_generation.py
: Contains the code to use the MeloTTS api to generated audio.voice_assistant/__init__.py
: Initializes the voice_assistant
package.Here's what's next for the Voice Assistant project:
We welcome contributions from the community! If you'd like to help improve this project, please follow these steps:
git checkout -b feature-branch
).git commit -m 'Add new feature'
).git push origin feature-branch
).