Skip to content

Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2

License

Notifications You must be signed in to change notification settings

socialdroids/whisper_ros

 
 

Repository files navigation

whisper_ros

This repository provides a set of ROS 2 packages to integrate whisper.cpp into ROS 2 using audio_common 4.0.3. Besides, silero-vad is used to perform VAD (Voice Activity Detection).

License: MIT GitHub release Code Size Last Commit GitHub issues GitHub pull requests Contributors Python Formatter Check C++ Formatter Check

ROS 2 Distro Branch Build status Docker Image Documentation
Humble main Humble Build Docker Image Doxygen Deployment
Iron main Iron Build Docker Image Doxygen Deployment
Jazzy main Jazzy Build Docker Image Doxygen Deployment
Rolling main Rolling Build Docker Image Doxygen Deployment

Table of Contents

  1. Related Projects
  2. Installation
  3. Docker
  4. Usage
  5. Demos

Related Projects

  • chatbot_ros → This chatbot, integrated into ROS 2, uses whisper_ros, to listen to people speech; and llama_ros, to generate responses. The chatbot is controlled by a state machine created with YASMIN.

Installation

To run whisper_ros with CUDA, first, you must install the CUDA Toolkit.

Notes: Miguel

LM-Studio

It is necessary to install LM-Studio and set up a model to run. If this is not done, the model will not respond.

lmstudio

After downloading, go to the "Developer" section (identified in green and located on the right-hand sidebar). Run a model —I suggest Llama— and enable the status slider to "Running." After this, you can proceed.

Install package

mkdir ~/ros2_ws/src
cd ~/ros2_ws/src
git clone https://github.com/mgonzs13/audio_common.git
git clone [email protected]:socialdroids/whisper_ros.git
pip3 install -r whisper_ros/requirements.txt
cd ~/ros2_ws
rosdep install --from-paths src --ignore-src -r -y
colcon build --cmake-args -DGGML_CUDA=ON # add this for CUDA

Usage

Run Silero for VAD and Whisper for STT:

ros2 launch whisper_bringup whisper.launch.py

Demos

Try the example of a whisper client:

ros2 run whisper_demos whisper_demo_node

About

Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 74.1%
  • Python 18.6%
  • CMake 5.8%
  • Dockerfile 1.5%