FunASR-APP is a comprehensive speech application toolkit designed to facilitate the application and integration of FunASR's open-source speech models. Its primary goal is to package the models into convenient application packages, enabling easy application and seamless integration.
- 10/17 Bug fix for multiple periods chosen, used to return video with wrong length.
- 10/10 ClipVideo now supports recognizing with speaker diarization ability, choose 'yes' button in 'Recognize Speakers' and you will get recognition results with speaker id for each sentence. And then you can clip out the periods of one or some speakers (e.g. 'spk0' or 'spk0#spk3') using ClipVideo.
As the first application toolkit of FunASR-APP, ClipVideo enables users to clip .mp4
video files or .wav
audio files with chosen text segments out of the recognition results generated by Paraformer-long model.
Under the help of ClipVideo you can get the video clips easily with the following steps (in Gradio service):
- Step1: Upload your video file (or try the example videos below)
- Step2: Copy the text segments you need to 'Text to Clip'
- Step3: Adjust subtitle settings (if needed)
- Step4: Click 'Clip' or 'Clip and Generate Subtitles'
git clone https://github.com/alibaba-damo-academy/FunASR-APP.git
cd FunASR-APP
# install modelscope
pip install "modelscope[audio_asr]" -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html
# python environments
pip install -r ClipVideo/requirments.txt
(Optional) If you want to clip video file with embedded subtitles
- ffmpeg and imagemagick is required
- On Ubuntu
apt-get -y update && apt-get -y install ffmpeg imagemagick
sed -i 's/none/read,write/g' /etc/ImageMagick-6/policy.xml
- On MacOS
brew install imagemagick
sed -i 's/none/read,write/g' /usr/local/Cellar/imagemagick/7.1.1-8_1/etc/ImageMagick-7/policy.xml
- Download font file to ClipVideo/font
wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ClipVideo/STHeitiMedium.ttc -O ClipVideo/font/STHeitiMedium.ttc
You can try ClipVideo in modelscope space: link.
You can establish your own ClipVideo service which is same as Modelscope Space as follow:
python clipvideo/gradio_service.py
then visit localhost:7860
you will get a Gradio service like below and you can use ClipVideo following the steps:
ClipVideo supports you to recognize and clip with commands:
# working in ClipVideo/
# step1: Recognize
python clipvideo/videoclipper.py --stage 1 \
--file examples/2022云栖大会_片段.mp4 \
--output_dir ./output
# now you can find recognition results and entire SRT file in ./output/
# step2: Clip
python clipvideo/videoclipper.py --stage 2 \
--file examples/2022云栖大会_片段.mp4 \
--output_dir ./output \
--dest_text '我们把它跟乡村振兴去结合起来,利用我们的设计的能力' \
--start_ost 0 \
--end_ost 100 \
--output_file './output/res.mp4'
FunASR hopes to build a bridge between academic research and industrial applications on speech recognition. By supporting the training & finetuning of the industrial-grade speech recognition model released on ModelScope, researchers and developers can conduct research and production of speech recognition models more conveniently, and promote the development of speech recognition ecology. ASR for Fun!