forked from ggerganov/whisper.cpp
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update README.md and finalize the whisper.wasm example
- Loading branch information
Showing
7 changed files
with
39 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,27 @@ | ||
# whisper.wasm | ||
|
||
Live demo: https://whisper.ggerganov.com | ||
Inference of [OpenAI's Whisper ASR model](https://github.com/openai/whisper) inside the browser | ||
|
||
This example uses a WebAssembly (WASM) port of the [whisper.cpp](https://github.com/ggerganov/whisper.cpp) | ||
implementation of the transformer to run the inference inside a web page. The audio data does not leave your computer - | ||
it is processed locally on your machine. The performance is not great but you should be able to achieve x2 or x3 | ||
real-time for the `tiny` and `base` models on a modern CPU and browser (i.e. transcribe a 60 seconds audio in about | ||
~20-30 seconds). | ||
|
||
This WASM port utilizes [WASM SIMD 128-bit intrinsics](https://emcc.zcopy.site/docs/porting/simd/) so you have to make | ||
sure that [your browser supports them](https://webassembly.org/roadmap/). | ||
|
||
The example is capable of running all models up to size `small` inclusive. Beyond that, the memory requirements and | ||
performance are unsatisfactory. The implementation currently support only the `Greedy` sampling strategy. Both | ||
transcription and translation are supported. | ||
|
||
Since the model data is quite big (74MB for the `tiny` model) you need to manually load the model into the web-page. | ||
|
||
The example supports both loading audio from a file and recording audio from the microphone. The maximum length of the | ||
audio is limited to 120 seconds. | ||
|
||
## Live demo | ||
|
||
Link: https://whisper.ggerganov.com | ||
|
||
![image](https://user-images.githubusercontent.com/1991296/197348344-1a7fead8-3dae-4922-8b06-df223a206603.png) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.