-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update on model development #26
Comments
Hi @gwcangtip ! The first model supported is the basic 24KHz model presented by Suno in their demo. It should be available by the end of this week. |
This will be really awesome. Can't wait to use it! |
When do you make ready-to-use models? |
Hi @gwcangtip ! Thanks for the interest for the repo. I'm making a quick update for anyone interested in |
Hello everyone! Quick update on the recent progress made in the last week. All the components (the 3 encoders and the Codec model) are now implemented and working. The end-to-end pipeline works fine, and I do obtain high-quality audio in output. Currently, I still have spotted 2 bugs (one in the tokenizer, one in the fine encoder) which make the model produce non-sense for some inputs. After fixing these two bugs, we should have a first working version of bark. Regarding the performance, the model takes 17 seconds on my MacBook Pro M2 to generate a 2-second audio. There are still a lot of improvements to be made (unnecessary memory copies for instance) on the codebase. Furthermore, I expect significant improvement in speed once we support mixed precision and quantization. We have a dedicated issue (#46) to perform benchmarks and I'll publish them in the README once the aforementioned bugs are fixed. |
Thanks for all the work you've put into this, @PABannier ! I can't wait to see this evolve as it gets more efficient. |
On a Ryzen 3600 using 6 threads, I see about 2 minutes for the "this is an audio" prompt. That's with AVX2 enabled for GGML. I tried with OpenBLAS, but that was even slower. I'm not sure why it's so slow. Also, what needs to be done to be able to reuse the model for subsequent calls to bark_generate_audio? I can put the calls to bark_generate_audio in the loop with the already loaded model, but after 5 or so calls it crashes because it can't allocate any more memory. I'm not sure what needs to be cleaned up between calls. I've also tried with different seed values, and most of them sound terrible, or are not spoken audio at all. |
Hi @jzeiber ! Thanks for the info. As for the nonsense output, I have yet to fix a bug in the fine encoder. This is why we have poor output for most of the prompts. As for memory allocation, have you tried re creating a GGML context for each model, every time you generate a prompt? As for speed, I'm sure there are some memory leaks or unnecessary copies that I'll need to track down. But first i'm focusing on fixing the aforementioned bug in the fine encoder. |
Alright, that's makes sense. It was just quite curious that seed value 0 seems to be the best with different prompts. I'm not sure what's special about that seed.
I haven't tried. I was trying to avoid having to reload the entire model each time, but if I can just recreate the model ctxs each time that should work.
Yes, that sounds good. Get the basics done first to get good output, then improve what's there. Great work so far! |
Quick update, I wrote 3 unit tests comparing the output of the fine encoder against the original bark implementation.
All tests are currently failing, meaning that the fine encoder is not correctly implemented. More interestingly, the absolute difference in the logits of the fine encoder is not significant for all the token sequences (e.g. test 1 with only an After investigation, the bug is in the non causal self attention block. Although the queries and keys are identical, Pinging @jzeiber @Green-Sky @kskelm @jmtatsch as they are following the updates on the model development. |
For those interested in Bark, We now have a first working stable version of bark.cpp that supports quantization with #139 ! Feel free to send me any feedback :) |
Please make a simple model for this test program, which can be used immediately. I'm not very good at python, sorry to bother you
The text was updated successfully, but these errors were encountered: