Update on model development #26

ClarissaGazalaEvanthe · 2023-07-31T03:56:34Z

Please make a simple model for this test program, which can be used immediately. I'm not very good at python, sorry to bother you

PABannier · 2023-07-31T07:56:52Z

Hi @gwcangtip ! The first model supported is the basic 24KHz model presented by Suno in their demo. It should be available by the end of this week.

planatscher · 2023-08-01T12:03:26Z

This will be really awesome. Can't wait to use it!

ClarissaGazalaEvanthe · 2023-08-07T11:03:26Z

Hi @gwcangtip ! The first model supported is the basic 24KHz model presented by Suno in their demo. It should be available by the end of this week.

When do you make ready-to-use models?

PABannier · 2023-08-07T11:54:57Z

Hi @gwcangtip ! Thanks for the interest for the repo. I'm making a quick update for anyone interested in bark.cpp. I've spent the past week cleaning the repo and making sure the implementations of the 3 encoders were right. I have yet to integrate encodec.cpp (already implemented here) to bark.cpp. I'm making the final stretch of work this week.

PABannier · 2023-08-11T08:41:52Z

Hello everyone! Quick update on the recent progress made in the last week.

All the components (the 3 encoders and the Codec model) are now implemented and working. The end-to-end pipeline works fine, and I do obtain high-quality audio in output. Currently, I still have spotted 2 bugs (one in the tokenizer, one in the fine encoder) which make the model produce non-sense for some inputs. After fixing these two bugs, we should have a first working version of bark.

Regarding the performance, the model takes 17 seconds on my MacBook Pro M2 to generate a 2-second audio. There are still a lot of improvements to be made (unnecessary memory copies for instance) on the codebase. Furthermore, I expect significant improvement in speed once we support mixed precision and quantization. We have a dedicated issue (#46) to perform benchmarks and I'll publish them in the README once the aforementioned bugs are fixed.

kskelm · 2023-08-11T12:33:43Z

Thanks for all the work you've put into this, @PABannier ! I can't wait to see this evolve as it gets more efficient.

jzeiber · 2023-08-12T20:10:37Z

Regarding the performance, the model takes 17 seconds on my MacBook Pro M2 to generate a 2-second audio.

On a Ryzen 3600 using 6 threads, I see about 2 minutes for the "this is an audio" prompt. That's with AVX2 enabled for GGML. I tried with OpenBLAS, but that was even slower. I'm not sure why it's so slow.

Also, what needs to be done to be able to reuse the model for subsequent calls to bark_generate_audio? I can put the calls to bark_generate_audio in the loop with the already loaded model, but after 5 or so calls it crashes because it can't allocate any more memory. I'm not sure what needs to be cleaned up between calls.

I've also tried with different seed values, and most of them sound terrible, or are not spoken audio at all.

PABannier · 2023-08-12T20:22:58Z

Hi @jzeiber ! Thanks for the info. As for the nonsense output, I have yet to fix a bug in the fine encoder. This is why we have poor output for most of the prompts.

As for memory allocation, have you tried re creating a GGML context for each model, every time you generate a prompt?

As for speed, I'm sure there are some memory leaks or unnecessary copies that I'll need to track down. But first i'm focusing on fixing the aforementioned bug in the fine encoder.

jzeiber · 2023-08-12T20:50:20Z

Hi @jzeiber ! Thanks for the info. As for the nonsense output, I have yet to fix a bug in the fine encoder. This is why we have poor output for most of the prompts.

Alright, that's makes sense. It was just quite curious that seed value 0 seems to be the best with different prompts. I'm not sure what's special about that seed.

As for memory allocation, have you tried re creating a GGML context for each model, every time you generate a prompt?

I haven't tried. I was trying to avoid having to reload the entire model each time, but if I can just recreate the model ctxs each time that should work.

As for speed, I'm sure there are some memory leaks or unnecessary copies that I'll need to track down. But first i'm focusing on fixing the aforementioned bug in the fine encoder.

Yes, that sounds good. Get the basics done first to get good output, then improve what's there. Great work so far!

PABannier · 2023-08-14T18:10:36Z

Quick update, I wrote 3 unit tests comparing the output of the fine encoder against the original bark implementation.

./data/fine/test_fine_1.bin
run_test_on_codes : failed test
       abs_tol=0.0100, rel_tol=0.0100, abs max viol=0.0917, viol=80.0%
   TEST 1 FAILED.
./data/fine/test_fine_2.bin
run_test_on_codes : failed test
       abs_tol=0.0100, rel_tol=0.0100, abs max viol=89.0242, viol=100.0%
   TEST 2 FAILED.
./data/fine/test_fine_3.bin
run_test_on_codes : failed test
       abs_tol=0.0100, rel_tol=0.0100, abs max viol=0.1022, viol=89.4%
   TEST 3 FAILED.

All tests are currently failing, meaning that the fine encoder is not correctly implemented. More interestingly, the absolute difference in the logits of the fine encoder is not significant for all the token sequences (e.g. test 1 with only an abs max viol of 0.0917). In practice, this gives noisy outputs or missing words in the generated audio. However, when the difference is significant (e.g. test 2 with a abs max viol of 89.0242), the model is spewing out non sense.

After investigation, the bug is in the non causal self attention block. Although the queries and keys are identical, KQ is completely different from q @ transpose(-2, -1) and full of almost zero values. This is strange: i've checked the dimensions, the strides (making the key and query tensors contiguous did not change anything) and obviously the coefficients as stated previously.

Pinging @jzeiber @Green-Sky @kskelm @jmtatsch as they are following the updates on the model development.

PABannier · 2024-04-10T13:45:02Z

For those interested in Bark,

We now have a first working stable version of bark.cpp that supports quantization with #139 !
Make sure to pull the latest version of Encodec and Bark, by following the instructions.

Feel free to send me any feedback :)

PABannier changed the title ~~Can this be used already? how to get models?~~ Update on model development Aug 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update on model development #26

Update on model development #26

ClarissaGazalaEvanthe commented Jul 31, 2023

PABannier commented Jul 31, 2023

planatscher commented Aug 1, 2023

ClarissaGazalaEvanthe commented Aug 7, 2023

PABannier commented Aug 7, 2023

PABannier commented Aug 11, 2023 •

edited

Loading

kskelm commented Aug 11, 2023

jzeiber commented Aug 12, 2023

PABannier commented Aug 12, 2023

jzeiber commented Aug 12, 2023

PABannier commented Aug 14, 2023 •

edited

Loading

PABannier commented Apr 10, 2024

Update on model development #26

Update on model development #26

Comments

ClarissaGazalaEvanthe commented Jul 31, 2023

PABannier commented Jul 31, 2023

planatscher commented Aug 1, 2023

ClarissaGazalaEvanthe commented Aug 7, 2023

PABannier commented Aug 7, 2023

PABannier commented Aug 11, 2023 • edited Loading

kskelm commented Aug 11, 2023

jzeiber commented Aug 12, 2023

PABannier commented Aug 12, 2023

jzeiber commented Aug 12, 2023

PABannier commented Aug 14, 2023 • edited Loading

PABannier commented Apr 10, 2024

PABannier commented Aug 11, 2023 •

edited

Loading

PABannier commented Aug 14, 2023 •

edited

Loading