Working on a training attempt with stabilityai/stablelm-2-12b-chat as the base #117

simcop2387 · 2024-04-16T18:04:24Z

simcop2387
Apr 16, 2024

So like the title says, I've got enough GPU to try to train with a larger model. This first run I'm doing I don't expect to work properly (I think I need to adjust a few things still for it since it has it's own function calling syntax) but once I get something that looks remotely usable, would anyone else be interested in helping test/play with it? I've not got things optimized for training so I expect it to take about 2 weeks per run based on the speed it's going. (I'm using a Tesla A16 (64GB aka 4x16) and an A6000 48GB right now, but I might be swapping in an A5000 24GB in a bit after futzing with my other server (replacing the A5000 with an Intel A770 for VDI instead of AI).

My main goals in using a larger model, is to try to get a model that will hold a bigger/more coherent conversation about the home, and later play with adding some other ideas like multi-modal models eventually and add camera/picture support but I need to learn how to get the training working first.

acon96 · 2024-04-16T22:26:11Z

acon96
Apr 16, 2024
Maintainer

Awesome! I would recommend interleaving the Home Assistant requests dataset with another one that is more "instruct" focused if you want to retain more of the general purpose abilities of the model. Starting with a chat or instruct fine-tuned model also helps with that. WizardLM70k (the original dataset) is already supported by the generate_home_assistant_data.py script and would probably work well here.

For the other settings:

I would recommend going with the --large dataset size given the parameter count of the model.
Increasing the batch size helps a lot with generalization but requires more examples (32-64 is a good sweet spot that I've found).
Learning rate is model size dependent. I use 1e-5 for 3B and 1B models but you probably need a lower rate for a 12B model (closer to 1e-6)
I rarely go over 1 epoch. If the model needs more training time I just increase the size of the dataset (hence the --xl size switch)

Looking forward to seeing results!

4 replies

simcop2387 Apr 17, 2024
Author

I actually went with the --xl dataset given the size of the model, figured it's not going to hurt at least to see that I can get training to actually do something (even if it isn't all that successful the first time). I'll play with the batch size once I see things actually complete, because I'm sure this isn't going to work properly the first time :). I actually did tweak things around a bit and managed to get the training to fit on only the A6000 so it's now going significantly faster (50-70 hours it estimates) so I'll probably have the first broken model in not too long.

I didn't merge with another data-set yet since StableLM-2-12b-chat has some of it's own instruction tuning and I didn't want to add any extra influence yet since I don't know so much about what I'm doing. Once I see this work I'm thinking that I'll adjust the generate script to produce the function calls and other bits in the same way that the model originally supports so that it tries to keep things consistent there (and then a new regex for parsing it of course).

simcop2387 Apr 18, 2024
Author

Still waiting on the training (says 17 hours left at this point) but I've tested a checkpoint and it at least seems to function but I haven't done any big tests yet. Been looking at some datasets to intersperse with it on the next attempt and I found this:

https://huggingface.co/datasets/hypervariance/function-calling-sharegpt?row=0

Which looks to largely match the kinds of stuff that the original model has been trained with, so I'm thinking that my next attempt will add that, plus either an alpaca or wizardlm dataset. I'll have to play with all the training data still though to make them all match formats for things but I'm hoping that combining that glaiveai (turned into sharegpt format by hypervariance) will help keep "normal" function calling working at the same time which might help the service calls stay stable regardless of whatever nonsense I try after this (I'm wanting to experiment with adding things like generating weather reports and such from forecast entities and more "chatty" ideas like that).

simcop2387 Apr 19, 2024
Author

Alright training finished, and while that was going on the proper support in llama.cpp got merged as part of ggml-org/llama.cpp#6635 so I'll be playing/testing with an f16 gguf today while I'm waiting on stuff at $WORK to happen. I don't expect good results just yet since I did everything the naive way but I'm hoping that it at least functions most of the way. Then I'll try to adjust all the training data and merge things over the weekend and do another training next week which should hopefully produce something usable that I'll upload to HF for others to play with. Took 53.5 hours to run through the training this time so it shouldn't be too bad to iterate on a few times at least.

simcop2387 Apr 19, 2024
Author

Ok, yea as I expected it's completely incoherent. I'll play with the training data/script to try to massage things into a better format for StableLM-2-12b so that it doesn't completely confuse the model (I probably also made mistakes in parameters when training but not sure about anything just yet)

simcop2387 · 2024-04-18T16:25:01Z

simcop2387
Apr 18, 2024
Author

Mostly off this topic but related, Just saw that LLaMa 3 got released today with an 8B parameter model. Once support exists in llama.cpp I might try training one of those with the above function calling dataset and an instruct dataset after I make a good pass on the StableLM-2-12b model here.

1 reply

acon96 Apr 18, 2024
Maintainer

Apparently it already works on llama.cpp because there weren't any huge architectural changes from Llama 2 -> Llama 3

I might try out training a LoRA sometime later.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Working on a training attempt with stabilityai/stablelm-2-12b-chat as the base #117

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Working on a training attempt with stabilityai/stablelm-2-12b-chat as the base #117

Uh oh!

simcop2387 Apr 16, 2024

Replies: 2 comments · 5 replies

Uh oh!

acon96 Apr 16, 2024 Maintainer

Uh oh!

simcop2387 Apr 17, 2024 Author

Uh oh!

simcop2387 Apr 18, 2024 Author

Uh oh!

simcop2387 Apr 19, 2024 Author

Uh oh!

simcop2387 Apr 19, 2024 Author

Uh oh!

simcop2387 Apr 18, 2024 Author

Uh oh!

acon96 Apr 18, 2024 Maintainer

simcop2387
Apr 16, 2024

Replies: 2 comments 5 replies

acon96
Apr 16, 2024
Maintainer

simcop2387 Apr 17, 2024
Author

simcop2387 Apr 18, 2024
Author

simcop2387 Apr 19, 2024
Author

simcop2387 Apr 19, 2024
Author

simcop2387
Apr 18, 2024
Author

acon96 Apr 18, 2024
Maintainer