Working on a training attempt with stabilityai/stablelm-2-12b-chat as the base #117
Replies: 2 comments 5 replies
-
Awesome! I would recommend interleaving the Home Assistant requests dataset with another one that is more "instruct" focused if you want to retain more of the general purpose abilities of the model. Starting with a chat or instruct fine-tuned model also helps with that. WizardLM70k (the original dataset) is already supported by the For the other settings:
Looking forward to seeing results! |
Beta Was this translation helpful? Give feedback.
-
Mostly off this topic but related, Just saw that LLaMa 3 got released today with an 8B parameter model. Once support exists in llama.cpp I might try training one of those with the above function calling dataset and an instruct dataset after I make a good pass on the StableLM-2-12b model here. |
Beta Was this translation helpful? Give feedback.
-
So like the title says, I've got enough GPU to try to train with a larger model. This first run I'm doing I don't expect to work properly (I think I need to adjust a few things still for it since it has it's own function calling syntax) but once I get something that looks remotely usable, would anyone else be interested in helping test/play with it? I've not got things optimized for training so I expect it to take about 2 weeks per run based on the speed it's going. (I'm using a Tesla A16 (64GB aka 4x16) and an A6000 48GB right now, but I might be swapping in an A5000 24GB in a bit after futzing with my other server (replacing the A5000 with an Intel A770 for VDI instead of AI).
My main goals in using a larger model, is to try to get a model that will hold a bigger/more coherent conversation about the home, and later play with adding some other ideas like multi-modal models eventually and add camera/picture support but I need to learn how to get the training working first.
Beta Was this translation helpful? Give feedback.
All reactions