-
-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Llama2 and some Mistral based ggufs seem to not work. #96
Comments
The reason these GGUFs don't work is because their metadata doesn't include a chat template. I think that this is a relatively new thing for GGUF, so models that are old-ish (like the ~1yr old model you linked) generally won't work. Other backends (i.e. the llama.cpp server and ollama) don't actually implement the chat templates properly, but instead they try to detect what model is being used, and then they have their own template for each supported model. These templates are not exactly the same as the ones that are intended for the model, but just something close-enough. I strongly suspect that this results in sub-par responses, compared to using the proper upstream templates. We did discuss falling back to some kind of default template, in cases where the provided GGUF file doesn't provide one. I'm leaning towards this being a bad idea. Silently applying a somewhat-random chat template to a bunch of models could result in them outputting subpar responses, without the user noticing. A relevant question is why you're interested in using "old" models like llama2. My impression is that the llama3 models are strictly better in all regards. |
Maybe the error message could be improved? We could add a message like: "It looks like your GGUF model doesn't include a chat template. Could it be that you are using an older model? Try using a model newer than , or manually add a chat template to the GGUF metadata using gguf-py." |
These were all models that came pretty heavily recommended for roleplay, which is more or less the exact desired behavior of "pretend to be an NPC". They also seem to perform pretty well in general. I'm sure the new models probably all work better as you've mentioned but being new means, they're not coming as well recommended because they're not as "tried and tested". The one Llama 3 model I have performs incredibly sluggishly(near unusably so) with this extension for some reason, despite being pretty speedy using a different local chat client running the same file, not sure what the deal is there? I would agree that the error message should be improved if you're not intending to support older models in any form, that way users of the extension are aware that this isn't a bug but rather intended functionality, and that they'll need to find some newer models(maybe providing guidance on where to find "new" models or how to tell which are valid would also be helpful in this regard since these are large files and it'd be a pain to find and download one only to find it doesn't work) If possible, it would be good if the extension could prescan the model to see if it IS supported at the time it is selected, and display something in the editor instead of at runtime. I'm not sure what the format of these files is but if the information is early in the header this shouldn't be too difficult to do. |
I'm interested in this, but it doesn't seem related to this bug report. Maybe open a new one, or hop in the group chat, if you would prefer to talk about it more informally. I would like to more about this. |
Closing this issue because the library works as intended. We don't support models that don't include a chat template, and the error message when loading the model clearly states that it fails because it fails to fetch a chat template. If you continue to believe that this is a bug, and more should be done to address it, feel free to open this issue with a suggestion on what should be done. |
Let me preface this by saying that I don't have permission to reopen this issue, so I may need to open a new issue for this if there's no response.
I have no issue with not supporting models that don't include a chat template, but I disagree entirely with the assertion that the error message "clearly states that it fails because it fails to fetch a chat template", and even more so with the closing of the issue. The core of the issue here is that if users are to receive an error message that isn't the result of a bug, the message needs to be actionable within a reasonable expectation of user understanding. The fact that models that don't have chat templates are not supported is not clearly stated. Beyond that, even if it were clearly stated: it's not clear how to even tell if a model has a chat template. From a quick search, it looks like you'd need dedicated tools to inspect it to find out? If the issue is that there is not a chat template present, then simply and clearly state in the error message "This model does not contain a chat template, models without chat templates are not supported." That information cannot be inferred from the error message in its current form without a lot of unsafe assumptions on the user's part. This won't make it any less frustrating to then need to go digging to find a model that does have a chat template without knowledge of which ones do and do not; however, at least that is now googleable in some way, shape or form. TL;DR: Change the error message. In it's current state, it's completely inscrutable without knowledge of either the inner workings of the extension, the structure of the model, or both. |
I'm more than happy to provide any information I can. I'm not sure what you mean by "the group chat", or I would gladly hop in and explain. For the time being though I'll at least answer those questions provided. What hardware:RTX 4080, 64G GPRAM, 14900K what the parameter count and quantization levels of the model are:I don't know how to answer these questions at present, or at least without just guessing or assuming. I can look into it. However, in the interest of answering the questions while they're still hot-I have already provided the model in a different issue so I'll just provide you the same link: llama3.8b.hathor_fractionate-l3-v.05.gguf_v2.q8_0.gguf different local chat client:Backyard.ai - formerly known as Faraday from what I understand |
Ah damn. I guess I didn't realize how permissions work on github issues.
That's totally fair. I guess this is a "curse of knowledge" thing, where it only becomes clear if one has spent most of the past months staring at llama.cpp errors and taking apart GGUF files 😅 The point of NobodyWho is precisely to let people using local LLMs without understanding the internals of llama.cpp. Let's improve the error message. |
Hm, it's proprietary and doesn't run on linux, so it's a bit more difficult for me to examine closely. I wonder if they're using cuda on nvidia machines. I expect cuda to perform somewhat better than nvidia, so that could be it.
The "group chat" I'm referring to is the matrix or discord chat that we link in the README. Feel free to use that if you want to, but github issues is also totally fine. |
Also I wrote a much more detailed error message for failing to fetch chat templates. I hope you agree that this is more actionable. Let me know if you disagree. |
Just took a look, it's leagues better! Explains the issue, likely cause, and even provides actionable information for next steps as an end user. Checks all the boxes. Thank you very much! |
Ah, excellent that makes sense, I'll probably pop into one of those chats then. |
Describe the bug
I have tried several models and many of them immediately crash, reporting
<nobodywho::NobodyWhoChat as godot_core::gen::classes::node::re_export::INode>::physics_process: Model worker crashed: Lama.cpp failed fetching chat template: the model has no meta val - returned code -1
<C++ Source> src\lib.rs:152 @ <nobodywho::NobodyWhoChat as godot_core::gen::classes::node::re_export::INode>::physics_process()
To Reproduce
Steps to reproduce the behavior:
Or preferably a godot .tscn file with inbuilt scripts (this can be achieved by checking the
built in
checkbox when adding a script to a node)Sure thing:
Expected behavior
The model should run, and return a response, without crashing, just like it does when using the recommended Gemma 2B model.
Environment:
Additional context
I know basically all of the GGUFs I used should work(if it even makes sense for one to "not work" at all), as I have seen them functioning very well on other backends on this very same machine.
The text was updated successfully, but these errors were encountered: