Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JoyCaption support #5

Open
2snEM6 opened this issue Oct 30, 2024 · 3 comments
Open

JoyCaption support #5

2snEM6 opened this issue Oct 30, 2024 · 3 comments

Comments

@2snEM6
Copy link

2snEM6 commented Oct 30, 2024

Hi! I love your tool and I'd love to contribute by integrating JoyCaption (https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two)

Do you think it can be integrated?

Thanks

@FennelFetish
Copy link
Owner

Hi! That would be great!

Yes, I think it should work and the package requirements should already be met.
I see the JoyCaption example code shows how to setup the conversation.
https://github.com/fpgaminer/joycaption

When integrating it in qapyq, it should follow the structure of the other backends.
For example: https://github.com/FennelFetish/qapyq/blob/main/infer/backend_qwen2vl.py
This shows how the config is applied, how to iterate over the input prompts/conversations, how to build the answer dictionary. But each model is different.

A GenerationConfig object should be used instead of passing max_new_tokens=300, do_sample=True etc. to llava_model.generate().

For building the device map:
DevMap.fromConfig() reads the layer count from the model's config.json.
For the visual layers the key would be "vision_config.num_hidden_layers".
I didn't see a value for the LLM layers, though. You'd have to figure that out.
You can use this static function to print the layer names of the model:
https://github.com/FennelFetish/qapyq/blob/main/infer/devmap.py#L88

The new backend then has to be registered here: https://github.com/FennelFetish/qapyq/blob/main/main_inference.py#L9
and here: https://github.com/FennelFetish/qapyq/blob/main/infer/model_settings.py#L14

For testing, this example script could be of use:
(place it in qapyq's top folder and run it in the same virtual environment)

from infer.backend_molmo import MolmoBackend

def main():
    config = {
        "model_path": "/mnt/ai/Models/MM-LLM/Molmo-7B-D-0924/",
        "gpu_layers": -1,
        "vis_gpu_layers": -1,
        "quantization": "none",
        "max_tokens": 20
    }

    molmo = MolmoBackend(config)

    prompts = [{"caption": "Describe this image in two sentences."}]
    answers = molmo.caption("/home/rem/Pictures/red-tree-with-eyes.jpeg", prompts)
    print(answers)

if __name__ == "__main__":
    main()

Thanks! Please feel free to ask any questions if needed.

@FennelFetish
Copy link
Owner

Oh, and another note:
Everything that is printed to the standard output is sent to the parent process.
That will garble the output and mess with the protocol.

Error messages and debug output should be written to stderr instead.

@FennelFetish
Copy link
Owner

@2snEM6 Any updates? Are you still interested in integrating JoyCaption?
Otherwise I'll have a go at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants