JoyCaption support #5

2snEM6 · 2024-10-30T13:28:40Z

Hi! I love your tool and I'd love to contribute by integrating JoyCaption (https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two)

Do you think it can be integrated?

Thanks

FennelFetish · 2024-10-30T20:46:10Z

Hi! That would be great!

Yes, I think it should work and the package requirements should already be met.
I see the JoyCaption example code shows how to setup the conversation.
https://github.com/fpgaminer/joycaption

When integrating it in qapyq, it should follow the structure of the other backends.
For example: https://github.com/FennelFetish/qapyq/blob/main/infer/backend_qwen2vl.py
This shows how the config is applied, how to iterate over the input prompts/conversations, how to build the answer dictionary. But each model is different.

A GenerationConfig object should be used instead of passing max_new_tokens=300, do_sample=True etc. to llava_model.generate().

For building the device map:
DevMap.fromConfig() reads the layer count from the model's config.json.
For the visual layers the key would be "vision_config.num_hidden_layers".
I didn't see a value for the LLM layers, though. You'd have to figure that out.
You can use this static function to print the layer names of the model:
https://github.com/FennelFetish/qapyq/blob/main/infer/devmap.py#L88

The new backend then has to be registered here: https://github.com/FennelFetish/qapyq/blob/main/main_inference.py#L9
and here: https://github.com/FennelFetish/qapyq/blob/main/infer/model_settings.py#L14

For testing, this example script could be of use:
(place it in qapyq's top folder and run it in the same virtual environment)

from infer.backend_molmo import MolmoBackend

def main():
    config = {
        "model_path": "/mnt/ai/Models/MM-LLM/Molmo-7B-D-0924/",
        "gpu_layers": -1,
        "vis_gpu_layers": -1,
        "quantization": "none",
        "max_tokens": 20
    }

    molmo = MolmoBackend(config)

    prompts = [{"caption": "Describe this image in two sentences."}]
    answers = molmo.caption("/home/rem/Pictures/red-tree-with-eyes.jpeg", prompts)
    print(answers)

if __name__ == "__main__":
    main()

Thanks! Please feel free to ask any questions if needed.

FennelFetish · 2024-10-31T17:08:28Z

Oh, and another note:
Everything that is printed to the standard output is sent to the parent process.
That will garble the output and mess with the protocol.

Error messages and debug output should be written to stderr instead.

FennelFetish · 2024-12-29T11:48:46Z

@2snEM6 Any updates? Are you still interested in integrating JoyCaption?
Otherwise I'll have a go at it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JoyCaption support #5

JoyCaption support #5

2snEM6 commented Oct 30, 2024

FennelFetish commented Oct 30, 2024

FennelFetish commented Oct 31, 2024

FennelFetish commented Dec 29, 2024

JoyCaption support #5

JoyCaption support #5

Comments

2snEM6 commented Oct 30, 2024

FennelFetish commented Oct 30, 2024

FennelFetish commented Oct 31, 2024

FennelFetish commented Dec 29, 2024