Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any advice to train on Chinese Dataset using Caption Contrastive Fine-tuning? #33

Open
WeihongM opened this issue Jan 23, 2025 · 2 comments

Comments

@WeihongM
Copy link

WeihongM commented Jan 23, 2025

Hello, thx for the impressive work. I find the model perform poor when use Chinese caption. If I want to use caption contrastive finetuning loss to train on LLM support Chinese (such as qwen), which dataset do you advise me to use?

@raytrun
Copy link
Contributor

raytrun commented Jan 24, 2025

Thank you for your interest in our work.
The WuKong Dataset is a large-scale Chinese image-text pairs dataset and could be a good choice, given its substantial volume of data. However, I’m not entirely sure about the quality of the captions in this dataset. You may need to check and see if it’s suitable.

@MrPanda007
Copy link

Thank you for your interest in our work. The WuKong Dataset is a large-scale Chinese image-text pairs dataset and could be a good choice, given its substantial volume of data. However, I’m not entirely sure about the quality of the captions in this dataset. You may need to check and see if it’s suitable.

@raytrun Thanks for you great work.
Now, I use your cc-finetuned llama model to finetune eva model with cc15m and my own chinese data(1 millions), but it performances worse than chinese-clip.
If I want to use llm2clip model in Chinese, shuold I use your cc-finetune LLama model to train clip, or I need to finetune both llama and clip model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants