Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why use different dataset for Training Ovis1.5-Gemma2-9B-S3 and Ovis1.5-Llama3-8B-S3 #39

Open
LIRENDA621 opened this issue Dec 1, 2024 · 2 comments

Comments

@LIRENDA621
Copy link

image You used 73 datasets to train Ovis1.5-Gemma2-9B-S3, but only 71 datasets to train Ovis1.5-Llama3-8B-S3. Why is this the case? Is it a typo or is there another reason?
@runninglsy
Copy link
Collaborator

The difference in the number of datasets used for training Ovis1.5-Gemma2-9B-S3 compared to Ovis1.5-Llama3-8B-S3 is not a typo. The 9B version was trained after the 8B version, during which we constructed new data. These additional datasets were included in the training of the 9B model.

@YangYang-DLUT
Copy link

Great work, truly powerful models. I want to know find the open-sources part of data for train Ovis2? Is it here?https://huggingface.co/datasets/AIDC-AI/Ovis-dataset
or where i can find this list?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants