Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper size of English font dataset #8

Open
tackgeun opened this issue Sep 16, 2023 · 5 comments
Open

Proper size of English font dataset #8

tackgeun opened this issue Sep 16, 2023 · 5 comments

Comments

@tackgeun
Copy link

Thank you for sharing the code for the wonderful work.

I currently develop a new font generation method based on the dataset setting introduced in DeepVecFont-v2, which is convenient than standard SVG-Fonts.

My question is the size of the newly introduced splits of the dataset.
In the paper, the train split has 8035 fonts and the test split has 1425 fonts, while the available pre-processed dataset in 'vecfont_dataset/eng/' has 10261 fonts for the train split and 1386 fonts for the test split.

Do I miss the proper setting in the code implementation?

Thank you in advance.

@yizhiwang96
Copy link
Owner

Thanks the feedback! Yes, the train/test split is different since I revised some criterions for selecting fonts when released the code. I will upload the train/test split reported in the paper soon.

@yizhiwang96
Copy link
Owner

yizhiwang96 commented Sep 17, 2023

Please check the 8035/1425 split in v1_train_font_ids.txt and v1_test_font_ids.txt. The ids are corresponding to the otf/ttf files I released in data/font_ttfs. You can use this 8035/1425 split for all your experiments.

@tackgeun
Copy link
Author

Thank you for your explanation. I will follow your instruction.

@tackgeun
Copy link
Author

I failed to preprocess the dataset, while following the instructions due to a mismatch in the number of files in the released folder data/font_ttfs and data/font_sfds.

I found that the released font files in data/font_tfs do not match with the split files v1_train_font_ids.txt and v1_test_font_ids.txt. Only 6296 and 1192 files in train and test split are included in the current release. I also found that the released folder data/font_sfds has more processed files 6693 and 1411 for train and test split, which still do not match the original number.

@yizhiwang96
Copy link
Owner

yizhiwang96 commented Sep 25, 2023

Sorry, I made a mistake in the v1_train_font_ids.txt and v1_test_font_ids.txt by using index 1,2,3... rather than 0,1,2..... Please see the latest version of v1_train_font_ids.txt and v1_test_font_ids.txt. I have checked they match the ttf files. For the sfds file, you need to generate by yourself according to the instruction I provided in readme. If you have difficulty, I can help you to generate them. And before generating them, first check if they are already contained in the data.zip I provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants