DataFlow-Gen is a data generation system to generate high-quality data automatically. We mainly support SOTA algorithms within academic papers with strong theoretical support.
We now support text, image, video, and multimodality data types.
Module\Modality | Text | Image | Video | Image-Text Pair | Video-Text Pair |
---|---|---|---|---|---|
Data Evaluation | ✅ | ✅ | ✅ | ✅ | ✅ |
- [2024-12-27] 🎉 Our first data generation system is now open source.
conda create -n dataflow-gen python=3.10 -y
conda activate dataflow-gen
pip install -r requirements.txt
cd path/to/DataFlow-Gen
python run_pipeline.py --config configs/TextGeneration.yaml # Text Generation
python run_pipeline.py --config configs/ImageCaption.yaml # Image Captioning
python run_pipeline.py --config configs/ImageGeneration.yaml # Image Generation
python run_pipeline.py --config configs/VideoCaption.yaml # Video Captioning
python run_pipeline.py --config configs/VideoGeneration.yaml # Video Generation
For the usage of evaluation, please refer to the following documents👇