Skip to content

Open-DataFlow/Dataflow-Gen

Repository files navigation

中文主页

Dataflow-Gen

License: apache-2-0 GitHub Stars Open Issues

DataFlow-Gen is a data generation system to generate high-quality data automatically. We mainly support SOTA algorithms within academic papers with strong theoretical support.

We now support text, image, video, and multimodality data types.

Table of Contents

Module and Modality Support

Module\Modality Text Image Video Image-Text Pair Video-Text Pair
Data Evaluation

News

  • [2024-12-27] 🎉 Our first data generation system is now open source.

Installation

conda create -n dataflow-gen python=3.10 -y
conda activate dataflow-gen
pip install -r requirements.txt

Quick Start

cd path/to/DataFlow-Gen
python run_pipeline.py --config configs/TextGeneration.yaml   # Text Generation
python run_pipeline.py --config configs/ImageCaption.yaml     # Image Captioning
python run_pipeline.py --config configs/ImageGeneration.yaml  # Image Generation
python run_pipeline.py --config configs/VideoCaption.yaml     # Video Captioning
python run_pipeline.py --config configs/VideoGeneration.yaml  # Video Generation

Data Generation Documentation

For the usage of evaluation, please refer to the following documents👇

Text Documentation

Image Documentation

Video Documentation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages