Skip to content
/ vilio Public
forked from Muennighoff/vilio

🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle

License

Notifications You must be signed in to change notification settings

Strifee/vilio

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


🥶VILIO🥶


Build GitHub release Transformers Documentation Contributor Covenant

State-of-the-art Visio-Linguistic Models 🥶

Updates

06/2021 - Hateful Memes CSV Files

  • The CSV files that were used for the scores in the vilio paper are now available here

06/2021 - Inference on any meme

Ordering

Vilio aims to replicate the organization of huggingface's transformer repo at: https://github.com/huggingface/transformers

  • /bash Shell files to reproduce hateful memes results

  • /data By default, directory for loading in data & saving checkpoints

  • /ernie-vil Ernie-vil sub-repository written in PaddlePaddle

  • /fts_lmdb Scripts for handling .lmdb extracted features

  • /fts_tsv Scripts for handling .tsv extracted features

  • /notebooks Jupyter Notebooks for demonstration & reproducibility

  • /py-bottm-up-attention Sub-repository for tsv feature extraction forked & adapted from here

  • src/vilio All implemented models (also see below for a quick overview of models)

  • /utils Pandas & ensembling scripts for data handling

  • entry.py files Scripts used to access the models and apply model-specific data preparation

  • pretrain.py files Same purpose as entry files, but for pre-training; Point of entry for pre-training

  • hm.py Training code for the hateful memes challenge; Main point of entry

  • param.py Args for running hm.py

Usage

Follow SCORE_REPRO.md for reproducing performance on the Hateful Memes Task.
Follow GETTING_STARTED.md for using the framework for your own task.
See the paper at: https://arxiv.org/abs/2012.07788

Architectures

🥶 Vilio currently provides the following architectures with the outlined language transformers:

  1. E - ERNIE-VIL ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
  2. D - DeVLBERT DeVLBert: Learning Deconfounded Visio-Linguistic Representations
  3. O - OSCAR Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
  4. U - UNITER UNITER: UNiversal Image-TExt Representation Learning
  5. V - VisualBERT VisualBERT: A Simple and Performant Baseline for Vision and Language
  6. X - LXMERT LXMERT: Learning Cross-Modality Encoder Representations from Transformers

To-do's

  • Clean-up import statements, python paths & find a better way to integrate transformers (Right now all import statements only work if in main folder)
  • Enable loading and running models just via import statements (and not having to clone the repo)
  • Find a way to better include ERNIE-VIL in this repo (PaddlePaddle to Torch?)
  • Move tokenization in entry files to model-specific tokenization similar to transformers

Attributions

The code heavily borrows from the following repositories, thanks for their great work:

About

🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 57.5%
  • Jupyter Notebook 40.0%
  • Cuda 1.2%
  • Shell 0.7%
  • C++ 0.6%
  • Dockerfile 0.0%