-
Media Intelligence Laboratory(MIL@HDU)
- Hangzhou, Zhejiang
Stars
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
This repository contains the source code for the paper First Order Motion Model for Image Animation
PyTorch package for the discrete VAE used for DALL·E.
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
ncnn is a high-performance neural network inference framework optimized for the mobile platform
nbgao / mt-captioning
Forked from MILVLG/mt-captioningA PyTorch implementation of the paper Multimodal Transformer with Multiview Visual Representation for Image Captioning
An PyTorch reimplementation of bottom-up-attention models
Recent Papers including Neural Symbolic Reasoning, Logical Reasoning, Visual Reasoning, planning and any other topics connecting deep learning and reasoning
A fast reverse proxy to help you expose a local server behind a NAT or firewall to the internet.
COVID-Net Open Source Initiative
Tianwen-1, a simulation of the Homan transfer orbit from Mars to Earth(天问一号,火星到地球的霍曼转移轨道模拟)
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
A PyTorch implementation of the paper Multimodal Transformer with Multiview Visual Representation for Image Captioning
Use GAN to generate landscape image (paint->photo / photo->paint)
A User Interface for DETR built with Dash. 100% Python.
Grid features pre-training code for visual question answering