Skip to content
View nbgao's full-sized avatar
  • Media Intelligence Laboratory(MIL@HDU)
  • Hangzhou, Zhejiang

Organizations

@MILVLG

Block or report nbgao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Visual-Language Pretraining

1 Updated Jul 21, 2021

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

Python 56 13 Updated Jun 13, 2023

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

Python 2,039 303 Updated Mar 19, 2024

Recent Advances in Vision and Language PreTrained Models (VL-PTMs)

1,152 104 Updated Aug 19, 2022

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Python 8,511 1,049 Updated Mar 21, 2025

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".

Python 14,497 2,104 Updated Jul 24, 2024

This repository contains the source code for the paper First Order Motion Model for Image Animation

Jupyter Notebook 14,769 3,249 Updated Nov 14, 2024

PyTorch package for the discrete VAE used for DALL·E.

Python 10,831 1,936 Updated Jan 31, 2024

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Python 5,605 637 Updated Feb 17, 2024

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 27,999 3,503 Updated Jul 23, 2024

ncnn is a high-performance neural network inference framework optimized for the mobile platform

C++ 21,150 4,220 Updated Mar 18, 2025

A PyTorch implementation of the paper Multimodal Transformer with Multiview Visual Representation for Image Captioning

Python 1 Updated Sep 4, 2020

An PyTorch reimplementation of bottom-up-attention models

Jupyter Notebook 1 Updated Sep 1, 2020

Recent Papers including Neural Symbolic Reasoning, Logical Reasoning, Visual Reasoning, planning and any other topics connecting deep learning and reasoning

309 36 Updated May 30, 2022

Deep Multimodal Neural Architecture Search

Python 28 8 Updated Nov 15, 2020

A fast reverse proxy to help you expose a local server behind a NAT or firewall to the internet.

Go 91,733 13,809 Updated Mar 7, 2025

🚀一个用来深入学习并实战 Spring Boot 的项目。

Java 33,511 10,952 Updated Jul 24, 2024

记录每天整理的计算机视觉/深度学习/机器学习相关方向的论文

6,475 1,279 Updated Jul 8, 2023

COVID-Net Open Source Initiative

Jupyter Notebook 1,154 479 Updated Feb 16, 2023

Tianwen-1, a simulation of the Homan transfer orbit from Mars to Earth(天问一号,火星到地球的霍曼转移轨道模拟)

Python 34 7 Updated Aug 9, 2020

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 7,513 825 Updated Mar 18, 2025

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Python 5,551 940 Updated Mar 19, 2025

A PyTorch implementation of the paper Multimodal Transformer with Multiview Visual Representation for Image Captioning

Python 25 7 Updated Sep 4, 2020

fitlog是一款在深度学习训练中用于辅助用户记录日志和管理代码的工具

Python 1,504 131 Updated Jan 16, 2024

Use GAN to generate landscape image (paint->photo / photo->paint)

Python 1 Updated Jun 26, 2020

A User Interface for DETR built with Dash. 100% Python.

Python 178 39 Updated Feb 16, 2023

data science competitions with money

201 11 Updated Feb 2, 2021

Grid features pre-training code for visual question answering

Python 269 45 Updated Sep 17, 2021
Next