Stars
[NeurIPS2023] LoRA: A Logical Reasoning Augmented Dataset for Visual Question Answering
Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research
A framework for drone racing research, built on Microsoft AirSim.
Using Tree-of-Thought Prompting to boost ChatGPT's reasoning
A framework for prompt tuning using Intent-based Prompt Calibration
Sample code and notebooks for Generative AI on Google Cloud, with Gemini on Vertex AI
Stable Diffusion web UI
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
How to use OpenAIs Whisper to transcribe and diarize audio files
Official Implementation of ICLR 2024 paper: "Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning"
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)
Reading list for research topics in multimodal machine learning
A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
[IJCAI 2024] Generate different roles for GPTs to form a collaborative entity for complex tasks.
Robust Speech Recognition via Large-Scale Weak Supervision
SketchEdit: Mask-Free Local Image Manipulation with Partial Sketches, CVPR2022
Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"
Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)
The code of Paper "Logic-Driven Context Extension and Data Augmentation for Logical Reasoning of Text".
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
An open-source framework for training large multimodal models.
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
This is the official repository for the LENS (Large Language Models Enhanced to See) system.