Skip to content
View CarolineGao's full-sized avatar

Block or report CarolineGao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[NeurIPS2023] LoRA: A Logical Reasoning Augmented Dataset for Visual Question Answering

Jupyter Notebook 13 Updated Jan 5, 2024

Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research

C++ 16,716 4,643 Updated Jun 22, 2024

A framework for drone racing research, built on Microsoft AirSim.

Python 201 43 Updated Aug 14, 2023

Using Tree-of-Thought Prompting to boost ChatGPT's reasoning

711 67 Updated Dec 9, 2023

主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题

HTML 4,829 556 Updated Oct 22, 2024

A framework for prompt tuning using Intent-based Prompt Calibration

Python 2,314 204 Updated Nov 23, 2024

Sample code and notebooks for Generative AI on Google Cloud, with Gemini on Vertex AI

Jupyter Notebook 9,013 2,500 Updated Jan 21, 2025

Stable Diffusion web UI

Python 146,182 27,395 Updated Dec 28, 2024

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

369 19 Updated Dec 10, 2024

Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)

Python 67,052 8,183 Updated Jan 9, 2025

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 39,075 4,413 Updated Jan 18, 2025

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)

Python 279 11 Updated Jan 20, 2025

InfiMM-Eval Dataset corresponding tools

6 Updated Dec 4, 2023

How to use OpenAIs Whisper to transcribe and diarize audio files

Jupyter Notebook 320 41 Updated Oct 12, 2022

Official Implementation of ICLR 2024 paper: "Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning"

Python 356 46 Updated Oct 20, 2024

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)

Python 87 7 Updated May 8, 2023

Reading list for research topics in multimodal machine learning

6,214 860 Updated Aug 20, 2024
Python 178 14 Updated May 10, 2023

A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour

Python 38,061 5,544 Updated Jan 22, 2025

[IJCAI 2024] Generate different roles for GPTs to form a collaborative entity for complex tasks.

Python 1,259 156 Updated Apr 16, 2024

Robust Speech Recognition via Large-Scale Weak Supervision

Python 74,838 8,938 Updated Jan 4, 2025

SketchEdit: Mask-Free Local Image Manipulation with Partial Sketches, CVPR2022

Python 249 28 Updated Jun 1, 2022

Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"

Python 255 10 Updated May 3, 2024

Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)

Python 214 17 Updated Jul 16, 2024

The code of Paper "Logic-Driven Context Extension and Data Augmentation for Logical Reasoning of Text".

Python 44 9 Updated Mar 2, 2023

Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

Python 1,427 211 Updated Apr 3, 2024

An open-source framework for training large multimodal models.

Python 3,803 291 Updated Aug 31, 2024

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

Python 1,228 59 Updated Oct 18, 2022

This is the official repository for the LENS (Large Language Models Enhanced to See) system.

Jupyter Notebook 351 12 Updated Dec 9, 2023
Next