Stars
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Examining how large language models (LLMs) perform across various synthetic regression tasks when given (input, output) examples in their context, without any parameter update
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, ...) or 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, Inter…
Code for the paper "Deep Entity Matching with Pre-trained Language Models"
This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"
This repository contains code and extensive prompt examples to reproduce and extend the experiments in our papers "Using ChatGPT for Entity Matching" and "Entity Matching using Large Language Models".
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
A Survey on Text-to-Video Generation/Synthesis.
Mora: More like Sora for Generalist Video Generation
Open-Sora: Democratizing Efficient Video Production for All
Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
[CVPR 2024🔥] EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection
Set of tools to assess and improve LLM security.
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
Large World Model -- Modeling Text and Video with Millions Context
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
An Open-source Toolkit for LLM Development
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Best practices for distilling large language models.
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.