Stars
Enhances Overleaf by allowing article searches and BibTeX retrieval from DBLP and Google Scholar | 通过允许从 DBLP 和 Google Scholar 进行文章搜索和获取 BibTeX 来增强 Overleaf。
[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
[ICLR 2025] Benchmarking Agentic Workflow Generation
Instruction Tuning with GPT-4
A curated list of awesome instruction tuning datasets, models, papers and repositories.
A comprehensive benchmark for evaluating Large Multimodal Models' capacities of visual deep semantics.
Arena-Hard-Auto: An automatic LLM benchmark.
A project page template for academic papers. Demo at https://eliahuhorwitz.github.io/Academic-project-page-template/
服务器 GPU 监控程序,当 GPU 属性满足预设条件时通过微信发送提示消息
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
📰 Must-read papers and blogs on Speculative Decoding ⚡️
[NeurlPS D&B 2024] Generative AI for Math: MathPile
SWE-bench [Multimodal]: Can Language Models Resolve Real-world Github Issues?
Just a bunch of benchmark logs for different LLMs
✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI
Code for EMNLP 2023 Findings paper: "Not All Demonstration Examples are Equally Beneficial: Reweighting Demonstration Examples for In-Context Learning"
[NeurIPS 2023 Datasets and Benchmarks] "FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation", Yuanxin Liu, Lei Li, Shuhuai Ren, Rundong Gao, Shicheng Li, Sishuo Ch…
A repository for research on medium sized language models.
This repository contains everything you need to become proficient in ML/AI Research and Research Papers