GitHub - Toseic/LLM-inference-arxiv-daily: 🎓Automatically Update LLM inference systems Papers Daily using Github Actions (Update Every 12th hours)

Updated on 2025.01.24

inference
MoE

inference

Publish Date	Title	Authors	PDF	Code
2025-01-20	Glinthawk: A Two-Tiered Architecture for High-Throughput LLM Inference	Pouya Hamadanian et.al.	2501.11779	link
2025-01-20	Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas	Nishant Balepur et.al.	2501.11549	link
2025-01-19	GREEN-CODE: Optimizing Energy Efficiency in Large Language Models for Code Generation	Shashikant Ilager et.al.	2501.11006	null
2025-01-17	A Survey on LLM Test-Time Compute via Search: Tasks, LLM Profiling, Search Algorithms, and Relevant Frameworks	Xinzhe Li et.al.	2501.10069	null
2025-01-16	Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition	Takaaki Hori et.al.	2501.09258	null
2025-01-15	Guiding Retrieval using LLM-based Listwise Rankers	Mandeep Rathee et.al.	2501.09186	link
2025-01-14	Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings	Paul Joe Maliakel et.al.	2501.08219	null
2025-01-14	PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving	Ahmet Caner Yüzügüler et.al.	2501.08192	null
2025-01-14	Hierarchical Autoscaling for Large Language Model Serving with Chiron	Archit Patke et.al.	2501.08090	null
2025-01-12	MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference	Wenxuan Zeng et.al.	2501.06807	null
2025-01-05	TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms	Jovan Stojkovic et.al.	2501.02600	null
2025-01-04	AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference	Zhuomin He et.al.	2501.02336	link
2025-01-03	Efficient LLM Inference with Activation Checkpointing and Hybrid Caching	Sanghyeon Lee et.al.	2501.01792	null
2025-01-03	BlockDialect: Block-wise Fine-grained Mixed Format for Energy-Efficient LLM Inference	Wonsuk Jang et.al.	2501.01144	null
2025-01-02	FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving	Zihao Ye et.al.	2501.01005	link
2024-12-23	Highly Optimized Kernels and Fine-Grained Codebooks for LLM Inference on Arm CPUs	Dibakar Gope et.al.	2501.00032	link
2024-12-29	TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication	Zongwu Wang et.al.	2412.20501	link
2024-12-28	LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System	Hyucksung Kwon et.al.	2412.20166	null
2024-12-19	GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors	Chengming Zhang et.al.	2412.19829	null
2025-01-02	A Survey on Large Language Model Acceleration based on KV Cache Management	Haoyang Li et.al.	2412.19442	link
2024-12-27	An Engorgio Prompt Makes Large Language Model Babble on	Jianshuo Dong et.al.	2412.19394	link
2024-12-25	Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference	Libo Zhang et.al.	2412.18934	null
2024-12-21	SYMPHONY: Improving Memory Management for LLM Inference Workloads	Saurabh Agarwal et.al.	2412.16434	null
2024-12-20	WebLLM: A High-Performance In-Browser LLM Inference Engine	Charlie F. Ruan et.al.	2412.15803	link
2024-12-18	A Survey on LLM Inference-Time Self-Improvement	Xiangjue Dong et.al.	2412.14352	link
2024-12-18	Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models	Seungeun Oh et.al.	2412.12687	null
2024-12-17	A System for Microserving of LLMs	Hongyi Jin et.al.	2412.12488	null
2024-12-16	CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation	Hongxuan Zhang et.al.	2412.11741	null
2024-12-15	Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning	Yun Qu et.al.	2412.11120	link
2024-12-15	NITRO: LLM Inference on Intel Laptop NPUs	Anthony Fei et.al.	2412.11053	link
2024-12-13	SCBench: A KV Cache-Centric Analysis of Long-Context Methods	Yucheng Li et.al.	2412.10319	null
2024-12-17	TurboAttention: Efficient Attention Approximation For High Throughputs LLMs	Hao Kang et.al.	2412.08585	null
2024-12-11	Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths	Naryeong Kim et.al.	2412.08281	null
2024-12-12	TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch	Xingchen Song et.al.	2412.08237	null
2024-12-09	Asynchronous LLM Function Calling	In Gim et.al.	2412.07017	null
2024-12-09	SparseAccelerate: Efficient Long-Context Inference for Mid-Range GPUs	James Vo et.al.	2412.06198	null
2024-12-08	XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference	Weizhuo Li et.al.	2412.05896	null
2024-12-06	GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments	Yanyu Chen et.al.	2412.04788	null
2024-12-03	Multi-Bin Batching for Increasing LLM Inference Throughput	Ozgur Guldogan et.al.	2412.04504	null
2024-11-29	BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching	Zhen Zheng et.al.	2412.03594	null
2024-12-03	Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity	Da Ma et.al.	2412.02252	null
2024-12-02	PLD+: Accelerating LLM inference by leveraging Language Model Artifacts	Shwetha Somasundaram et.al.	2412.01447	null
2024-12-02	Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking	Marco Federici et.al.	2412.01380	null
2024-12-05	RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy	Geonho Lee et.al.	2412.01129	null
2024-12-02	TruncFormer: Private LLM Inference Using Only Truncations	Patrick Yubeaton et.al.	2412.01042	null
2024-11-29	A dynamic parallel method for performance optimization on hybrid CPUs	Luo Yu et.al.	2411.19542	null
2024-12-03	Puzzle: Distillation-Based NAS for Inference-Optimized LLMs	Akhiad Bercovich et.al.	2411.19146	null
2024-11-29	InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks	Xinyao Zheng et.al.	2411.18191	null
2024-11-28	MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache	Akshat Sharma et.al.	2411.18077	null
2024-11-24	Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments	Nikoleta Iliakopoulou et.al.	2411.17741	null
2024-11-26	PIM-AI: A Novel Architecture for High-Efficiency LLM Inference	Cristobal Ortega et.al.	2411.17309	null
2024-11-26	Star Attention: Efficient LLM Inference over Long Sequences	Shantanu Acharya et.al.	2411.17116	link
2024-11-26	Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation	Chaoyi Jiang et.al.	2411.17089	null
2024-11-25	MixPE: Quantization and Hardware Co-design for Efficient LLM Inference	Yu Zhang et.al.	2411.16158	null
2024-11-24	eFedLLM: Efficient LLM Inference Based on Federated Learning	Shengwen Ding et.al.	2411.16003	null
2024-11-24	Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format	Chao Fang et.al.	2411.15982	null
2024-11-24	Task Scheduling for Efficient Inference of Large Language Models on Single Moderate GPU Systems	Wenxiang Lin et.al.	2411.15715	null
2024-11-22	XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models	Yixin Dong et.al.	2411.15100	null
2024-11-21	Disentangling Memory and Reasoning Ability in Large Language Models	Mingyu Jin et.al.	2411.13504	link
2024-11-20	Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding	Hyun Ryu et.al.	2411.13157	null
2024-11-21	LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts	Zhuohan Gu et.al.	2411.13009	null
2024-11-15	An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2	Pepijn de Reus et.al.	2411.12758	link
2024-11-19	SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference	Jiho Shin et.al.	2411.12692	null
2024-11-18	MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs	Shiyi Cao et.al.	2411.11217	null
2024-11-15	AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference	Janghwan Lee et.al.	2411.09909	null
2024-11-14	Squeezed Attention: Accelerating Long Context Length LLM Inference	Coleman Hooper et.al.	2411.09688	link
2024-11-15	Communication Compression for Tensor Parallel LLM Inference	Jan Hansen-Palmus et.al.	2411.09510	null
2024-11-14	Pie: Pooling CPU Memory for LLM Inference	Yi Xu et.al.	2411.09317	null
2024-11-12	Towards Low-bit Communication for Tensor Parallel LLM Inference	Harry Dong et.al.	2411.07942	null
2024-11-12	The Effect of Scheduling and Preemption on the Efficiency of LLM Inference Serving	Kyoungmin Kim et.al.	2411.07447	null
2024-11-08	AcceLLM: Accelerating LLM Inference using Redundancy for Load Balancing and Data Locality	Ilias Bournias et.al.	2411.05555	null
2024-11-07	Hardware and Software Platform Inference	Cheng Zhang et.al.	2411.05197	null
2024-11-07	SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference	Gabriele Oliaro et.al.	2411.04975	null
2024-11-05	CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration	Hongpeng Jin et.al.	2411.02829	null
2024-11-04	RAGViz: Diagnose and Visualize Retrieval-Augmented Generation	Tevin Wang et.al.	2411.01751	link
2024-11-06	HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference	Peng Tang et.al.	2411.01433	null
2024-11-02	RA-WEBs: Remote Attestation for WEB services	Kosei Akama et.al.	2411.01340	null
2024-11-02	NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference	Xuanlin Jiang et.al.	2411.01142	null
2024-11-01	LLM-Based Misconfiguration Detection for AWS Serverless Computing	Jinfeng Wen et.al.	2411.00642	null
2024-11-04	ReverseNER: A Self-Generated Example-Driven Framework for Zero-Shot Named Entity Recognition with Large Language Models	Anbang Wang et.al.	2411.00533	null
2024-11-01	Attention Tracker: Detecting Prompt Injection Attacks in LLMs	Kuo-Han Hung et.al.	2411.00348	null
2024-10-31	LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators	Krishna Teja Chitty-Venkata et.al.	2411.00136	link
2024-10-31	Interpretable Language Modeling via Induction-head Ngram Models	Eunji Kim et.al.	2411.00066	link
2024-10-31	ALISE: Accelerating Large Language Model Serving with Speculative Scheduling	Youpeng Zhao et.al.	2410.23537	null
2024-10-30	BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference	Junqi Zhao et.al.	2410.23079	link
2024-10-29	Scaling LLM Inference with Optimized Sample Compute Allocation	Kexun Zhang et.al.	2410.22480	link
2024-10-29	SVIP: Towards Verifiable Inference of Open-source Large Language Models	Yifan Sun et.al.	2410.22307	null
2024-10-28	ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference	Hanshi Sun et.al.	2410.21465	link
2024-10-27	FIRP: Faster LLM inference via future intermediate representation prediction	Pengfei Wu et.al.	2410.20488	null
2024-10-29	Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management	Tuowei Wang et.al.	2410.19274	null
2024-10-24	Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design	Ruisi Cai et.al.	2410.19123	link
2024-10-30	Dynamic Vocabulary Pruning in Early-Exit LLMs	Jort Vincenti et.al.	2410.18952	link
2024-10-25	A Survey on Speech Large Language Models	Jing Peng et.al.	2410.18908	null
2024-10-24	BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching	Peizhuang Cong et.al.	2410.18701	null
2024-10-25	Fast Inference for Augmented Large Language Models	Rana Shahout et.al.	2410.18248	null
2024-10-23	POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference	Aditya K Kamath et.al.	2410.18038	null
2024-10-22	FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs	Haoran Lin et.al.	2410.16663	null
2024-10-22	Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency	Prafulla Kumar Choubey et.al.	2410.16597	null
2024-10-20	EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models	Junhao Hu et.al.	2410.15332	null
2024-10-19	IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System	Minseok Seo et.al.	2410.15008	null
2024-10-23	Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching	Jie Peng et.al.	2410.14740	null
2024-10-18	A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference	You Wu et.al.	2410.14442	link
2024-10-18	Revisiting SLO and Goodput Metrics in LLM Serving	Zhibin Wang et.al.	2410.14257	null
2024-10-17	RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs	Jiatan Huang et.al.	2410.13987	null
2024-10-17	Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs	Tianyu Guo et.al.	2410.13835	link
2024-10-17	Progressive Mixed-Precision Decoding for Efficient LLM Inference	Hao Mark Chen et.al.	2410.13461	null
2024-10-17	Data Defenses Against Large Language Models	William Agnew et.al.	2410.13138	link
2024-10-19	In-context KV-Cache Eviction for LLMs via Attention-Gate	Zihao Zeng et.al.	2410.12876	null
2024-10-10	RecurFormer: Not All Transformer Heads Need Self-Attention	Ruiqing Yan et.al.	2410.12850	null
2024-10-16	Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning	Huiwen Wu et.al.	2410.12130	null
2024-10-15	Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix	Yingyu Liang et.al.	2410.11261	null
2024-10-14	DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads	Guangxuan Xiao et.al.	2410.10819	link
2024-10-16	SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization	Akrit Mudvari et.al.	2410.10759	null
2024-10-12	Power-Softmax: Towards Secure LLM Inference over Encrypted Data	Itamar Zimerman et.al.	2410.09457	null
2024-10-09	SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration	Heming Xia et.al.	2410.06916	link
2024-10-08	ParallelSpec: Parallel Drafter for Efficient Speculative Decoding	Zilin Xiao et.al.	2410.05589	null
2024-10-06	RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference	Yige Xu et.al.	2410.04519	link
2024-10-14	Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective	Jinhao Li et.al.	2410.04466	null
2024-10-04	SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation	Aurick Qiao et.al.	2410.03960	null
2024-10-04	EXAQ: Exponent Aware Quantization For LLMs Acceleration	Moran Shkolnik et.al.	2410.03185	link
2024-10-03	LLMCO2: Advancing Accurate Carbon Footprint Prediction for LLM Inferences	Zhenxiao Fu et.al.	2410.02950	null
2024-10-03	Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration	Yun Qu et.al.	2410.02511	link
2024-10-03	LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services	Małgorzata Łazuka et.al.	2410.02425	link
2024-10-04	Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation	Xiaoqun Liu et.al.	2410.02220	null
2024-10-02	Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads	Yuxiang Huang et.al.	2410.01805	link
2024-10-02	ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving	Yifan Qiao et.al.	2410.01228	null
2024-10-01	TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices	Zonghang Li et.al.	2410.00531	link
2024-09-30	The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems	Linke Song et.al.	2409.20002	null
2024-09-26	Control Industrial Automation System with Large Language Models	Yuchen Xia et.al.	2409.18009	link
2024-09-26	Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores	Shaobo Ma et.al.	2409.17870	null
2024-09-25	Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction	Zhenmei Shi et.al.	2409.17422	link
2024-09-25	Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations	Amey Agrawal et.al.	2409.17264	null
2024-09-25	Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference	Zongyue Qin et.al.	2409.16560	null
2024-09-25	AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization	Yifan Tan et.al.	2409.16546	link
2024-09-23	Eagle: Efficient Training-Free Router for Multi-LLM Inference	Zesen Zhao et.al.	2409.15518	null
2024-09-24	UELLM: A Unified and Efficient Approach for LLM Inference Serving	Yiyuan He et.al.	2409.14961	null
2024-09-22	RACOON: An LLM-based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph	Linxi Wei et.al.	2409.14556	null
2024-09-16	Do Large Language Models Need a Content Delivery Network?	Yihua Cheng et.al.	2409.13761	link
2024-09-19	PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs)	Mahmoud Nazzal et.al.	2409.12699	link
2024-09-12	LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs	Han Xu et.al.	2409.11424	null
2024-09-04	ISO: Overlap of Computation and Communication within Seqenence For LLM Inference	Bin Xiao et.al.	2409.11155	null
2024-09-18	RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval	Di Liu et.al.	2409.10516	link
2024-09-08	InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference	Xiurui Pan et.al.	2409.04992	null
2024-09-07	Achieving Peak Performance for Large Language Models: A Systematic Review	Zhyar Rzgar K Rostam et.al.	2409.04833	null
2024-09-06	A First Look At Efficient And Secure On-Device LLM Inference Against KV Leakage	Huan Yang et.al.	2409.04040	null
2024-09-13	Confidential Computing on nVIDIA H100 GPU: A Performance Benchmark Study	Jianwei Zhu et.al.	2409.03992	null
2024-09-05	Sirius: Contextual Sparsity with Correction for Efficient LLMs	Yang Zhou et.al.	2409.03856	link
2024-08-31	HSF: Defending against Jailbreak Attacks with Hidden State Filtering	Cheng Qian et.al.	2409.03788	null
2024-09-03	Contemporary Model Compression on Large Language Models Inference	Dong Liu et.al.	2409.01990	link
2024-09-02	CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification	Junhui He et.al.	2409.01366	null
2024-09-04	Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference	Barys Liskavets et.al.	2409.01227	null
2024-09-01	Research on LLM Acceleration Using the High-Performance RISC-V Processor "Xiangshan" (Nanhu Version) Based on the Open-Source Matrix Instruction Set Extension (Vector Dot Product)	Xu-Hao Chen et.al.	2409.00661	null
2024-08-28	Decentralized LLM Inference over Edge Networks with Energy Harvesting	Aria Khoshsirat et.al.	2408.15907	null
2024-08-28	Efficient LLM Scheduling by Learning to Rank	Yichao Fu et.al.	2408.15792	link
2024-08-28	Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation	Lujun Gui et.al.	2408.15562	null
2024-08-22	NanoFlow: Towards Optimal Large Language Model Serving Throughput	Kan Zhu et.al.	2408.12757	link
2024-09-04	Parallel Speculative Decoding with Adaptive Draft Length	Tianyu Liu et.al.	2408.11850	link
2024-08-21	MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models	Elias Frantar et.al.	2408.11743	link
2024-08-20	Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models	Artem Vazhentsev et.al.	2408.10692	null
2024-08-19	PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars	Sumanth Prabhu et.al.	2408.08869	null
2024-08-23	ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models	Chao Zeng et.al.	2408.08554	link
2024-08-14	LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference	Seungjae Moon et.al.	2408.07326	null
2024-08-12	LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration	Zhiwen Mo et.al.	2408.06003	null
2024-08-10	LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale	Jaehong Cho et.al.	2408.05499	link
2024-08-05	SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving	Andreas Kosmas Kakolyris et.al.	2408.05235	null
2024-08-08	Towards SLO-Optimized LLM Serving via Automatic Inference Engine Tuning	Ke Cheng et.al.	2408.04323	null
2024-08-07	Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference	Zeyu Zhang et.al.	2408.04107	null
2024-08-07	MPC-Minimized Secure LLM Inference	Deevashwer Rathee et.al.	2408.03561	null
2024-08-05	Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning	Hao Zhou et.al.	2408.02549	null
2024-08-02	The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines	Matias Martinez et.al.	2408.01050	null
2024-08-01	DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency	Jovan Stojkovic et.al.	2408.00741	null
2024-08-01	Designing Efficient LLM Accelerators for Edge Devices	Jude Haris et.al.	2408.00462	null
2024-08-01	Large Language Model (LLM)-enabled In-context Learning for Wireless Network Optimization: A Case Study of Power Control	Hao Zhou et.al.	2408.00214	null
2024-07-23	ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency	Yuhang Yao et.al.	2408.00008	null
2024-08-01	Responsive ML inference in multi-tenanted environments using AQUA	Abhishek Vijaya Kumar et.al.	2407.21255	null
2024-07-25	An Efficient Inference Framework for Early-exit Large Language Models	Ruijie Miao et.al.	2407.20272	null
2024-07-29	Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost	Sania Nayab et.al.	2407.19825	null
2024-07-29	Teaching LLMs at Charles University: Assignments and Activities	Jindřich Helcl et.al.	2407.19798	null
2024-07-22	RazorAttention: Efficient KV Cache Compression Through Retrieval Heads	Hanlin Tang et.al.	2407.15891	null
2024-07-22	vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving	Jiale Xu et.al.	2407.15309	link
2024-07-19	LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference	Qichen Fu et.al.	2407.14057	null
2024-07-17	Struct-X: Enhancing Large Language Models Reasoning with Structured Data	Xiaoyu Tan et.al.	2407.12522	null
2024-07-17	LLM Inference Serving: Survey of Recent Advances and Opportunities	Baolin Li et.al.	2407.12391	null
2024-07-17	Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models	Ayush Kaushal et.al.	2407.12327	link
2024-07-16	PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation	Branden Butler et.al.	2407.11798	null
2024-07-21	Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference	Yuan Feng et.al.	2407.11550	link
2024-07-15	Fast Matrix Multiplications for Lookup Table-Quantized LLMs	Han Guo et.al.	2407.10960	link
2024-07-12	Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference	Zongyue Qin et.al.	2407.09722	null
2024-07-09	Metron: Holistic Performance Evaluation Framework for LLM Inference Systems	Amey Agrawal et.al.	2407.07000	link
2024-07-08	Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU	Daliang Xu et.al.	2407.05858	link
2024-07-07	A Queueing Theoretic Perspective on Low-Latency LLM Inference with Variable Token Length	Yuqing Yang et.al.	2407.05347	null
2024-07-05	Corki: Enabling Real-time Embodied AI Robots via Algorithm-Architecture Co-Design	Yiyang Huang et.al.	2407.04292	link
2024-07-04	Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems	Grant Wilkins et.al.	2407.04014	null
2024-07-02	MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention	Huiqiang Jiang et.al.	2407.02490	link
2024-06-29	Teola: Towards End-to-End Optimization of LLM-based Applications	Xin Tan et.al.	2407.00326	null
2024-06-25	T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge	Jianyu Wei et.al.	2407.00088	link
2024-06-28	InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management	Wonbeom Lee et.al.	2406.19707	null
2024-06-24	Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters	Euiin Yi et.al.	2406.16758	link
2024-06-28	SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention	Qianchao Zhu et.al.	2406.15486	null
2024-06-21	Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models	Qi Liu et.al.	2406.14848	link
2024-06-20	Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data	Johannes Treutlein et.al.	2406.14546	link
2024-06-20	LiveMind: Low-latency Large Language Models with Simultaneous Inference	Chuangtao Chen et.al.	2406.14319	link
2024-06-19	SDQ: Sparse Decomposed Quantization for LLM Inference	Geonhwa Jeong et.al.	2406.13868	null
2024-06-19	Amphista: Accelerate LLM Inference with Bi-directional Multiple Drafting Heads in a Non-autoregressive Style	Zeping Li et.al.	2406.13170	null
2024-06-16	Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization	Jungi Lee et.al.	2406.12930	null
2024-06-18	LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization	Masafumi Enomoto et.al.	2406.12494	null
2024-06-18	LLMs Are Prone to Fallacies in Causal Inference	Nitish Joshi et.al.	2406.12158	null
2024-06-14	Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning	Hui Liu et.al.	2406.11890	null
2024-06-17	Endor: Hardware-Friendly Sparse Format for Offloaded LLM Inference	Donghyeon Joo et.al.	2406.11674	null
2024-06-17	QTIP: Quantization with Trellises and Incoherence Processing	Albert Tseng et.al.	2406.11235	link
2024-06-16	New Solutions on LLM Acceleration, Optimization, and Application	Yingbing Huang et.al.	2406.10903	null
2024-06-16	Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference	Jiaming Tang et.al.	2406.10774	link
2024-06-15	Large Language Models as Surrogate Models in Evolutionary Algorithms: A Preliminary Study	Hao Hao et.al.	2406.10675	link
2024-06-08	QCQA: Quality and Capacity-aware grouped Query Attention	Vinay Joshi et.al.	2406.10247	null
2024-06-12	Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference	Christopher Wolters et.al.	2406.08413	null
2024-06-12	PowerInfer-2: Fast Large Language Model Inference on a Smartphone	Zhenliang Xue et.al.	2406.06282	null
2024-06-09	A Superalignment Framework in Autonomous Driving with Large Language Models	Xiangrui Kong et.al.	2406.05651	null
2024-06-06	Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism	Jiahao Liu et.al.	2406.03853	null
2024-06-04	Language Models can Infer Action Semantics for Classical Planners from Environment Feedback	Wang Zhu et.al.	2406.02791	null
2024-06-08	Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach	Yuxuan Chen et.al.	2406.02616	null
2024-06-04	SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices	Ruslan Svirschevski et.al.	2406.02532	link
2024-06-03	Demystifying Platform Requirements for Diverse LLM Inference Use Cases	Abhimanyu Bambhaniya et.al.	2406.01698	link
2024-06-03	PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration	Ziqian Zeng et.al.	2406.01394	null
2024-06-01	A Practice-Friendly Two-Stage LLM-Enhanced Paradigm in Sequential Recommendation	Dugang Liu et.al.	2406.00333	null
2024-05-31	No Free Lunch Theorem for Privacy-Preserving LLM Inference	Xiaojin Zhang et.al.	2405.20681	null
2024-05-30	Decentralized AI: Permissionless LLM Inference on POKT Network	Daniel Olshansky et.al.	2405.20450	null
2024-06-01	S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs	Wei Zhong et.al.	2405.20314	null
2024-05-30	Deciphering Human Mobility: Inferring Semantics of Trajectories with Large Language Models	Yuxiao Luo et.al.	2405.19850	null
2024-05-29	MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models	Taehyun Kim et.al.	2405.18832	null
2024-05-29	PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN	Fei Zheng et.al.	2405.18744	null
2024-06-02	Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference	Hao Mark Chen et.al.	2405.18628	link
2024-05-25	FastQuery: Communication-efficient Embedding Table Query for Private LLM Inference	Chenqi Lin et.al.	2405.16241	null
2024-05-23	EdgeShard: Efficient LLM Inference via Collaborative Edge Computing	Mingjin Zhang et.al.	2405.14371	null
2024-05-23	MiniCache: KV Cache Compression in Depth Dimension for Large Language Models	Akide Liu et.al.	2405.14366	null
2024-05-21	PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference	Dongjie Yang et.al.	2405.12532	null
2024-05-12	Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization	Xinyuan Zhang et.al.	2405.07140	null
2024-05-11	Aladdin: Joint Placement and Scaling for SLO-Aware LLM Serving	Chengyi Nie et.al.	2405.06856	null
2024-05-21	Vidur: A Large-Scale Simulation Framework For LLM Inference	Amey Agrawal et.al.	2405.05465	link
2024-05-13	KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation	Minsik Cho et.al.	2405.05329	null
2024-05-12	DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature	Dawei Li et.al.	2405.04819	link
2024-05-10	QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving	Yujun Lin et.al.	2405.04532	link
2024-05-07	vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention	Ramya Prabhu et.al.	2405.04437	null
2024-05-07	Optimizing Language Model's Reasoning Abilities with Weak Supervision	Yongqi Tong et.al.	2405.04086	null
2024-05-06	AlphaMath Almost Zero: process Supervision without process	Guoxin Chen et.al.	2405.03553	link
2024-05-03	Efficient and Economic Large Language Model Inference with Attention Offloading	Shaoyuan Chen et.al.	2405.01814	null

(back to top)

MoE

Publish Date	Title	Authors	PDF	Code
2025-01-22	Autonomy-of-Experts Models	Ang Lv et.al.	2501.13074	null
2025-01-22	LLM4WM: Adapting LLM for Wireless Multi-Tasking	Xuanyu Liu et.al.	2501.12983	null
2025-01-22	UniUIR: Considering Underwater Image Restoration as An All-in-One Learner	Xu Zhang et.al.	2501.12981	null
2025-01-22	BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR	Guodong Ma et.al.	2501.12602	null
2025-01-21	Modality Interactive Mixture-of-Experts for Fake News Detection	Yifan Liu et.al.	2501.12431	null
2025-01-21	SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection	Xiaocheng Zhang et.al.	2501.12430	null
2025-01-21	Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models	Samira Abnar et.al.	2501.12370	null
2025-01-21	MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks	Qishen Zhou et.al.	2501.12281	link
2025-01-21	Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models	Zihan Qiu et.al.	2501.11873	null
2025-01-18	FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models	Xinglin Pan et.al.	2501.10714	null
2025-01-17	OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning	Jinyuan Feng et.al.	2501.10062	null
2025-01-17	LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading	Kuan-Ming Liu et.al.	2501.09636	null
2025-01-14	MiniMax-01: Scaling Foundation Models with Lightning Attention	MiniMax et.al.	2501.08313	null
2025-01-14	GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism	Chen Tang et.al.	2501.07890	null
2025-01-18	PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration	Xiaoshui Huang et.al.	2501.07762	null
2025-01-13	A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis	Binyu Zhang et.al.	2501.07016	link
2025-01-12	Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning	Hanwen Zhong et.al.	2501.06884	link
2025-01-10	TAMER: A Test-Time Adaptive MoE-Driven Framework for EHR Representation Learning	Yinghao Zhu et.al.	2501.05661	link
2025-01-09	Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing	Mengfan Liu et.al.	2501.05313	null
2025-01-07	LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes	Xiang Xu et.al.	2501.04004	link
2025-01-07	mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training	Xudong Liao et.al.	2501.03905	null
2025-01-08	Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection	Donatella Genovese et.al.	2501.03432	null
2025-01-12	Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning	Zhongyi Zhou et.al.	2501.02198	null
2025-01-03	MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders	Jiajun Cao et.al.	2501.01709	null
2025-01-01	REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization	Huyen Nguyen et.al.	2501.00779	null
2025-01-06	Superposition in Transformers: A Novel Way of Building Mixture of Experts	Ayoub Ben Chaliah et.al.	2501.00530	link
2024-12-31	CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection	Xiaolei Wang et.al.	2501.00346	null
2024-12-29	Multimodal Variational Autoencoder: a Barycentric View	Peijie Qiu et.al.	2412.20487	null
2024-12-29	A Comprehensive Framework for Reliable Legal AI: Combining Specialized Expert Systems and Adaptive Refinement	Sidra Nasir et.al.	2412.20468	null
2024-12-28	UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity	Jingbo Lin et.al.	2412.20157	link
2024-12-28	Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection	Yaning Zhang et.al.	2412.20156	null
2024-12-27	DeepSeek-V3 Technical Report	DeepSeek-AI et.al.	2412.19437	link
2024-12-26	AskChart: Universal Chart Understanding through Textual Enhancement	Xudong Yang et.al.	2412.19146	link
2024-12-30	Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection	Xiaoyu Huang et.al.	2412.19108	null
2024-12-24	Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making	David Shoresh et.al.	2412.18593	link
2024-12-24	BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing	Yingjie Ma et.al.	2412.18065	link
2024-12-23	UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition	Li Fu et.al.	2412.17507	null
2024-12-23	BrainMAP: Learning Multiple Activation Pathways in Brain Networks	Song Wang et.al.	2412.17404	null
2024-12-22	Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models	Elie Antoine et.al.	2412.16971	null
2024-12-20	Theory of Mixture-of-Experts for Mobile Edge Computing	Hongbo Li et.al.	2412.15690	null
2024-12-19	MoEtion: Efficient and Reliable Checkpointing for Mixture-of-Experts Models at Scale	Swapnil Gandhi et.al.	2412.15411	null
2024-12-19	Qwen2.5 Technical Report	Qwen et.al.	2412.15115	link
2024-12-19	ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing	Ziteng Wang et.al.	2412.14711	link
2024-12-18	A Survey on Inference Optimization Techniques for Mixture of Experts Models	Jiacheng Liu et.al.	2412.14219	link
2024-12-18	SEKE: Specialised Experts for Keyword Extraction	Matej Martinc et.al.	2412.14087	link
2024-12-18	MedCoT: Medical Chain of Thought via Hierarchical Expert	Jiaxiang Liu et.al.	2412.13736	link
2024-12-17	SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks	Mátyás Vincze et.al.	2412.13053	null
2024-12-17	Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning	Moritz Reuss et.al.	2412.12953	null
2024-12-17	CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition	He Wang et.al.	2412.12760	null
2024-12-16	Investigating Mixture of Experts in Dense Retrieval	Effrosyni Sokli et.al.	2412.11864	null
2024-12-18	Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture	Jingze Shi et.al.	2412.11834	link
2024-12-16	Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation	Svetlana Pavlitska et.al.	2412.11608	null
2024-12-16	Enhancing Healthcare Recommendation Systems with a Multimodal LLMs-based MOE Architecture	Jingyu Xu et.al.	2412.11557	null
2024-12-14	DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification	Yuhao Wang et.al.	2412.10650	link
2024-12-13	DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding	Zhiyu Wu et.al.	2412.10302	link
2024-12-13	Llama 3 Meets MoE: Efficient Upcycling	Aditya Vavre et.al.	2412.09952	link
2024-12-12	Memory Layers at Scale	Vincent-Pierre Berges et.al.	2412.09764	link
2024-12-12	Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine	Xiaoshuang Huang et.al.	2412.09278	link
2024-12-12	Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective	Minh Le et.al.	2412.08285	null
2024-12-11	Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node Classification	Xuanze Chen et.al.	2412.08193	null
2024-12-10	MoE-CAP: Cost-Accuracy-Performance Benchmarking for Mixture-of-Experts Systems	Yao Fu et.al.	2412.07067	null
2024-12-07	Partition of Unity Physics-Informed Neural Networks (POU-PINNs): An Unsupervised Framework for Physics-Informed Domain Decomposition and Mixtures of Experts	Arturo Rodriguez et.al.	2412.06842	null
2024-12-09	Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset	Xiao Wang et.al.	2412.06647	link
2024-12-09	UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts	Zhen Wan et.al.	2412.06340	null
2024-12-08	Hallucination-aware Optimization for Large Language Model-empowered Communications	Yinqiu Liu et.al.	2412.06007	link
2024-12-10	An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism	Qing Zhang et.al.	2412.05821	null
2024-12-10	RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts	Xu Liu et.al.	2412.05679	link
2024-12-07	SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts	Gengze Zhou et.al.	2412.05552	link
2024-12-07	Towards 3D Acceleration for low-power Mixture-of-Experts and Multi-Head Attention Spiking Transformers	Boxun Xu et.al.	2412.05540	null
2024-12-06	Steps are all you need: Rethinking STEM Education with Prompt Engineering	Krishnasai Addala et.al.	2412.05023	null
2024-12-09	Monet: Mixture of Monosemantic Experts for Transformers	Jungwoo Park et.al.	2412.04139	link
2024-12-05	Meta-Reinforcement Learning With Mixture of Experts for Generalizable Multi Access in Heterogeneous Wireless Networks	Zhaoyang Liu et.al.	2412.03850	null
2024-12-04	Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond	Loukas Ilias et.al.	2412.03483	null
2024-12-05	MQFL-FHE: Multimodal Quantum Federated Learning Framework with Fully Homomorphic Encryption	Siddhant Dutta et.al.	2412.01858	null
2024-12-05	Yi-Lightning Technical Report	01. AI et.al.	2412.01253	null
2024-11-30	Mixture of Experts for Node Classification	Yu Shi et.al.	2412.00418	null
2024-11-30	HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting	Shaohan Yu et.al.	2412.00316	null
2024-11-27	Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference	Andrii Skliar et.al.	2412.00099	null
2024-11-29	LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References	Shuguo Jiang et.al.	2411.19758	null
2024-11-28	On the effectiveness of discrete representations in sparse mixture of experts	Giang Do et.al.	2411.19402	null
2024-11-28	Bayesian Cluster Weighted Gaussian Models	Panagiotis Papastamoulis et.al.	2411.18957	link
2024-11-27	UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS	Haomin Zhuang et.al.	2411.18797	null
2024-11-27	Complexity Experts are Task-Discriminative Learners for Any Image Restoration	Eduard Zamfir et.al.	2411.18466	null
2024-11-27	Mixture of Experts in Image Classification: What's the Sweet Spot?	Mathurin Videau et.al.	2411.18322	null
2024-11-26	$H^3$ Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs	Selim Furkan Tekin et.al.	2411.17792	link
2024-11-25	Staleness-Centric Optimizations for Efficient Diffusion MoE Inference	Jiajun Luo et.al.	2411.16786	null
2024-11-29	MH-MoE: Multi-Head Mixture-of-Experts	Shaohan Huang et.al.	2411.16205	null
2024-11-25	LDACP: Long-Delayed Ad Conversions Prediction Model for Bidding Strategy	Peng Cui et.al.	2411.16095	null
2024-11-24	Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution	Haiquan Wang et.al.	2411.15871	null
2024-11-24	LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training	Xiaoye Qu et.al.	2411.15708	link
2024-11-23	Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts	Qizhou Chen et.al.	2411.15432	null
2024-11-23	Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation	Fahao Chen et.al.	2411.15419	null
2024-11-20	MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification	Yuxuan Chen et.al.	2411.13004	null
2024-11-23	KAAE: Numerical Reasoning for Knowledge Graphs via Knowledge-aware Attributes Learning	Ming Yin et.al.	2411.12950	null
2024-11-19	Ultra-Sparse Memory Network	Zihao Huang et.al.	2411.12364	null
2024-11-18	MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs	Shiyi Cao et.al.	2411.11217	null
2024-11-16	Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts	Jinqiang Long et.al.	2411.10669	link
2024-11-15	Weakly-Supervised Multimodal Learning on MIMIC-CXR	Andrea Agostini et.al.	2411.10356	link
2024-11-21	Pro-Prophet: A Systematic Load Balancing Method for Efficient Parallel Training of Large-scale MoE Models	Wei Wang et.al.	2411.10003	null
2024-11-13	Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection	Vima Gupta et.al.	2411.08982	null
2024-11-13	Sparse Upcycling: Inference Inefficient Finetuning	Sasha Doubov et.al.	2411.08968	null
2024-11-13	LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing	Xiaonan Nie et.al.	2411.08446	null
2024-11-12	Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach	Renzi Wang et.al.	2411.08232	null
2024-11-12	PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model	Yilun Liu et.al.	2411.08212	null
2024-11-12	Towards Vision Mixture of Experts for Wildlife Monitoring on the Edge	Emmanuel Azuh Mensah et.al.	2411.07834	null
2024-11-11	Adaptive Conditional Expert Selection Network for Multi-domain Recommendation	Kuiyao Dong et.al.	2411.06826	null
2024-11-11	WDMoE: Wireless Distributed Mixture of Experts for Large Language Models	Nan Xue et.al.	2411.06681	null
2024-11-09	Learning Mixtures of Experts with EM	Quentin Fruytier et.al.	2411.06056	null
2024-11-08	NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts	Yen-Ting Lin et.al.	2411.05945	null
2024-11-05	DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of Experts	Zelin Yao et.al.	2411.03025	link
2024-11-05	Advancing Robust Underwater Acoustic Target Recognition through Multi-task Learning and Multi-Gate Mixture-of-Experts	Yuan Xie et.al.	2411.02787	null
2024-11-06	Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent	Xingwu Sun et.al.	2411.02265	null
2024-11-04	FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation	Ziwei Zhan et.al.	2411.02115	null
2024-11-03	RS-MoE: Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering	Hui Lin et.al.	2411.01595	null
2024-11-03	Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation	Mingrui Liu et.al.	2411.01457	null
2024-11-06	HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference	Peng Tang et.al.	2411.01433	null
2024-11-07	HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy	Shuqing Luo et.al.	2411.01288	link
2024-11-02	PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment	Dongxu Liu et.al.	2411.01245	null
2024-11-01	MoE-I $^2$ : Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition	Cheng Yang et.al.	2411.01016	null
2024-11-01	LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models	Nam V. Nguyen et.al.	2411.00918	link
2024-11-01	MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization	Jingming Guo et.al.	2411.00662	link
2024-10-31	Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts	Xiang Deng et.al.	2410.23836	null
2024-10-30	Efficient and Interpretable Grammatical Error Correction with Mixture of Experts	Muhammad Reza Qorib et.al.	2410.23507	link
2024-10-30	Stealing User Prompts from Mixture of Experts	Itay Yona et.al.	2410.22884	null
2024-10-30	MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning	Xujia Wang et.al.	2410.22782	null
2024-10-29	ProMoE: Fast MoE-based LLM Serving using Proactive Caching	Xiaoniu Song et.al.	2410.22134	null
2024-10-29	Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging	Li Shen et.al.	2410.21804	null
2024-10-29	Neural Experts: Mixture of Experts for Implicit Neural Representations	Yizhak Ben-Shabat et.al.	2410.21643	null
2024-10-28	FinTeamExperts: Role Specialized MOEs For Financial Analysis	Yue Yu et.al.	2410.21338	null
2024-10-28	Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving	Jiyao Wang et.al.	2410.21086	null
2024-10-27	Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation	Maohao Shen et.al.	2410.20336	null
2024-10-27	GUMBEL-NERF: Representing Unseen Objects as Part-Compositional Neural Radiance Fields	Yusuke Sekikawa et.al.	2410.20306	null
2024-10-25	DMT-HI: MOE-based Hyperbolic Interpretable Deep Manifold Transformation for Unspervised Dimensionality Reduction	Zelin Zang et.al.	2410.19504	link
2024-10-25	Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis	Weikai Li et.al.	2410.19225	link
2024-10-24	Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design	Ruisi Cai et.al.	2410.19123	link
2024-10-24	Mixture of Parrots: Experts improve memorization more than reasoning	Samy Jelassi et.al.	2410.19034	null
2024-10-24	MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases	Zhisheng Lin et.al.	2410.18406	null
2024-10-23	Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches	Kexin Feng et.al.	2410.18298	null
2024-10-23	MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning	Jingfan Zhang et.al.	2410.18035	null
2024-10-23	ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference	Xin He et.al.	2410.17954	null
2024-10-23	Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition	Artem Basharin et.al.	2410.17765	null
2024-10-22	Optimizing Mixture-of-Experts Inference Time Combining Model Deployment and Communication Scheduling	Jialong Li et.al.	2410.17043	null
2024-10-21	LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset	Ruikun Zhang et.al.	2410.16095	link
2024-10-22	CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts	Zhenpeng Su et.al.	2410.16077	link
2024-10-21	Generalizing Motion Planners with Mixture of Experts for Autonomous Driving	Qiao Sun et.al.	2410.15774	link
2024-10-21	ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts	Xumeng Han et.al.	2410.15732	null
2024-10-20	Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs	Xin Zhou et.al.	2410.15438	null
2024-10-20	LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration	Yuang Ai et.al.	2410.15385	link
2024-10-19	MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning	Suning Huang et.al.	2410.14972	null
2024-10-18	MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts	Rachel S. Y. Teo et.al.	2410.14574	link
2024-10-18	ST-MoE-BERT: A Spatial-Temporal Mixture-of-Experts Framework for Long-Term Cross-City Mobility Prediction	Haoyu He et.al.	2410.14099	link
2024-10-17	Enhancing Generalization in Sparse Mixture of Experts Models: The Case for Increased Expert Activation in Compositional Tasks	Jinze Zhao et.al.	2410.13964	null
2024-10-16	On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs	Herun Wan et.al.	2410.12600	null
2024-10-16	Understanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of Experts	Fanqi Yan et.al.	2410.12258	null
2024-10-16	EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference	Yulei Qian et.al.	2410.12247	null
2024-10-15	MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router	Yanyue Xie et.al.	2410.12013	null
2024-10-15	MoH: Multi-Head Attention as Mixture-of-Head Attention	Peng Jin et.al.	2410.11842	link
2024-10-15	GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation	Fei Tang et.al.	2410.11841	link
2024-10-15	Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models	James Vo et.al.	2410.11654	null
2024-10-16	Quadratic Gating Functions in Mixture of Experts: A Statistical Insight	Pedram Akbarian et.al.	2410.11222	null
2024-10-16	Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free	Ziyue Li et.al.	2410.10814	link
2024-10-14	Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts	Guorui Zheng et.al.	2410.10626	link
2024-10-14	Learning to Ground VLMs without Forgetting	Aritra Bhowmik et.al.	2410.10491	null
2024-10-14	Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts	Xu Liu et.al.	2410.10469	null
2024-10-15	Ada-K Routing: Boosting the Efficiency of MoE-based LLMs	Tongtian Yue et.al.	2410.10456	null
2024-10-14	Tighter Risk Bounds for Mixtures of Experts	Wissam Akretche et.al.	2410.10397	null
2024-10-14	Scalable Multi-Domain Adaptation of Language Models using Modular Experts	Peter Schafhalter et.al.	2410.10181	null
2024-10-14	Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models	Jun Luo et.al.	2410.10114	null
2024-10-14	AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality	Peijun Qing et.al.	2410.10054	link
2024-10-13	ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL	Zhanqiu Guo et.al.	2410.09781	null
2024-10-11	Semi-Supervised Learning of Noisy Mixture of Experts Models	Oh-Ran Kwon et.al.	2410.09039	null
2024-10-11	Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering	I-Chun Chen et.al.	2410.08589	link
2024-10-10	Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts	Sukwon Yun et.al.	2410.08245	link
2024-10-10	Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training	Gen Luo et.al.	2410.08202	null
2024-10-10	Efficient Dictionary Learning with Switch Sparse Autoencoders	Anish Mudide et.al.	2410.08201	link
2024-10-10	More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing	Sagi Shaier et.al.	2410.08003	null
2024-10-10	SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture	Jiayi Han et.al.	2410.07739	null
2024-10-10	Upcycling Large Language Models into Mixture of Experts	Ethan He et.al.	2410.07524	null
2024-10-09	MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts	Peng Jin et.al.	2410.07348	link
2024-10-09	Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders	David Noever et.al.	2410.06462	null
2024-10-09	Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs	Ruijia Niu et.al.	2410.06431	null
2024-10-08	Probing the Robustness of Theory of Mind in Large Language Models	Christian Nickel et.al.	2410.06271	null
2024-10-08	MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More	Wei Huang et.al.	2410.06270	link
2024-10-08	Aria: An Open Multimodal Native Mixture-of-Experts Model	Dongxu Li et.al.	2410.05993	link
2024-10-08	Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models	Siqi Wang et.al.	2410.05661	null
2024-10-07	Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild	Xinyu Zhao et.al.	2410.05357	link
2024-10-07	Multimodal Fusion Strategies for Mapping Biophysical Landscape Features	Lucia Gordon et.al.	2410.04833	link
2024-10-06	Realizing Video Summarization from the Path of Language-based Semantic Understanding	Kuan-Chen Mu et.al.	2410.04511	null
2024-10-09	Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding	Wei Wu et.al.	2410.03553	null
2024-10-04	Exploring the Benefit of Activation Sparsity in Pre-training	Zhengyan Zhang et.al.	2410.03440	link
2024-10-03	MLP-KAN: Unifying Deep Representation and Function Learning	Yunhong He et.al.	2410.03027	link
2024-10-03	On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions	Huy Nguyen et.al.	2410.02935	null
2024-10-03	Neutral residues: revisiting adapters for model extension	Franck Signe Talla et.al.	2410.02744	null
2024-10-03	Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping	Ziye Huang et.al.	2410.02475	null
2024-10-03	MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction	Zhaojian Yu et.al.	2410.02241	null
2024-10-03	Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts	Minh Le et.al.	2410.02200	null
2024-10-04	Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices	Andres Potapczynski et.al.	2410.02117	link
2024-10-04	EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing	Haotian Sun et.al.	2410.02098	null
2024-10-02	Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL	Ghada Sokar et.al.	2410.01930	null
2024-10-02	Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models	Shayekh Bin Islam et.al.	2410.01782	link
2024-10-02	Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging	Tingfeng Hui et.al.	2410.01610	null
2024-10-02	The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs	Hong Li et.al.	2410.01417	null
2024-10-01	MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards	Sheng Wang et.al.	2410.00938	null
2024-10-01	UniAdapt: A Universal Adapter for Knowledge Calibration	Tai D. Nguyen et.al.	2410.00454	null
2024-10-01	Robust Traffic Forecasting against Spatial Shift over Years	Hongjun Wang et.al.	2410.00373	link
2024-09-29	IDEA: An Inverse Domain Expert Adaptation Based Active DNN IP Protection Method	Chaohui Xu et.al.	2410.00059	null
2024-09-30	MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning	Haotian Zhang et.al.	2409.20566	null
2024-10-02	CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling	Jihai Zhang et.al.	2409.19291	link
2024-09-27	SciDFM: A Large Language Model with Mixture-of-Experts for Science	Liangtai Sun et.al.	2409.18412	null
2024-09-26	Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE	Xun Zhu et.al.	2409.17508	link
2024-09-26	A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow Prediction	Guangyu Wang et.al.	2409.17440	link
2024-09-24	Leveraging Mixture of Experts for Improved Speech Deepfake Detection	Viola Negroni et.al.	2409.16077	null
2024-10-02	Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts	Xiaoming Shi et.al.	2409.16040	link
2024-09-24	Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM	Fengrun Zhang et.al.	2409.15905	null
2024-09-24	Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G Networks	Jiayi He et.al.	2409.15695	null
2024-09-23	A Gated Residual Kolmogorov-Arnold Networks for Mixtures of Experts	Hugo Inzirillo et.al.	2409.15161	link
2024-09-23	Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond	Hong Chen et.al.	2409.14993	null
2024-09-21	Routing in Sparsely-gated Language Models responds to Context	Stefan Arnold et.al.	2409.14107	null
2024-09-20	On-device Collaborative Language Modeling via a Mixture of Generalists and Specialists	Dongyang Fan et.al.	2409.13931	link
2024-09-20	Multi-omics data integration for early diagnosis of hepatocellular carcinoma (HCC) using machine learning	Annette Spooner et.al.	2409.13791	null
2024-09-19	Robust Audiovisual Speech Recognition Models with Mixture-of-Experts	Yihan Wu et.al.	2409.12370	null
2024-09-18	GRIN: GRadient-INformed MoE	Liyuan Liu et.al.	2409.12136	null
2024-09-18	Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0	Zhiyong Wang et.al.	2409.11909	null
2024-09-17	LPT++: Efficient Training on Mixture of Long-tailed Experts	Bowen Dong et.al.	2409.11323	null
2024-09-19	LOLA -- An Open-Source Massively Multilingual Large Language Model	Nikit Srivastava et.al.	2409.11272	link
2024-09-16	Adaptive Segmentation-Based Initialization for Steered Mixture of Experts Image Regression	Yi-Hsin Li et.al.	2409.10101	null
2024-09-14	MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving	Enming Zhang et.al.	2409.07267	link
2024-09-10	DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models	Maryam Akhavan Aghdam et.al.	2409.06669	null
2024-09-10	STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning	Jaeseong Lee et.al.	2409.06211	null
2024-09-10	VE: Modeling Multivariate Time Series Correlation with Variate Embedding	Shangjiong Wang et.al.	2409.06169	link
2024-09-09	Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models	Hongyang Lei et.al.	2409.05929	null
2024-09-09	Optical Spiking Neurons Enable High-Speed and Energy-Efficient Optical Neural Networks	Bo Xu et.al.	2409.05726	null
2024-09-09	Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection	Tianwu Lei et.al.	2409.05611	null
2024-09-05	Interpretable mixture of experts for time series prediction under recurrent and non-recurrent conditions	Zemian Ke et.al.	2409.03282	null
2024-09-05	ChartMoE: Mixture of Expert Connector for Advanced Chart Understanding	Zhengzhuo Xu et.al.	2409.03277	null
2024-09-05	xLAM: A Family of Large Action Models to Empower AI Agent Systems	Jianguo Zhang et.al.	2409.03215	link
2024-09-04	Configurable Foundation Models: Building LLMs from a Modular Perspective	Chaojun Xiao et.al.	2409.02877	null
2024-09-04	Pluralistic Salient Object Detection	Xuelu Feng et.al.	2409.02368	null
2024-09-03	OLMoE: Open Mixture-of-Experts Language Models	Niklas Muennighoff et.al.	2409.02060	link
2024-09-05	Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model	Hukai Huang et.al.	2409.02050	null
2024-09-02	Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning	Soumajyoti Sarkar et.al.	2409.01483	null
2024-09-02	Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching	Sungmin Yun et.al.	2409.01141	null
2024-09-04	Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack	Guanzhong Chen et.al.	2409.00960	link
2024-09-02	Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts	Youngseog Chung et.al.	2409.00879	null
2024-08-29	Gradient-free variational learning with conditional mixture networks	Conor Heins et.al.	2408.16429	link
2024-08-28	Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models	Yuncheng Yang et.al.	2408.15915	link
2024-08-28	Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts	Nikolas Gritsch et.al.	2408.15901	null
2024-08-28	LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation	Fangxun Shu et.al.	2408.15881	link
2024-08-28	Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts	Lean Wang et.al.	2408.15664	null
2024-08-27	Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis	Sakhinana Sagar Srinivas et.al.	2408.15305	null
2024-08-27	MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce	Hao Jiang et.al.	2408.14968	null
2024-08-24	Advancing Enterprise Spatio-Temporal Forecasting Applications: Data Mining Meets Instruction Tuning of Language Models For Multi-modal Time Series Analysis in Low-Resource Settings	Sagar Srinivas Sakhinana et.al.	2408.13622	null
2024-08-23	The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities	Venkatesh Balavadhani Parthasarathy et.al.	2408.13296	null
2024-08-23	Guiding IoT-Based Healthcare Alert Systems with Large Language Models	Yulan Gao et.al.	2408.13071	null
2024-08-23	DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation	Xiaowei Mao et.al.	2408.12809	null
2024-08-23	Multi-Treatment Multi-Task Uplift Modeling for Enhancing User Growth	Yuxiang Wei et.al.	2408.12803	null
2024-08-23	La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection	Hang Zou et.al.	2408.12793	null
2024-08-22	SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging	Mohammadreza Pourreza et.al.	2408.12733	null
2024-08-22	Jamba-1.5: Hybrid Transformer-Mamba Models at Scale	Jamba Team et.al.	2408.12570	null
2024-08-22	Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators	Dingkang Yang et.al.	2408.12325	link
2024-08-21	MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing	Hao Zhou et.al.	2408.11396	link
2024-08-21	KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting?	Xiao Han et.al.	2408.11306	link
2024-08-21	FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts	Hanzi Mei et.al.	2408.11304	null
2024-08-20	Unboxing Occupational Bias: Grounded Debiasing LLMs with U.S. Labor Data	Atmika Gorti et.al.	2408.11247	null
2024-08-20	Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting	Jianxiang Zhou et.al.	2408.10822	link
2024-08-20	AnyGraph: Graph Foundation Model in the Wild	Lianghao Xia et.al.	2408.10700	link
2024-08-20	HMoE: Heterogeneous Mixture of Experts for Language Modeling	An Wang et.al.	2408.10681	null
2024-08-19	AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference	Shuzhang Zhong et.al.	2408.10284	link
2024-08-17	FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models	Xiaochen Wang et.al.	2408.10276	link
2024-08-19	Customizing Language Models with Instance-wise LoRA for Sequential Recommendation	Xiaoyu Kong et.al.	2408.10159	link
2024-08-19	A Unified Framework for Iris Anti-Spoofing: Introducing IrisGeneral Dataset and Masked-MoE Method	Hang Zou et.al.	2408.09752	null
2024-08-16	Integrating Multi-view Analysis: Multi-view Mixture-of-Expert for Textual Personality Detection	Haohao Zhu et.al.	2408.08551	null
2024-08-17	BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts	Qizhen Zhang et.al.	2408.08274	null
2024-08-14	Beyond Inter-Item Relations: Dynamic Adaptive Mixture-of-Experts for LLM-Based Sequential Recommendation	CanYi Liu et.al.	2408.07427	null
2024-08-13	A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning	Prateek Yadav et.al.	2408.07057	null
2024-08-13	Layerwise Recurrent Router for Mixture-of-Experts	Zihan Qiu et.al.	2408.06793	link
2024-08-13	AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies	Bo-Wen Zhang et.al.	2408.06567	null
2024-08-10	HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou	Xu Wang et.al.	2408.05430	null
2024-08-08	Understanding the Performance and Estimating the Cost of LLM Fine-Tuning	Yuchen Xia et.al.	2408.04693	link
2024-08-08	Partial Experts Checkpoint: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training	Weilin Cai et.al.	2408.04307	null
2024-08-08	LaDiMo: Layer-wise Distillation Inspired MoEfier	Sungyoon Kim et.al.	2408.04278	null
2024-08-07	MoExtend: Tuning New Experts for Modality and Task Extension	Shanshan Zhong et.al.	2408.03511	link
2024-08-05	Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization	Changtao Miao et.al.	2408.02306	null
2024-08-02	HMDN: Hierarchical Multi-Distribution Network for Click-Through Rate Prediction	Xingyu Lou et.al.	2408.01332	null
2024-08-01	Multimodal Fusion and Coherence Modeling for Video Topic Segmentation	Hai Yu et.al.	2408.00365	null
2024-08-12	MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts	Xi Victoria Lin et.al.	2407.21770	null
2024-07-31	PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning	Min Jae Jung et.al.	2407.21571	null
2024-07-30	Distribution Learning for Molecular Regression	Nima Shoghi et.al.	2407.20475	null
2024-07-29	Time series forecasting with high stakes: A field study of the air cargo industry	Abhinav Garg et.al.	2407.20192	null
2024-07-30	Mixture of Nested Experts: Adaptive Processing of Visual Tokens	Gagan Jain et.al.	2407.19985	null
2024-07-28	Mixture of Modular Experts: Distilling Knowledge from a Multilingual Teacher into Specialized Modular Language Models	Mohammed Al-Maamari et.al.	2407.19610	link
2024-07-26	Wolf: Captioning Everything with a World Summarization Framework	Boyi Li et.al.	2407.18908	null
2024-07-26	MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition	Chang Liu et.al.	2407.18616	link
2024-07-26	Dynamic Language Group-Based MoE: Enhancing Efficiency and Flexibility for Code-Switching Speech Recognition	Hukai Huang et.al.	2407.18581	link
2024-07-25	How Lightweight Can A Vision Transformer Be	Jen Hong Tan et.al.	2407.17783	null
2024-07-24	Exploring Domain Robust Lightweight Reward Models based on Router Mechanism	Hyuk Namgoong et.al.	2407.17546	null
2024-07-24	M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis	Junyu Li et.al.	2407.17267	link
2024-07-25	Cheems: Wonderful Matrices More Efficient and More Effective Architecture	Jingze Shi et.al.	2407.16958	null
2024-07-22	Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget	Vikash Sehwag et.al.	2407.15811	link
2024-07-22	Norface: Improving Facial Expression Analysis by Identity Normalization	Hanwei Liu et.al.	2407.15617	link
2024-07-19	Mixture of Experts with Mixture of Precisions for Tuning Quality of Service	HamidReza Imani et.al.	2407.14417	null
2024-07-19	EVLM: An Efficient Vision-Language Model for Visual Understanding	Kaibing Chen et.al.	2407.14177	null
2024-07-19	Routing Experts: Learning to Route Dynamic Experts in Multi-modal Large Language Models	Qiong Wu et.al.	2407.14093	null
2024-07-18	Discussion: Effective and Interpretable Outcome Prediction by Training Sparse Mixtures of Linear Experts	Francesco Folino et.al.	2407.13526	null
2024-07-18	Mixture of Experts based Multi-task Supervise Learning from Crowds	Tao Han et.al.	2407.13268	null
2024-07-15	MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration	Yulin Ren et.al.	2407.10833	null
2024-07-18	Qwen2 Technical Report	An Yang et.al.	2407.10671	link
2024-07-15	Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering	Francesco Di Sario et.al.	2407.10389	null
2024-07-13	Low-Rank Interconnected Adaptation Across Layers	Yibo Zhong et.al.	2407.09946	link
2024-07-13	MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts	Zhenpeng Su et.al.	2407.09816	link
2024-07-12	Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts	Zeliang Zhang et.al.	2407.09590	null
2024-07-11	An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio	Siding Zeng et.al.	2407.08239	null
2024-07-10	MoVEInt: Mixture of Variational Experts for Learning Human-Robot Interactions from Demonstrations	Vignesh Prasad et.al.	2407.07636	link
2024-07-10	Swin SMT: Global Sequential Modeling in 3D Medical Image Segmentation	Szymon Płotka et.al.	2407.07514	link
2024-07-09	A Simple Architecture for Enterprise Large Language Model Applications based on Role based security and Clearance Levels using Retrieval-Augmented Generation or Mixture of Experts	Atilla Özgür et.al.	2407.06718	null
2024-07-06	SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation	Guoan Wang et.al.	2407.04938	null
2024-07-06	Completed Feature Disentanglement Learning for Multimodal MRIs Analysis	Tianling Liu et.al.	2407.04916	null
2024-07-05	YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation	Sungkyun Chang et.al.	2407.04822	link
2024-07-05	Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement	Yongji Wu et.al.	2407.04656	null
2024-07-05	MobileFlow: A Multimodal LLM For Mobile GUI Agent	Songqin Nong et.al.	2407.04346	null
2024-07-04	Mixture of A Million Experts	Xu Owen He et.al.	2407.04153	null
2024-07-02	Terminating Differentiable Tree Experts	Jonathan Thomm et.al.	2407.02060	null
2024-07-05	Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models	Zihan Wang et.al.	2407.01906	link
2024-07-01	Uncertainty Quantification in Table Structure Recognition	Kehinde Ajayi et.al.	2407.01731	link
2024-07-01	Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning	Yixiao Wang et.al.	2407.01531	null
2024-07-01	Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation	Nadezhda Chirkova et.al.	2407.01126	null
2024-07-01	Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs	Enshu Liu et.al.	2407.00945	link
2024-07-03	Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules	Xinglin Pan et.al.	2407.00599	link
2024-06-28	One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts	Ruochen Wang et.al.	2407.00256	link
2024-06-28	LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models	Renzhi Wang et.al.	2406.20030	null
2024-06-28	Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model	Longrong Yang et.al.	2406.19905	link
2024-06-28	SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR	Qiuming Zhao et.al.	2406.19706	link
2024-06-27	A Teacher Is Worth A Million Instructions	Nikhil Kothari et.al.	2406.19112	null
2024-06-27	Towards Personalized Federated Multi-scenario Multi-task Recommendation	Yue Ding et.al.	2406.18938	null
2024-06-26	Mixture of Experts in a Mixture of RL settings	Timon Willi et.al.	2406.18420	null
2024-06-26	A Closer Look into Mixture-of-Experts in Large Language Models	Ka Man Lo et.al.	2406.18219	link
2024-06-26	SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR	Shuaishuai Ye et.al.	2406.18021	null
2024-06-24	Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis Construction	Bruce Rushing et.al.	2406.17150	link
2024-06-24	LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training	Tong Zhu et.al.	2406.16554	link
2024-06-25	OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser	Jingze Shi et.al.	2406.16495	link
2024-06-24	Theory on Mixture-of-Experts in Continual Learning	Hongbo Li et.al.	2406.16437	null
2024-06-22	SimSMoE: Solving Representational Collapse via Similarity Measure	Giang Do et.al.	2406.15883	null
2024-06-20	Voice Disorder Analysis: a Transformer-based Approach	Alkis Koudounas et.al.	2406.14693	link
2024-06-19	Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation	Qian Chen et.al.	2406.13583	null
2024-06-19	AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models	Zihao Zeng et.al.	2406.13233	link
2024-06-18	Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts	Haoxiang Wang et.al.	2406.12845	link
2024-06-18	P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts	Yuhao Dan et.al.	2406.12548	null
2024-06-18	Variational Distillation of Diffusion Policies into Mixture of Experts	Hongyi Zhou et.al.	2406.12538	null
2024-06-18	GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory	Haoze Wu et.al.	2406.12375	link
2024-06-17	Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding	Ukyo Honda et.al.	2406.12060	link
2024-06-17	DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence	DeepSeek-AI et.al.	2406.11931	link
2024-06-17	Graph Knowledge Distillation to Mixture of Experts	Pavel Rumiantsev et.al.	2406.11919	link
2024-06-17	$\texttt{MoE-RBench}$ : Towards Building Reliable Language Models with Sparse Mixture-of-Experts	Guanjie Chen et.al.	2406.11353	link
2024-06-17	Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts	Tong Zhu et.al.	2406.11256	link
2024-06-14	Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion	Anke Tang et.al.	2406.09770	link
2024-06-13	DeepUnifiedMom: Unified Time-series Momentum Portfolio Construction via Multi-Task Learning with Multi-Gate Mixture of Experts	Joel Ong et.al.	2406.08742	link
2024-06-12	Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark	Pingzhi Li et.al.	2406.08155	link
2024-06-11	Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters	Yixin Song et.al.	2406.05955	null
2024-06-08	Flexible and Adaptable Summarization via Expertise Separation	Xiuying Chen et.al.	2406.05360	link
2024-06-07	MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter	Jitai Hao et.al.	2406.04984	link
2024-06-07	MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks	Xingkui Zhu et.al.	2406.04801	link
2024-06-05	Style Mixture of Experts for Expressive Text-To-Speech Synthesis	Ahad Jawaid et.al.	2406.03637	null
2024-06-05	Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach	Haoyu Han et.al.	2406.03464	null
2024-06-05	Continual Traffic Forecasting via Mixture of Experts	Sanghyun Lee et.al.	2406.03140	null
2024-06-05	Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models	Raeid Saqur et.al.	2406.02969	null
2024-06-04	Parrot: Multilingual Visual Instruction Tuning	Hai-Long Sun et.al.	2406.02539	link
2024-06-04	Demystifying the Compression of Mixture-of-Experts Through a Unified Framework	Shwai He et.al.	2406.02500	link
2024-06-02	Reservoir History Matching of the Norne field with generative exotic priors and a coupled Mixture of Experts -- Physics Informed Neural Operator Forward Model	Clement Etienam et.al.	2406.00889	link
2024-06-01	A Gaussian Process-based Streaming Algorithm for Prediction of Time Series With Regimes and Outliers	Daniel Waxman et.al.	2406.00570	link
2024-06-01	Optimizing 6G Integrated Sensing and Communications (ISAC) via Expert Networks	Jiacheng Wang et.al.	2406.00408	null
2024-05-30	Low-dimensional approximations of the conditional law of Volterra processes: a non-positive curvature approach	Reza Arabpour et.al.	2405.20094	null
2024-06-02	MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors	Renzhi Wang et.al.	2405.19086	null
2024-06-02	Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design	Markus J. Buehler et.al.	2405.19076	link
2024-05-29	Learning Mixture-of-Experts for General-Purpose Black-Box Discrete Optimization	Shengcai Liu et.al.	2405.18884	link
2024-05-29	MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models	Taehyun Kim et.al.	2405.18832	null
2024-05-29	Yuan 2.0-M32: Mixture of Experts with Attention Router	Shaohua Wu et.al.	2405.17976	link
2024-05-28	LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design	Rui Kong et.al.	2405.17741	null
2024-05-27	Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node	Andreas Charalampopoulos et.al.	2405.16836	link
2024-05-26	Mixture of Experts Using Tensor Products	Zhan Su et.al.	2405.16671	link
2024-05-30	A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts	Mohammed Nowaz Rabbani Chowdhury et.al.	2405.16646	null
2024-05-26	Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation	Rongyu Zhang et.al.	2405.16486	link
2024-05-25	MoEUT: Mixture-of-Experts Universal Transformers	Róbert Csordás et.al.	2405.16039	link
2024-05-23	Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training	Xianzhi Du et.al.	2405.15052	link
2024-05-23	Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast	Chufan Shi et.al.	2405.14507	link
2024-05-23	Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models	Yongxin Guo et.al.	2405.14297	link
2024-05-23	Graph Sparsification via Mixture of Graphs	Guibin Zhang et.al.	2405.14260	link
2024-05-23	Statistical Advantages of Perturbing Cosine Router in Sparse Mixture of Experts	Huy Nguyen et.al.	2405.14131	null
2024-05-23	Mixture of Experts Meets Prompt-Based Continual Learning	Minh Le et.al.	2405.14124	link
2024-05-22	Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts	Huy Nguyen et.al.	2405.13997	null
2024-05-22	xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token	Xin Cheng et.al.	2405.13792	link
2024-05-24	MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models	Jingwei Xu et.al.	2405.13053	link
2024-05-21	Optimizing Generative AI Networking: A Dual Perspective with Multi-Agent Systems and Mixture of Experts	Ruichen Zhang et.al.	2405.12472	null
2024-05-21	Ensemble and Mixture-of-Experts DeepONets For Operator Learning	Ramansh Sharma et.al.	2405.11907	null
2024-05-19	Learning More Generalized Experts by Merging Experts in Mixture-of-Experts	Sejik Park et.al.	2405.11530	null
2024-05-18	Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts	Yunxin Li et.al.	2405.11273	link
2024-05-16	Many Hands Make Light Work: Task-Oriented Dialogue System with Module-Based Mixture-of-Experts	Ruolin Su et.al.	2405.09744	null
2024-05-15	M $^4$ oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts	Yufeng Jiang et.al.	2405.09446	link
2024-05-13	Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition	Zhiyong Yang et.al.	2405.07780	link
2024-05-07	SUTRA: Scalable Multilingual Language Model Architecture	Abhijit Bendale et.al.	2405.06694	null
2024-05-09	A Mixture of Experts Approach to 3D Human Motion Prediction	Edmund Shieh et.al.	2405.06088	link
2024-05-09	A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds	Christopher Z. Cui et.al.	2405.06059	null
2024-05-09	EWMoE: An effective model for global weather forecasting with mixture-of-experts	Lihao Gan et.al.	2405.06004	link
2024-05-09	CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts	Jiachen Li et.al.	2405.05949	link
2024-05-16	DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model	DeepSeek-AI et.al.	2405.04434	link
2024-05-07	Enhancing Physical Layer Communication Security through Generative AI with Mixture of Experts	Changyuan Zhao et.al.	2405.04198	null
2024-05-06	Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training	Zexuan Zhong et.al.	2405.03133	null
2024-05-06	WDMoE: Wireless Distributed Large Language Models with Mixture of Experts	Nan Xue et.al.	2405.03131	null

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 2,256 Commits
.github		.github
assets		assets
docs		docs
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
daily_arxiv.py		daily_arxiv.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Updated on 2025.01.24

inference

MoE

About

Releases

Packages

Languages

License

Toseic/LLM-inference-arxiv-daily

Folders and files

Latest commit

History

Repository files navigation

Updated on 2025.01.24

inference

MoE

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages