Skip to content

🎓Automatically Update LLM inference systems Papers Daily using Github Actions (Update Every 12th hours)

License

Notifications You must be signed in to change notification settings

Toseic/LLM-inference-arxiv-daily

 
 

Repository files navigation

Contributors Forks Stargazers Issues

Updated on 2025.01.24

inference

Publish Date Title Authors PDF Code
2025-01-20 Glinthawk: A Two-Tiered Architecture for High-Throughput LLM Inference Pouya Hamadanian et.al. 2501.11779 link
2025-01-20 Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas Nishant Balepur et.al. 2501.11549 link
2025-01-19 GREEN-CODE: Optimizing Energy Efficiency in Large Language Models for Code Generation Shashikant Ilager et.al. 2501.11006 null
2025-01-17 A Survey on LLM Test-Time Compute via Search: Tasks, LLM Profiling, Search Algorithms, and Relevant Frameworks Xinzhe Li et.al. 2501.10069 null
2025-01-16 Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition Takaaki Hori et.al. 2501.09258 null
2025-01-15 Guiding Retrieval using LLM-based Listwise Rankers Mandeep Rathee et.al. 2501.09186 link
2025-01-14 Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings Paul Joe Maliakel et.al. 2501.08219 null
2025-01-14 PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving Ahmet Caner Yüzügüler et.al. 2501.08192 null
2025-01-14 Hierarchical Autoscaling for Large Language Model Serving with Chiron Archit Patke et.al. 2501.08090 null
2025-01-12 MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference Wenxuan Zeng et.al. 2501.06807 null
2025-01-05 TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms Jovan Stojkovic et.al. 2501.02600 null
2025-01-04 AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference Zhuomin He et.al. 2501.02336 link
2025-01-03 Efficient LLM Inference with Activation Checkpointing and Hybrid Caching Sanghyeon Lee et.al. 2501.01792 null
2025-01-03 BlockDialect: Block-wise Fine-grained Mixed Format for Energy-Efficient LLM Inference Wonsuk Jang et.al. 2501.01144 null
2025-01-02 FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving Zihao Ye et.al. 2501.01005 link
2024-12-23 Highly Optimized Kernels and Fine-Grained Codebooks for LLM Inference on Arm CPUs Dibakar Gope et.al. 2501.00032 link
2024-12-29 TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication Zongwu Wang et.al. 2412.20501 link
2024-12-28 LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System Hyucksung Kwon et.al. 2412.20166 null
2024-12-19 GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors Chengming Zhang et.al. 2412.19829 null
2025-01-02 A Survey on Large Language Model Acceleration based on KV Cache Management Haoyang Li et.al. 2412.19442 link
2024-12-27 An Engorgio Prompt Makes Large Language Model Babble on Jianshuo Dong et.al. 2412.19394 link
2024-12-25 Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference Libo Zhang et.al. 2412.18934 null
2024-12-21 SYMPHONY: Improving Memory Management for LLM Inference Workloads Saurabh Agarwal et.al. 2412.16434 null
2024-12-20 WebLLM: A High-Performance In-Browser LLM Inference Engine Charlie F. Ruan et.al. 2412.15803 link
2024-12-18 A Survey on LLM Inference-Time Self-Improvement Xiangjue Dong et.al. 2412.14352 link
2024-12-18 Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models Seungeun Oh et.al. 2412.12687 null
2024-12-17 A System for Microserving of LLMs Hongyi Jin et.al. 2412.12488 null
2024-12-16 CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation Hongxuan Zhang et.al. 2412.11741 null
2024-12-15 Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning Yun Qu et.al. 2412.11120 link
2024-12-15 NITRO: LLM Inference on Intel Laptop NPUs Anthony Fei et.al. 2412.11053 link
2024-12-13 SCBench: A KV Cache-Centric Analysis of Long-Context Methods Yucheng Li et.al. 2412.10319 null
2024-12-17 TurboAttention: Efficient Attention Approximation For High Throughputs LLMs Hao Kang et.al. 2412.08585 null
2024-12-11 Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths Naryeong Kim et.al. 2412.08281 null
2024-12-12 TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch Xingchen Song et.al. 2412.08237 null
2024-12-09 Asynchronous LLM Function Calling In Gim et.al. 2412.07017 null
2024-12-09 SparseAccelerate: Efficient Long-Context Inference for Mid-Range GPUs James Vo et.al. 2412.06198 null
2024-12-08 XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference Weizhuo Li et.al. 2412.05896 null
2024-12-06 GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments Yanyu Chen et.al. 2412.04788 null
2024-12-03 Multi-Bin Batching for Increasing LLM Inference Throughput Ozgur Guldogan et.al. 2412.04504 null
2024-11-29 BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching Zhen Zheng et.al. 2412.03594 null
2024-12-03 Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity Da Ma et.al. 2412.02252 null
2024-12-02 PLD+: Accelerating LLM inference by leveraging Language Model Artifacts Shwetha Somasundaram et.al. 2412.01447 null
2024-12-02 Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking Marco Federici et.al. 2412.01380 null
2024-12-05 RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy Geonho Lee et.al. 2412.01129 null
2024-12-02 TruncFormer: Private LLM Inference Using Only Truncations Patrick Yubeaton et.al. 2412.01042 null
2024-11-29 A dynamic parallel method for performance optimization on hybrid CPUs Luo Yu et.al. 2411.19542 null
2024-12-03 Puzzle: Distillation-Based NAS for Inference-Optimized LLMs Akhiad Bercovich et.al. 2411.19146 null
2024-11-29 InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks Xinyao Zheng et.al. 2411.18191 null
2024-11-28 MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache Akshat Sharma et.al. 2411.18077 null
2024-11-24 Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments Nikoleta Iliakopoulou et.al. 2411.17741 null
2024-11-26 PIM-AI: A Novel Architecture for High-Efficiency LLM Inference Cristobal Ortega et.al. 2411.17309 null
2024-11-26 Star Attention: Efficient LLM Inference over Long Sequences Shantanu Acharya et.al. 2411.17116 link
2024-11-26 Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation Chaoyi Jiang et.al. 2411.17089 null
2024-11-25 MixPE: Quantization and Hardware Co-design for Efficient LLM Inference Yu Zhang et.al. 2411.16158 null
2024-11-24 eFedLLM: Efficient LLM Inference Based on Federated Learning Shengwen Ding et.al. 2411.16003 null
2024-11-24 Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format Chao Fang et.al. 2411.15982 null
2024-11-24 Task Scheduling for Efficient Inference of Large Language Models on Single Moderate GPU Systems Wenxiang Lin et.al. 2411.15715 null
2024-11-22 XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models Yixin Dong et.al. 2411.15100 null
2024-11-21 Disentangling Memory and Reasoning Ability in Large Language Models Mingyu Jin et.al. 2411.13504 link
2024-11-20 Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding Hyun Ryu et.al. 2411.13157 null
2024-11-21 LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts Zhuohan Gu et.al. 2411.13009 null
2024-11-15 An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2 Pepijn de Reus et.al. 2411.12758 link
2024-11-19 SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference Jiho Shin et.al. 2411.12692 null
2024-11-18 MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs Shiyi Cao et.al. 2411.11217 null
2024-11-15 AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference Janghwan Lee et.al. 2411.09909 null
2024-11-14 Squeezed Attention: Accelerating Long Context Length LLM Inference Coleman Hooper et.al. 2411.09688 link
2024-11-15 Communication Compression for Tensor Parallel LLM Inference Jan Hansen-Palmus et.al. 2411.09510 null
2024-11-14 Pie: Pooling CPU Memory for LLM Inference Yi Xu et.al. 2411.09317 null
2024-11-12 Towards Low-bit Communication for Tensor Parallel LLM Inference Harry Dong et.al. 2411.07942 null
2024-11-12 The Effect of Scheduling and Preemption on the Efficiency of LLM Inference Serving Kyoungmin Kim et.al. 2411.07447 null
2024-11-08 AcceLLM: Accelerating LLM Inference using Redundancy for Load Balancing and Data Locality Ilias Bournias et.al. 2411.05555 null
2024-11-07 Hardware and Software Platform Inference Cheng Zhang et.al. 2411.05197 null
2024-11-07 SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference Gabriele Oliaro et.al. 2411.04975 null
2024-11-05 CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration Hongpeng Jin et.al. 2411.02829 null
2024-11-04 RAGViz: Diagnose and Visualize Retrieval-Augmented Generation Tevin Wang et.al. 2411.01751 link
2024-11-06 HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference Peng Tang et.al. 2411.01433 null
2024-11-02 RA-WEBs: Remote Attestation for WEB services Kosei Akama et.al. 2411.01340 null
2024-11-02 NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference Xuanlin Jiang et.al. 2411.01142 null
2024-11-01 LLM-Based Misconfiguration Detection for AWS Serverless Computing Jinfeng Wen et.al. 2411.00642 null
2024-11-04 ReverseNER: A Self-Generated Example-Driven Framework for Zero-Shot Named Entity Recognition with Large Language Models Anbang Wang et.al. 2411.00533 null
2024-11-01 Attention Tracker: Detecting Prompt Injection Attacks in LLMs Kuo-Han Hung et.al. 2411.00348 null
2024-10-31 LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators Krishna Teja Chitty-Venkata et.al. 2411.00136 link
2024-10-31 Interpretable Language Modeling via Induction-head Ngram Models Eunji Kim et.al. 2411.00066 link
2024-10-31 ALISE: Accelerating Large Language Model Serving with Speculative Scheduling Youpeng Zhao et.al. 2410.23537 null
2024-10-30 BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference Junqi Zhao et.al. 2410.23079 link
2024-10-29 Scaling LLM Inference with Optimized Sample Compute Allocation Kexun Zhang et.al. 2410.22480 link
2024-10-29 SVIP: Towards Verifiable Inference of Open-source Large Language Models Yifan Sun et.al. 2410.22307 null
2024-10-28 ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Hanshi Sun et.al. 2410.21465 link
2024-10-27 FIRP: Faster LLM inference via future intermediate representation prediction Pengfei Wu et.al. 2410.20488 null
2024-10-29 Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management Tuowei Wang et.al. 2410.19274 null
2024-10-24 Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design Ruisi Cai et.al. 2410.19123 link
2024-10-30 Dynamic Vocabulary Pruning in Early-Exit LLMs Jort Vincenti et.al. 2410.18952 link
2024-10-25 A Survey on Speech Large Language Models Jing Peng et.al. 2410.18908 null
2024-10-24 BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching Peizhuang Cong et.al. 2410.18701 null
2024-10-25 Fast Inference for Augmented Large Language Models Rana Shahout et.al. 2410.18248 null
2024-10-23 POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference Aditya K Kamath et.al. 2410.18038 null
2024-10-22 FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs Haoran Lin et.al. 2410.16663 null
2024-10-22 Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency Prafulla Kumar Choubey et.al. 2410.16597 null
2024-10-20 EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models Junhao Hu et.al. 2410.15332 null
2024-10-19 IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System Minseok Seo et.al. 2410.15008 null
2024-10-23 Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching Jie Peng et.al. 2410.14740 null
2024-10-18 A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference You Wu et.al. 2410.14442 link
2024-10-18 Revisiting SLO and Goodput Metrics in LLM Serving Zhibin Wang et.al. 2410.14257 null
2024-10-17 RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs Jiatan Huang et.al. 2410.13987 null
2024-10-17 Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs Tianyu Guo et.al. 2410.13835 link
2024-10-17 Progressive Mixed-Precision Decoding for Efficient LLM Inference Hao Mark Chen et.al. 2410.13461 null
2024-10-17 Data Defenses Against Large Language Models William Agnew et.al. 2410.13138 link
2024-10-19 In-context KV-Cache Eviction for LLMs via Attention-Gate Zihao Zeng et.al. 2410.12876 null
2024-10-10 RecurFormer: Not All Transformer Heads Need Self-Attention Ruiqing Yan et.al. 2410.12850 null
2024-10-16 Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning Huiwen Wu et.al. 2410.12130 null
2024-10-15 Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix Yingyu Liang et.al. 2410.11261 null
2024-10-14 DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads Guangxuan Xiao et.al. 2410.10819 link
2024-10-16 SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization Akrit Mudvari et.al. 2410.10759 null
2024-10-12 Power-Softmax: Towards Secure LLM Inference over Encrypted Data Itamar Zimerman et.al. 2410.09457 null
2024-10-09 SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration Heming Xia et.al. 2410.06916 link
2024-10-08 ParallelSpec: Parallel Drafter for Efficient Speculative Decoding Zilin Xiao et.al. 2410.05589 null
2024-10-06 RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference Yige Xu et.al. 2410.04519 link
2024-10-14 Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective Jinhao Li et.al. 2410.04466 null
2024-10-04 SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation Aurick Qiao et.al. 2410.03960 null
2024-10-04 EXAQ: Exponent Aware Quantization For LLMs Acceleration Moran Shkolnik et.al. 2410.03185 link
2024-10-03 LLMCO2: Advancing Accurate Carbon Footprint Prediction for LLM Inferences Zhenxiao Fu et.al. 2410.02950 null
2024-10-03 Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration Yun Qu et.al. 2410.02511 link
2024-10-03 LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services Małgorzata Łazuka et.al. 2410.02425 link
2024-10-04 Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation Xiaoqun Liu et.al. 2410.02220 null
2024-10-02 Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads Yuxiang Huang et.al. 2410.01805 link
2024-10-02 ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving Yifan Qiao et.al. 2410.01228 null
2024-10-01 TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Zonghang Li et.al. 2410.00531 link
2024-09-30 The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems Linke Song et.al. 2409.20002 null
2024-09-26 Control Industrial Automation System with Large Language Models Yuchen Xia et.al. 2409.18009 link
2024-09-26 Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores Shaobo Ma et.al. 2409.17870 null
2024-09-25 Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction Zhenmei Shi et.al. 2409.17422 link
2024-09-25 Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations Amey Agrawal et.al. 2409.17264 null
2024-09-25 Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference Zongyue Qin et.al. 2409.16560 null
2024-09-25 AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization Yifan Tan et.al. 2409.16546 link
2024-09-23 Eagle: Efficient Training-Free Router for Multi-LLM Inference Zesen Zhao et.al. 2409.15518 null
2024-09-24 UELLM: A Unified and Efficient Approach for LLM Inference Serving Yiyuan He et.al. 2409.14961 null
2024-09-22 RACOON: An LLM-based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph Linxi Wei et.al. 2409.14556 null
2024-09-16 Do Large Language Models Need a Content Delivery Network? Yihua Cheng et.al. 2409.13761 link
2024-09-19 PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs) Mahmoud Nazzal et.al. 2409.12699 link
2024-09-12 LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs Han Xu et.al. 2409.11424 null
2024-09-04 ISO: Overlap of Computation and Communication within Seqenence For LLM Inference Bin Xiao et.al. 2409.11155 null
2024-09-18 RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Di Liu et.al. 2409.10516 link
2024-09-08 InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference Xiurui Pan et.al. 2409.04992 null
2024-09-07 Achieving Peak Performance for Large Language Models: A Systematic Review Zhyar Rzgar K Rostam et.al. 2409.04833 null
2024-09-06 A First Look At Efficient And Secure On-Device LLM Inference Against KV Leakage Huan Yang et.al. 2409.04040 null
2024-09-13 Confidential Computing on nVIDIA H100 GPU: A Performance Benchmark Study Jianwei Zhu et.al. 2409.03992 null
2024-09-05 Sirius: Contextual Sparsity with Correction for Efficient LLMs Yang Zhou et.al. 2409.03856 link
2024-08-31 HSF: Defending against Jailbreak Attacks with Hidden State Filtering Cheng Qian et.al. 2409.03788 null
2024-09-03 Contemporary Model Compression on Large Language Models Inference Dong Liu et.al. 2409.01990 link
2024-09-02 CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification Junhui He et.al. 2409.01366 null
2024-09-04 Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference Barys Liskavets et.al. 2409.01227 null
2024-09-01 Research on LLM Acceleration Using the High-Performance RISC-V Processor "Xiangshan" (Nanhu Version) Based on the Open-Source Matrix Instruction Set Extension (Vector Dot Product) Xu-Hao Chen et.al. 2409.00661 null
2024-08-28 Decentralized LLM Inference over Edge Networks with Energy Harvesting Aria Khoshsirat et.al. 2408.15907 null
2024-08-28 Efficient LLM Scheduling by Learning to Rank Yichao Fu et.al. 2408.15792 link
2024-08-28 Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation Lujun Gui et.al. 2408.15562 null
2024-08-22 NanoFlow: Towards Optimal Large Language Model Serving Throughput Kan Zhu et.al. 2408.12757 link
2024-09-04 Parallel Speculative Decoding with Adaptive Draft Length Tianyu Liu et.al. 2408.11850 link
2024-08-21 MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models Elias Frantar et.al. 2408.11743 link
2024-08-20 Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models Artem Vazhentsev et.al. 2408.10692 null
2024-08-19 PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars Sumanth Prabhu et.al. 2408.08869 null
2024-08-23 ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models Chao Zeng et.al. 2408.08554 link
2024-08-14 LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference Seungjae Moon et.al. 2408.07326 null
2024-08-12 LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration Zhiwen Mo et.al. 2408.06003 null
2024-08-10 LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale Jaehong Cho et.al. 2408.05499 link
2024-08-05 SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving Andreas Kosmas Kakolyris et.al. 2408.05235 null
2024-08-08 Towards SLO-Optimized LLM Serving via Automatic Inference Engine Tuning Ke Cheng et.al. 2408.04323 null
2024-08-07 Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference Zeyu Zhang et.al. 2408.04107 null
2024-08-07 MPC-Minimized Secure LLM Inference Deevashwer Rathee et.al. 2408.03561 null
2024-08-05 Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning Hao Zhou et.al. 2408.02549 null
2024-08-02 The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines Matias Martinez et.al. 2408.01050 null
2024-08-01 DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency Jovan Stojkovic et.al. 2408.00741 null
2024-08-01 Designing Efficient LLM Accelerators for Edge Devices Jude Haris et.al. 2408.00462 null
2024-08-01 Large Language Model (LLM)-enabled In-context Learning for Wireless Network Optimization: A Case Study of Power Control Hao Zhou et.al. 2408.00214 null
2024-07-23 ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency Yuhang Yao et.al. 2408.00008 null
2024-08-01 Responsive ML inference in multi-tenanted environments using AQUA Abhishek Vijaya Kumar et.al. 2407.21255 null
2024-07-25 An Efficient Inference Framework for Early-exit Large Language Models Ruijie Miao et.al. 2407.20272 null
2024-07-29 Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost Sania Nayab et.al. 2407.19825 null
2024-07-29 Teaching LLMs at Charles University: Assignments and Activities Jindřich Helcl et.al. 2407.19798 null
2024-07-22 RazorAttention: Efficient KV Cache Compression Through Retrieval Heads Hanlin Tang et.al. 2407.15891 null
2024-07-22 vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving Jiale Xu et.al. 2407.15309 link
2024-07-19 LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference Qichen Fu et.al. 2407.14057 null
2024-07-17 Struct-X: Enhancing Large Language Models Reasoning with Structured Data Xiaoyu Tan et.al. 2407.12522 null
2024-07-17 LLM Inference Serving: Survey of Recent Advances and Opportunities Baolin Li et.al. 2407.12391 null
2024-07-17 Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models Ayush Kaushal et.al. 2407.12327 link
2024-07-16 PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation Branden Butler et.al. 2407.11798 null
2024-07-21 Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference Yuan Feng et.al. 2407.11550 link
2024-07-15 Fast Matrix Multiplications for Lookup Table-Quantized LLMs Han Guo et.al. 2407.10960 link
2024-07-12 Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference Zongyue Qin et.al. 2407.09722 null
2024-07-09 Metron: Holistic Performance Evaluation Framework for LLM Inference Systems Amey Agrawal et.al. 2407.07000 link
2024-07-08 Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU Daliang Xu et.al. 2407.05858 link
2024-07-07 A Queueing Theoretic Perspective on Low-Latency LLM Inference with Variable Token Length Yuqing Yang et.al. 2407.05347 null
2024-07-05 Corki: Enabling Real-time Embodied AI Robots via Algorithm-Architecture Co-Design Yiyang Huang et.al. 2407.04292 link
2024-07-04 Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems Grant Wilkins et.al. 2407.04014 null
2024-07-02 MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention Huiqiang Jiang et.al. 2407.02490 link
2024-06-29 Teola: Towards End-to-End Optimization of LLM-based Applications Xin Tan et.al. 2407.00326 null
2024-06-25 T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge Jianyu Wei et.al. 2407.00088 link
2024-06-28 InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management Wonbeom Lee et.al. 2406.19707 null
2024-06-24 Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters Euiin Yi et.al. 2406.16758 link
2024-06-28 SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention Qianchao Zhu et.al. 2406.15486 null
2024-06-21 Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models Qi Liu et.al. 2406.14848 link
2024-06-20 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data Johannes Treutlein et.al. 2406.14546 link
2024-06-20 LiveMind: Low-latency Large Language Models with Simultaneous Inference Chuangtao Chen et.al. 2406.14319 link
2024-06-19 SDQ: Sparse Decomposed Quantization for LLM Inference Geonhwa Jeong et.al. 2406.13868 null
2024-06-19 Amphista: Accelerate LLM Inference with Bi-directional Multiple Drafting Heads in a Non-autoregressive Style Zeping Li et.al. 2406.13170 null
2024-06-16 Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization Jungi Lee et.al. 2406.12930 null
2024-06-18 LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization Masafumi Enomoto et.al. 2406.12494 null
2024-06-18 LLMs Are Prone to Fallacies in Causal Inference Nitish Joshi et.al. 2406.12158 null
2024-06-14 Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning Hui Liu et.al. 2406.11890 null
2024-06-17 Endor: Hardware-Friendly Sparse Format for Offloaded LLM Inference Donghyeon Joo et.al. 2406.11674 null
2024-06-17 QTIP: Quantization with Trellises and Incoherence Processing Albert Tseng et.al. 2406.11235 link
2024-06-16 New Solutions on LLM Acceleration, Optimization, and Application Yingbing Huang et.al. 2406.10903 null
2024-06-16 Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference Jiaming Tang et.al. 2406.10774 link
2024-06-15 Large Language Models as Surrogate Models in Evolutionary Algorithms: A Preliminary Study Hao Hao et.al. 2406.10675 link
2024-06-08 QCQA: Quality and Capacity-aware grouped Query Attention Vinay Joshi et.al. 2406.10247 null
2024-06-12 Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference Christopher Wolters et.al. 2406.08413 null
2024-06-12 PowerInfer-2: Fast Large Language Model Inference on a Smartphone Zhenliang Xue et.al. 2406.06282 null
2024-06-09 A Superalignment Framework in Autonomous Driving with Large Language Models Xiangrui Kong et.al. 2406.05651 null
2024-06-06 Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism Jiahao Liu et.al. 2406.03853 null
2024-06-04 Language Models can Infer Action Semantics for Classical Planners from Environment Feedback Wang Zhu et.al. 2406.02791 null
2024-06-08 Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach Yuxuan Chen et.al. 2406.02616 null
2024-06-04 SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices Ruslan Svirschevski et.al. 2406.02532 link
2024-06-03 Demystifying Platform Requirements for Diverse LLM Inference Use Cases Abhimanyu Bambhaniya et.al. 2406.01698 link
2024-06-03 PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration Ziqian Zeng et.al. 2406.01394 null
2024-06-01 A Practice-Friendly Two-Stage LLM-Enhanced Paradigm in Sequential Recommendation Dugang Liu et.al. 2406.00333 null
2024-05-31 No Free Lunch Theorem for Privacy-Preserving LLM Inference Xiaojin Zhang et.al. 2405.20681 null
2024-05-30 Decentralized AI: Permissionless LLM Inference on POKT Network Daniel Olshansky et.al. 2405.20450 null
2024-06-01 S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs Wei Zhong et.al. 2405.20314 null
2024-05-30 Deciphering Human Mobility: Inferring Semantics of Trajectories with Large Language Models Yuxiao Luo et.al. 2405.19850 null
2024-05-29 MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models Taehyun Kim et.al. 2405.18832 null
2024-05-29 PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN Fei Zheng et.al. 2405.18744 null
2024-06-02 Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference Hao Mark Chen et.al. 2405.18628 link
2024-05-25 FastQuery: Communication-efficient Embedding Table Query for Private LLM Inference Chenqi Lin et.al. 2405.16241 null
2024-05-23 EdgeShard: Efficient LLM Inference via Collaborative Edge Computing Mingjin Zhang et.al. 2405.14371 null
2024-05-23 MiniCache: KV Cache Compression in Depth Dimension for Large Language Models Akide Liu et.al. 2405.14366 null
2024-05-21 PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference Dongjie Yang et.al. 2405.12532 null
2024-05-12 Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization Xinyuan Zhang et.al. 2405.07140 null
2024-05-11 Aladdin: Joint Placement and Scaling for SLO-Aware LLM Serving Chengyi Nie et.al. 2405.06856 null
2024-05-21 Vidur: A Large-Scale Simulation Framework For LLM Inference Amey Agrawal et.al. 2405.05465 link
2024-05-13 KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation Minsik Cho et.al. 2405.05329 null
2024-05-12 DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature Dawei Li et.al. 2405.04819 link
2024-05-10 QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving Yujun Lin et.al. 2405.04532 link
2024-05-07 vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention Ramya Prabhu et.al. 2405.04437 null
2024-05-07 Optimizing Language Model's Reasoning Abilities with Weak Supervision Yongqi Tong et.al. 2405.04086 null
2024-05-06 AlphaMath Almost Zero: process Supervision without process Guoxin Chen et.al. 2405.03553 link
2024-05-03 Efficient and Economic Large Language Model Inference with Attention Offloading Shaoyuan Chen et.al. 2405.01814 null

(back to top)

MoE

Publish Date Title Authors PDF Code
2025-01-22 Autonomy-of-Experts Models Ang Lv et.al. 2501.13074 null
2025-01-22 LLM4WM: Adapting LLM for Wireless Multi-Tasking Xuanyu Liu et.al. 2501.12983 null
2025-01-22 UniUIR: Considering Underwater Image Restoration as An All-in-One Learner Xu Zhang et.al. 2501.12981 null
2025-01-22 BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR Guodong Ma et.al. 2501.12602 null
2025-01-21 Modality Interactive Mixture-of-Experts for Fake News Detection Yifan Liu et.al. 2501.12431 null
2025-01-21 SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection Xiaocheng Zhang et.al. 2501.12430 null
2025-01-21 Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models Samira Abnar et.al. 2501.12370 null
2025-01-21 MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks Qishen Zhou et.al. 2501.12281 link
2025-01-21 Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Zihan Qiu et.al. 2501.11873 null
2025-01-18 FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models Xinglin Pan et.al. 2501.10714 null
2025-01-17 OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning Jinyuan Feng et.al. 2501.10062 null
2025-01-17 LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading Kuan-Ming Liu et.al. 2501.09636 null
2025-01-14 MiniMax-01: Scaling Foundation Models with Lightning Attention MiniMax et.al. 2501.08313 null
2025-01-14 GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism Chen Tang et.al. 2501.07890 null
2025-01-18 PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration Xiaoshui Huang et.al. 2501.07762 null
2025-01-13 A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis Binyu Zhang et.al. 2501.07016 link
2025-01-12 Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning Hanwen Zhong et.al. 2501.06884 link
2025-01-10 TAMER: A Test-Time Adaptive MoE-Driven Framework for EHR Representation Learning Yinghao Zhu et.al. 2501.05661 link
2025-01-09 Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing Mengfan Liu et.al. 2501.05313 null
2025-01-07 LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes Xiang Xu et.al. 2501.04004 link
2025-01-07 mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training Xudong Liao et.al. 2501.03905 null
2025-01-08 Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection Donatella Genovese et.al. 2501.03432 null
2025-01-12 Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning Zhongyi Zhou et.al. 2501.02198 null
2025-01-03 MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders Jiajun Cao et.al. 2501.01709 null
2025-01-01 REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization Huyen Nguyen et.al. 2501.00779 null
2025-01-06 Superposition in Transformers: A Novel Way of Building Mixture of Experts Ayoub Ben Chaliah et.al. 2501.00530 link
2024-12-31 CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection Xiaolei Wang et.al. 2501.00346 null
2024-12-29 Multimodal Variational Autoencoder: a Barycentric View Peijie Qiu et.al. 2412.20487 null
2024-12-29 A Comprehensive Framework for Reliable Legal AI: Combining Specialized Expert Systems and Adaptive Refinement Sidra Nasir et.al. 2412.20468 null
2024-12-28 UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity Jingbo Lin et.al. 2412.20157 link
2024-12-28 Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection Yaning Zhang et.al. 2412.20156 null
2024-12-27 DeepSeek-V3 Technical Report DeepSeek-AI et.al. 2412.19437 link
2024-12-26 AskChart: Universal Chart Understanding through Textual Enhancement Xudong Yang et.al. 2412.19146 link
2024-12-30 Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection Xiaoyu Huang et.al. 2412.19108 null
2024-12-24 Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making David Shoresh et.al. 2412.18593 link
2024-12-24 BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing Yingjie Ma et.al. 2412.18065 link
2024-12-23 UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition Li Fu et.al. 2412.17507 null
2024-12-23 BrainMAP: Learning Multiple Activation Pathways in Brain Networks Song Wang et.al. 2412.17404 null
2024-12-22 Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models Elie Antoine et.al. 2412.16971 null
2024-12-20 Theory of Mixture-of-Experts for Mobile Edge Computing Hongbo Li et.al. 2412.15690 null
2024-12-19 MoEtion: Efficient and Reliable Checkpointing for Mixture-of-Experts Models at Scale Swapnil Gandhi et.al. 2412.15411 null
2024-12-19 Qwen2.5 Technical Report Qwen et.al. 2412.15115 link
2024-12-19 ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing Ziteng Wang et.al. 2412.14711 link
2024-12-18 A Survey on Inference Optimization Techniques for Mixture of Experts Models Jiacheng Liu et.al. 2412.14219 link
2024-12-18 SEKE: Specialised Experts for Keyword Extraction Matej Martinc et.al. 2412.14087 link
2024-12-18 MedCoT: Medical Chain of Thought via Hierarchical Expert Jiaxiang Liu et.al. 2412.13736 link
2024-12-17 SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks Mátyás Vincze et.al. 2412.13053 null
2024-12-17 Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning Moritz Reuss et.al. 2412.12953 null
2024-12-17 CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition He Wang et.al. 2412.12760 null
2024-12-16 Investigating Mixture of Experts in Dense Retrieval Effrosyni Sokli et.al. 2412.11864 null
2024-12-18 Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture Jingze Shi et.al. 2412.11834 link
2024-12-16 Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation Svetlana Pavlitska et.al. 2412.11608 null
2024-12-16 Enhancing Healthcare Recommendation Systems with a Multimodal LLMs-based MOE Architecture Jingyu Xu et.al. 2412.11557 null
2024-12-14 DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification Yuhao Wang et.al. 2412.10650 link
2024-12-13 DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding Zhiyu Wu et.al. 2412.10302 link
2024-12-13 Llama 3 Meets MoE: Efficient Upcycling Aditya Vavre et.al. 2412.09952 link
2024-12-12 Memory Layers at Scale Vincent-Pierre Berges et.al. 2412.09764 link
2024-12-12 Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine Xiaoshuang Huang et.al. 2412.09278 link
2024-12-12 Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective Minh Le et.al. 2412.08285 null
2024-12-11 Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node Classification Xuanze Chen et.al. 2412.08193 null
2024-12-10 MoE-CAP: Cost-Accuracy-Performance Benchmarking for Mixture-of-Experts Systems Yao Fu et.al. 2412.07067 null
2024-12-07 Partition of Unity Physics-Informed Neural Networks (POU-PINNs): An Unsupervised Framework for Physics-Informed Domain Decomposition and Mixtures of Experts Arturo Rodriguez et.al. 2412.06842 null
2024-12-09 Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset Xiao Wang et.al. 2412.06647 link
2024-12-09 UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts Zhen Wan et.al. 2412.06340 null
2024-12-08 Hallucination-aware Optimization for Large Language Model-empowered Communications Yinqiu Liu et.al. 2412.06007 link
2024-12-10 An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism Qing Zhang et.al. 2412.05821 null
2024-12-10 RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts Xu Liu et.al. 2412.05679 link
2024-12-07 SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts Gengze Zhou et.al. 2412.05552 link
2024-12-07 Towards 3D Acceleration for low-power Mixture-of-Experts and Multi-Head Attention Spiking Transformers Boxun Xu et.al. 2412.05540 null
2024-12-06 Steps are all you need: Rethinking STEM Education with Prompt Engineering Krishnasai Addala et.al. 2412.05023 null
2024-12-09 Monet: Mixture of Monosemantic Experts for Transformers Jungwoo Park et.al. 2412.04139 link
2024-12-05 Meta-Reinforcement Learning With Mixture of Experts for Generalizable Multi Access in Heterogeneous Wireless Networks Zhaoyang Liu et.al. 2412.03850 null
2024-12-04 Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond Loukas Ilias et.al. 2412.03483 null
2024-12-05 MQFL-FHE: Multimodal Quantum Federated Learning Framework with Fully Homomorphic Encryption Siddhant Dutta et.al. 2412.01858 null
2024-12-05 Yi-Lightning Technical Report 01. AI et.al. 2412.01253 null
2024-11-30 Mixture of Experts for Node Classification Yu Shi et.al. 2412.00418 null
2024-11-30 HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting Shaohan Yu et.al. 2412.00316 null
2024-11-27 Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference Andrii Skliar et.al. 2412.00099 null
2024-11-29 LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References Shuguo Jiang et.al. 2411.19758 null
2024-11-28 On the effectiveness of discrete representations in sparse mixture of experts Giang Do et.al. 2411.19402 null
2024-11-28 Bayesian Cluster Weighted Gaussian Models Panagiotis Papastamoulis et.al. 2411.18957 link
2024-11-27 UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS Haomin Zhuang et.al. 2411.18797 null
2024-11-27 Complexity Experts are Task-Discriminative Learners for Any Image Restoration Eduard Zamfir et.al. 2411.18466 null
2024-11-27 Mixture of Experts in Image Classification: What's the Sweet Spot? Mathurin Videau et.al. 2411.18322 null
2024-11-26 $H^3$ Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs Selim Furkan Tekin et.al. 2411.17792 link
2024-11-25 Staleness-Centric Optimizations for Efficient Diffusion MoE Inference Jiajun Luo et.al. 2411.16786 null
2024-11-29 MH-MoE: Multi-Head Mixture-of-Experts Shaohan Huang et.al. 2411.16205 null
2024-11-25 LDACP: Long-Delayed Ad Conversions Prediction Model for Bidding Strategy Peng Cui et.al. 2411.16095 null
2024-11-24 Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution Haiquan Wang et.al. 2411.15871 null
2024-11-24 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training Xiaoye Qu et.al. 2411.15708 link
2024-11-23 Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts Qizhou Chen et.al. 2411.15432 null
2024-11-23 Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation Fahao Chen et.al. 2411.15419 null
2024-11-20 MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification Yuxuan Chen et.al. 2411.13004 null
2024-11-23 KAAE: Numerical Reasoning for Knowledge Graphs via Knowledge-aware Attributes Learning Ming Yin et.al. 2411.12950 null
2024-11-19 Ultra-Sparse Memory Network Zihao Huang et.al. 2411.12364 null
2024-11-18 MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs Shiyi Cao et.al. 2411.11217 null
2024-11-16 Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts Jinqiang Long et.al. 2411.10669 link
2024-11-15 Weakly-Supervised Multimodal Learning on MIMIC-CXR Andrea Agostini et.al. 2411.10356 link
2024-11-21 Pro-Prophet: A Systematic Load Balancing Method for Efficient Parallel Training of Large-scale MoE Models Wei Wang et.al. 2411.10003 null
2024-11-13 Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection Vima Gupta et.al. 2411.08982 null
2024-11-13 Sparse Upcycling: Inference Inefficient Finetuning Sasha Doubov et.al. 2411.08968 null
2024-11-13 LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing Xiaonan Nie et.al. 2411.08446 null
2024-11-12 Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach Renzi Wang et.al. 2411.08232 null
2024-11-12 PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model Yilun Liu et.al. 2411.08212 null
2024-11-12 Towards Vision Mixture of Experts for Wildlife Monitoring on the Edge Emmanuel Azuh Mensah et.al. 2411.07834 null
2024-11-11 Adaptive Conditional Expert Selection Network for Multi-domain Recommendation Kuiyao Dong et.al. 2411.06826 null
2024-11-11 WDMoE: Wireless Distributed Mixture of Experts for Large Language Models Nan Xue et.al. 2411.06681 null
2024-11-09 Learning Mixtures of Experts with EM Quentin Fruytier et.al. 2411.06056 null
2024-11-08 NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts Yen-Ting Lin et.al. 2411.05945 null
2024-11-05 DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of Experts Zelin Yao et.al. 2411.03025 link
2024-11-05 Advancing Robust Underwater Acoustic Target Recognition through Multi-task Learning and Multi-Gate Mixture-of-Experts Yuan Xie et.al. 2411.02787 null
2024-11-06 Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent Xingwu Sun et.al. 2411.02265 null
2024-11-04 FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation Ziwei Zhan et.al. 2411.02115 null
2024-11-03 RS-MoE: Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering Hui Lin et.al. 2411.01595 null
2024-11-03 Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation Mingrui Liu et.al. 2411.01457 null
2024-11-06 HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference Peng Tang et.al. 2411.01433 null
2024-11-07 HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy Shuqing Luo et.al. 2411.01288 link
2024-11-02 PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment Dongxu Liu et.al. 2411.01245 null
2024-11-01 MoE-I $^2$ : Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition Cheng Yang et.al. 2411.01016 null
2024-11-01 LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models Nam V. Nguyen et.al. 2411.00918 link
2024-11-01 MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization Jingming Guo et.al. 2411.00662 link
2024-10-31 Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts Xiang Deng et.al. 2410.23836 null
2024-10-30 Efficient and Interpretable Grammatical Error Correction with Mixture of Experts Muhammad Reza Qorib et.al. 2410.23507 link
2024-10-30 Stealing User Prompts from Mixture of Experts Itay Yona et.al. 2410.22884 null
2024-10-30 MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning Xujia Wang et.al. 2410.22782 null
2024-10-29 ProMoE: Fast MoE-based LLM Serving using Proactive Caching Xiaoniu Song et.al. 2410.22134 null
2024-10-29 Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging Li Shen et.al. 2410.21804 null
2024-10-29 Neural Experts: Mixture of Experts for Implicit Neural Representations Yizhak Ben-Shabat et.al. 2410.21643 null
2024-10-28 FinTeamExperts: Role Specialized MOEs For Financial Analysis Yue Yu et.al. 2410.21338 null
2024-10-28 Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving Jiyao Wang et.al. 2410.21086 null
2024-10-27 Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation Maohao Shen et.al. 2410.20336 null
2024-10-27 GUMBEL-NERF: Representing Unseen Objects as Part-Compositional Neural Radiance Fields Yusuke Sekikawa et.al. 2410.20306 null
2024-10-25 DMT-HI: MOE-based Hyperbolic Interpretable Deep Manifold Transformation for Unspervised Dimensionality Reduction Zelin Zang et.al. 2410.19504 link
2024-10-25 Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis Weikai Li et.al. 2410.19225 link
2024-10-24 Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design Ruisi Cai et.al. 2410.19123 link
2024-10-24 Mixture of Parrots: Experts improve memorization more than reasoning Samy Jelassi et.al. 2410.19034 null
2024-10-24 MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases Zhisheng Lin et.al. 2410.18406 null
2024-10-23 Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches Kexin Feng et.al. 2410.18298 null
2024-10-23 MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning Jingfan Zhang et.al. 2410.18035 null
2024-10-23 ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference Xin He et.al. 2410.17954 null
2024-10-23 Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition Artem Basharin et.al. 2410.17765 null
2024-10-22 Optimizing Mixture-of-Experts Inference Time Combining Model Deployment and Communication Scheduling Jialong Li et.al. 2410.17043 null
2024-10-21 LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset Ruikun Zhang et.al. 2410.16095 link
2024-10-22 CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts Zhenpeng Su et.al. 2410.16077 link
2024-10-21 Generalizing Motion Planners with Mixture of Experts for Autonomous Driving Qiao Sun et.al. 2410.15774 link
2024-10-21 ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts Xumeng Han et.al. 2410.15732 null
2024-10-20 Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs Xin Zhou et.al. 2410.15438 null
2024-10-20 LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration Yuang Ai et.al. 2410.15385 link
2024-10-19 MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning Suning Huang et.al. 2410.14972 null
2024-10-18 MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts Rachel S. Y. Teo et.al. 2410.14574 link
2024-10-18 ST-MoE-BERT: A Spatial-Temporal Mixture-of-Experts Framework for Long-Term Cross-City Mobility Prediction Haoyu He et.al. 2410.14099 link
2024-10-17 Enhancing Generalization in Sparse Mixture of Experts Models: The Case for Increased Expert Activation in Compositional Tasks Jinze Zhao et.al. 2410.13964 null
2024-10-16 On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs Herun Wan et.al. 2410.12600 null
2024-10-16 Understanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of Experts Fanqi Yan et.al. 2410.12258 null
2024-10-16 EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference Yulei Qian et.al. 2410.12247 null
2024-10-15 MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router Yanyue Xie et.al. 2410.12013 null
2024-10-15 MoH: Multi-Head Attention as Mixture-of-Head Attention Peng Jin et.al. 2410.11842 link
2024-10-15 GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation Fei Tang et.al. 2410.11841 link
2024-10-15 Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models James Vo et.al. 2410.11654 null
2024-10-16 Quadratic Gating Functions in Mixture of Experts: A Statistical Insight Pedram Akbarian et.al. 2410.11222 null
2024-10-16 Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Ziyue Li et.al. 2410.10814 link
2024-10-14 Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts Guorui Zheng et.al. 2410.10626 link
2024-10-14 Learning to Ground VLMs without Forgetting Aritra Bhowmik et.al. 2410.10491 null
2024-10-14 Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts Xu Liu et.al. 2410.10469 null
2024-10-15 Ada-K Routing: Boosting the Efficiency of MoE-based LLMs Tongtian Yue et.al. 2410.10456 null
2024-10-14 Tighter Risk Bounds for Mixtures of Experts Wissam Akretche et.al. 2410.10397 null
2024-10-14 Scalable Multi-Domain Adaptation of Language Models using Modular Experts Peter Schafhalter et.al. 2410.10181 null
2024-10-14 Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models Jun Luo et.al. 2410.10114 null
2024-10-14 AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality Peijun Qing et.al. 2410.10054 link
2024-10-13 ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL Zhanqiu Guo et.al. 2410.09781 null
2024-10-11 Semi-Supervised Learning of Noisy Mixture of Experts Models Oh-Ran Kwon et.al. 2410.09039 null
2024-10-11 Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering I-Chun Chen et.al. 2410.08589 link
2024-10-10 Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts Sukwon Yun et.al. 2410.08245 link
2024-10-10 Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training Gen Luo et.al. 2410.08202 null
2024-10-10 Efficient Dictionary Learning with Switch Sparse Autoencoders Anish Mudide et.al. 2410.08201 link
2024-10-10 More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing Sagi Shaier et.al. 2410.08003 null
2024-10-10 SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture Jiayi Han et.al. 2410.07739 null
2024-10-10 Upcycling Large Language Models into Mixture of Experts Ethan He et.al. 2410.07524 null
2024-10-09 MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts Peng Jin et.al. 2410.07348 link
2024-10-09 Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders David Noever et.al. 2410.06462 null
2024-10-09 Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs Ruijia Niu et.al. 2410.06431 null
2024-10-08 Probing the Robustness of Theory of Mind in Large Language Models Christian Nickel et.al. 2410.06271 null
2024-10-08 MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More Wei Huang et.al. 2410.06270 link
2024-10-08 Aria: An Open Multimodal Native Mixture-of-Experts Model Dongxu Li et.al. 2410.05993 link
2024-10-08 Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models Siqi Wang et.al. 2410.05661 null
2024-10-07 Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild Xinyu Zhao et.al. 2410.05357 link
2024-10-07 Multimodal Fusion Strategies for Mapping Biophysical Landscape Features Lucia Gordon et.al. 2410.04833 link
2024-10-06 Realizing Video Summarization from the Path of Language-based Semantic Understanding Kuan-Chen Mu et.al. 2410.04511 null
2024-10-09 Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding Wei Wu et.al. 2410.03553 null
2024-10-04 Exploring the Benefit of Activation Sparsity in Pre-training Zhengyan Zhang et.al. 2410.03440 link
2024-10-03 MLP-KAN: Unifying Deep Representation and Function Learning Yunhong He et.al. 2410.03027 link
2024-10-03 On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions Huy Nguyen et.al. 2410.02935 null
2024-10-03 Neutral residues: revisiting adapters for model extension Franck Signe Talla et.al. 2410.02744 null
2024-10-03 Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping Ziye Huang et.al. 2410.02475 null
2024-10-03 MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction Zhaojian Yu et.al. 2410.02241 null
2024-10-03 Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts Minh Le et.al. 2410.02200 null
2024-10-04 Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices Andres Potapczynski et.al. 2410.02117 link
2024-10-04 EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing Haotian Sun et.al. 2410.02098 null
2024-10-02 Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL Ghada Sokar et.al. 2410.01930 null
2024-10-02 Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models Shayekh Bin Islam et.al. 2410.01782 link
2024-10-02 Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging Tingfeng Hui et.al. 2410.01610 null
2024-10-02 The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs Hong Li et.al. 2410.01417 null
2024-10-01 MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards Sheng Wang et.al. 2410.00938 null
2024-10-01 UniAdapt: A Universal Adapter for Knowledge Calibration Tai D. Nguyen et.al. 2410.00454 null
2024-10-01 Robust Traffic Forecasting against Spatial Shift over Years Hongjun Wang et.al. 2410.00373 link
2024-09-29 IDEA: An Inverse Domain Expert Adaptation Based Active DNN IP Protection Method Chaohui Xu et.al. 2410.00059 null
2024-09-30 MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Haotian Zhang et.al. 2409.20566 null
2024-10-02 CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling Jihai Zhang et.al. 2409.19291 link
2024-09-27 SciDFM: A Large Language Model with Mixture-of-Experts for Science Liangtai Sun et.al. 2409.18412 null
2024-09-26 Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE Xun Zhu et.al. 2409.17508 link
2024-09-26 A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow Prediction Guangyu Wang et.al. 2409.17440 link
2024-09-24 Leveraging Mixture of Experts for Improved Speech Deepfake Detection Viola Negroni et.al. 2409.16077 null
2024-10-02 Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts Xiaoming Shi et.al. 2409.16040 link
2024-09-24 Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM Fengrun Zhang et.al. 2409.15905 null
2024-09-24 Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G Networks Jiayi He et.al. 2409.15695 null
2024-09-23 A Gated Residual Kolmogorov-Arnold Networks for Mixtures of Experts Hugo Inzirillo et.al. 2409.15161 link
2024-09-23 Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond Hong Chen et.al. 2409.14993 null
2024-09-21 Routing in Sparsely-gated Language Models responds to Context Stefan Arnold et.al. 2409.14107 null
2024-09-20 On-device Collaborative Language Modeling via a Mixture of Generalists and Specialists Dongyang Fan et.al. 2409.13931 link
2024-09-20 Multi-omics data integration for early diagnosis of hepatocellular carcinoma (HCC) using machine learning Annette Spooner et.al. 2409.13791 null
2024-09-19 Robust Audiovisual Speech Recognition Models with Mixture-of-Experts Yihan Wu et.al. 2409.12370 null
2024-09-18 GRIN: GRadient-INformed MoE Liyuan Liu et.al. 2409.12136 null
2024-09-18 Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0 Zhiyong Wang et.al. 2409.11909 null
2024-09-17 LPT++: Efficient Training on Mixture of Long-tailed Experts Bowen Dong et.al. 2409.11323 null
2024-09-19 LOLA -- An Open-Source Massively Multilingual Large Language Model Nikit Srivastava et.al. 2409.11272 link
2024-09-16 Adaptive Segmentation-Based Initialization for Steered Mixture of Experts Image Regression Yi-Hsin Li et.al. 2409.10101 null
2024-09-14 MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving Enming Zhang et.al. 2409.07267 link
2024-09-10 DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models Maryam Akhavan Aghdam et.al. 2409.06669 null
2024-09-10 STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning Jaeseong Lee et.al. 2409.06211 null
2024-09-10 VE: Modeling Multivariate Time Series Correlation with Variate Embedding Shangjiong Wang et.al. 2409.06169 link
2024-09-09 Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models Hongyang Lei et.al. 2409.05929 null
2024-09-09 Optical Spiking Neurons Enable High-Speed and Energy-Efficient Optical Neural Networks Bo Xu et.al. 2409.05726 null
2024-09-09 Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection Tianwu Lei et.al. 2409.05611 null
2024-09-05 Interpretable mixture of experts for time series prediction under recurrent and non-recurrent conditions Zemian Ke et.al. 2409.03282 null
2024-09-05 ChartMoE: Mixture of Expert Connector for Advanced Chart Understanding Zhengzhuo Xu et.al. 2409.03277 null
2024-09-05 xLAM: A Family of Large Action Models to Empower AI Agent Systems Jianguo Zhang et.al. 2409.03215 link
2024-09-04 Configurable Foundation Models: Building LLMs from a Modular Perspective Chaojun Xiao et.al. 2409.02877 null
2024-09-04 Pluralistic Salient Object Detection Xuelu Feng et.al. 2409.02368 null
2024-09-03 OLMoE: Open Mixture-of-Experts Language Models Niklas Muennighoff et.al. 2409.02060 link
2024-09-05 Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model Hukai Huang et.al. 2409.02050 null
2024-09-02 Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning Soumajyoti Sarkar et.al. 2409.01483 null
2024-09-02 Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching Sungmin Yun et.al. 2409.01141 null
2024-09-04 Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack Guanzhong Chen et.al. 2409.00960 link
2024-09-02 Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts Youngseog Chung et.al. 2409.00879 null
2024-08-29 Gradient-free variational learning with conditional mixture networks Conor Heins et.al. 2408.16429 link
2024-08-28 Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models Yuncheng Yang et.al. 2408.15915 link
2024-08-28 Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts Nikolas Gritsch et.al. 2408.15901 null
2024-08-28 LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation Fangxun Shu et.al. 2408.15881 link
2024-08-28 Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts Lean Wang et.al. 2408.15664 null
2024-08-27 Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis Sakhinana Sagar Srinivas et.al. 2408.15305 null
2024-08-27 MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce Hao Jiang et.al. 2408.14968 null
2024-08-24 Advancing Enterprise Spatio-Temporal Forecasting Applications: Data Mining Meets Instruction Tuning of Language Models For Multi-modal Time Series Analysis in Low-Resource Settings Sagar Srinivas Sakhinana et.al. 2408.13622 null
2024-08-23 The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities Venkatesh Balavadhani Parthasarathy et.al. 2408.13296 null
2024-08-23 Guiding IoT-Based Healthcare Alert Systems with Large Language Models Yulan Gao et.al. 2408.13071 null
2024-08-23 DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation Xiaowei Mao et.al. 2408.12809 null
2024-08-23 Multi-Treatment Multi-Task Uplift Modeling for Enhancing User Growth Yuxiang Wei et.al. 2408.12803 null
2024-08-23 La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection Hang Zou et.al. 2408.12793 null
2024-08-22 SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging Mohammadreza Pourreza et.al. 2408.12733 null
2024-08-22 Jamba-1.5: Hybrid Transformer-Mamba Models at Scale Jamba Team et.al. 2408.12570 null
2024-08-22 Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators Dingkang Yang et.al. 2408.12325 link
2024-08-21 MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing Hao Zhou et.al. 2408.11396 link
2024-08-21 KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting? Xiao Han et.al. 2408.11306 link
2024-08-21 FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts Hanzi Mei et.al. 2408.11304 null
2024-08-20 Unboxing Occupational Bias: Grounded Debiasing LLMs with U.S. Labor Data Atmika Gorti et.al. 2408.11247 null
2024-08-20 Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting Jianxiang Zhou et.al. 2408.10822 link
2024-08-20 AnyGraph: Graph Foundation Model in the Wild Lianghao Xia et.al. 2408.10700 link
2024-08-20 HMoE: Heterogeneous Mixture of Experts for Language Modeling An Wang et.al. 2408.10681 null
2024-08-19 AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference Shuzhang Zhong et.al. 2408.10284 link
2024-08-17 FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models Xiaochen Wang et.al. 2408.10276 link
2024-08-19 Customizing Language Models with Instance-wise LoRA for Sequential Recommendation Xiaoyu Kong et.al. 2408.10159 link
2024-08-19 A Unified Framework for Iris Anti-Spoofing: Introducing IrisGeneral Dataset and Masked-MoE Method Hang Zou et.al. 2408.09752 null
2024-08-16 Integrating Multi-view Analysis: Multi-view Mixture-of-Expert for Textual Personality Detection Haohao Zhu et.al. 2408.08551 null
2024-08-17 BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts Qizhen Zhang et.al. 2408.08274 null
2024-08-14 Beyond Inter-Item Relations: Dynamic Adaptive Mixture-of-Experts for LLM-Based Sequential Recommendation CanYi Liu et.al. 2408.07427 null
2024-08-13 A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning Prateek Yadav et.al. 2408.07057 null
2024-08-13 Layerwise Recurrent Router for Mixture-of-Experts Zihan Qiu et.al. 2408.06793 link
2024-08-13 AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies Bo-Wen Zhang et.al. 2408.06567 null
2024-08-10 HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou Xu Wang et.al. 2408.05430 null
2024-08-08 Understanding the Performance and Estimating the Cost of LLM Fine-Tuning Yuchen Xia et.al. 2408.04693 link
2024-08-08 Partial Experts Checkpoint: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training Weilin Cai et.al. 2408.04307 null
2024-08-08 LaDiMo: Layer-wise Distillation Inspired MoEfier Sungyoon Kim et.al. 2408.04278 null
2024-08-07 MoExtend: Tuning New Experts for Modality and Task Extension Shanshan Zhong et.al. 2408.03511 link
2024-08-05 Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization Changtao Miao et.al. 2408.02306 null
2024-08-02 HMDN: Hierarchical Multi-Distribution Network for Click-Through Rate Prediction Xingyu Lou et.al. 2408.01332 null
2024-08-01 Multimodal Fusion and Coherence Modeling for Video Topic Segmentation Hai Yu et.al. 2408.00365 null
2024-08-12 MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts Xi Victoria Lin et.al. 2407.21770 null
2024-07-31 PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning Min Jae Jung et.al. 2407.21571 null
2024-07-30 Distribution Learning for Molecular Regression Nima Shoghi et.al. 2407.20475 null
2024-07-29 Time series forecasting with high stakes: A field study of the air cargo industry Abhinav Garg et.al. 2407.20192 null
2024-07-30 Mixture of Nested Experts: Adaptive Processing of Visual Tokens Gagan Jain et.al. 2407.19985 null
2024-07-28 Mixture of Modular Experts: Distilling Knowledge from a Multilingual Teacher into Specialized Modular Language Models Mohammed Al-Maamari et.al. 2407.19610 link
2024-07-26 Wolf: Captioning Everything with a World Summarization Framework Boyi Li et.al. 2407.18908 null
2024-07-26 MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition Chang Liu et.al. 2407.18616 link
2024-07-26 Dynamic Language Group-Based MoE: Enhancing Efficiency and Flexibility for Code-Switching Speech Recognition Hukai Huang et.al. 2407.18581 link
2024-07-25 How Lightweight Can A Vision Transformer Be Jen Hong Tan et.al. 2407.17783 null
2024-07-24 Exploring Domain Robust Lightweight Reward Models based on Router Mechanism Hyuk Namgoong et.al. 2407.17546 null
2024-07-24 M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis Junyu Li et.al. 2407.17267 link
2024-07-25 Cheems: Wonderful Matrices More Efficient and More Effective Architecture Jingze Shi et.al. 2407.16958 null
2024-07-22 Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget Vikash Sehwag et.al. 2407.15811 link
2024-07-22 Norface: Improving Facial Expression Analysis by Identity Normalization Hanwei Liu et.al. 2407.15617 link
2024-07-19 Mixture of Experts with Mixture of Precisions for Tuning Quality of Service HamidReza Imani et.al. 2407.14417 null
2024-07-19 EVLM: An Efficient Vision-Language Model for Visual Understanding Kaibing Chen et.al. 2407.14177 null
2024-07-19 Routing Experts: Learning to Route Dynamic Experts in Multi-modal Large Language Models Qiong Wu et.al. 2407.14093 null
2024-07-18 Discussion: Effective and Interpretable Outcome Prediction by Training Sparse Mixtures of Linear Experts Francesco Folino et.al. 2407.13526 null
2024-07-18 Mixture of Experts based Multi-task Supervise Learning from Crowds Tao Han et.al. 2407.13268 null
2024-07-15 MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration Yulin Ren et.al. 2407.10833 null
2024-07-18 Qwen2 Technical Report An Yang et.al. 2407.10671 link
2024-07-15 Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering Francesco Di Sario et.al. 2407.10389 null
2024-07-13 Low-Rank Interconnected Adaptation Across Layers Yibo Zhong et.al. 2407.09946 link
2024-07-13 MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts Zhenpeng Su et.al. 2407.09816 link
2024-07-12 Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts Zeliang Zhang et.al. 2407.09590 null
2024-07-11 An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio Siding Zeng et.al. 2407.08239 null
2024-07-10 MoVEInt: Mixture of Variational Experts for Learning Human-Robot Interactions from Demonstrations Vignesh Prasad et.al. 2407.07636 link
2024-07-10 Swin SMT: Global Sequential Modeling in 3D Medical Image Segmentation Szymon Płotka et.al. 2407.07514 link
2024-07-09 A Simple Architecture for Enterprise Large Language Model Applications based on Role based security and Clearance Levels using Retrieval-Augmented Generation or Mixture of Experts Atilla Özgür et.al. 2407.06718 null
2024-07-06 SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation Guoan Wang et.al. 2407.04938 null
2024-07-06 Completed Feature Disentanglement Learning for Multimodal MRIs Analysis Tianling Liu et.al. 2407.04916 null
2024-07-05 YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation Sungkyun Chang et.al. 2407.04822 link
2024-07-05 Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement Yongji Wu et.al. 2407.04656 null
2024-07-05 MobileFlow: A Multimodal LLM For Mobile GUI Agent Songqin Nong et.al. 2407.04346 null
2024-07-04 Mixture of A Million Experts Xu Owen He et.al. 2407.04153 null
2024-07-02 Terminating Differentiable Tree Experts Jonathan Thomm et.al. 2407.02060 null
2024-07-05 Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models Zihan Wang et.al. 2407.01906 link
2024-07-01 Uncertainty Quantification in Table Structure Recognition Kehinde Ajayi et.al. 2407.01731 link
2024-07-01 Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning Yixiao Wang et.al. 2407.01531 null
2024-07-01 Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation Nadezhda Chirkova et.al. 2407.01126 null
2024-07-01 Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs Enshu Liu et.al. 2407.00945 link
2024-07-03 Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules Xinglin Pan et.al. 2407.00599 link
2024-06-28 One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts Ruochen Wang et.al. 2407.00256 link
2024-06-28 LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models Renzhi Wang et.al. 2406.20030 null
2024-06-28 Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model Longrong Yang et.al. 2406.19905 link
2024-06-28 SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR Qiuming Zhao et.al. 2406.19706 link
2024-06-27 A Teacher Is Worth A Million Instructions Nikhil Kothari et.al. 2406.19112 null
2024-06-27 Towards Personalized Federated Multi-scenario Multi-task Recommendation Yue Ding et.al. 2406.18938 null
2024-06-26 Mixture of Experts in a Mixture of RL settings Timon Willi et.al. 2406.18420 null
2024-06-26 A Closer Look into Mixture-of-Experts in Large Language Models Ka Man Lo et.al. 2406.18219 link
2024-06-26 SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR Shuaishuai Ye et.al. 2406.18021 null
2024-06-24 Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis Construction Bruce Rushing et.al. 2406.17150 link
2024-06-24 LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training Tong Zhu et.al. 2406.16554 link
2024-06-25 OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser Jingze Shi et.al. 2406.16495 link
2024-06-24 Theory on Mixture-of-Experts in Continual Learning Hongbo Li et.al. 2406.16437 null
2024-06-22 SimSMoE: Solving Representational Collapse via Similarity Measure Giang Do et.al. 2406.15883 null
2024-06-20 Voice Disorder Analysis: a Transformer-based Approach Alkis Koudounas et.al. 2406.14693 link
2024-06-19 Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation Qian Chen et.al. 2406.13583 null
2024-06-19 AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models Zihao Zeng et.al. 2406.13233 link
2024-06-18 Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts Haoxiang Wang et.al. 2406.12845 link
2024-06-18 P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts Yuhao Dan et.al. 2406.12548 null
2024-06-18 Variational Distillation of Diffusion Policies into Mixture of Experts Hongyi Zhou et.al. 2406.12538 null
2024-06-18 GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory Haoze Wu et.al. 2406.12375 link
2024-06-17 Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding Ukyo Honda et.al. 2406.12060 link
2024-06-17 DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence DeepSeek-AI et.al. 2406.11931 link
2024-06-17 Graph Knowledge Distillation to Mixture of Experts Pavel Rumiantsev et.al. 2406.11919 link
2024-06-17 $\texttt{MoE-RBench}$ : Towards Building Reliable Language Models with Sparse Mixture-of-Experts Guanjie Chen et.al. 2406.11353 link
2024-06-17 Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts Tong Zhu et.al. 2406.11256 link
2024-06-14 Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion Anke Tang et.al. 2406.09770 link
2024-06-13 DeepUnifiedMom: Unified Time-series Momentum Portfolio Construction via Multi-Task Learning with Multi-Gate Mixture of Experts Joel Ong et.al. 2406.08742 link
2024-06-12 Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark Pingzhi Li et.al. 2406.08155 link
2024-06-11 Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters Yixin Song et.al. 2406.05955 null
2024-06-08 Flexible and Adaptable Summarization via Expertise Separation Xiuying Chen et.al. 2406.05360 link
2024-06-07 MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter Jitai Hao et.al. 2406.04984 link
2024-06-07 MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks Xingkui Zhu et.al. 2406.04801 link
2024-06-05 Style Mixture of Experts for Expressive Text-To-Speech Synthesis Ahad Jawaid et.al. 2406.03637 null
2024-06-05 Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach Haoyu Han et.al. 2406.03464 null
2024-06-05 Continual Traffic Forecasting via Mixture of Experts Sanghyun Lee et.al. 2406.03140 null
2024-06-05 Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models Raeid Saqur et.al. 2406.02969 null
2024-06-04 Parrot: Multilingual Visual Instruction Tuning Hai-Long Sun et.al. 2406.02539 link
2024-06-04 Demystifying the Compression of Mixture-of-Experts Through a Unified Framework Shwai He et.al. 2406.02500 link
2024-06-02 Reservoir History Matching of the Norne field with generative exotic priors and a coupled Mixture of Experts -- Physics Informed Neural Operator Forward Model Clement Etienam et.al. 2406.00889 link
2024-06-01 A Gaussian Process-based Streaming Algorithm for Prediction of Time Series With Regimes and Outliers Daniel Waxman et.al. 2406.00570 link
2024-06-01 Optimizing 6G Integrated Sensing and Communications (ISAC) via Expert Networks Jiacheng Wang et.al. 2406.00408 null
2024-05-30 Low-dimensional approximations of the conditional law of Volterra processes: a non-positive curvature approach Reza Arabpour et.al. 2405.20094 null
2024-06-02 MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors Renzhi Wang et.al. 2405.19086 null
2024-06-02 Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design Markus J. Buehler et.al. 2405.19076 link
2024-05-29 Learning Mixture-of-Experts for General-Purpose Black-Box Discrete Optimization Shengcai Liu et.al. 2405.18884 link
2024-05-29 MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models Taehyun Kim et.al. 2405.18832 null
2024-05-29 Yuan 2.0-M32: Mixture of Experts with Attention Router Shaohua Wu et.al. 2405.17976 link
2024-05-28 LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design Rui Kong et.al. 2405.17741 null
2024-05-27 Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node Andreas Charalampopoulos et.al. 2405.16836 link
2024-05-26 Mixture of Experts Using Tensor Products Zhan Su et.al. 2405.16671 link
2024-05-30 A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts Mohammed Nowaz Rabbani Chowdhury et.al. 2405.16646 null
2024-05-26 Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation Rongyu Zhang et.al. 2405.16486 link
2024-05-25 MoEUT: Mixture-of-Experts Universal Transformers Róbert Csordás et.al. 2405.16039 link
2024-05-23 Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training Xianzhi Du et.al. 2405.15052 link
2024-05-23 Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast Chufan Shi et.al. 2405.14507 link
2024-05-23 Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models Yongxin Guo et.al. 2405.14297 link
2024-05-23 Graph Sparsification via Mixture of Graphs Guibin Zhang et.al. 2405.14260 link
2024-05-23 Statistical Advantages of Perturbing Cosine Router in Sparse Mixture of Experts Huy Nguyen et.al. 2405.14131 null
2024-05-23 Mixture of Experts Meets Prompt-Based Continual Learning Minh Le et.al. 2405.14124 link
2024-05-22 Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts Huy Nguyen et.al. 2405.13997 null
2024-05-22 xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token Xin Cheng et.al. 2405.13792 link
2024-05-24 MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models Jingwei Xu et.al. 2405.13053 link
2024-05-21 Optimizing Generative AI Networking: A Dual Perspective with Multi-Agent Systems and Mixture of Experts Ruichen Zhang et.al. 2405.12472 null
2024-05-21 Ensemble and Mixture-of-Experts DeepONets For Operator Learning Ramansh Sharma et.al. 2405.11907 null
2024-05-19 Learning More Generalized Experts by Merging Experts in Mixture-of-Experts Sejik Park et.al. 2405.11530 null
2024-05-18 Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts Yunxin Li et.al. 2405.11273 link
2024-05-16 Many Hands Make Light Work: Task-Oriented Dialogue System with Module-Based Mixture-of-Experts Ruolin Su et.al. 2405.09744 null
2024-05-15 M $^4$ oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts Yufeng Jiang et.al. 2405.09446 link
2024-05-13 Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition Zhiyong Yang et.al. 2405.07780 link
2024-05-07 SUTRA: Scalable Multilingual Language Model Architecture Abhijit Bendale et.al. 2405.06694 null
2024-05-09 A Mixture of Experts Approach to 3D Human Motion Prediction Edmund Shieh et.al. 2405.06088 link
2024-05-09 A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds Christopher Z. Cui et.al. 2405.06059 null
2024-05-09 EWMoE: An effective model for global weather forecasting with mixture-of-experts Lihao Gan et.al. 2405.06004 link
2024-05-09 CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts Jiachen Li et.al. 2405.05949 link
2024-05-16 DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model DeepSeek-AI et.al. 2405.04434 link
2024-05-07 Enhancing Physical Layer Communication Security through Generative AI with Mixture of Experts Changyuan Zhao et.al. 2405.04198 null
2024-05-06 Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training Zexuan Zhong et.al. 2405.03133 null
2024-05-06 WDMoE: Wireless Distributed Large Language Models with Mixture of Experts Nan Xue et.al. 2405.03131 null

(back to top)

About

🎓Automatically Update LLM inference systems Papers Daily using Github Actions (Update Every 12th hours)

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%