Pulse · sgl-project/sglang

March 1, 2025 – March 8, 2025

150 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

model: Support Janus-pro
#3203 commented on Mar 8, 2025 • 22 new comments
Minicpmo
#3023 commented on Mar 8, 2025 • 4 new comments
Add device detection and count functions to utils.
#3962 commented on Mar 8, 2025 • 4 new comments
[ROCm] add amd ocp fp8 E3M4FUZ constant
#3959 commented on Mar 8, 2025 • 4 new comments
Apply sgl w8a8 fp8 kernel
#3148 commented on Mar 8, 2025 • 3 new comments
[ROCm] Enable per token group quant fp8 in amd
#3702 commented on Mar 8, 2025 • 2 new comments
IPv6 support
#3949 commented on Mar 8, 2025 • 2 new comments
[Bug fixed] fixed the crash when enable the dp-attention on the single card
#3958 commented on Mar 8, 2025 • 2 new comments
Feat/support code completion
#3612 commented on Mar 8, 2025 • 1 new comment
fix: second_per_grid_ts should be used to get mrop position
#3682 commented on Mar 8, 2025 • 1 new comment
fix: Fix deprecated max_tokens param in openai ChatCompletionRequest
#3122 commented on Mar 8, 2025 • 1 new comment
model: Intern vl 2.5
#3351 commented on Mar 8, 2025 • 1 new comment
[Bug Fix] Add partial rotary factor support for Phi-4 and upgrade to transformers v4.49.0
#3984 commented on Mar 8, 2025 • 1 new comment
[Frontend] Support optional auto truncation
#3721 commented on Mar 8, 2025 • 0 new comments
Support building Dockerfile and Dockerfile.rocm using local repo.
#3734 commented on Mar 8, 2025 • 0 new comments
add bench for OpenAI Chat Completions
#3738 commented on Mar 8, 2025 • 0 new comments
feat: log input text for OpenAI format API
#3699 commented on Mar 8, 2025 • 0 new comments
[enh] add '\n' for non-streaming case to keep connection alive
#3689 commented on Mar 8, 2025 • 0 new comments
split ReplicatedLinear used in MLA prefill computing along hidden_states[0] to save duplicated computing on all devices
#3688 commented on Mar 8, 2025 • 0 new comments
add switch to disable open api doc
#3744 commented on Mar 8, 2025 • 0 new comments
Make Marlin incompatible AWQ models work
#3759 commented on Mar 8, 2025 • 0 new comments
[BugFix] Illegal memory access for MoE On H20
#3779 commented on Mar 8, 2025 • 0 new comments
[moe] fix: correct the cache size in the last chunk
#3679 commented on Mar 8, 2025 • 0 new comments
[WIP][Feautre, Hardware] add initial suport for ascend npu
#3782 commented on Mar 8, 2025 • 0 new comments
[Fix]: DeepSeek support cpu_offload_gb option
#3675 commented on Mar 8, 2025 • 0 new comments
[ROCm] Enable MTP (NextN) on AMD GPU
#3670 commented on Mar 8, 2025 • 0 new comments
RPD integration into SGLANG with --profile-req support
#3653 commented on Mar 8, 2025 • 0 new comments
[docker] Distributed Serving with k8s Statefulset ( good example for DeepSeek-R1)
#3631 commented on Mar 8, 2025 • 0 new comments
Update decode kernel benchmark with new Triton backend interface
#3618 commented on Mar 8, 2025 • 0 new comments
solve the mrope position bugs for qwen2-vl
#3605 commented on Mar 8, 2025 • 0 new comments
[DRAFT] change block size&wave number to increase wave inflight
#3565 commented on Mar 8, 2025 • 0 new comments
Add README for benchmarking SGLang using GenAI-Perf.
#3552 commented on Mar 8, 2025 • 0 new comments
chore: use sglang as alias of sglang.launch_server
#3546 commented on Mar 8, 2025 • 0 new comments
Add integration with lws for multi-node inference
#3540 commented on Mar 8, 2025 • 0 new comments
Make decord package optional.
#3528 commented on Mar 8, 2025 • 0 new comments
[optimize][scheduler] implement shortest-queue method
#3526 commented on Mar 8, 2025 • 0 new comments
[ROCm] accelerate rocm image build
#3481 commented on Mar 8, 2025 • 0 new comments
fix adjust_max_prefix_ids bug
#3478 commented on Mar 8, 2025 • 0 new comments
Support MLA in Torch Native Attention Backend
#3475 commented on Mar 8, 2025 • 0 new comments
Check for Already Terminated Process before kill
#3474 commented on Mar 8, 2025 • 0 new comments
Support `n` in OpenAI API completions
#3446 commented on Mar 8, 2025 • 0 new comments
run eagle speculative decodeing error!
#3362 commented on Mar 2, 2025 • 0 new comments
HotFix: json serialization error when using OAI v1/batches endpoint with logprobs
#3896 commented on Mar 8, 2025 • 0 new comments
Support FP4 gemm (1/2)
#3899 commented on Mar 8, 2025 • 0 new comments
Add block_wise INT8 support MTP NextN function
#3911 commented on Mar 8, 2025 • 0 new comments
Docs; Fix link in offline enine
#3919 commented on Mar 8, 2025 • 0 new comments
[Feature] Support for qwen2vl radix cache
#3924 commented on Mar 8, 2025 • 0 new comments
[bugfix]fix finish reason in response
#3926 commented on Mar 8, 2025 • 0 new comments
[Feature] Support Efficient Sparse HiP Attention (InfiniteHiP) with Long-Context Generalization and KV Offloading Capabilties
#3930 commented on Mar 8, 2025 • 0 new comments
Update vLLM version for ROCm
#3937 commented on Mar 8, 2025 • 0 new comments
triton python backend
#3939 commented on Mar 8, 2025 • 0 new comments
[Feature] Devin/1740716491 Add a hash for each new release
#3947 commented on Mar 8, 2025 • 0 new comments
add deepgemm into sgl-kernel
#3957 commented on Mar 8, 2025 • 0 new comments
Update function_call_parser.py
#3960 commented on Mar 8, 2025 • 0 new comments
example: add async offline inference demo
#3961 commented on Mar 8, 2025 • 0 new comments
feat(remote_model): support variable remote backend for model loader
#3964 commented on Mar 8, 2025 • 0 new comments
Blackwell fp8 blockscale
#3968 commented on Mar 8, 2025 • 0 new comments
[CI] Remove unused imports with Ruff to pre-commit config
#3969 commented on Mar 8, 2025 • 0 new comments
FP4 weight loading and inference (2/2)
#3972 commented on Mar 8, 2025 • 0 new comments
Move `aiohttp` into public dependencies
#3980 commented on Mar 8, 2025 • 0 new comments
[ROCM MOE] Enable ROCM AITER Block MOE For DeepSeek R1/V3
#3788 commented on Mar 8, 2025 • 0 new comments
Workaround for transformers 4.49
#3792 commented on Mar 8, 2025 • 0 new comments
[Bug] Fix chat completion empty input
#3793 commented on Mar 8, 2025 • 0 new comments
Add model name in EntryClass for tracking doc update to date
#3800 commented on Mar 8, 2025 • 0 new comments
use torch api to get distributed backend
#3804 commented on Mar 8, 2025 • 0 new comments
Ensure Usage Data in Streaming Responses Aligns with vLLM’s Implementation
#3814 commented on Mar 8, 2025 • 0 new comments
[BugFix]: Add type check before compare for max_new_tokens
#3832 commented on Mar 8, 2025 • 0 new comments
Resolve LinearBase circular import
#3833 commented on Mar 8, 2025 • 0 new comments
[WIP] [Feature] Support DeepSeek-v3 gptq
#3834 commented on Mar 8, 2025 • 0 new comments
[Doc] Fix typo in backend/sampling_params
#3835 commented on Mar 8, 2025 • 0 new comments
[docs] Update outdated description about `torch.compile`
#3844 commented on Mar 8, 2025 • 0 new comments
Update FP8 kernel configuration for 4xGPU support on AMD
#3850 commented on Mar 8, 2025 • 0 new comments
[Feature] Support for Ascend NPU backend
#3853 commented on Mar 8, 2025 • 0 new comments
Fix block_shape in benchmark_vllm_vs_sglang_fused_moe_triton.py
#3858 commented on Mar 8, 2025 • 0 new comments
Feat: Supports returning expected error responses to wrong requests
#3875 commented on Mar 8, 2025 • 0 new comments
Update router.rs
#3877 commented on Mar 8, 2025 • 0 new comments
[Quantization] support VPTQ
#3879 commented on Mar 8, 2025 • 0 new comments
Add examples for token in token out for LLM
#3886 commented on Mar 8, 2025 • 0 new comments
[Feature] Prefill assistant response
#3971 commented on Mar 5, 2025 • 0 new comments
[Bug] RecursionError: maximum recursion depth exceeded while calling a Python object
#3953 commented on Mar 5, 2025 • 0 new comments
[Bug] [AMD] ncclAllReduce Error under tp > 1 + multi nodes + enable cuda graph when running DeepSeek R1 on MI300X
#3622 commented on Mar 5, 2025 • 0 new comments
[Bug] ImportError: cannot import name 'is_valid_list_of_images' from transformers.models.mllama.image_processing_mllama
#3878 commented on Mar 5, 2025 • 0 new comments
[Bug] Qwen 2.5 VL
#3321 commented on Mar 5, 2025 • 0 new comments
[Track] long context performance sglang vs vllm
#3471 commented on Mar 6, 2025 • 0 new comments
[Feature] Lora Development Roadmap
#2929 commented on Mar 6, 2025 • 0 new comments
[Bug] 8xMI300X DeepSeek v3/R1 takes 45 mins to download and 1 hour to load shards
#3825 commented on Mar 6, 2025 • 0 new comments
[Feature] Proposal for adding PD-Disaggregation Feature to SGLang
#3554 commented on Mar 6, 2025 • 0 new comments
[Bug] SGLang Tool Calling for Qwen2.5 models returns empty ChatCompletionMessage content
#3797 commented on Mar 6, 2025 • 0 new comments
[Bug] Parameter/message body error directly returns 500
#3805 commented on Mar 7, 2025 • 0 new comments
[Feature] remove vllm _custom_ops
#2965 commented on Mar 7, 2025 • 0 new comments
[Bug] fused_moe OOM when run deepseek-r1 with --speculative-algo NEXTN
#3633 commented on Mar 7, 2025 • 0 new comments
[Feature] Instructions for running Sglang on AMD RX 7900 XTX (gfx1100) ROCm 6.2.4
#3243 commented on Mar 7, 2025 • 0 new comments
[DeepseekR1]How ragged prefill manage kv_cache?
#3849 commented on Mar 8, 2025 • 0 new comments
[Bug] --dp-size issue with AMD 8xMI300X and Llama 3.1 70B
#3890 commented on Mar 8, 2025 • 0 new comments
Extensive benchmarking of reasoning models including variance
#3725 commented on Mar 8, 2025 • 0 new comments
[Feature] A800 kernel config file to support deepseek r1 bf16
#3748 commented on Mar 9, 2025 • 0 new comments
[Bug] GGUF tokenizations issues
#3427 commented on Mar 2, 2025 • 0 new comments
[Bug] HuggingFace and SGLang inference don't match
#2671 commented on Mar 3, 2025 • 0 new comments
[Bug] torch.distributed.all_reduce raised Segmentation fault on 2 * 8 * H800
#3745 commented on Mar 3, 2025 • 0 new comments
[Feature] Support Deepseek's DeepGemm MoE
#3881 commented on Mar 3, 2025 • 0 new comments
[Bug] Dimension mismatched error when capturing cuda graph while enabling NEXTN
#3891 commented on Mar 3, 2025 • 0 new comments
[Bug] tensor_model_parallel_all_reduce' is not defined
#2931 commented on Mar 3, 2025 • 0 new comments
[Bug] ERROR: No matching distribution found for vllm==0.6.3.post2.dev1; extra == "srt-hip"
#3189 commented on Mar 3, 2025 • 0 new comments
[Bug] AttributeError: module 'vllm._custom_ops' has no attribute 'silu_and_mul'
#3392 commented on Mar 3, 2025 • 0 new comments
[Bug] Decode Throughput Inconsistency Between bench_serving and Engine Logs
#3050 commented on Mar 3, 2025 • 0 new comments
[Feature] attention dp + attention tp for deepseek v3
#3750 commented on Mar 3, 2025 • 0 new comments
[Bug] KeyError: 'model.layers.0.self_attn.k_scale'
#3936 commented on Mar 4, 2025 • 0 new comments
[Track] progress in removing vLLM dependencies
#2245 commented on Mar 4, 2025 • 0 new comments
[Question] Why is the performance worse after --enable-flashinfer-mla on H20
#3917 commented on Mar 4, 2025 • 0 new comments
[Feature] Support mistralai/Pixtral
#2351 commented on Mar 4, 2025 • 0 new comments
[Bug] deploy the DeepSeek-R1-awq get jumbled or nonsensical answers
#3580 commented on Mar 4, 2025 • 0 new comments
The number of image token (3) should be the same as in the number of provided images (1)
#3819 commented on Mar 4, 2025 • 0 new comments
Question sgl_kernel on amd paltforms
#3965 commented on Mar 4, 2025 • 0 new comments
[Feature] Use xgrammar as default grammar backend to aviod I/O errors while using Outlines in a multi-node setting
#3383 commented on Mar 5, 2025 • 0 new comments
[WIP] Integration of TurboMind AWQ
#2900 commented on Mar 8, 2025 • 0 new comments
[Core] Optimize the delay scheduling of in batch prefix caching
#2962 commented on Mar 8, 2025 • 0 new comments
Integrate turbomind into sgl-kernel
#2999 commented on Mar 8, 2025 • 0 new comments
support telechat2 model
#3000 commented on Mar 8, 2025 • 0 new comments
Support int8 kvcahe
#3034 commented on Mar 8, 2025 • 0 new comments
Modify the kernel test path & add it to the CI process.
#3044 commented on Mar 8, 2025 • 0 new comments
[Feature] Beam Search
#3066 commented on Mar 8, 2025 • 0 new comments
[Feature] Rewrite Sampling Parameter #3165
#3185 commented on Mar 8, 2025 • 0 new comments
Add logit bias into the SGLang interface.
#3187 commented on Mar 8, 2025 • 0 new comments
Add deepseek_v3 fused gate
#3191 commented on Mar 8, 2025 • 0 new comments
add date_string to the chat template
#3297 commented on Mar 8, 2025 • 0 new comments
FEA compat with ipv6
#3301 commented on Mar 8, 2025 • 0 new comments
fix(mig): fallback gpu_memory_total value
#3353 commented on Mar 8, 2025 • 0 new comments
Better support of tp checkpoint loading
#3367 commented on Mar 8, 2025 • 0 new comments
Update Intel XPU install instruction
#3390 commented on Mar 8, 2025 • 0 new comments
fix: return null instead of "" for finish_reason when unfinished
#3391 commented on Mar 8, 2025 • 0 new comments
Modify metrics service endpoint
#3443 commented on Mar 8, 2025 • 0 new comments
Add torch profiler activity for HPU
#3445 commented on Mar 8, 2025 • 0 new comments
How to return reasoning_content from sglang server response?
#3428 commented on Mar 9, 2025 • 0 new comments
DeepSeek-R1 Optimization Option Ablations
#3956 commented on Mar 9, 2025 • 0 new comments
[Feature] Add initial support for sequence parallelism
#1436 commented on Mar 8, 2025 • 0 new comments
Surpport kv cache int8/int4 for triton backend
#1644 commented on Mar 8, 2025 • 0 new comments
feat: use cascade attention kernel (single level)
#2101 commented on Mar 8, 2025 • 0 new comments
Add support for Phi3V
#2383 commented on Mar 8, 2025 • 0 new comments
Add InfiniteBench for long context benchmarking
#2421 commented on Mar 8, 2025 • 0 new comments
[Experimental] Add a gRPC server for completion request
#2478 commented on Mar 8, 2025 • 0 new comments
Refactor Scheduler to improve code organization
#2593 commented on Mar 8, 2025 • 0 new comments
[Feature] support compute-communication overlap with TransformerEngine
#2627 commented on Mar 8, 2025 • 0 new comments
Support InternVL2 Series
#2629 commented on Mar 8, 2025 • 0 new comments
[Feature] Support regex as a stopping condition
#2699 commented on Mar 8, 2025 • 0 new comments
Speculative decoding with lookahead
#2790 commented on Mar 8, 2025 • 0 new comments
Add endpoint for file support, purely to speed up processing of input_embeds.
#2797 commented on Mar 8, 2025 • 0 new comments
[Feature] Support Deepseek-VL2
#2798 commented on Mar 8, 2025 • 0 new comments
Improve the mixed chunk prefill by lanuch two kernels
#2811 commented on Mar 8, 2025 • 0 new comments
Use CUDA_VISIBLE_DEVICES instead of gpu_id variables everywhere.
#2824 commented on Mar 8, 2025 • 0 new comments
[Feature] Support dynamic loading and unloading of Lora adapters
#2891 commented on Mar 8, 2025 • 0 new comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

March 1, 2025 – March 8, 2025

Overview

Could not load contribution data

117 Pull requests merged by 45 people

35 Pull requests opened by 29 people

47 Issues closed by 20 people

52 Issues opened by 46 people

150 Unresolved conversations

Insights: sgl-project/sglang

March 1, 2025 – March 8, 2025

Overview

Could not load contribution data

117 Pull requests merged by 45 people

35 Pull requests opened by 29 people

47 Issues closed by 20 people

52 Issues opened by 46 people

150 Unresolved conversations