Pulse · InternLM/lmdeploy · GitHub

December 18, 2024 – December 25, 2024

Overview

15 Active pull requests

29 Active issues

10 Pull requests merged by 6 people

Fallback to pytorch engine when the model is quantized by smooth quant
#2953 merged Dec 26, 2024
Fix exception handler for proxy server
#2901 merged Dec 26, 2024
Support torch_dtype modification and update FAQs for AWQ quantization
#2898 merged Dec 25, 2024
fix mllama inference without image
#2947 merged Dec 25, 2024
support unaligned qkv heads
#2930 merged Dec 23, 2024
fix torch_dtype
#2933 merged Dec 23, 2024
[side effect] fix vlm quant failed
#2914 merged Dec 22, 2024
[dlinfer] fix moe op for dlinfer.
#2917 merged Dec 20, 2024
fix lora name and rearange wqkv for internlm2
#2912 merged Dec 20, 2024
fix: Incorrect stats size during inference of throughput benchmark when concurrency > num_prompts
#2928 merged Dec 19, 2024

5 Pull requests opened by 4 people

[ci] add w8a8 and internvl2.5 models into testcase
#2949 opened Dec 24, 2024
[dlinfer] feat: add DlinferFlashAttention to support qwen vl.
#2952 opened Dec 25, 2024
[side-effect] bring back quantization of qwen2-vl, glm4v and etc.
#2954 opened Dec 25, 2024
Bump version to v0.6.5
#2955 opened Dec 25, 2024
Fix torch_dtype in lite
#2956 opened Dec 25, 2024

16 Issues closed by 10 people

[Bug] vlm模型的聊天模版，只会提取同一个role的最后一段text吗？
#2911 closed Dec 26, 2024
[Bug] Same code and enviroment A800 succeed but A10 failed
#2903 closed Dec 26, 2024
[Question] Support for torch 2.5
#2946 closed Dec 25, 2024
[Bug] lmdeploy[432]: OSError: image file is truncated
#2869 closed Dec 25, 2024
[Bug] mixtral moe fp16 greedy decode output differ each request
#2890 closed Dec 25, 2024
[Bug] AWQ 4-bit quantize Qwen2-VL-72B-Instruct error
#2935 closed Dec 24, 2024
[Bug] cannot chat with llava-v1.6-vicuna-7b、llava-v1.5-13b, because of error "Can not found rewrite for architectures: ['LlavaLlamaForCausalLM']"
#2939 closed Dec 24, 2024
[Question] How to run specific version of model from huggingface
#2936 closed Dec 24, 2024
[Bug] chat with gemma-2-27b-it, response is empty.
#2938 closed Dec 24, 2024
[Docs] 有关VLLM性能测试的疑问
#2838 closed Dec 23, 2024
Poor performance of Molmo pointing function
#2856 closed Dec 23, 2024
[Bug] AWQ 4-bit quantize Qwen/Qwen2-VL-72B-Instruct cuda out of memory
#2934 closed Dec 22, 2024
[Bug] lmdeploy加载lora微调模型报错
#2762 closed Dec 22, 2024
[Bug] InternVL2_5-78B 量化+ tp=4报错
#2929 closed Dec 20, 2024
[Docs] AWS Inferentia Setup
#2921 closed Dec 20, 2024
[Bug] Incorrect stats size during inference of throughput benchmark
#2927 closed Dec 19, 2024

13 Issues opened by 11 people

[Feature] 我有一段代码，不知道怎么使用LMDeploy去加速它
#2958 opened Dec 26, 2024
[Bug] Numerical Error in Flash Attention
#2957 opened Dec 25, 2024
[Bug] lmdeploy 0.6.4 cannot quantize Llama3.1-8B?
#2951 opened Dec 25, 2024
[Feature] Support Medusa decode (Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads)
#2950 opened Dec 25, 2024
[Bug] generation profile hangs on Mixtral-8x7B-Instruct-v0.1 with pytorch backend
#2948 opened Dec 24, 2024
910B 起服务失败
#2945 opened Dec 24, 2024
[Feature] Control over Prefix Cache Capacity and Guaranteed Caching
#2942 opened Dec 23, 2024
[Bug] cannot chat with Llama-3.2-11B-Vision-Instruct, because of error 'NoneType' object has no attribute 'kv_seqlens'
#2940 opened Dec 23, 2024
[Bug] 如何在推理时输出隐藏层
#2937 opened Dec 22, 2024
[Bug] deploy InterVL2.5 8B，Aborted (core dumped)
#2932 opened Dec 20, 2024
[Bug] 请问如何使用自己的数据集进行lmdeploy的awq量化？量化的校对数据集只支持文字问答数据集吗？
#2931 opened Dec 20, 2024
[Feature] pipeline.get_logits() support tensor on gpu as input rather than list input
#2926 opened Dec 19, 2024
[Feature] Support deploy model on specific device id
#2925 opened Dec 19, 2024

13 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Torchrun launching multiple api_server
#2402 commented on Dec 26, 2024 • 6 new comments
Qwen2-VL-72B-Instruct-AWQ 推理结果异常
#2863 commented on Dec 20, 2024 • 0 new comments
如何避免 VLM OOM的问题？
#2887 commented on Dec 20, 2024 • 0 new comments
[Feature] how to apply a custom calibration data set during awq quant?
#2923 commented on Dec 23, 2024 • 0 new comments
[Feature] qwen2 vl support the turbomind engine
#2774 commented on Dec 23, 2024 • 0 new comments
当连续请求200多次后，出现突然卡住的情况
#2231 commented on Dec 23, 2024 • 0 new comments
[Bug] InternLM2.5-20b-chat 长文本推理启动报错 `Illegal instruction`
#2900 commented on Dec 24, 2024 • 0 new comments
[Bug] docker+lmdeploy deploys multimodal large model and reports an error：AssertionError: failed to match chat template, please explicit set chat_template_config（docker+lmdeploy部署多模态大模型）
#2805 commented on Dec 25, 2024 • 0 new comments
[Bug] llama3.2 -11b-version batch infer text-only item return Garbled code
#2878 commented on Dec 26, 2024 • 0 new comments
[Feature] Support llava onevision
#2783 commented on Dec 26, 2024 • 0 new comments
[maca] add cudagraph support on maca backend.
#2834 commented on Dec 25, 2024 • 0 new comments
Support Medusa speculative decoding
#2859 commented on Dec 23, 2024 • 0 new comments
Remove threadsafe
#2907 commented on Dec 25, 2024 • 0 new comments