-
Notifications
You must be signed in to change notification settings - Fork 445
Insights: InternLM/lmdeploy
Overview
Could not load contribution data
Please try again later
10 Pull requests merged by 6 people
-
Fallback to pytorch engine when the model is quantized by smooth quant
#2953 merged
Dec 26, 2024 -
Fix exception handler for proxy server
#2901 merged
Dec 26, 2024 -
Support torch_dtype modification and update FAQs for AWQ quantization
#2898 merged
Dec 25, 2024 -
fix mllama inference without image
#2947 merged
Dec 25, 2024 -
support unaligned qkv heads
#2930 merged
Dec 23, 2024 -
fix torch_dtype
#2933 merged
Dec 23, 2024 -
[side effect] fix vlm quant failed
#2914 merged
Dec 22, 2024 -
[dlinfer] fix moe op for dlinfer.
#2917 merged
Dec 20, 2024 -
fix lora name and rearange wqkv for internlm2
#2912 merged
Dec 20, 2024 -
fix: Incorrect stats size during inference of throughput benchmark when concurrency > num_prompts
#2928 merged
Dec 19, 2024
5 Pull requests opened by 4 people
-
[ci] add w8a8 and internvl2.5 models into testcase
#2949 opened
Dec 24, 2024 -
[dlinfer] feat: add DlinferFlashAttention to support qwen vl.
#2952 opened
Dec 25, 2024 -
[side-effect] bring back quantization of qwen2-vl, glm4v and etc.
#2954 opened
Dec 25, 2024 -
Bump version to v0.6.5
#2955 opened
Dec 25, 2024 -
Fix torch_dtype in lite
#2956 opened
Dec 25, 2024
16 Issues closed by 10 people
-
[Bug] vlm模型的聊天模版,只会提取同一个role的最后一段text吗?
#2911 closed
Dec 26, 2024 -
[Bug] Same code and enviroment A800 succeed but A10 failed
#2903 closed
Dec 26, 2024 -
[Question] Support for torch 2.5
#2946 closed
Dec 25, 2024 -
[Bug] lmdeploy[432]: OSError: image file is truncated
#2869 closed
Dec 25, 2024 -
[Bug] mixtral moe fp16 greedy decode output differ each request
#2890 closed
Dec 25, 2024 -
[Bug] AWQ 4-bit quantize Qwen2-VL-72B-Instruct error
#2935 closed
Dec 24, 2024 -
[Question] How to run specific version of model from huggingface
#2936 closed
Dec 24, 2024 -
[Bug] chat with gemma-2-27b-it, response is empty.
#2938 closed
Dec 24, 2024 -
[Docs] 有关VLLM性能测试的疑问
#2838 closed
Dec 23, 2024 -
Poor performance of Molmo pointing function
#2856 closed
Dec 23, 2024 -
[Bug] AWQ 4-bit quantize Qwen/Qwen2-VL-72B-Instruct cuda out of memory
#2934 closed
Dec 22, 2024 -
[Bug] lmdeploy加载lora微调模型报错
#2762 closed
Dec 22, 2024 -
[Bug] InternVL2_5-78B 量化+ tp=4报错
#2929 closed
Dec 20, 2024 -
[Docs] AWS Inferentia Setup
#2921 closed
Dec 20, 2024 -
[Bug] Incorrect stats size during inference of throughput benchmark
#2927 closed
Dec 19, 2024
13 Issues opened by 11 people
-
[Feature] 我有一段代码,不知道怎么使用LMDeploy去加速它
#2958 opened
Dec 26, 2024 -
[Bug] Numerical Error in Flash Attention
#2957 opened
Dec 25, 2024 -
[Bug] lmdeploy 0.6.4 cannot quantize Llama3.1-8B?
#2951 opened
Dec 25, 2024 -
[Bug] generation profile hangs on Mixtral-8x7B-Instruct-v0.1 with pytorch backend
#2948 opened
Dec 24, 2024 -
910B 起服务失败
#2945 opened
Dec 24, 2024 -
[Feature] Control over Prefix Cache Capacity and Guaranteed Caching
#2942 opened
Dec 23, 2024 -
[Bug] 如何在推理时输出隐藏层
#2937 opened
Dec 22, 2024 -
[Bug] deploy InterVL2.5 8B,Aborted (core dumped)
#2932 opened
Dec 20, 2024 -
[Bug] 请问如何使用自己的数据集进行lmdeploy的awq量化?量化的校对数据集只支持文字问答数据集吗?
#2931 opened
Dec 20, 2024 -
[Feature] pipeline.get_logits() support tensor on gpu as input rather than list input
#2926 opened
Dec 19, 2024 -
[Feature] Support deploy model on specific device id
#2925 opened
Dec 19, 2024
13 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Torchrun launching multiple api_server
#2402 commented on
Dec 26, 2024 • 6 new comments -
Qwen2-VL-72B-Instruct-AWQ 推理结果异常
#2863 commented on
Dec 20, 2024 • 0 new comments -
如何避免 VLM OOM的问题?
#2887 commented on
Dec 20, 2024 • 0 new comments -
[Feature] how to apply a custom calibration data set during awq quant?
#2923 commented on
Dec 23, 2024 • 0 new comments -
[Feature] qwen2 vl support the turbomind engine
#2774 commented on
Dec 23, 2024 • 0 new comments -
当连续请求200多次后,出现突然卡住的情况
#2231 commented on
Dec 23, 2024 • 0 new comments -
[Bug] InternLM2.5-20b-chat 长文本推理启动报错 `Illegal instruction`
#2900 commented on
Dec 24, 2024 • 0 new comments -
[Bug] docker+lmdeploy deploys multimodal large model and reports an error:AssertionError: failed to match chat template, please explicit set chat_template_config(docker+lmdeploy部署多模态大模型)
#2805 commented on
Dec 25, 2024 • 0 new comments -
[Bug] llama3.2 -11b-version batch infer text-only item return Garbled code
#2878 commented on
Dec 26, 2024 • 0 new comments -
[Feature] Support llava onevision
#2783 commented on
Dec 26, 2024 • 0 new comments -
[maca] add cudagraph support on maca backend.
#2834 commented on
Dec 25, 2024 • 0 new comments -
Support Medusa speculative decoding
#2859 commented on
Dec 23, 2024 • 0 new comments -
Remove threadsafe
#2907 commented on
Dec 25, 2024 • 0 new comments