forked from datawhalechina/self-llm
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
113 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
# MiniCPM-2B-chat transformers 部署调用 | ||
|
||
## MiniCPM-2B-chat 介绍 | ||
|
||
MiniCPM 是面壁智能与清华大学自然语言处理实验室共同开源的系列端侧大模型,主体语言模型 MiniCPM-2B 仅有 24亿(2.4B)的非词嵌入参数量。 | ||
|
||
经过 SFT 后,MiniCPM 在公开综合性评测集上,MiniCPM 与 Mistral-7B相近(中文、数学、代码能力更优),整体性能超越 Llama2-13B、MPT-30B、Falcon-40B 等模型。 | ||
经过 DPO 后,MiniCPM 在当前最接近用户体感的评测集 MTBench上,MiniCPM-2B 也超越了 Llama2-70B-Chat、Vicuna-33B、Mistral-7B-Instruct-v0.1、Zephyr-7B-alpha 等众多代表性开源大模型。 | ||
以 MiniCPM-2B 为基础构建端侧多模态大模型 MiniCPM-V,整体性能在同规模模型中实现最佳,超越基于 Phi-2 构建的现有多模态大模型,在部分评测集上达到与 9.6B Qwen-VL-Chat 相当甚至更好的性能。 | ||
经过 Int4 量化后,MiniCPM 可在手机上进行部署推理,流式输出速度略高于人类说话速度。MiniCPM-V 也直接跑通了多模态大模型在手机上的部署。 | ||
一张1080/2080可高效参数微调,一张3090/4090可全参数微调,一台机器可持续训练 MiniCPM,二次开发成本较低。 | ||
|
||
## 环境准备 | ||
在autodl平台中租一个**单卡3090等24G**显存的显卡机器,如下图所示镜像选择PyTorch-->2.1.0-->3.10(ubuntu22.04)-->12.1 | ||
接下来打开刚刚租用服务器的JupyterLab, 图像 并且打开其中的终端开始环境配置、模型下载和运行演示。 | ||
 | ||
|
||
接下来打开刚刚租用服务器的`JupyterLab`,并且打开其中的终端开始环境配置、模型下载和运行`demo`。 | ||
|
||
pip换源和安装依赖包 | ||
|
||
```shell | ||
# 升级pip | ||
python -m pip install --upgrade pip | ||
# 更换 pypi 源加速库的安装 | ||
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple | ||
pip install modelscope transformers sentencepiece accelerate langchain | ||
|
||
MAX_JOBS=8 pip install flash-attn --no-build-isolation | ||
``` | ||
|
||
> 注意:flash-attn 安装会比较慢,大概需要十几分钟。 | ||
## 模型下载 | ||
|
||
使用 `modelscope` 中的`snapshot_download`函数下载模型,第一个参数为模型名称,参数`cache_dir`为模型的下载路径。 | ||
|
||
在 `/root/autodl-tmp` 路径下新建 `download.py` 文件并在其中输入以下内容,粘贴代码后记得保存文件,如下图所示。并运行 `python /root/autodl-tmp/download.py`执行下载,模型大小为 10 GB,下载模型大概需要 5~10 分钟 | ||
|
||
```python | ||
import torch | ||
from modelscope import snapshot_download, AutoModel, AutoTokenizer | ||
import os | ||
model_dir = snapshot_download('OpenBMB/MiniCPM-2B-sft-fp32', cache_dir='/root/autodl-tmp', revision='master') | ||
``` | ||
|
||
## 代码准备 | ||
|
||
为便捷构建 LLM 应用,我们需要基于本地部署的 MiniCPM-2B-chat,自定义一个 LLM 类,将 MiniCPM-2B-chat 接入到 LangChain 框架中。完成自定义 LLM 类之后,可以以完全一致的方式调用 LangChain 的接口,而无需考虑底层模型调用的不一致。 | ||
|
||
基于本地部署的 MiniCPM-2B-chat 自定义 LLM 类并不复杂,我们只需从 LangChain.llms.base.LLM 类继承一个子类,并重写构造函数与 _call 函数即可: | ||
|
||
|
||
```python | ||
from langchain.llms.base import LLM | ||
from typing import Any, List, Optional | ||
from langchain.callbacks.manager import CallbackManagerForLLMRun | ||
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig | ||
import torch | ||
|
||
class MiniCPM_LLM(LLM): | ||
# 基于本地 InternLM 自定义 LLM 类 | ||
tokenizer : AutoTokenizer = None | ||
model: AutoModelForCausalLM = None | ||
|
||
def __init__(self, model_path :str): | ||
# model_path: InternLM 模型路径 | ||
# 从本地初始化模型 | ||
super().__init__() | ||
print("正在从本地加载模型...") | ||
self.tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) | ||
self.model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True,torch_dtype=torch.bfloat16, device_map="auto") | ||
self.model = self.model.eval() | ||
print("完成本地模型的加载") | ||
|
||
def _call(self, prompt : str, stop: Optional[List[str]] = None, | ||
run_manager: Optional[CallbackManagerForLLMRun] = None, | ||
**kwargs: Any): | ||
# 通过模型获得输出 | ||
responds, history = self.model.chat(self.tokenizer, prompt, temperature=0.5, top_p=0.8, repetition_penalty=1.02) | ||
return responds | ||
|
||
@property | ||
def _llm_type(self) -> str: | ||
return "MiniCPM_LLM" | ||
``` | ||
|
||
## 调用 | ||
|
||
然后就可以像使用任何其他的langchain大模型功能一样使用了。 | ||
|
||
```python | ||
llm = MiniCPM_LLM('/root/autodl-tmp/OpenBMB/miniCPM-bf16') | ||
|
||
llm('你好') | ||
``` | ||
|
||
如下图所示: | ||
|
||
 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.