forked from THUDM/ChatGLM3
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request THUDM#918 from zRzRzRzRzRzRzR/main
多处更新
- Loading branch information
Showing
10 changed files
with
430 additions
and
48 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# Intel Device Demo | ||
|
||
本文件夹主要辅助开发者 在 Intel 设备上加速部署 ChatGLM3-6B 模型。 | ||
|
||
## 1. 硬件准备 | ||
本文件夹中的设备支持列表包括: | ||
- Intel CPU 系列, 包含个人CPU 和 服务器 / 工作站 CPU | ||
- Intel Arc 独立显卡系列,包括 Arc A770 等显卡。 | ||
- Intel CPU 核显系列 | ||
- 其他理论支持 OpenVINO 加速的Intel 工具套件。 | ||
|
||
## 2. 文件目录 | ||
- OpenVINO_demo: 使用 Intel OpenVINO 推理加速框架,实现加速模型部署示例。 | ||
- Pytorch_demo (暂未推出) : 使用 Intel Pytorch Extension 实现在 Pytorch 环境上开发(适用于 Intel Arc 系列 GPU) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
|
||
# 使用 OpenVINO 部署ChatGLM3-6B 模型 | ||
|
||
[OpenVINO](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/overview.html) 是 Intel 为深度学习推理而设计的开源工具包。它可以帮助开发者优化模型,提高推理性能,减少模型的内存占用。本示例将展示如何使用 OpenVINO 部署 ChatGLM3。 | ||
|
||
你需要克隆本仓库,然后按照以下步骤进行操作来将模型转换你的 OpenVINO IR 模型,随后进行推理。 | ||
## 请你 | ||
## 1. 环境配置 | ||
|
||
首先,克隆OpenVINO GLM3 推理仓库并安装依赖。 | ||
|
||
```bash | ||
git clone https://github.com/OpenVINO-dev-contest/chatglm3.openvino.git | ||
cd chatglm3.openvino | ||
``` | ||
|
||
接着,我们推荐您新建一个虚拟环境,然后按照以下安装依赖。 | ||
|
||
``` | ||
python3 -m venv openvino_env | ||
source openvino_env/bin/activate | ||
python3 -m pip install --upgrade pip | ||
pip install wheel setuptools | ||
pip install -r requirements.txt | ||
``` | ||
|
||
## 2. 转换模型 | ||
|
||
由于需要将Huggingface模型转换为OpenVINO IR模型,因此您需要下载模型并转换。 | ||
|
||
``` | ||
python3 convert.py --model_id THUDM/chatglm3-6b --output {your_path}/chatglm3-6b | ||
``` | ||
|
||
### 可以选择的参数 | ||
|
||
* `--model_id` - 模型所在目录的路径(绝对路径)。 | ||
* `--output` - 转换后模型保存的地址 | ||
|
||
## 3. 量化模型(非必须) | ||
|
||
``` | ||
python3 quantize.py --model_path {your_path}/chatglm3-6b --precision int4 --output {your_path}/chatglm3-6b-int4 | ||
``` | ||
|
||
### 可以选择的参数 | ||
|
||
* `--model_path` - OpenVINO IR 模型所在目录的路径。 | ||
* `-- precision` - 量化精度:int8 或 int4。 | ||
* `--output` - 保存模型的路径。 | ||
|
||
## 4. 运行ChatGLM3 机器人 | ||
|
||
``` | ||
python3 chat.py --model_path {your_path}/chatglm3-6b --max_sequence_length 4096 --device CPU | ||
``` | ||
|
||
### 可以选择的参数 | ||
|
||
* `--model_path` - OpenVINO IR 模型所在目录的路径。 | ||
* `--max_sequence_length` - 输出标记的最大大小。 | ||
* `--device` - 运行推理的设备。 | ||
|
||
## 例子 | ||
|
||
``` | ||
用户: 你好 | ||
ChatGLM3-6B-OpenVINO: 你好!有什么我可以帮助你的吗? | ||
用户: 你是谁? | ||
ChatGLM3-6B-OpenVINO: 我是一个名为ChatGLM3-6B的人工智能助手,是由清华大学KEG实验室和智谱AI 公司于2023 年共同训练的语言模型开发而成。我的任务是针对用户的问题和要求提供适当的答复和支持。 | ||
用户: 请给我讲一个有趣的故事 | ||
ChatGLM3-6B-OpenVINO: 从前,有一个名叫小明的小男孩,他是一个非常喜欢动物的人。有一天,他在森林里散步时,发现了一个非常漂亮的小鸟。小鸟受伤了,无法飞行。小明非常心疼,于是决定照顾这只小鸟。小明带着小鸟回家,为它搭建了一个小小的巢穴,并找来了一些软草和食物。每天,他都会给小鸟喂食,并为它换水。渐渐地,小鸟的伤势好了起来,开始在小明的家里飞来飞去,它们成了非常好的朋友。然而,一天,小明的父母告诉他,他们必须把小明养的小鸟送到森林里去。小明非常伤心,因为他已经和小鸟成为了好朋友。但是,他的父母告诉他,小鸟在森林里会更加自由自在,而且他也可以继续观看小鸟在森林中的生活。于是,小明和他的父母一起将小鸟送到了森林中。小鸟非常高兴,因为它又可以飞行了,并且还有许多其他的小动物朋友。小明也感到非常开心,因为他知道,即使不能一直拥有小鸟,他仍然可以欣赏到它们在自然中的美丽。从此以后,小明常常来到森林中,寻找小鸟。 | ||
用户: 请给这个故事起一个标题 | ||
ChatGLM3-6B-OpenVINO: 《友谊的力量:小明与小鸟的森林冒险》 | ||
``` | ||
|
||
## 常见问题 | ||
1. 为什么倒入本地模型还会报 huggingface 链接错误 | ||
- 降级 transformers 库到 4.37.2 版本 | ||
|
||
2. 需要安装 OpenVINO C++ 推理引擎吗 | ||
- 不需要 | ||
|
||
3. 一定要使用 Intel 的硬件吗? | ||
- 我们仅在 Intel 设备上尝试,我们推荐使用x86架构的英特尔设备,包括但不限制于: | ||
- 英特尔的CPU,包括个人电脑CPU 和服务器CPU。 | ||
- 英特尔的独立显卡。例如:ARC A770 显卡。 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,215 @@ | ||
import argparse | ||
|
||
from optimum.utils import NormalizedTextConfig, NormalizedConfigManager | ||
from optimum.intel.openvino import OVModelForCausalLM | ||
from optimum.intel.openvino.utils import OV_XML_FILE_NAME | ||
|
||
from transformers import (PretrainedConfig, AutoTokenizer, AutoConfig, | ||
TextIteratorStreamer, StoppingCriteriaList, StoppingCriteria) | ||
|
||
from typing import Optional, Union, Dict, List, Tuple | ||
from pathlib import Path | ||
from threading import Thread | ||
import torch | ||
|
||
|
||
class StopOnTokens(StoppingCriteria): | ||
def __init__(self, token_ids): | ||
self.token_ids = token_ids | ||
|
||
def __call__( | ||
self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs | ||
) -> bool: | ||
for stop_id in self.token_ids: | ||
if input_ids[0][-1] == stop_id: | ||
return True | ||
return False | ||
|
||
|
||
class OVCHATGLMModel(OVModelForCausalLM): | ||
""" | ||
Optimum intel compatible model wrapper for CHATGLM2 | ||
""" | ||
|
||
def __init__( | ||
self, | ||
model: "Model", | ||
config: "PretrainedConfig" = None, | ||
device: str = "CPU", | ||
dynamic_shapes: bool = True, | ||
ov_config: Optional[Dict[str, str]] = None, | ||
model_save_dir: Optional[Union[str, Path]] = None, | ||
**kwargs, | ||
): | ||
NormalizedConfigManager._conf["chatglm"] = NormalizedTextConfig.with_args( | ||
num_layers="num_hidden_layers", | ||
num_attention_heads="num_attention_heads", | ||
hidden_size="hidden_size", | ||
) | ||
super().__init__( | ||
model, config, device, dynamic_shapes, ov_config, model_save_dir, **kwargs | ||
) | ||
|
||
def _reshape( | ||
self, | ||
model: "Model", | ||
*args, **kwargs | ||
): | ||
shapes = {} | ||
for inputs in model.inputs: | ||
shapes[inputs] = inputs.get_partial_shape() | ||
shapes[inputs][0] = -1 | ||
input_name = inputs.get_any_name() | ||
if input_name.startswith('beam_idx'): | ||
continue | ||
if input_name.startswith('past_key_values'): | ||
shapes[inputs][1] = -1 | ||
shapes[inputs][2] = 2 | ||
elif shapes[inputs].rank.get_length() > 1: | ||
shapes[inputs][1] = -1 | ||
model.reshape(shapes) | ||
return model | ||
|
||
@classmethod | ||
def _from_pretrained( | ||
cls, | ||
model_id: Union[str, Path], | ||
config: PretrainedConfig, | ||
use_auth_token: Optional[Union[bool, str, None]] = None, | ||
revision: Optional[Union[str, None]] = None, | ||
force_download: bool = False, | ||
cache_dir: Optional[str] = None, | ||
file_name: Optional[str] = None, | ||
subfolder: str = "", | ||
from_onnx: bool = False, | ||
local_files_only: bool = False, | ||
load_in_8bit: bool = False, | ||
**kwargs, | ||
): | ||
model_path = Path(model_id) | ||
default_file_name = OV_XML_FILE_NAME | ||
file_name = file_name or default_file_name | ||
|
||
model_cache_path = cls._cached_file( | ||
model_path=model_path, | ||
use_auth_token=use_auth_token, | ||
revision=revision, | ||
force_download=force_download, | ||
cache_dir=cache_dir, | ||
file_name=file_name, | ||
subfolder=subfolder, | ||
local_files_only=local_files_only, | ||
) | ||
|
||
model = cls.load_model(model_cache_path) | ||
init_cls = OVCHATGLMModel | ||
|
||
return init_cls( | ||
model=model, config=config, model_save_dir=model_cache_path.parent, **kwargs | ||
) | ||
|
||
|
||
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser(add_help=False) | ||
parser.add_argument('-h', | ||
'--help', | ||
action='help', | ||
help='Show this help message and exit.') | ||
parser.add_argument('-m', | ||
'--model_path', | ||
required=True, | ||
type=str, | ||
help='Required. model path') | ||
parser.add_argument('-l', | ||
'--max_sequence_length', | ||
default=256, | ||
required=False, | ||
type=int, | ||
help='Required. maximun length of output') | ||
parser.add_argument('-d', | ||
'--device', | ||
default='CPU', | ||
required=False, | ||
type=str, | ||
help='Required. device for inference') | ||
args = parser.parse_args() | ||
|
||
ov_config = {"PERFORMANCE_HINT": "LATENCY", | ||
"NUM_STREAMS": "1", "CACHE_DIR": ""} | ||
|
||
tokenizer = AutoTokenizer.from_pretrained( | ||
args.model_path, trust_remote_code=True) | ||
model_dir = args.model_path | ||
|
||
print("====Compiling model====") | ||
ov_model = OVCHATGLMModel.from_pretrained( | ||
model_dir, | ||
device=args.device, | ||
ov_config=ov_config, | ||
config=AutoConfig.from_pretrained(model_dir, trust_remote_code=True), | ||
trust_remote_code=True, | ||
) | ||
|
||
streamer = TextIteratorStreamer( | ||
tokenizer, timeout=30.0, skip_prompt=True, skip_special_tokens=True | ||
) | ||
stop_tokens = [0, 2] | ||
stop_tokens = [StopOnTokens(stop_tokens)] | ||
|
||
|
||
def convert_history_to_token(history: List[Tuple[str, str]]): | ||
|
||
messages = [] | ||
for idx, (user_msg, model_msg) in enumerate(history): | ||
if idx == len(history) - 1 and not model_msg: | ||
messages.append({"role": "user", "content": user_msg}) | ||
break | ||
if user_msg: | ||
messages.append({"role": "user", "content": user_msg}) | ||
if model_msg: | ||
messages.append({"role": "assistant", "content": model_msg}) | ||
|
||
model_inputs = tokenizer.apply_chat_template(messages, | ||
add_generation_prompt=True, | ||
tokenize=True, | ||
return_tensors="pt") | ||
return model_inputs | ||
|
||
|
||
history = [] | ||
print("====Starting conversation====") | ||
while True: | ||
input_text = input("用户: ") | ||
if input_text.lower() == 'stop': | ||
break | ||
|
||
if input_text.lower() == 'clear': | ||
history = [] | ||
print("AI助手: 对话历史已清空") | ||
continue | ||
|
||
print("ChatGLM3-6B-OpenVINO:", end=" ") | ||
history = history + [[input_text, ""]] | ||
model_inputs = convert_history_to_token(history) | ||
generate_kwargs = dict( | ||
input_ids=model_inputs, | ||
max_new_tokens=args.max_sequence_length, | ||
temperature=0.1, | ||
do_sample=True, | ||
top_p=1.0, | ||
top_k=50, | ||
repetition_penalty=1.1, | ||
streamer=streamer, | ||
stopping_criteria=StoppingCriteriaList(stop_tokens) | ||
) | ||
|
||
t1 = Thread(target=ov_model.generate, kwargs=generate_kwargs) | ||
t1.start() | ||
|
||
partial_text = "" | ||
for new_text in streamer: | ||
new_text = new_text | ||
print(new_text, end="", flush=True) | ||
partial_text += new_text | ||
print("\n") | ||
history[-1][1] = partial_text |
Oops, something went wrong.