Skip to content

Commit

Permalink
update command line run and data process
Browse files Browse the repository at this point in the history
  • Loading branch information
mst272 committed Aug 8, 2024
1 parent 72e4d44 commit 872dee8
Show file tree
Hide file tree
Showing 7 changed files with 91 additions and 25 deletions.
21 changes: 18 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Tips: 图片完全由AI生成
- [致谢](#-致谢)

## 📖 Latest News
- [2024-08-08] 🤓支持直接修改配置文件启动及命令行启动,增加框架适配数据处理代码。
- [2024-08-04] 🤓支持自适应单轮或多轮对话,无需指定单轮或多轮,训练根据数据自行判断单轮或多轮。且可自主设置system命令。可见[训练数据格式说明](#训练数据格式说明)
- [2024-07-19] 🤓RLHF 强化学习框架新增CPO,SimPO,以及二者融合CPO-SimPO
- [2024-07-16] 🤓RLHF 强化学习框架更新完成,支持deepspeed单卡/多卡 进行强化学习lora、qlora等训练,详细可见[RLHF](./rlhf/README.md)
Expand Down Expand Up @@ -73,10 +74,13 @@ RLHF训练框架,支持并持续更新Reward训练、PPO、DPO、RLOO、SimPO
- [Chat Template总结](./chat_template/README.md)

### 技术发文
<details> <summary>More news...</summary>

- [Deepspeed配置及使用讲解](https://zhuanlan.zhihu.com/p/698631348)
- [从零代码构建MOE](https://zhuanlan.zhihu.com/p/701777558)
- [一步一步实现Transformer代码](https://medium.com/@sdwzh2725/transformer-code-step-by-step-understandingtransformer-d2ea773f15fa)
- [DPO训练QWEN2及魔改DPO实现](https://zhuanlan.zhihu.com/p/702569978)
</details>

## 😮训练数据格式说明
本框架采用的SFT数据格式无论单轮对话或多轮对话均为***jsonl***形式。无需指定单轮或多轮,训练根据数据自行判断单轮或多轮。
Expand All @@ -97,6 +101,13 @@ RLHF训练框架,支持并持续更新Reward训练、PPO、DPO、RLOO、SimPO

对于DPO数据,可见```data/dpo_multi_data.jsonl```示例数据

### 适配框架数据处理
鉴于框架指定格式数据可能会跟常规数据有些不同,故可以通过```generate_data.py```文件进行处理,输入应为正常的instruction和output的jsonl格式文件,
如下:
```json lines
{"instruction":"将这个句子改写成将来时态:“太阳将会照耀明亮。”","output":"太阳将会散发温暖的光芒。"}
```
运行后即可得到无system的user、assistant指定格式。

## 🤓Quick Start
包括SFT和DPO。
Expand All @@ -107,17 +118,21 @@ RLHF训练框架,支持并持续更新Reward训练、PPO、DPO、RLOO、SimPO

### SFT微调(FineTune)

**1、支持命令行传参启动,启动示例可见```run_example.sh```**

**2、也支持参数文件直接修改默认值,具体如下:**

#### Step1 配置args.py
不同的微调方法有不同的配置,但大体都是类似的,基本默认设置即可,你只需要改一下模型路径、输出路径等等。

常规的参数在utils下的args.py。

其中:
> train_args_path:为Step2中需要配置的train_args路径
> train_args_path:为Step2中需要配置的参数,可选sft_args和dpo_args,分别都在train_args文件夹下
#### Step2 配置train_args文件夹下对应文件
相关训练参数在train_args文件夹下对应的文件中。一般就是用```base.py```即可
均是采用dataclass格式配置参数,直接在default中修改即可,即不需要直接命令行传输参数了(如果有小伙伴需要这种方式也可以补上)
相关训练参数在train_args文件夹下对应的文件中,分为SFT和DPO。
均是采用dataclass格式配置参数,直接在default中修改即可。

#### Step3 开始训练

Expand Down
8 changes: 6 additions & 2 deletions data/sft_data.jsonl
Original file line number Diff line number Diff line change
@@ -1,2 +1,6 @@
{"message": [{"role": "system", "content": "You are a friendly chatbot who always responds in the style of a pirate"},{"role": "user", "content": "How many helicopters can a human eat in one sitting"},{"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},{"role": "user", "content": "How many hel"},{"role": "assistant", "content": "Snogether"}]}
{"message": [{"role": "user", "content": "How many helicopters can a human eat in one sitting"},{"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},{"role": "user", "content": "你是谁"},{"role": "assistant", "content": "我是一个智能机器人 "}]}
{"message": [{"role": "user", "content": "Create a function to calculate the sum of a sequence of integers."}, {"role": "assistant", "content": "# Python code def sum_sequence(sequence): sum = 0 for num in sequence: sum += num return sum"}]}
{"message": [{"role": "user", "content": "Generate a Python code for crawling a website for a specific type of data."}, {"role": "assistant", "content": "import requests import re def crawl_website_for_phone_numbers(website): response = requests.get(website) phone_numbers = re.findall('\\d{3}-\\d{3}-\\d{4}', response.text) return phone_numbers if __name__ == '__main__': print(crawl_website_for_phone_numbers('www.example.com'))"}]}
{"message": [{"role": "user", "content": "Create a nested loop to print every combination of numbers between 0-9, excluding any combination that contains the number 5. Additionally, exclude any combination that contains a repeating digit. Implement the solution without using any built-in functions or libraries to check for repeating digits."}, {"role": "assistant", "content": "Here is an example of a nested loop in Python to print every combination of numbers between 0-9, excluding any combination that contains the number 5 or repeating digits: ```python for i in range(10): # First digit for j in range(10): # Second digit for k in range(10): # Third digit # Checking for the conditions if i != 5 and j != 5 and k != 5 and i != j and i != k and j != k: print(i, j, k) ``` This code will generate and print every combination of three digits between 0-9 that do not contain the number 5 and do not have any repeating digits."}]}
{"message": [{"role": "user", "content": "Generate a Python program that adds two then doubles the result."}, {"role": "assistant", "content": "def double_add(num1, num2): return (num1 + num2) * 2"}]}
{"message": [{"role": "user", "content": "Develop a function in Python that prints out the Pascal's triangle for a given number of rows."}, {"role": "assistant", "content": "def pascal_triangle(n): trow = [1] y = [0] for x in range(max(n, 0)): print(trow) trow=[l+r for l,r in zip(trow + y, y + trow)] return n>=1 pascal_triangle(5)"}]}
{"message": [{"role": "user", "content": "Write a Python program to find all the numbers divisible by a certain number within a certain range."}, {"role": "assistant", "content": "def find_numbers_divisible_by(m, n, divisor): divisible_numbers = [] for i in range(m, n+1): if i % divisor == 0: divisible_numbers.append(i) return divisible_numbers"}]}
20 changes: 20 additions & 0 deletions generate_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import pandas as pd

ORIGINAL_DATA_PATH = './1.jsonl' # 原数据路径
OUT_DATA_PATH = './out_data.jsonl' # 转换为框架适用的role content模式的输出路径


data1 = pd.read_json(ORIGINAL_DATA_PATH, lines=True)
# 创建一个空的列表来存储处理后的数据
processed_data = []
# 迭代每一行数据
for index, row in data1.iterrows():
message = [
{"role": "user", "content": row['instruction']},
{"role": "assistant", "content": row['output']}
]
processed_data.append({"message": message})
# 将处理后的数据转换为 DataFrame
processed_df = pd.DataFrame(processed_data)
# 保存为jsonl格式
processed_df.to_json(OUT_DATA_PATH, orient='records', lines=True, force_ascii=False)
37 changes: 31 additions & 6 deletions run_example.sh
Original file line number Diff line number Diff line change
@@ -1,10 +1,35 @@

DATA_PATH=''
OUTPUT_PATH=""
MODEL_PATH=""

# deepspeed 启动
deepspeed --include localhost:0,1 main_train.py\
--train_data_path 数据集路径\
--model_name_or_path 模型路径\
--task_type sft\
--train_mode qlora\
--output_dir 输出路径
--train_data_path "$DATA_PATH" \
--model_name_or_path "$MODEL_PATH" \
--max_len 1024 \
--num_train_epochs 1 \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 4 \
--task_type "sft" \
--train_mode "qlora" \
--output_dir "$OUTPUT_PATH" \
--save_strategy "steps" \
--save_steps 500 \
--save_total_limit 5 \
--learning_rate 2e-4 \
--warmup_steps 10 \
--logging_steps 1 \
--lr_scheduler_type "cosine_with_min_lr" \
--gradient_checkpointing True \
--report_to "wandb" \
--deepspeed './train_args/deepspeed_config/ds_config_zero2.json' \
--bf16 True

# task_type:[pretrain, sft, dpo_multi, dpo_single]
# train_mode:[qlora, lora, full]



# python main_train.py --train_data_path 数据集路径 --model_name_or_path 模型路径
# python main_train.py --train_data_path 数据集路径 --model_name_or_path 模型路径 ......同上述传入参数
7 changes: 6 additions & 1 deletion train_args/dpo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,11 @@ DPO训练方式均支持框架中的deepspeed或者python启动模式,相应


## DPO quick start

**1、支持命令行传参启动,启动示例可见```LLM-Dojo/run_example.sh```**

**2、也支持参数文件直接修改默认值,具体如下:**

### Step1 配置args.py
常规的参数在utils下的args.py,基本默认设置即可,你只需要改一下模型路径、输出路径、task_type、template_name、train_data_path、train_args_path、train_mode等。

Expand All @@ -37,7 +42,7 @@ DPO训练方式均支持框架中的deepspeed或者python启动模式,相应
### Step2 配置train_args文件夹下对应文件
相关训练参数在train_args文件夹下对应的文件中。一般就是用```dpo/dpo_config.py```即可

均是采用dataclass格式配置参数,直接在default中修改即可,即不需要直接命令行传输参数了(如果有小伙伴需要这种方式也可以补上)
均是采用dataclass格式配置参数,直接在default中修改即可,即不需要直接命令行传输参数了。

在这里修改max_len和max_prompt_length参数,其他需要设置的是是否选择deepspeed模式训练等参数

Expand Down
14 changes: 6 additions & 8 deletions train_args/sft/lora_qlora/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,13 @@ class TrainArgument(TrainingArguments):
learning_rate: float = field(default=2e-4, metadata={"help": "学习率"})
logging_steps: int = field(default=100, metadata={"help": "打印的步长"})
save_steps: int = field(default=500, metadata={"help": "多少步长保存一次"})
evaluation_strategy: Union[IntervalStrategy, str] = field(default="no", metadata={"help": "The evaluation "
"strategy to use."}, )
save_strategy: Union[IntervalStrategy, str] = field(default="epoch", metadata={"help": "The checkpoint save "
save_strategy: Union[IntervalStrategy, str] = field(default="steps", metadata={"help": "The checkpoint save "
"strategy to use."}, )
save_total_limit: Optional[int] = field(default=2, metadata={"help": "If a value is passed, will limit the total "
"amount of checkpoints. Deletes the older "
"checkpoints in"})
save_total_limit: Optional[int] = field(default=2, metadata={"help": "保存的限制总量"})
lr_scheduler_type: Union[SchedulerType, str] = field(default="cosine",
metadata={"help": "The scheduler type to use."})
lr_scheduler_kwargs: dict = field(default_factory=lambda: {},
metadata={"help": "lr_scheduler的额外参数,例如 {'num_cycles': 1}"})
warmup_steps: int = field(default=10, metadata={"help": "Linear warmup over warmup_steps."})
optim: Union[OptimizerNames, str] = field(default='adamw_torch', metadata={"help": "The optimizer to use."})
seed: int = field(default=42, metadata={"help": "Random seed that will be set at the beginning of training."})
Expand All @@ -44,5 +42,5 @@ class TrainArgument(TrainingArguments):
fp16: bool = field(default=False, metadata={"help": "Whether to use fp16 (mixed) precision instead of 32-bit"})

# Deepspeed训练相关参数,不使用时设置为default=None
deepspeed: Optional[str] = field(default='./train_args/deepspeed_config/ds_config_zero2.json', metadata={"help": "启用Deepspeed时需要的config文件"})

deepspeed: Optional[str] = field(default='./train_args/deepspeed_config/ds_config_zero2.json',
metadata={"help": "启用Deepspeed时需要的config文件"})
9 changes: 4 additions & 5 deletions utils/args.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from dataclasses import dataclass, field
from typing import Optional, Union
from typing import Optional
from enum import Enum


Expand All @@ -19,19 +19,18 @@ class CommonArgs:
"""
一些常用的自定义参数
"""
# Deepspeed相关参数
# Deepspeed相关参数,如出现报错可注释掉
local_rank: int = field(default=1, metadata={"help": "deepspeed所需参数,单机无需修改,如出现报错可注释掉"})

train_args_path: TrainArgPath = field(default='sft_args',
metadata={"help": "当前模式的训练参数,分为sft和dpo参数"})
train_args_path: TrainArgPath = field(default='sft_args', metadata={"help": "训练参数,分为sft和dpo参数[sft_args,dpo_args]"})
max_len: int = field(default=1024, metadata={"help": "最大输入长度,dpo时该参数在dpo_config中设置"})
max_prompt_length: int = field(default=512, metadata={
"help": "dpo时,prompt的最大长度,适用于dpo_single,dpo_multi时该参数在dpo_config中设置"})
train_data_path: Optional[str] = field(default='./', metadata={"help": "训练集路径"})
model_name_or_path: str = field(default='./', metadata={"help": "下载的所需模型路径"})

# 微调方法相关选择与配置
train_mode: TrainMode = field(default=TrainMode.LORA.value,
train_mode: TrainMode = field(default='lora',
metadata={"help": "选择采用的训练方式:[qlora, lora, full]"})
use_dora: bool = field(default=False, metadata={"help": "仅在train_mode==lora时可以使用。是否使用Dora(一个基于lora的变体) "
"目前只支持linear and Conv2D layers."})
Expand Down

0 comments on commit 872dee8

Please sign in to comment.