Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor of models and trainers with base class for common methods #306

Open
wants to merge 42 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
2b8e301
Refactor models and trainers with base_class for common methods
PierpaoloSorbellini Mar 27, 2023
5e0ded8
Revert "Release ChatLLaMA 0.0.4"
PierpaoloSorbellini Mar 27, 2023
3fa5c53
Merge branch 'main' of https://github.com/nebuly-ai/nebullvm into main
PierpaoloSorbellini Mar 27, 2023
ab1f09e
Refactor of models and trainers with base class for common methods
PierpaoloSorbellini Mar 27, 2023
3d54d50
Fix comments and values in the config.yaml
PierpaoloSorbellini Mar 27, 2023
9f5eab4
Add load 8 bit from HF
PierpaoloSorbellini Mar 27, 2023
dc46ee4
Add check on load int 8
PierpaoloSorbellini Mar 27, 2023
c1d03d3
Add Reward and Critic support for LoRA PEFT
PierpaoloSorbellini Mar 28, 2023
36c350d
Add SelfInstruct Dataset from HF
PierpaoloSorbellini Mar 28, 2023
bb92ee7
Fix imports
Mar 28, 2023
6fc94d3
Add logging with proper class
Mar 29, 2023
dc2489f
Fix logs for deepspeed
Mar 30, 2023
0b0795d
Fix early logs with multi-GPUs
Mar 30, 2023
01be6dc
Fix MultiGPU for accelerate
Mar 30, 2023
13b1abd
Fix batch-size for accelerate
Mar 30, 2023
db8b3c2
Add multi gpu training to readme.md
Mar 30, 2023
d771fb2
Fix fp16 training
Mar 31, 2023
e5f959c
Merge branch 'main' into refactor
PierpaoloSorbellini Mar 31, 2023
d5084e5
Fix Distributed training for RLHF
PierpaoloSorbellini Apr 3, 2023
2ec5eaa
Add new models
PierpaoloSorbellini Apr 3, 2023
33e97e2
Add decapoda models
PierpaoloSorbellini Apr 3, 2023
8332a26
Add unsupported model message
PierpaoloSorbellini Apr 3, 2023
32ddfa2
Change sing to KL div accordingly to issue #298
PierpaoloSorbellini Apr 3, 2023
aa9881c
Fix imports order
PierpaoloSorbellini Apr 3, 2023
b10f1dc
Add cases for lora-peft model loading
PierpaoloSorbellini Apr 4, 2023
86a699b
Merge branch 'refactor' of https://github.com/nebuly-ai/nebullvm into…
PierpaoloSorbellini Apr 4, 2023
1f29ba4
Fix Actor 8bit training
PierpaoloSorbellini Apr 4, 2023
1836788
Adjust code comments to match new adjustments
PierpaoloSorbellini Apr 4, 2023
966a19d
Fix device error when using vanilla pytorch trainig
PierpaoloSorbellini Apr 4, 2023
feacb88
Fix RLHF with fp16
PierpaoloSorbellini Apr 5, 2023
f894494
Move grad scaler into base class
PierpaoloSorbellini Apr 5, 2023
b56185f
Add check on 8bit load and distributed training
PierpaoloSorbellini Apr 5, 2023
5699aaa
Add template to self-instruct dataset
PierpaoloSorbellini Apr 12, 2023
5c83927
Fix checkpoints name in actor training
PierpaoloSorbellini Apr 12, 2023
a205ee6
Fix slow loss computation
PierpaoloSorbellini Apr 12, 2023
bb386c4
Fix checkpoints also in reward models
PierpaoloSorbellini Apr 12, 2023
22a64af
Fix checkpoint for rl
PierpaoloSorbellini Apr 12, 2023
10211c6
Add n_checkpoints for all the training with old checkpoints removal
PierpaoloSorbellini Apr 12, 2023
442b396
Improve datasets quality with reward model negative examples
PierpaoloSorbellini Apr 13, 2023
71a6c02
Merge branch 'main' of https://github.com/nebuly-ai/nebullvm into main
PierpaoloSorbellini Apr 14, 2023
1189787
Merge branch 'main' into refactor
PierpaoloSorbellini Apr 14, 2023
98b96c2
Fix merge issues
PierpaoloSorbellini Apr 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fix RLHF with fp16
  • Loading branch information
PierpaoloSorbellini committed Apr 5, 2023
commit feacb88b9ee1f43a29f44d8eb7fa96e4323d560f
14 changes: 7 additions & 7 deletions apps/accelerate/chatllama/artifacts/config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,12 @@ trainer_config:
# number of episodes and generation performed for each episode
# in the train() method
num_episodes: 100
max_timesteps: 32
max_timesteps: 4
# number of timesteps after which the learn() method is called
# (to update the weights)
update_timesteps: 32
update_timesteps: 4
# number of example sampled at each timestep
num_examples: 1
num_examples: 4
# batch and epochs for the training
batch_size: 1
epochs: 1
Expand All @@ -33,7 +33,7 @@ trainer_config:
accelerate_enable: False

actor_config:
model: "facebook/opt-125m"
model: "facebook/opt-1.3b"
load_8bit: False
model_folder: "./models"
tokenizer_path: "path-to-tokenizer"
Expand Down Expand Up @@ -92,13 +92,13 @@ reward_config:
epochs: 1
iteration_per_print: 1
# steps after which the checkpoint are saved
checkpoint_steps: 200
checkpoint_steps: 10000
# here specify the name of the reward checkpoint from which resume
# during reward training. If null load the last one.
checkpoint_name: null
lr: 0.000009
# deepspeed settings
deepspeed_enable: True
deepspeed_enable: False
deepspeed_config_path: "./artifacts/config/ds_config.json"
# accelerate settings
accelerate_enable: False
Expand All @@ -117,5 +117,5 @@ critic_config:
# here specify the name of the critic checkpoint from which resume
# during critic training. If null load the last one.
checkpoint_name: null
peft_enable: True
peft_enable: False
peft_config_path: "./artifacts/config/peft_config.yaml"
12 changes: 6 additions & 6 deletions apps/accelerate/chatllama/artifacts/download_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
import os

from chatllama.rlhf.dataset import (
AnthropicRLHF,
SelfInstruct,
StanfordNLPSHP,
AnthropicRLHFDataset,
SelfInstructDataset,
StanfordNLPSHPDataset,
)


Expand Down Expand Up @@ -44,17 +44,17 @@
raise ValueError("Number of samples should be an integer")

if args.dataset_name == "SHP":
dataset = StanfordNLPSHP()
dataset = StanfordNLPSHPDataset()
dataset.save_dataset(args.path, n_samples)

elif args.dataset_name == "ARLHF":
dataset = AnthropicRLHF()
dataset = AnthropicRLHFDataset()
dataset.save_dataset(
args.path,
n_samples,
)
elif args.dataset_name == "SI":
dataset = SelfInstruct()
dataset = SelfInstructDataset()
dataset.save_dataset(
args.path,
n_samples,
Expand Down
3 changes: 2 additions & 1 deletion apps/accelerate/chatllama/chatllama/rlhf/actor.py
Original file line number Diff line number Diff line change
Expand Up @@ -313,6 +313,7 @@ def train(
return_tensors="pt",
truncation=True,
padding=True,
max_length=self.config.max_sequence_length,
)
else:
input_tokenized = self.model.tokenizer(
Expand Down Expand Up @@ -344,7 +345,7 @@ def train(
attention_mask = input_tokenized_mask[:, :-1]

# move to device
if self.config.load_8bit is False:
if not self.config.load_8bit:
training_output = training_output.to(self.device)
training_input = training_input.to(self.device)
attention_mask = attention_mask.to(self.device)
Expand Down
5 changes: 4 additions & 1 deletion apps/accelerate/chatllama/chatllama/rlhf/base_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -499,6 +499,7 @@ def __init__(self, config: ConfigType) -> None:

# clean the dataset
if self.accelerate_enable or self.deepspeed_enable:
# TODO fix error for process group when using accelerate
if dist.get_rank() == 0:
BaseDataset.clean_dataset(config)
else:
Expand Down Expand Up @@ -654,7 +655,9 @@ def setup_accelerate(
)

# assign device
self.device = torch.device(f"cuda:{dist.get_rank()}")
# Fix error with process group not initialized when using
self.device = torch.device("cuda:0")
# self.device = torch.device(f"cuda:{dist.get_rank()}")

my_logger.info("Training with Accelerate")

Expand Down
3 changes: 3 additions & 0 deletions apps/accelerate/chatllama/chatllama/rlhf/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ class ConfigReward:
is_reward (bool): True if the model is a reward model. Default to True.
accelerate_enable (bool): Enable accelerate for the reward model
debug (bool): enable prints for Debugging
device_type (str): Device type to be used for the reward model
"""

device: torch.device
Expand Down Expand Up @@ -136,6 +137,7 @@ class ConfigActor:
peft_enable (bool): Enable peft for the actor
peft_config_path (str): Path to the peft config file.
debug (bool): Enable prints for debugging
device_type (str): Device type to be used for the actor

"""

Expand Down Expand Up @@ -211,6 +213,7 @@ class ConfigTrainer:
accelerate_enable (bool): Enable accelerate for rl training
checkpoint_name (Optional[str]): Name of the checkpoint. Default to
None.
device_type (str): Device type to be used for the rl training
"""

actor_lr: int
Expand Down
6 changes: 3 additions & 3 deletions apps/accelerate/chatllama/chatllama/rlhf/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ def clean_dataset(config: ConfigType):
)


class StanfordNLPSHP(BaseDataset):
class StanfordNLPSHPDataset(BaseDataset):
"""Class for Stanford NLP SHP dataset from HuggingFace"""

def __init__(
Expand Down Expand Up @@ -344,7 +344,7 @@ def save_dataset(
my_logger.success("Generation Completed")


class AnthropicRLHF(BaseDataset):
class AnthropicRLHFDataset(BaseDataset):
"""Class for Anthropic RLHF dataset from HuggingFace"""

def __init__(
Expand Down Expand Up @@ -438,7 +438,7 @@ def save_dataset(
my_logger.success("Generation Completed")


class SelfInstruct(BaseDataset):
class SelfInstructDataset(BaseDataset):
"""Class for SelfInstruct dataset from HuggingFace"""

def __init__(
Expand Down
Loading