Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor of models and trainers with base class for common methods #306

Open
wants to merge 42 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
2b8e301
Refactor models and trainers with base_class for common methods
PierpaoloSorbellini Mar 27, 2023
5e0ded8
Revert "Release ChatLLaMA 0.0.4"
PierpaoloSorbellini Mar 27, 2023
3fa5c53
Merge branch 'main' of https://github.com/nebuly-ai/nebullvm into main
PierpaoloSorbellini Mar 27, 2023
ab1f09e
Refactor of models and trainers with base class for common methods
PierpaoloSorbellini Mar 27, 2023
3d54d50
Fix comments and values in the config.yaml
PierpaoloSorbellini Mar 27, 2023
9f5eab4
Add load 8 bit from HF
PierpaoloSorbellini Mar 27, 2023
dc46ee4
Add check on load int 8
PierpaoloSorbellini Mar 27, 2023
c1d03d3
Add Reward and Critic support for LoRA PEFT
PierpaoloSorbellini Mar 28, 2023
36c350d
Add SelfInstruct Dataset from HF
PierpaoloSorbellini Mar 28, 2023
bb92ee7
Fix imports
Mar 28, 2023
6fc94d3
Add logging with proper class
Mar 29, 2023
dc2489f
Fix logs for deepspeed
Mar 30, 2023
0b0795d
Fix early logs with multi-GPUs
Mar 30, 2023
01be6dc
Fix MultiGPU for accelerate
Mar 30, 2023
13b1abd
Fix batch-size for accelerate
Mar 30, 2023
db8b3c2
Add multi gpu training to readme.md
Mar 30, 2023
d771fb2
Fix fp16 training
Mar 31, 2023
e5f959c
Merge branch 'main' into refactor
PierpaoloSorbellini Mar 31, 2023
d5084e5
Fix Distributed training for RLHF
PierpaoloSorbellini Apr 3, 2023
2ec5eaa
Add new models
PierpaoloSorbellini Apr 3, 2023
33e97e2
Add decapoda models
PierpaoloSorbellini Apr 3, 2023
8332a26
Add unsupported model message
PierpaoloSorbellini Apr 3, 2023
32ddfa2
Change sing to KL div accordingly to issue #298
PierpaoloSorbellini Apr 3, 2023
aa9881c
Fix imports order
PierpaoloSorbellini Apr 3, 2023
b10f1dc
Add cases for lora-peft model loading
PierpaoloSorbellini Apr 4, 2023
86a699b
Merge branch 'refactor' of https://github.com/nebuly-ai/nebullvm into…
PierpaoloSorbellini Apr 4, 2023
1f29ba4
Fix Actor 8bit training
PierpaoloSorbellini Apr 4, 2023
1836788
Adjust code comments to match new adjustments
PierpaoloSorbellini Apr 4, 2023
966a19d
Fix device error when using vanilla pytorch trainig
PierpaoloSorbellini Apr 4, 2023
feacb88
Fix RLHF with fp16
PierpaoloSorbellini Apr 5, 2023
f894494
Move grad scaler into base class
PierpaoloSorbellini Apr 5, 2023
b56185f
Add check on 8bit load and distributed training
PierpaoloSorbellini Apr 5, 2023
5699aaa
Add template to self-instruct dataset
PierpaoloSorbellini Apr 12, 2023
5c83927
Fix checkpoints name in actor training
PierpaoloSorbellini Apr 12, 2023
a205ee6
Fix slow loss computation
PierpaoloSorbellini Apr 12, 2023
bb386c4
Fix checkpoints also in reward models
PierpaoloSorbellini Apr 12, 2023
22a64af
Fix checkpoint for rl
PierpaoloSorbellini Apr 12, 2023
10211c6
Add n_checkpoints for all the training with old checkpoints removal
PierpaoloSorbellini Apr 12, 2023
442b396
Improve datasets quality with reward model negative examples
PierpaoloSorbellini Apr 13, 2023
71a6c02
Merge branch 'main' of https://github.com/nebuly-ai/nebullvm into main
PierpaoloSorbellini Apr 14, 2023
1189787
Merge branch 'main' into refactor
PierpaoloSorbellini Apr 14, 2023
98b96c2
Fix merge issues
PierpaoloSorbellini Apr 14, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fix batch-size for accelerate
  • Loading branch information
Ubuntu committed Mar 30, 2023
commit 13b1abd5d05286b5ec7d14786d386b3f56b430ce
6 changes: 5 additions & 1 deletion apps/accelerate/chatllama/chatllama/rlhf/actor.py
Original file line number Diff line number Diff line change
Expand Up @@ -259,8 +259,12 @@ def train(
my_logger.success("Start Actor Model Pretraining")

# get config parameters
if self.config.deepspeed_enable:
if self.deepspeed_enable:
batch_size = self.train_dataloader.batch_size
elif self.accelerate_enable:
batch_size = (
self.config.batch_size * self.accelerator.num_processes
)
else:
batch_size = self.config.batch_size
epochs = self.config.epochs
Expand Down
6 changes: 5 additions & 1 deletion apps/accelerate/chatllama/chatllama/rlhf/reward.py
Original file line number Diff line number Diff line change
Expand Up @@ -211,8 +211,12 @@ def train(
my_logger.success("Start Training the Reward Model")

# get config parameters
if self.config.deepspeed_enable:
if self.deepspeed_enable:
batch_size = self.train_dataloader.batch_size
elif self.accelerate_enable:
batch_size = (
self.config.batch_size * self.accelerator.num_processes
)
else:
batch_size = self.config.batch_size

Expand Down
1 change: 1 addition & 0 deletions apps/accelerate/chatllama/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@


REQUIREMENTS = [
"loguru",
"peft",
"accelerate",
"beartype",
Expand Down