Skip to content

Commit

Permalink
deprecate RLHF (h2oai#592)
Browse files Browse the repository at this point in the history
* deprecate RLHF
removal on train pipeline and UI components
Old experiments are still viewable

* add warning to logs

* rm RLHF trainer

* typo
  • Loading branch information
pascal-pfeiffer authored Feb 1, 2024
1 parent 0205367 commit 340df38
Show file tree
Hide file tree
Showing 9 changed files with 9 additions and 1,472 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ Using CLI for fine-tuning LLMs:

## What's New

- [PR 592](https://github.com/h2oai/h2o-llmstudio/pull/592) Starting to deprecate RLHF in favor of DPO/IPO optimization. Training is disabled, but old experiments are still viewable. RLHF will be fully removed in a future release.
- [PR 530](https://github.com/h2oai/h2o-llmstudio/pull/530) Introduced a new problem type for DPO/IPO optimization. This optimization technique can be used as an alternative to RLHF.
- [PR 288](https://github.com/h2oai/h2o-llmstudio/pull/288) Introduced Deepspeed for sharded training allowing to train larger models on machines with multiple GPUs. Requires NVLink. This feature replaces FSDP and offers more flexibility. Deepspeed requires a system installation of cudatoolkit and we recommend using version 11.8. See [Recommended Install](#recommended-install).
- [PR 449](https://github.com/h2oai/h2o-llmstudio/pull/449) New problem type for Causal Classification Modeling allows to train binary and multiclass models using LLMs.
Expand Down
2 changes: 0 additions & 2 deletions documentation/docs/tooltips/experiments/_problem-type.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ Defines the problem type of the experiment, which also defines the settings H2O

- DPO Modeling: Used to fine-tune large language models using Direct Preference Optimization

- Rlhf Language Modeling: Used to fine-tune RLHF language models

- Sequence To Sequence Modeling: Used to fine-tune large sequence to sequence models

- Causal Classification Modeling: Used to fine-tune causal classification models
1 change: 0 additions & 1 deletion llm_studio/app_utils/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,6 @@ def get_size(x):
"problem_types": [
"text_causal_language_modeling_config",
"text_dpo_modeling_config",
"text_rlhf_language_modeling_config",
"text_sequence_to_sequence_modeling_config",
"text_causal_classification_modeling_config",
],
Expand Down
Loading

0 comments on commit 340df38

Please sign in to comment.