deprecate RLHF (h2oai#592)

* deprecate RLHF removal on train pipeline and UI components Old experiments are still viewable * add warning to logs * rm RLHF trainer * typo
rochemedia · Feb 1, 2024 · 340df38 · 340df38
1 parent 0205367
commit 340df38
Show file tree

Hide file tree

Showing 9 changed files with 9 additions and 1,472 deletions.
diff --git a/README.md b/README.md
@@ -53,6 +53,7 @@ Using CLI for fine-tuning LLMs:
 
 ## What's New
 
+- [PR 592](https://github.com/h2oai/h2o-llmstudio/pull/592) Starting to deprecate RLHF in favor of DPO/IPO optimization. Training is disabled, but old experiments are still viewable. RLHF will be fully removed in a future release.
 - [PR 530](https://github.com/h2oai/h2o-llmstudio/pull/530) Introduced a new problem type for DPO/IPO optimization. This optimization technique can be used as an alternative to RLHF.
 - [PR 288](https://github.com/h2oai/h2o-llmstudio/pull/288) Introduced Deepspeed for sharded training allowing to train larger models on machines with multiple GPUs. Requires NVLink. This feature replaces FSDP and offers more flexibility. Deepspeed requires a system installation of cudatoolkit and we recommend using version 11.8. See [Recommended Install](#recommended-install).
 - [PR 449](https://github.com/h2oai/h2o-llmstudio/pull/449) New problem type for Causal Classification Modeling allows to train binary and multiclass models using LLMs.

diff --git a/documentation/docs/tooltips/experiments/_problem-type.mdx b/documentation/docs/tooltips/experiments/_problem-type.mdx
@@ -4,8 +4,6 @@ Defines the problem type of the experiment, which also defines the settings H2O
 
 - DPO Modeling: Used to fine-tune large language models using Direct Preference Optimization
 
-- Rlhf Language Modeling: Used to fine-tune RLHF language models
-
 - Sequence To Sequence Modeling: Used to fine-tune large sequence to sequence models
 
 - Causal Classification Modeling: Used to fine-tune causal classification models
diff --git a/llm_studio/app_utils/config.py b/llm_studio/app_utils/config.py
@@ -60,7 +60,6 @@ def get_size(x):
     "problem_types": [
         "text_causal_language_modeling_config",
         "text_dpo_modeling_config",
-        "text_rlhf_language_modeling_config",
         "text_sequence_to_sequence_modeling_config",
         "text_causal_classification_modeling_config",
     ],