Open-Sourcing as promised.

huawei-noah · Jul 23, 2024 · 12a4da6 · 12a4da6
1 parent 0cc7512
commit 12a4da6
Show file tree

Hide file tree

Showing 16 changed files with 3,367 additions and 0 deletions.
diff --git a/NLP/HumanRankEval/LICENSE.md b/NLP/HumanRankEval/LICENSE.md
@@ -0,0 +1,9 @@
+MIT License
+
+Copyright (c) 2023, Huawei Technologies Co., Ltd
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
diff --git a/NLP/HumanRankEval/README.md b/NLP/HumanRankEval/README.md
@@ -0,0 +1,119 @@
+## HumanRankEval: Automatic Evaluation of Alignment with Human Preferences
+
+#### The repository is based on [EleutherAI LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness), big thanks!
+
+This project provides a framework to evaluate generative language models (Seq2seq also supported by AutoHF) on HumanRankEval (HRE).
+If you find it helpful, please cite the **HumanRankEval** [paper](LINK_TO_BE_ADDED).
+
+- Supported Topics: **python, java, unix, cpp, html, english, physics, latex, soft_eng, stats, cs_db, languages_sciences, apple_android, math**
+- Supported Models: **AutoHF (single and multi-gpu runs implemented, see below)**
+- Supported Deepspeed Inference: **Tensor Parallel, Kernel Injection and/or DS ZeRO3**
+
+### Installation (PyTorch)
+
+Create an environment with conda or virtualenv and then run the following command:
+
+```bash
+pip install -r requirements.txt
+```
+
+### Installation (MindSpore)
+
+You **additionally** need to install [MindSpore](https://www.mindspore.cn/install/en) and [MindNLP](https://github.com/mindspore-lab/mindnlp). 
+We provide an example in ```lm_eval.models.mindspore``` for OPT (facebook) models that can be extended to additional LLMs.
+
+### Dataset
+
+The HRE dataset is hosted on [HuggingFace Datasets](https://huggingface.co/datasets/huawei-noah/human_rank_eval). 
+It will be automatically (down)loaded with: ```load_dataset("huawei-noah/human_rank_eval")```
+
+### Running HumanRankEval
+
+Set the **MODEL_DIR=/your/path/to/models/**
+
+Set the **DATA_PATH=/your/path/to/HumanRankEvalData/**
+
+> 💡 Check out ```evaluate.sh``` for full details 💡
+> 
+The following command runs Pythia-410M on HRE on gpu:2 (see **evaluate.sh**):
+```
+deepspeed --include localhost:2 main.py \
+          --model auto_hf \
+          --tasks human_rank_eval_* \
+          --model_args pretrained=${MODEL_DIR}Pythia-410M \
+          --batch_size 8 \
+          --data_path ${DATA_PATH}
+```
+
+The output should look like this:
+
+|               Task               |   Metric    |Value |
+|----------------------------------|-------------|-----:|
+|human_rank_eval_apple_android     |pearson_corr |0.0860|
+|human_rank_eval_cpp               |pearson_corr |0.1351|
+|human_rank_eval_cs_db             |pearson_corr |0.0646|
+|human_rank_eval_english           |pearson_corr |0.1193|
+|human_rank_eval_html              |pearson_corr |0.1055|
+|human_rank_eval_java              |pearson_corr |0.1044|
+|human_rank_eval_languages_sciences|pearson_corr |0.1201|
+|human_rank_eval_latex             |pearson_corr |0.1648|
+|human_rank_eval_math              |pearson_corr |0.1405|
+|human_rank_eval_physics           |pearson_corr |0.1118|
+|human_rank_eval_python            |pearson_corr |0.0778|
+|human_rank_eval_soft_eng          |pearson_corr |0.0769|
+|human_rank_eval_stats             |pearson_corr |0.1100|
+|human_rank_eval_unix              |pearson_corr |0.0967|
+|=== HumanRankEval Score ===       |Micro Average|0.1081|
+
+The following command runs Vicuna-7B on HRE on all gpus with tensor parallel (default).
+```bash
+deepspeed --num_gpus ${NUM_GPUs} main.py \
+          --model auto_hf \
+          --tasks human_rank_eval_* \
+          --model_args pretrained=${MODEL_DIR}Vicuna-7B \
+          --data_path ${DATA_PATH} \
+          --batch_size 4 \
+          --world_size ${NUM_GPUs}
+```
+The output should look like this:
+
+|               Task               |   Metric    |Value |
+|----------------------------------|-------------|-----:|
+|human_rank_eval_apple_android     |pearson_corr |0.1310|
+|human_rank_eval_cpp               |pearson_corr |0.1657|
+|human_rank_eval_cs_db             |pearson_corr |0.1043|
+|human_rank_eval_english           |pearson_corr |0.1468|
+|human_rank_eval_html              |pearson_corr |0.1430|
+|human_rank_eval_java              |pearson_corr |0.1670|
+|human_rank_eval_languages_sciences|pearson_corr |0.1571|
+|human_rank_eval_latex             |pearson_corr |0.1743|
+|human_rank_eval_math              |pearson_corr |0.1257|
+|human_rank_eval_physics           |pearson_corr |0.1114|
+|human_rank_eval_python            |pearson_corr |0.1402|
+|human_rank_eval_soft_eng          |pearson_corr |0.0962|
+|human_rank_eval_stats             |pearson_corr |0.1629|
+|human_rank_eval_unix              |pearson_corr |0.1289|
+|=== HumanRankEval Score ===       |Micro Average|0.1396|
+
+Evaluating a MindSpore model on a single topic can be done as follows:
+
+```bash
+python main.py --model mindspore \
+               --tasks human_rank_eval_math \
+               --data_path ${DATA_PATH} \
+               --model_args pretrained=opt-350m \
+               --batch_size 4
+```
+
+You should see the following output:
+
+|           Task            |   Metric    |Value|
+|---------------------------|-------------|----:|
+|human_rank_eval_math       |pearson_corr |0.078|
+|=== HumanRankEval Score ===|Micro Average|0.078|
+
+## License
+
+We follow MIT license. Please see the [License](./LICENSE) file for more information.
+
+Disclaimer: This open source project is not an official Huawei product, Huawei is not expected to provide support for this project.
diff --git a/NLP/HumanRankEval/THIRD_PARTY_OPEN_SOURCE_SOFTWARE_NOTICE.md b/NLP/HumanRankEval/THIRD_PARTY_OPEN_SOURCE_SOFTWARE_NOTICE.md
@@ -0,0 +1,32 @@
+Please note we provide an open source software notice for the third party open source software 
+along with this software and/or this software component contributed by Huawei (in the following just “this SOFTWARE”).
+The open source software licenses are granted by the respective right holders. 
+
+Warranty Disclaimer
+THE OPEN SOURCE SOFTWARE IN THIS SOFTWARE IS DISTRIBUTED IN THE HOPE THAT IT WILL BE USEFUL, 
+BUT WITHOUT ANY WARRANTY, WITHOUT EVEN THE IMPLIED WARRANTY OF MERCHANTABILITY OR FITNESS 
+FOR A PARTICULAR PURPOSE. SEE THE APPLICABLE LICENSES FOR MORE DETAILS.
+
+Copyright Notice and License Texts
+Software: Language Model Evaluation Harness (https://github.com/EleutherAI/lm-evaluation-harness)
+Copyright (c) 2020 EleutherAI
+
+MIT License
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/NLP/HumanRankEval/evaluate.sh b/NLP/HumanRankEval/evaluate.sh
@@ -0,0 +1,43 @@
+#!/usr/bin/env bash
+
+# Copyright (C) 2023. Huawei Technologies Co., Ltd. All rights reserved.
+#
+# Licensed under MIT License (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# https://opensource.org/license/mit
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+NUM_GPUs=4
+DATA_PATH="/path/to/HumanRankEvalData/"
+MODEL_DIR="/path/to/models/"
+
+#---------------------------------------------------------------
+
+deepspeed --num_gpus ${NUM_GPUs} main.py \
+          --model auto_hf \
+          --tasks human_rank_eval_* \
+          --model_args pretrained=${MODEL_DIR}Vicuna-7B \
+          --data_path ${DATA_PATH} \
+          --batch_size 4 \
+          --world_size ${NUM_GPUs}
+
+#deepspeed --include localhost:2 main.py \
+#          --model auto_hf \
+#          --tasks human_rank_eval_* \
+#          --model_args pretrained=${MODEL_DIR}Pythia-410M \
+#          --batch_size 8 \
+#          --data_path ${DATA_PATH}
+
+#python main.py --model mindspore \
+#               --tasks human_rank_eval_math \
+#               --data_path ${DATA_PATH} \
+#               --model_args pretrained=opt-350m \
+#               --batch_size 4
diff --git a/NLP/HumanRankEval/lm_eval/__init__.py b/NLP/HumanRankEval/lm_eval/__init__.py