From a256753221ad2a33ec9750b31f6284b581c1e1fd Mon Sep 17 00:00:00 2001
From: Fengzhe Zhou <zfz-960727@163.com>
Date: Mon, 22 Apr 2024 14:39:31 +0800
Subject: [PATCH] [Feature] Add LLaMA-3 Series Configs (#1065)

* add LLaMA-3 Series configs

* update readme
---
 README.md                                     | 10 +++----
 README_zh-CN.md                               | 10 +++----
 configs/models/hf_llama/hf_llama3_70b.py      | 21 ++++++++++++++
 .../models/hf_llama/hf_llama3_70b_instruct.py | 29 +++++++++++++++++++
 configs/models/hf_llama/hf_llama3_8b.py       | 21 ++++++++++++++
 .../models/hf_llama/hf_llama3_8b_instruct.py  | 29 +++++++++++++++++++
 docs/en/notes/news.md                         |  4 +++
 docs/zh_cn/notes/news.md                      |  4 +++
 8 files changed, 116 insertions(+), 12 deletions(-)
 create mode 100644 configs/models/hf_llama/hf_llama3_70b.py
 create mode 100644 configs/models/hf_llama/hf_llama3_70b_instruct.py
 create mode 100644 configs/models/hf_llama/hf_llama3_8b.py
 create mode 100644 configs/models/hf_llama/hf_llama3_8b_instruct.py
diff --git a/README.md b/README.md
index e84fd8e0..1bdd63a8 100644
--- a/README.md
+++ b/README.md
@@ -70,12 +70,9 @@ Just like a compass guides us on our journey, OpenCompass will guide you through
 
 ## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
 
-- **\[2024.02.29\]** We supported the MT-Bench, AlpacalEval and AlignBench, more information can be found [here](https://opencompass.readthedocs.io/en/latest/advanced_guides/subjective_evaluation.html) 🔥🔥🔥.
-- **\[2024.01.30\]** We release OpenCompass 2.0. Click  [CompassKit](https://github.com/open-compass), [CompassHub](https://hub.opencompass.org.cn/home), and [CompassRank](https://rank.opencompass.org.cn/home) for more information ! 🔥🔥🔥.
-- **\[2024.01.17\]** We supported the evaluation of [InternLM2](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_keyset.py) and [InternLM2-Chat](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_chat_keyset.py), InternLM2 showed extremely strong performance in these tests, welcome to try! 🔥🔥🔥.
-- **\[2024.01.17\]** We supported the needle in a haystack test with multiple needles, more information can be found [here](https://opencompass.readthedocs.io/en/latest/advanced_guides/needleinahaystack_eval.html#id8) 🔥🔥🔥.
-- **\[2023.12.28\]** We have enabled seamless evaluation of all models developed using [LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory), a powerful toolkit for comprehensive LLM development.
-- **\[2023.12.22\]** We have released [T-Eval](https://github.com/open-compass/T-Eval), a step-by-step evaluation benchmark to gauge your LLMs on tool utilization. Welcome to our [Leaderboard](https://open-compass.github.io/T-Eval/leaderboard.html) for more details!
+- **\[2024.04.22\]** We supported the evaluation of [LLaMA3](configs/models/hf_llama/hf_llama3_8b.py) 和 [LLaMA3-Instruct](configs/models/hf_llama/hf_llama3_8b_instruct.py), welcome to try! 🔥🔥🔥
+- **\[2024.02.29\]** We supported the MT-Bench, AlpacalEval and AlignBench, more information can be found [here](https://opencompass.readthedocs.io/en/latest/advanced_guides/subjective_evaluation.html)
+- **\[2024.01.30\]** We release OpenCompass 2.0. Click  [CompassKit](https://github.com/open-compass), [CompassHub](https://hub.opencompass.org.cn/home), and [CompassRank](https://rank.opencompass.org.cn/home) for more information !
 
 > [More](docs/en/notes/news.md)
 
@@ -458,6 +455,7 @@ Through the command line or configuration files, OpenCompass also supports evalu
 
 - [InternLM](https://github.com/InternLM/InternLM)
 - [LLaMA](https://github.com/facebookresearch/llama)
+- [LLaMA3](https://github.com/meta-llama/llama3)
 - [Vicuna](https://github.com/lm-sys/FastChat)
 - [Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
 - [Baichuan](https://github.com/baichuan-inc)
diff --git a/README_zh-CN.md b/README_zh-CN.md
index 6d115243..f399ed65 100644
--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -69,12 +69,9 @@
 
 ## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
 
-- **\[2024.02.29\]** 我们支持了MT-Bench、AlpacalEval和AlignBench，更多信息可以在[这里](https://opencompass.readthedocs.io/en/latest/advanced_guides/subjective_evaluation.html)找到 🔥🔥🔥。
-- **\[2024.01.30\]** 我们发布了OpenCompass 2.0。更多信息，请访问[CompassKit](https://github.com/open-compass)、[CompassHub](https://hub.opencompass.org.cn/home)和[CompassRank](https://rank.opencompass.org.cn/home) 🔥🔥🔥。
-- **\[2024.01.17\]** 我们支持了 [InternLM2](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_chat_keyset.py) 和 [InternLM2-Chat](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_chat_keyset.py) 的相关评测，InternLM2 在这些测试中表现出非常强劲的性能，欢迎试用！🔥🔥🔥.
-- **\[2024.01.17\]** 我们支持了多根针版本的大海捞针测试，更多信息见[这里](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/needleinahaystack_eval.html#id8)🔥🔥🔥.
-- **\[2023.12.28\]** 我们支持了对使用[LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory)（一款强大的LLM开发工具箱）开发的所有模型的无缝评估!
-- **\[2023.12.22\]** 我们开源了[T-Eval](https://github.com/open-compass/T-Eval)用于评测大语言模型工具调用能力。欢迎访问T-Eval的官方[Leaderboard](https://open-compass.github.io/T-Eval/leaderboard.html)获取更多信息!
+- **\[2024.04.22\]** 我们支持了 [LLaMA3](configs/models/hf_llama/hf_llama3_8b.py) 和 [LLaMA3-Instruct](configs/models/hf_llama/hf_llama3_8b_instruct.py) 的评测，欢迎试用！🔥🔥🔥.
+- **\[2024.02.29\]** 我们支持了MT-Bench、AlpacalEval和AlignBench，更多信息可以在[这里](https://opencompass.readthedocs.io/en/latest/advanced_guides/subjective_evaluation.html)找到。
+- **\[2024.01.30\]** 我们发布了OpenCompass 2.0。更多信息，请访问[CompassKit](https://github.com/open-compass)、[CompassHub](https://hub.opencompass.org.cn/home)和[CompassRank](https://rank.opencompass.org.cn/home)。
 
 > [更多](docs/zh_cn/notes/news.md)
 
@@ -463,6 +460,7 @@ python run.py --datasets ceval_ppl mmlu_ppl \
 
 - [InternLM](https://github.com/InternLM/InternLM)
 - [LLaMA](https://github.com/facebookresearch/llama)
+- [LLaMA3](https://github.com/meta-llama/llama3)
 - [Vicuna](https://github.com/lm-sys/FastChat)
 - [Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
 - [Baichuan](https://github.com/baichuan-inc)
diff --git a/configs/models/hf_llama/hf_llama3_70b.py b/configs/models/hf_llama/hf_llama3_70b.py
new file mode 100644
index 00000000..f35c18ad
--- /dev/null
+++ b/configs/models/hf_llama/hf_llama3_70b.py
@@ -0,0 +1,21 @@
+from opencompass.models import HuggingFaceCausalLM
+
+
+models = [
+    dict(
+        type=HuggingFaceCausalLM,
+        abbr="llama-3-70b-hf",
+        path="meta-llama/Meta-Llama-3-70B",
+        model_kwargs=dict(device_map="auto"),
+        tokenizer_kwargs=dict(
+            padding_side="left",
+            truncation_side="left",
+            use_fast=False,
+        ),
+        max_out_len=100,
+        max_seq_len=2048,
+        batch_size=8,
+        batch_padding=True,
+        run_cfg=dict(num_gpus=4, num_procs=1),
+    )
+]
diff --git a/configs/models/hf_llama/hf_llama3_70b_instruct.py b/configs/models/hf_llama/hf_llama3_70b_instruct.py
new file mode 100644
index 00000000..64883815
--- /dev/null
+++ b/configs/models/hf_llama/hf_llama3_70b_instruct.py
@@ -0,0 +1,29 @@
+from opencompass.models import HuggingFaceCausalLM
+
+_meta_template = dict(
+    round=[
+        dict(role="HUMAN", begin="<|begin_of_text|>user<|end_header_id|>\n\n", end="<|eot_id|>"),
+        dict(role="BOT", begin="<|begin_of_text|>assistant<|end_header_id|>\n\n", end="<|eot_id|>", generate=True),
+    ],
+)
+
+models = [
+    dict(
+        type=HuggingFaceCausalLM,
+        abbr="llama-3-70b-instruct-hf",
+        path="meta-llama/Meta-Llama-3-70B-Instruct",
+        model_kwargs=dict(device_map="auto"),
+        tokenizer_kwargs=dict(
+            padding_side="left",
+            truncation_side="left",
+            use_fast=False,
+        ),
+        meta_template=_meta_template,
+        max_out_len=100,
+        max_seq_len=2048,
+        batch_size=8,
+        run_cfg=dict(num_gpus=4, num_procs=1),
+        generation_kwargs={"eos_token_id": [128001, 128009]},
+        batch_padding=True,
+    )
+]
diff --git a/configs/models/hf_llama/hf_llama3_8b.py b/configs/models/hf_llama/hf_llama3_8b.py
new file mode 100644
index 00000000..cbf2a9da
--- /dev/null
+++ b/configs/models/hf_llama/hf_llama3_8b.py
@@ -0,0 +1,21 @@
+from opencompass.models import HuggingFaceCausalLM
+
+
+models = [
+    dict(
+        type=HuggingFaceCausalLM,
+        abbr="llama-3-8b-hf",
+        path="meta-llama/Meta-Llama-3-8B",
+        model_kwargs=dict(device_map="auto"),
+        tokenizer_kwargs=dict(
+            padding_side="left",
+            truncation_side="left",
+            use_fast=False,
+        ),
+        max_out_len=100,
+        max_seq_len=2048,
+        batch_size=8,
+        batch_padding=True,
+        run_cfg=dict(num_gpus=1, num_procs=1),
+    )
+]
diff --git a/configs/models/hf_llama/hf_llama3_8b_instruct.py b/configs/models/hf_llama/hf_llama3_8b_instruct.py
new file mode 100644
index 00000000..e3eb4812
--- /dev/null
+++ b/configs/models/hf_llama/hf_llama3_8b_instruct.py
@@ -0,0 +1,29 @@
+from opencompass.models import HuggingFaceCausalLM
+
+_meta_template = dict(
+    round=[
+        dict(role="HUMAN", begin="<|begin_of_text|>user<|end_header_id|>\n\n", end="<|eot_id|>"),
+        dict(role="BOT", begin="<|begin_of_text|>assistant<|end_header_id|>\n\n", end="<|eot_id|>", generate=True),
+    ],
+)
+
+models = [
+    dict(
+        type=HuggingFaceCausalLM,
+        abbr="llama-3-8b-instruct-hf",
+        path="meta-llama/Meta-Llama-3-8B-Instruct",
+        model_kwargs=dict(device_map="auto"),
+        tokenizer_kwargs=dict(
+            padding_side="left",
+            truncation_side="left",
+            use_fast=False,
+        ),
+        meta_template=_meta_template,
+        max_out_len=100,
+        max_seq_len=2048,
+        batch_size=8,
+        run_cfg=dict(num_gpus=1, num_procs=1),
+        generation_kwargs={"eos_token_id": [128001, 128009]},
+        batch_padding=True,
+    )
+]
diff --git a/docs/en/notes/news.md b/docs/en/notes/news.md
index 782106e0..b848f6bf 100644
--- a/docs/en/notes/news.md
+++ b/docs/en/notes/news.md
@@ -1,5 +1,9 @@
 # News
 
+- **\[2024.01.17\]** We supported the evaluation of [InternLM2](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_keyset.py) and [InternLM2-Chat](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_chat_keyset.py), InternLM2 showed extremely strong performance in these tests, welcome to try!
+- **\[2024.01.17\]** We supported the needle in a haystack test with multiple needles, more information can be found [here](https://opencompass.readthedocs.io/en/latest/advanced_guides/needleinahaystack_eval.html#id8).
+- **\[2023.12.28\]** We have enabled seamless evaluation of all models developed using [LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory), a powerful toolkit for comprehensive LLM development.
+- **\[2023.12.22\]** We have released [T-Eval](https://github.com/open-compass/T-Eval), a step-by-step evaluation benchmark to gauge your LLMs on tool utilization. Welcome to our [Leaderboard](https://open-compass.github.io/T-Eval/leaderboard.html) for more details!
 - **\[2023.12.10\]** We have released [VLMEvalKit](https://github.com/open-compass/VLMEvalKit), a toolkit for evaluating vision-language models (VLMs), currently support 20+ VLMs and 7 multi-modal benchmarks (including MMBench series).
 - **\[2023.12.10\]** We have supported Mistral AI's MoE LLM: **Mixtral-8x7B-32K**. Welcome to [MixtralKit](https://github.com/open-compass/MixtralKit) for more details about inference and evaluation.
 - **\[2023.11.22\]** We have supported many API-based models, include **Baidu, ByteDance, Huawei, 360**. Welcome to [Models](https://opencompass.readthedocs.io/en/latest/user_guides/models.html) section for more details.
diff --git a/docs/zh_cn/notes/news.md b/docs/zh_cn/notes/news.md
index e0def1cb..776f2f50 100644
--- a/docs/zh_cn/notes/news.md
+++ b/docs/zh_cn/notes/news.md
@@ -1,5 +1,9 @@
 # 新闻
 
+- **\[2024.01.17\]** 我们支持了 [InternLM2](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_chat_keyset.py) 和 [InternLM2-Chat](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_chat_keyset.py) 的相关评测，InternLM2 在这些测试中表现出非常强劲的性能，欢迎试用！.
+- **\[2024.01.17\]** 我们支持了多根针版本的大海捞针测试，更多信息见[这里](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/needleinahaystack_eval.html#id8).
+- **\[2023.12.28\]** 我们支持了对使用[LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory)（一款强大的LLM开发工具箱）开发的所有模型的无缝评估!
+- **\[2023.12.22\]** 我们开源了[T-Eval](https://github.com/open-compass/T-Eval)用于评测大语言模型工具调用能力。欢迎访问T-Eval的官方[Leaderboard](https://open-compass.github.io/T-Eval/leaderboard.html)获取更多信息!
 - **\[2023.12.10\]** 我们开源了多模评测框架 [VLMEvalKit](https://github.com/open-compass/VLMEvalKit)，目前已支持 20+ 个多模态大模型与包括 MMBench 系列在内的 7 个多模态评测集.
 - **\[2023.12.10\]** 我们已经支持了Mistral AI的MoE模型 **Mixtral-8x7B-32K**。欢迎查阅[MixtralKit](https://github.com/open-compass/MixtralKit)以获取更多关于推理和评测的详细信息.
 - **\[2023.11.22\]** 我们已经支持了多个于API的模型，包括**百度、字节跳动、华为、360**。欢迎查阅[模型](https://opencompass.readthedocs.io/en/latest/user_guides/models.html)部分以获取更多详细信息。