From a256753221ad2a33ec9750b31f6284b581c1e1fd Mon Sep 17 00:00:00 2001 From: Fengzhe Zhou Date: Mon, 22 Apr 2024 14:39:31 +0800 Subject: [PATCH] [Feature] Add LLaMA-3 Series Configs (#1065) * add LLaMA-3 Series configs * update readme --- README.md | 10 +++---- README_zh-CN.md | 10 +++---- configs/models/hf_llama/hf_llama3_70b.py | 21 ++++++++++++++ .../models/hf_llama/hf_llama3_70b_instruct.py | 29 +++++++++++++++++++ configs/models/hf_llama/hf_llama3_8b.py | 21 ++++++++++++++ .../models/hf_llama/hf_llama3_8b_instruct.py | 29 +++++++++++++++++++ docs/en/notes/news.md | 4 +++ docs/zh_cn/notes/news.md | 4 +++ 8 files changed, 116 insertions(+), 12 deletions(-) create mode 100644 configs/models/hf_llama/hf_llama3_70b.py create mode 100644 configs/models/hf_llama/hf_llama3_70b_instruct.py create mode 100644 configs/models/hf_llama/hf_llama3_8b.py create mode 100644 configs/models/hf_llama/hf_llama3_8b_instruct.py diff --git a/README.md b/README.md index e84fd8e0..1bdd63a8 100644 --- a/README.md +++ b/README.md @@ -70,12 +70,9 @@ Just like a compass guides us on our journey, OpenCompass will guide you through ## 🚀 What's New -- **\[2024.02.29\]** We supported the MT-Bench, AlpacalEval and AlignBench, more information can be found [here](https://opencompass.readthedocs.io/en/latest/advanced_guides/subjective_evaluation.html) 🔥🔥🔥. -- **\[2024.01.30\]** We release OpenCompass 2.0. Click [CompassKit](https://github.com/open-compass), [CompassHub](https://hub.opencompass.org.cn/home), and [CompassRank](https://rank.opencompass.org.cn/home) for more information ! 🔥🔥🔥. -- **\[2024.01.17\]** We supported the evaluation of [InternLM2](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_keyset.py) and [InternLM2-Chat](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_chat_keyset.py), InternLM2 showed extremely strong performance in these tests, welcome to try! 🔥🔥🔥. -- **\[2024.01.17\]** We supported the needle in a haystack test with multiple needles, more information can be found [here](https://opencompass.readthedocs.io/en/latest/advanced_guides/needleinahaystack_eval.html#id8) 🔥🔥🔥. -- **\[2023.12.28\]** We have enabled seamless evaluation of all models developed using [LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory), a powerful toolkit for comprehensive LLM development. -- **\[2023.12.22\]** We have released [T-Eval](https://github.com/open-compass/T-Eval), a step-by-step evaluation benchmark to gauge your LLMs on tool utilization. Welcome to our [Leaderboard](https://open-compass.github.io/T-Eval/leaderboard.html) for more details! +- **\[2024.04.22\]** We supported the evaluation of [LLaMA3](configs/models/hf_llama/hf_llama3_8b.py) 和 [LLaMA3-Instruct](configs/models/hf_llama/hf_llama3_8b_instruct.py), welcome to try! 🔥🔥🔥 +- **\[2024.02.29\]** We supported the MT-Bench, AlpacalEval and AlignBench, more information can be found [here](https://opencompass.readthedocs.io/en/latest/advanced_guides/subjective_evaluation.html) +- **\[2024.01.30\]** We release OpenCompass 2.0. Click [CompassKit](https://github.com/open-compass), [CompassHub](https://hub.opencompass.org.cn/home), and [CompassRank](https://rank.opencompass.org.cn/home) for more information ! > [More](docs/en/notes/news.md) @@ -458,6 +455,7 @@ Through the command line or configuration files, OpenCompass also supports evalu - [InternLM](https://github.com/InternLM/InternLM) - [LLaMA](https://github.com/facebookresearch/llama) +- [LLaMA3](https://github.com/meta-llama/llama3) - [Vicuna](https://github.com/lm-sys/FastChat) - [Alpaca](https://github.com/tatsu-lab/stanford_alpaca) - [Baichuan](https://github.com/baichuan-inc) diff --git a/README_zh-CN.md b/README_zh-CN.md index 6d115243..f399ed65 100644 --- a/README_zh-CN.md +++ b/README_zh-CN.md @@ -69,12 +69,9 @@ ## 🚀 最新进展 -- **\[2024.02.29\]** 我们支持了MT-Bench、AlpacalEval和AlignBench,更多信息可以在[这里](https://opencompass.readthedocs.io/en/latest/advanced_guides/subjective_evaluation.html)找到 🔥🔥🔥。 -- **\[2024.01.30\]** 我们发布了OpenCompass 2.0。更多信息,请访问[CompassKit](https://github.com/open-compass)、[CompassHub](https://hub.opencompass.org.cn/home)和[CompassRank](https://rank.opencompass.org.cn/home) 🔥🔥🔥。 -- **\[2024.01.17\]** 我们支持了 [InternLM2](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_chat_keyset.py) 和 [InternLM2-Chat](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_chat_keyset.py) 的相关评测,InternLM2 在这些测试中表现出非常强劲的性能,欢迎试用!🔥🔥🔥. -- **\[2024.01.17\]** 我们支持了多根针版本的大海捞针测试,更多信息见[这里](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/needleinahaystack_eval.html#id8)🔥🔥🔥. -- **\[2023.12.28\]** 我们支持了对使用[LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory)(一款强大的LLM开发工具箱)开发的所有模型的无缝评估! -- **\[2023.12.22\]** 我们开源了[T-Eval](https://github.com/open-compass/T-Eval)用于评测大语言模型工具调用能力。欢迎访问T-Eval的官方[Leaderboard](https://open-compass.github.io/T-Eval/leaderboard.html)获取更多信息! +- **\[2024.04.22\]** 我们支持了 [LLaMA3](configs/models/hf_llama/hf_llama3_8b.py) 和 [LLaMA3-Instruct](configs/models/hf_llama/hf_llama3_8b_instruct.py) 的评测,欢迎试用!🔥🔥🔥. +- **\[2024.02.29\]** 我们支持了MT-Bench、AlpacalEval和AlignBench,更多信息可以在[这里](https://opencompass.readthedocs.io/en/latest/advanced_guides/subjective_evaluation.html)找到。 +- **\[2024.01.30\]** 我们发布了OpenCompass 2.0。更多信息,请访问[CompassKit](https://github.com/open-compass)、[CompassHub](https://hub.opencompass.org.cn/home)和[CompassRank](https://rank.opencompass.org.cn/home)。 > [更多](docs/zh_cn/notes/news.md) @@ -463,6 +460,7 @@ python run.py --datasets ceval_ppl mmlu_ppl \ - [InternLM](https://github.com/InternLM/InternLM) - [LLaMA](https://github.com/facebookresearch/llama) +- [LLaMA3](https://github.com/meta-llama/llama3) - [Vicuna](https://github.com/lm-sys/FastChat) - [Alpaca](https://github.com/tatsu-lab/stanford_alpaca) - [Baichuan](https://github.com/baichuan-inc) diff --git a/configs/models/hf_llama/hf_llama3_70b.py b/configs/models/hf_llama/hf_llama3_70b.py new file mode 100644 index 00000000..f35c18ad --- /dev/null +++ b/configs/models/hf_llama/hf_llama3_70b.py @@ -0,0 +1,21 @@ +from opencompass.models import HuggingFaceCausalLM + + +models = [ + dict( + type=HuggingFaceCausalLM, + abbr="llama-3-70b-hf", + path="meta-llama/Meta-Llama-3-70B", + model_kwargs=dict(device_map="auto"), + tokenizer_kwargs=dict( + padding_side="left", + truncation_side="left", + use_fast=False, + ), + max_out_len=100, + max_seq_len=2048, + batch_size=8, + batch_padding=True, + run_cfg=dict(num_gpus=4, num_procs=1), + ) +] diff --git a/configs/models/hf_llama/hf_llama3_70b_instruct.py b/configs/models/hf_llama/hf_llama3_70b_instruct.py new file mode 100644 index 00000000..64883815 --- /dev/null +++ b/configs/models/hf_llama/hf_llama3_70b_instruct.py @@ -0,0 +1,29 @@ +from opencompass.models import HuggingFaceCausalLM + +_meta_template = dict( + round=[ + dict(role="HUMAN", begin="<|begin_of_text|>user<|end_header_id|>\n\n", end="<|eot_id|>"), + dict(role="BOT", begin="<|begin_of_text|>assistant<|end_header_id|>\n\n", end="<|eot_id|>", generate=True), + ], +) + +models = [ + dict( + type=HuggingFaceCausalLM, + abbr="llama-3-70b-instruct-hf", + path="meta-llama/Meta-Llama-3-70B-Instruct", + model_kwargs=dict(device_map="auto"), + tokenizer_kwargs=dict( + padding_side="left", + truncation_side="left", + use_fast=False, + ), + meta_template=_meta_template, + max_out_len=100, + max_seq_len=2048, + batch_size=8, + run_cfg=dict(num_gpus=4, num_procs=1), + generation_kwargs={"eos_token_id": [128001, 128009]}, + batch_padding=True, + ) +] diff --git a/configs/models/hf_llama/hf_llama3_8b.py b/configs/models/hf_llama/hf_llama3_8b.py new file mode 100644 index 00000000..cbf2a9da --- /dev/null +++ b/configs/models/hf_llama/hf_llama3_8b.py @@ -0,0 +1,21 @@ +from opencompass.models import HuggingFaceCausalLM + + +models = [ + dict( + type=HuggingFaceCausalLM, + abbr="llama-3-8b-hf", + path="meta-llama/Meta-Llama-3-8B", + model_kwargs=dict(device_map="auto"), + tokenizer_kwargs=dict( + padding_side="left", + truncation_side="left", + use_fast=False, + ), + max_out_len=100, + max_seq_len=2048, + batch_size=8, + batch_padding=True, + run_cfg=dict(num_gpus=1, num_procs=1), + ) +] diff --git a/configs/models/hf_llama/hf_llama3_8b_instruct.py b/configs/models/hf_llama/hf_llama3_8b_instruct.py new file mode 100644 index 00000000..e3eb4812 --- /dev/null +++ b/configs/models/hf_llama/hf_llama3_8b_instruct.py @@ -0,0 +1,29 @@ +from opencompass.models import HuggingFaceCausalLM + +_meta_template = dict( + round=[ + dict(role="HUMAN", begin="<|begin_of_text|>user<|end_header_id|>\n\n", end="<|eot_id|>"), + dict(role="BOT", begin="<|begin_of_text|>assistant<|end_header_id|>\n\n", end="<|eot_id|>", generate=True), + ], +) + +models = [ + dict( + type=HuggingFaceCausalLM, + abbr="llama-3-8b-instruct-hf", + path="meta-llama/Meta-Llama-3-8B-Instruct", + model_kwargs=dict(device_map="auto"), + tokenizer_kwargs=dict( + padding_side="left", + truncation_side="left", + use_fast=False, + ), + meta_template=_meta_template, + max_out_len=100, + max_seq_len=2048, + batch_size=8, + run_cfg=dict(num_gpus=1, num_procs=1), + generation_kwargs={"eos_token_id": [128001, 128009]}, + batch_padding=True, + ) +] diff --git a/docs/en/notes/news.md b/docs/en/notes/news.md index 782106e0..b848f6bf 100644 --- a/docs/en/notes/news.md +++ b/docs/en/notes/news.md @@ -1,5 +1,9 @@ # News +- **\[2024.01.17\]** We supported the evaluation of [InternLM2](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_keyset.py) and [InternLM2-Chat](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_chat_keyset.py), InternLM2 showed extremely strong performance in these tests, welcome to try! +- **\[2024.01.17\]** We supported the needle in a haystack test with multiple needles, more information can be found [here](https://opencompass.readthedocs.io/en/latest/advanced_guides/needleinahaystack_eval.html#id8). +- **\[2023.12.28\]** We have enabled seamless evaluation of all models developed using [LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory), a powerful toolkit for comprehensive LLM development. +- **\[2023.12.22\]** We have released [T-Eval](https://github.com/open-compass/T-Eval), a step-by-step evaluation benchmark to gauge your LLMs on tool utilization. Welcome to our [Leaderboard](https://open-compass.github.io/T-Eval/leaderboard.html) for more details! - **\[2023.12.10\]** We have released [VLMEvalKit](https://github.com/open-compass/VLMEvalKit), a toolkit for evaluating vision-language models (VLMs), currently support 20+ VLMs and 7 multi-modal benchmarks (including MMBench series). - **\[2023.12.10\]** We have supported Mistral AI's MoE LLM: **Mixtral-8x7B-32K**. Welcome to [MixtralKit](https://github.com/open-compass/MixtralKit) for more details about inference and evaluation. - **\[2023.11.22\]** We have supported many API-based models, include **Baidu, ByteDance, Huawei, 360**. Welcome to [Models](https://opencompass.readthedocs.io/en/latest/user_guides/models.html) section for more details. diff --git a/docs/zh_cn/notes/news.md b/docs/zh_cn/notes/news.md index e0def1cb..776f2f50 100644 --- a/docs/zh_cn/notes/news.md +++ b/docs/zh_cn/notes/news.md @@ -1,5 +1,9 @@ # 新闻 +- **\[2024.01.17\]** 我们支持了 [InternLM2](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_chat_keyset.py) 和 [InternLM2-Chat](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_chat_keyset.py) 的相关评测,InternLM2 在这些测试中表现出非常强劲的性能,欢迎试用!. +- **\[2024.01.17\]** 我们支持了多根针版本的大海捞针测试,更多信息见[这里](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/needleinahaystack_eval.html#id8). +- **\[2023.12.28\]** 我们支持了对使用[LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory)(一款强大的LLM开发工具箱)开发的所有模型的无缝评估! +- **\[2023.12.22\]** 我们开源了[T-Eval](https://github.com/open-compass/T-Eval)用于评测大语言模型工具调用能力。欢迎访问T-Eval的官方[Leaderboard](https://open-compass.github.io/T-Eval/leaderboard.html)获取更多信息! - **\[2023.12.10\]** 我们开源了多模评测框架 [VLMEvalKit](https://github.com/open-compass/VLMEvalKit),目前已支持 20+ 个多模态大模型与包括 MMBench 系列在内的 7 个多模态评测集. - **\[2023.12.10\]** 我们已经支持了Mistral AI的MoE模型 **Mixtral-8x7B-32K**。欢迎查阅[MixtralKit](https://github.com/open-compass/MixtralKit)以获取更多关于推理和评测的详细信息. - **\[2023.11.22\]** 我们已经支持了多个于API的模型,包括**百度、字节跳动、华为、360**。欢迎查阅[模型](https://opencompass.readthedocs.io/en/latest/user_guides/models.html)部分以获取更多详细信息。