From 5329724b65ea1b33adb4ef31baa4c81dc6ac0b00 Mon Sep 17 00:00:00 2001 From: Songyang Zhang Date: Wed, 22 Nov 2023 19:16:54 +0800 Subject: [PATCH] [Doc] Update README and requirements. (#622) * update readme * update doc --- README.md | 71 ++++++++++++++++++-------- README_zh-CN.md | 69 +++++++++++++++++-------- docs/en/get_started/installation.md | 17 ++++++ docs/en/notes/news.md | 2 + docs/zh_cn/get_started/installation.md | 18 +++++++ docs/zh_cn/notes/news.md | 2 + requirements/runtime.txt | 2 +- 7 files changed, 139 insertions(+), 42 deletions(-) diff --git a/README.md b/README.md index 87ce948b..856671b2 100644 --- a/README.md +++ b/README.md @@ -38,15 +38,15 @@ Just like a compass guides us on our journey, OpenCompass will guide you through ## 🚀 What's New +- **\[2023.11.22\]** We have supported many API-based models, include **Baidu, ByteDance, Huawei, 360**. Welcome to [Models](https://opencompass.readthedocs.io/en/latest/user_guides/models.html) section for more details. 🔥🔥🔥. +- **\[2023.11.20\]** Thanks [helloyongyang](https://github.com/helloyongyang) for supporting the evaluation with [LightLLM](https://github.com/ModelTC/lightllm) as backent. Welcome to [Evaluation With LightLLM](https://opencompass.readthedocs.io/en/latest/advanced_guides/evaluation_lightllm.html) for more details. 🔥🔥🔥. - **\[2023.11.13\]** We are delighted to announce the release of OpenCompass v0.1.8. This version enables local loading of evaluation benchmarks, thereby eliminating the need for an internet connection. Please note that with this update, **you must re-download all evaluation datasets** to ensure accurate and up-to-date results.🔥🔥🔥. -- **\[2023.11.06\]** We have supported several API-based models, include ChatGLM Pro@Zhipu, ABAB-Chat@MiniMax and Xunfei. Welcome to [Models](https://opencompass.readthedocs.io/en/latest/user_guides/models.html) section for more details. 🔥🔥🔥. -- **\[2023.10.24\]** We release a new benchmark for evaluating LLMs’ capabilities of having multi-turn dialogues. Welcome to [BotChat](https://github.com/open-compass/BotChat) for more details. 🔥🔥🔥. -- **\[2023.09.26\]** We update the leaderboard with [Qwen](https://github.com/QwenLM/Qwen), one of the best-performing open-source models currently available, welcome to our [homepage](https://opencompass.org.cn) for more details. 🔥🔥🔥. +- **\[2023.11.06\]** We have supported several API-based models, include **ChatGLM Pro@Zhipu, ABAB-Chat@MiniMax and Xunfei**. Welcome to [Models](https://opencompass.readthedocs.io/en/latest/user_guides/models.html) section for more details. 🔥🔥🔥. +- **\[2023.10.24\]** We release a new benchmark for evaluating LLMs’ capabilities of having multi-turn dialogues. Welcome to [BotChat](https://github.com/open-compass/BotChat) for more details. +- **\[2023.09.26\]** We update the leaderboard with [Qwen](https://github.com/QwenLM/Qwen), one of the best-performing open-source models currently available, welcome to our [homepage](https://opencompass.org.cn) for more details. - **\[2023.09.20\]** We update the leaderboard with [InternLM-20B](https://github.com/InternLM/InternLM), welcome to our [homepage](https://opencompass.org.cn) for more details. - **\[2023.09.19\]** We update the leaderboard with WeMix-LLaMA2-70B/Phi-1.5-1.3B, welcome to our [homepage](https://opencompass.org.cn) for more details. - **\[2023.09.18\]** We have released [long context evaluation guidance](docs/en/advanced_guides/longeval.md). -- **\[2023.09.08\]** We update the leaderboard with Baichuan-2/Tigerbot-2/Vicuna-v1.5, welcome to our [homepage](https://opencompass.org.cn) for more details. -- **\[2023.09.06\]** [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) team adpots OpenCompass to evaluate their models systematically. We deeply appreciate the community's dedication to transparency and reproducibility in LLM evaluation. > [More](docs/en/notes/news.md) @@ -76,12 +76,32 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for the co Below are the steps for quick installation and datasets preparation. -```Python +### 💻 Environment Setup + +#### Open-source Models with GPU + +```bash conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y conda activate opencompass git clone https://github.com/open-compass/opencompass opencompass cd opencompass pip install -e . +``` + +#### API Models with CPU-only + +```bash +conda create -n opencompass python=3.10 pytorch torchvision torchaudio cpuonly -c pytorch -y +conda activate opencompass +git clone https://github.com/open-compass/opencompass opencompass +cd opencompass +pip install -e . +# also please install requiresments packages via `pip install -r requirements/api.txt` for API models if needed. +``` + +### 📂 Data Preparation + +```bash # Download dataset to data/ folder wget https://github.com/open-compass/opencompass/releases/download/0.1.8.rc1/OpenCompassData-core-20231110.zip unzip OpenCompassData-core-20231110.zip @@ -411,16 +431,17 @@ Through the command line or configuration files, OpenCompass also supports evalu -- InternLM -- LLaMA -- Vicuna -- Alpaca -- Baichuan -- WizardLM -- ChatGLM2 -- Falcon -- TigerBot -- Qwen +- [InternLM](https://github.com/InternLM/InternLM) +- [LLaMA](https://github.com/facebookresearch/llama) +- [Vicuna](https://github.com/lm-sys/FastChat) +- [Alpaca](https://github.com/tatsu-lab/stanford_alpaca) +- [Baichuan](https://github.com/baichuan-inc) +- [WizardLM](https://github.com/nlpxucan/WizardLM) +- [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B) +- [ChatGLM3](https://github.com/THUDM/ChatGLM3-6B) +- [TigerBot](https://github.com/TigerResearch/TigerBot) +- [Qwen](https://github.com/QwenLM/Qwen) +- [BlueLM](https://github.com/vivo-ai-lab/BlueLM) - ... @@ -428,7 +449,15 @@ Through the command line or configuration files, OpenCompass also supports evalu - OpenAI - Claude -- PaLM (coming soon) +- ZhipuAI(ChatGLM) +- Baichuan +- ByteDance(YunQue) +- Huawei(PanGu) +- 360 +- Baidu(ERNIEBot) +- MiniMax(ABAB-Chat) +- SenseTime(nova) +- Xunfei(Spark) - …… @@ -444,17 +473,17 @@ Through the command line or configuration files, OpenCompass also supports evalu - [ ] Subjective Evaluation - [ ] Release CompassAreana - [ ] Subjective evaluation dataset. -- [ ] Long-context +- [x] Long-context - [ ] Long-context evaluation with extensive datasets. - [ ] Long-context leaderboard. - [ ] Coding - [ ] Coding evaluation leaderboard. - - [ ] Non-python language evaluation service. + - [x] Non-python language evaluation service. - [ ] Agent - [ ] Support various agenet framework. - [ ] Evaluation of tool use of the LLMs. -- [ ] Robustness - - [ ] Support various attack method +- [x] Robustness + - [x] Support various attack method ## 👷‍♂️ Contributing diff --git a/README_zh-CN.md b/README_zh-CN.md index 6ab76bb4..ade418a3 100644 --- a/README_zh-CN.md +++ b/README_zh-CN.md @@ -38,15 +38,15 @@ ## 🚀 最新进展 +- **\[2023.11.22\]** 我们已经支持了多个于API的模型,包括**百度、字节跳动、华为、360**。欢迎查阅[模型](https://opencompass.readthedocs.io/en/latest/user_guides/models.html)部分以获取更多详细信息。🔥🔥🔥。 +- **\[2023.11.20\]** 感谢[helloyongyang](https://github.com/helloyongyang)支持使用[LightLLM](https://github.com/ModelTC/lightllm)作为后端进行评估。欢迎查阅[使用LightLLM进行评估](https://opencompass.readthedocs.io/en/latest/advanced_guides/evaluation_lightllm.html)以获取更多详细信息。🔥🔥🔥。 - **\[2023.11.13\]** 我们很高兴地宣布发布 OpenCompass v0.1.8 版本。此版本支持本地加载评估基准,从而无需连接互联网。请注意,随着此更新的发布,**您需要重新下载所有评估数据集**,以确保结果准确且最新。🔥🔥🔥。 - **\[2023.11.06\]** 我们已经支持了多个基于 API 的模型,包括ChatGLM Pro@智谱清言、ABAB-Chat@MiniMax 和讯飞。欢迎查看 [模型](https://opencompass.readthedocs.io/en/latest/user_guides/models.html) 部分以获取更多详细信息。🔥🔥🔥。 -- **\[2023.10.24\]** 我们发布了一个全新的评测集,BotChat,用于评估大语言模型的多轮对话能力,欢迎查看 [BotChat](https://github.com/open-compass/BotChat) 获取更多信息. 🔥🔥🔥. -- **\[2023.09.26\]** 我们在评测榜单上更新了[Qwen](https://github.com/QwenLM/Qwen), 这是目前表现最好的开源模型之一, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.🔥🔥🔥. +- **\[2023.10.24\]** 我们发布了一个全新的评测集,BotChat,用于评估大语言模型的多轮对话能力,欢迎查看 [BotChat](https://github.com/open-compass/BotChat) 获取更多信息. +- **\[2023.09.26\]** 我们在评测榜单上更新了[Qwen](https://github.com/QwenLM/Qwen), 这是目前表现最好的开源模型之一, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情. - **\[2023.09.20\]** 我们在评测榜单上更新了[InternLM-20B](https://github.com/InternLM/InternLM), 欢迎访问[官方网站](https://opencompass.org.cn)获取详情. - **\[2023.09.19\]** 我们在评测榜单上更新了WeMix-LLaMA2-70B/Phi-1.5-1.3B, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情. - **\[2023.09.18\]** 我们发布了[长文本评测指引](docs/zh_cn/advanced_guides/longeval.md). -- **\[2023.09.08\]** 我们在评测榜单上更新了Baichuan-2/Tigerbot-2/Vicuna-v1.5, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情。 -- **\[2023.09.06\]** 欢迎 [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) 团队采用OpenCompass对模型进行系统评估。我们非常感谢社区在提升LLM评估的透明度和可复现性上所做的努力。 > [更多](docs/zh_cn/notes/news.md) @@ -78,12 +78,32 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下 下面展示了快速安装以及准备数据集的步骤。 -```Python +### 💻 环境配置 + +#### 面向开源模型的GPU环境 + +```bash conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y conda activate opencompass git clone https://github.com/open-compass/opencompass opencompass cd opencompass pip install -e . +``` + +#### 面向API模型测试的CPU环境 + +```bash +conda create -n opencompass python=3.10 pytorch torchvision torchaudio cpuonly -c pytorch -y +conda activate opencompass +git clone https://github.com/open-compass/opencompass opencompass +cd opencompass +pip install -e . +# 如果需要使用各个API模型,请 `pip install -r requirements/api.txt` 安装API模型的相关依赖 +``` + +### 📂 数据准备 + +```bash # 下载数据集到 data/ 处 wget https://github.com/open-compass/opencompass/releases/download/0.1.8.rc1/OpenCompassData-core-20231110.zip unzip OpenCompassData-core-20231110.zip @@ -413,16 +433,17 @@ python run.py --datasets ceval_ppl mmlu_ppl \ -- InternLM -- LLaMA -- Vicuna -- Alpaca -- Baichuan -- WizardLM -- ChatGLM2 -- Falcon -- TigerBot -- Qwen +- [InternLM](https://github.com/InternLM/InternLM) +- [LLaMA](https://github.com/facebookresearch/llama) +- [Vicuna](https://github.com/lm-sys/FastChat) +- [Alpaca](https://github.com/tatsu-lab/stanford_alpaca) +- [Baichuan](https://github.com/baichuan-inc) +- [WizardLM](https://github.com/nlpxucan/WizardLM) +- [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B) +- [ChatGLM3](https://github.com/THUDM/ChatGLM3-6B) +- [TigerBot](https://github.com/TigerResearch/TigerBot) +- [Qwen](https://github.com/QwenLM/Qwen) +- [BlueLM](https://github.com/vivo-ai-lab/BlueLM) - …… @@ -430,7 +451,15 @@ python run.py --datasets ceval_ppl mmlu_ppl \ - OpenAI - Claude -- PaLM (即将推出) +- ZhipuAI(ChatGLM) +- Baichuan +- ByteDance(YunQue) +- Huawei(PanGu) +- 360 +- Baidu(ERNIEBot) +- MiniMax(ABAB-Chat) +- SenseTime(nova) +- Xunfei(Spark) - …… @@ -446,17 +475,17 @@ python run.py --datasets ceval_ppl mmlu_ppl \ - [ ] 主观评测 - [ ] 发布主观评测榜单 - [ ] 发布主观评测数据集 -- [ ] 长文本 +- [x] 长文本 - [ ] 支持广泛的长文本评测集 - [ ] 发布长文本评测榜单 - [ ] 代码能力 - [ ] 发布代码能力评测榜单 - - [ ] 提供非Python语言的评测服务 + - [x] 提供非Python语言的评测服务 - [ ] 智能体 - [ ] 支持丰富的智能体方案 - [ ] 提供智能体评测榜单 -- [ ] 鲁棒性 - - [ ] 支持各类攻击方法 +- [x] 鲁棒性 + - [x] 支持各类攻击方法 ## 👷‍♂️ 贡献 diff --git a/docs/en/get_started/installation.md b/docs/en/get_started/installation.md index c94fd3e4..6ee79547 100644 --- a/docs/en/get_started/installation.md +++ b/docs/en/get_started/installation.md @@ -2,6 +2,9 @@ 1. Set up the OpenCompass environment: +`````{tabs} +````{tab} Open-source Models with GPU + ```bash conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y conda activate opencompass @@ -9,6 +12,20 @@ If you want to customize the PyTorch version or related CUDA version, please refer to the [official documentation](https://pytorch.org/get-started/locally/) to set up the PyTorch environment. Note that OpenCompass requires `pytorch>=1.13`. +```` +````{tab} API Models with CPU-only + + ```bash + conda create -n opencompass python=3.10 pytorch torchvision torchaudio cpuonly -c pytorch -y + conda activate opencompass + # also please install requiresments packages via `pip install -r requirements/api.txt` for API models if needed. + ``` + + If you want to customize the PyTorch version, please refer to the [official documentation](https://pytorch.org/get-started/locally/) to set up the PyTorch environment. Note that OpenCompass requires `pytorch>=1.13`. + +```` +````` + 2. Install OpenCompass: ```bash diff --git a/docs/en/notes/news.md b/docs/en/notes/news.md index d2fc8b79..c7c5d55a 100644 --- a/docs/en/notes/news.md +++ b/docs/en/notes/news.md @@ -1,5 +1,7 @@ # News +- **\[2023.09.08\]** We update the leaderboard with Baichuan-2/Tigerbot-2/Vicuna-v1.5, welcome to our [homepage](https://opencompass.org.cn) for more details. +- **\[2023.09.06\]** [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) team adpots OpenCompass to evaluate their models systematically. We deeply appreciate the community's dedication to transparency and reproducibility in LLM evaluation. - **\[2023.09.02\]** We have supported the evaluation of [Qwen-VL](https://github.com/QwenLM/Qwen-VL) in OpenCompass. - **\[2023.08.25\]** [**TigerBot**](https://github.com/TigerResearch/TigerBot) team adpots OpenCompass to evaluate their models systematically. We deeply appreciate the community's dedication to transparency and reproducibility in LLM evaluation. - **\[2023.08.21\]** [**Lagent**](https://github.com/InternLM/lagent) has been released, which is a lightweight framework for building LLM-based agents. We are working with Lagent team to support the evaluation of general tool-use capability, stay tuned! diff --git a/docs/zh_cn/get_started/installation.md b/docs/zh_cn/get_started/installation.md index 360fdb81..e38d29b3 100644 --- a/docs/zh_cn/get_started/installation.md +++ b/docs/zh_cn/get_started/installation.md @@ -2,6 +2,9 @@ 1. 准备 OpenCompass 运行环境: +`````{tabs} +````{tab} 面向开源模型的GPU环境 + ```bash conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y conda activate opencompass @@ -9,6 +12,21 @@ 如果你希望自定义 PyTorch 版本或相关的 CUDA 版本,请参考 [官方文档](https://pytorch.org/get-started/locally/) 准备 PyTorch 环境。需要注意的是,OpenCompass 要求 `pytorch>=1.13`。 +```` + +````{tab} 面向API模型测试的CPU环境 + + ```bash + conda create -n opencompass python=3.10 pytorch torchvision torchaudio cpuonly -c pytorch -y + conda activate opencompass + # 如果需要使用各个API模型,请 `pip install -r requirements/api.txt` 安装API模型的相关依赖 + ``` + + 如果你希望自定义 PyTorch 版本,请参考 [官方文档](https://pytorch.org/get-started/locally/) 准备 PyTorch 环境。需要注意的是,OpenCompass 要求 `pytorch>=1.13`。 + +```` +````` + 2. 安装 OpenCompass: ```bash diff --git a/docs/zh_cn/notes/news.md b/docs/zh_cn/notes/news.md index e2d49438..9b344822 100644 --- a/docs/zh_cn/notes/news.md +++ b/docs/zh_cn/notes/news.md @@ -1,5 +1,7 @@ # 新闻 +- **\[2023.09.08\]** 我们在评测榜单上更新了Baichuan-2/Tigerbot-2/Vicuna-v1.5, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情。 +- **\[2023.09.06\]** 欢迎 [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) 团队采用OpenCompass对模型进行系统评估。我们非常感谢社区在提升LLM评估的透明度和可复现性上所做的努力。 - **\[2023.09.02\]** 我们加入了[Qwen-VL](https://github.com/QwenLM/Qwen-VL)的评测支持。 - **\[2023.08.25\]** 欢迎 [**TigerBot**](https://github.com/TigerResearch/TigerBot) 团队采用OpenCompass对模型进行系统评估。我们非常感谢社区在提升LLM评估的透明度和可复现性上所做的努力。 - **\[2023.08.21\]** [**Lagent**](https://github.com/InternLM/lagent) 正式发布,它是一个轻量级、开源的基于大语言模型的智能体(agent)框架。我们正与Lagent团队紧密合作,推进支持基于Lagent的大模型工具能力评测 ! diff --git a/requirements/runtime.txt b/requirements/runtime.txt index 3e905ab6..4681af1c 100644 --- a/requirements/runtime.txt +++ b/requirements/runtime.txt @@ -11,7 +11,7 @@ fairscale fuzzywuzzy jieba ltp -mmengine>=0.8.2 +mmengine-lite nltk==3.8 numpy==1.23.4 openai