[Doc] Update README and requirements. (#622)

* update readme * update doc
2025-05-30 16:03:24 +08:00 · 2023-11-22 19:16:54 +08:00 · 2023-11-22 19:16:54 +08:00 · 5329724b65
commit 5329724b65
parent c0785e53d8
7 changed files with 139 additions and 42 deletions
--- a/README.md
+++ b/README.md
@ -38,15 +38,15 @@ Just like a compass guides us on our journey, OpenCompass will guide you through
 ## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
 - **\[2023.11.22\]** We have supported many API-based models, include **Baidu, ByteDance, Huawei, 360**. Welcome to [Models](https://opencompass.readthedocs.io/en/latest/user_guides/models.html) section for more details. 🔥🔥🔥.
 - **\[2023.11.20\]** Thanks [helloyongyang](https://github.com/helloyongyang) for supporting the evaluation with [LightLLM](https://github.com/ModelTC/lightllm) as backent. Welcome to [Evaluation With LightLLM](https://opencompass.readthedocs.io/en/latest/advanced_guides/evaluation_lightllm.html) for more details. 🔥🔥🔥.
 - **\[2023.11.13\]** We are delighted to announce the release of OpenCompass v0.1.8. This version enables local loading of evaluation benchmarks, thereby eliminating the need for an internet connection. Please note that with this update, **you must re-download all evaluation datasets** to ensure accurate and up-to-date results.🔥🔥🔥.
- **\[2023.11.06\]** We have supported several API-based models, include  ChatGLM Pro@Zhipu, ABAB-Chat@MiniMax and Xunfei. Welcome to [Models](https://opencompass.readthedocs.io/en/latest/user_guides/models.html) section for more details. 🔥🔥🔥.
+- **\[2023.11.06\]** We have supported several API-based models, include  **ChatGLM Pro@Zhipu, ABAB-Chat@MiniMax and Xunfei**. Welcome to [Models](https://opencompass.readthedocs.io/en/latest/user_guides/models.html) section for more details. 🔥🔥🔥.
- **\[2023.10.24\]** We release a new benchmark for evaluating LLMs’ capabilities of having multi-turn dialogues. Welcome to [BotChat](https://github.com/open-compass/BotChat) for more details. 🔥🔥🔥.
+- **\[2023.10.24\]** We release a new benchmark for evaluating LLMs’ capabilities of having multi-turn dialogues. Welcome to [BotChat](https://github.com/open-compass/BotChat) for more details.
- **\[2023.09.26\]** We update the leaderboard with [Qwen](https://github.com/QwenLM/Qwen), one of the best-performing open-source models currently available, welcome to our [homepage](https://opencompass.org.cn) for more details. 🔥🔥🔥.
+- **\[2023.09.26\]** We update the leaderboard with [Qwen](https://github.com/QwenLM/Qwen), one of the best-performing open-source models currently available, welcome to our [homepage](https://opencompass.org.cn) for more details.
 - **\[2023.09.20\]** We update the leaderboard with [InternLM-20B](https://github.com/InternLM/InternLM), welcome to our [homepage](https://opencompass.org.cn) for more details.
 - **\[2023.09.19\]** We update the leaderboard with WeMix-LLaMA2-70B/Phi-1.5-1.3B, welcome to our [homepage](https://opencompass.org.cn) for more details.
 - **\[2023.09.18\]** We have released [long context evaluation guidance](docs/en/advanced_guides/longeval.md).
 - **\[2023.09.08\]** We update the leaderboard with Baichuan-2/Tigerbot-2/Vicuna-v1.5, welcome to our [homepage](https://opencompass.org.cn) for more details.
 - **\[2023.09.06\]**  [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) team adpots OpenCompass to evaluate their models systematically. We deeply appreciate the community's dedication to transparency and reproducibility in LLM evaluation.
 > [More](docs/en/notes/news.md)
@ -76,12 +76,32 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for the co
 Below are the steps for quick installation and datasets preparation.
-```Python
+### 💻 Environment Setup
 #### Open-source Models with GPU
 ```bash
 conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
 conda activate opencompass
 git clone https://github.com/open-compass/opencompass opencompass
 cd opencompass
 pip install -e .
 ```
 #### API Models with CPU-only
 ```bash
 conda create -n opencompass python=3.10 pytorch torchvision torchaudio cpuonly -c pytorch -y
 conda activate opencompass
 git clone https://github.com/open-compass/opencompass opencompass
 cd opencompass
 pip install -e .
 # also please install requiresments packages via `pip install -r requirements/api.txt` for API models if needed.
 ```
 ### 📂 Data Preparation
 ```bash
 # Download dataset to data/ folder
 wget https://github.com/open-compass/opencompass/releases/download/0.1.8.rc1/OpenCompassData-core-20231110.zip
 unzip OpenCompassData-core-20231110.zip
@ -411,16 +431,17 @@ Through the command line or configuration files, OpenCompass also supports evalu
    <tr valign="top">
      <td>
- InternLM
+- [InternLM](https://github.com/InternLM/InternLM)
- LLaMA
+- [LLaMA](https://github.com/facebookresearch/llama)
- Vicuna
+- [Vicuna](https://github.com/lm-sys/FastChat)
- Alpaca
+- [Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
- Baichuan
+- [Baichuan](https://github.com/baichuan-inc)
- WizardLM
+- [WizardLM](https://github.com/nlpxucan/WizardLM)
- ChatGLM2
+- [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B)
- Falcon
+- [ChatGLM3](https://github.com/THUDM/ChatGLM3-6B)
- TigerBot
+- [TigerBot](https://github.com/TigerResearch/TigerBot)
- Qwen
+- [Qwen](https://github.com/QwenLM/Qwen)
 - [BlueLM](https://github.com/vivo-ai-lab/BlueLM)
 - ...
 </td>
@ -428,7 +449,15 @@ Through the command line or configuration files, OpenCompass also supports evalu
 - OpenAI
 - Claude
- PaLM (coming soon)
+- ZhipuAI(ChatGLM)
 - Baichuan
 - ByteDance(YunQue)
 - Huawei(PanGu)
 - 360
 - Baidu(ERNIEBot)
 - MiniMax(ABAB-Chat)
 - SenseTime(nova)
 - Xunfei(Spark)
 - ……
 </td>
@ -444,17 +473,17 @@ Through the command line or configuration files, OpenCompass also supports evalu
 - [ ] Subjective Evaluation
  - [ ] Release CompassAreana
  - [ ] Subjective evaluation dataset.
- [ ] Long-context
+- [x] Long-context
  - [ ] Long-context evaluation with extensive datasets.
  - [ ] Long-context leaderboard.
 - [ ] Coding
  - [ ] Coding evaluation leaderboard.
-  - [ ] Non-python language evaluation service.
+  - [x] Non-python language evaluation service.
 - [ ] Agent
  - [ ] Support various agenet framework.
  - [ ] Evaluation of tool use of the LLMs.
- [ ] Robustness
+- [x] Robustness
-  - [ ] Support various attack method
+  - [x] Support various attack method
 ## 👷‍♂️ Contributing
--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@ -38,15 +38,15 @@
 ## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
 - **\[2023.11.22\]** 我们已经支持了多个于API的模型，包括**百度、字节跳动、华为、360**。欢迎查阅[模型](https://opencompass.readthedocs.io/en/latest/user_guides/models.html)部分以获取更多详细信息。🔥🔥🔥。
 - **\[2023.11.20\]** 感谢[helloyongyang](https://github.com/helloyongyang)支持使用[LightLLM](https://github.com/ModelTC/lightllm)作为后端进行评估。欢迎查阅[使用LightLLM进行评估](https://opencompass.readthedocs.io/en/latest/advanced_guides/evaluation_lightllm.html)以获取更多详细信息。🔥🔥🔥。
 - **\[2023.11.13\]** 我们很高兴地宣布发布 OpenCompass v0.1.8 版本。此版本支持本地加载评估基准，从而无需连接互联网。请注意，随着此更新的发布，**您需要重新下载所有评估数据集**，以确保结果准确且最新。🔥🔥🔥。
 - **\[2023.11.06\]** 我们已经支持了多个基于 API 的模型，包括ChatGLM Pro@智谱清言、ABAB-Chat@MiniMax 和讯飞。欢迎查看 [模型](https://opencompass.readthedocs.io/en/latest/user_guides/models.html) 部分以获取更多详细信息。🔥🔥🔥。
- **\[2023.10.24\]** 我们发布了一个全新的评测集，BotChat，用于评估大语言模型的多轮对话能力，欢迎查看 [BotChat](https://github.com/open-compass/BotChat) 获取更多信息. 🔥🔥🔥.
+- **\[2023.10.24\]** 我们发布了一个全新的评测集，BotChat，用于评估大语言模型的多轮对话能力，欢迎查看 [BotChat](https://github.com/open-compass/BotChat) 获取更多信息.
- **\[2023.09.26\]** 我们在评测榜单上更新了[Qwen](https://github.com/QwenLM/Qwen), 这是目前表现最好的开源模型之一, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.🔥🔥🔥.
+- **\[2023.09.26\]** 我们在评测榜单上更新了[Qwen](https://github.com/QwenLM/Qwen), 这是目前表现最好的开源模型之一, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.
 - **\[2023.09.20\]** 我们在评测榜单上更新了[InternLM-20B](https://github.com/InternLM/InternLM), 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.
 - **\[2023.09.19\]** 我们在评测榜单上更新了WeMix-LLaMA2-70B/Phi-1.5-1.3B, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.
 - **\[2023.09.18\]** 我们发布了[长文本评测指引](docs/zh_cn/advanced_guides/longeval.md).
 - **\[2023.09.08\]** 我们在评测榜单上更新了Baichuan-2/Tigerbot-2/Vicuna-v1.5, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情。
 - **\[2023.09.06\]** 欢迎 [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) 团队采用OpenCompass对模型进行系统评估。我们非常感谢社区在提升LLM评估的透明度和可复现性上所做的努力。
 > [更多](docs/zh_cn/notes/news.md)
@ -78,12 +78,32 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
 下面展示了快速安装以及准备数据集的步骤。
-```Python
+### 💻 环境配置
 #### 面向开源模型的GPU环境
 ```bash
 conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
 conda activate opencompass
 git clone https://github.com/open-compass/opencompass opencompass
 cd opencompass
 pip install -e .
 ```
 #### 面向API模型测试的CPU环境
 ```bash
 conda create -n opencompass python=3.10 pytorch torchvision torchaudio cpuonly -c pytorch -y
 conda activate opencompass
 git clone https://github.com/open-compass/opencompass opencompass
 cd opencompass
 pip install -e .
 # 如果需要使用各个API模型，请 `pip install -r requirements/api.txt` 安装API模型的相关依赖
 ```
 ### 📂 数据准备
 ```bash
 # 下载数据集到 data/ 处
 wget https://github.com/open-compass/opencompass/releases/download/0.1.8.rc1/OpenCompassData-core-20231110.zip
 unzip OpenCompassData-core-20231110.zip
@ -413,16 +433,17 @@ python run.py --datasets ceval_ppl mmlu_ppl \
    <tr valign="top">
      <td>
- InternLM
+- [InternLM](https://github.com/InternLM/InternLM)
- LLaMA
+- [LLaMA](https://github.com/facebookresearch/llama)
- Vicuna
+- [Vicuna](https://github.com/lm-sys/FastChat)
- Alpaca
+- [Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
- Baichuan
+- [Baichuan](https://github.com/baichuan-inc)
- WizardLM
+- [WizardLM](https://github.com/nlpxucan/WizardLM)
- ChatGLM2
+- [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B)
- Falcon
+- [ChatGLM3](https://github.com/THUDM/ChatGLM3-6B)
- TigerBot
+- [TigerBot](https://github.com/TigerResearch/TigerBot)
- Qwen
+- [Qwen](https://github.com/QwenLM/Qwen)
 - [BlueLM](https://github.com/vivo-ai-lab/BlueLM)
 - ……
 </td>
@ -430,7 +451,15 @@ python run.py --datasets ceval_ppl mmlu_ppl \
 - OpenAI
 - Claude
- PaLM (即将推出)
+- ZhipuAI(ChatGLM)
 - Baichuan
 - ByteDance(YunQue)
 - Huawei(PanGu)
 - 360
 - Baidu(ERNIEBot)
 - MiniMax(ABAB-Chat)
 - SenseTime(nova)
 - Xunfei(Spark)
 - ……
 </td>
@ -446,17 +475,17 @@ python run.py --datasets ceval_ppl mmlu_ppl \
 - [ ] 主观评测
  - [ ] 发布主观评测榜单
  - [ ] 发布主观评测数据集
- [ ] 长文本
+- [x] 长文本
  - [ ] 支持广泛的长文本评测集
  - [ ] 发布长文本评测榜单
 - [ ] 代码能力
  - [ ] 发布代码能力评测榜单
-  - [ ] 提供非Python语言的评测服务
+  - [x] 提供非Python语言的评测服务
 - [ ] 智能体
  - [ ] 支持丰富的智能体方案
  - [ ] 提供智能体评测榜单
- [ ] 鲁棒性
+- [x] 鲁棒性
-  - [ ] 支持各类攻击方法
+  - [x] 支持各类攻击方法
 ## 👷‍♂️ 贡献
--- a/docs/en/get_started/installation.md
+++ b/docs/en/get_started/installation.md
@ -2,6 +2,9 @@
 1. Set up the OpenCompass environment:
 `````{tabs}
 ````{tab} Open-source Models with GPU
   ```bash
   conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
   conda activate opencompass
@ -9,6 +12,20 @@
   If you want to customize the PyTorch version or related CUDA version, please refer to the [official documentation](https://pytorch.org/get-started/locally/) to set up the PyTorch environment. Note that OpenCompass requires `pytorch>=1.13`.
 ````
 ````{tab} API Models with CPU-only
   ```bash
   conda create -n opencompass python=3.10 pytorch torchvision torchaudio cpuonly -c pytorch -y
   conda activate opencompass
   # also please install requiresments packages via `pip install -r requirements/api.txt` for API models if needed.
   ```
   If you want to customize the PyTorch version, please refer to the [official documentation](https://pytorch.org/get-started/locally/) to set up the PyTorch environment. Note that OpenCompass requires `pytorch>=1.13`.
 ````
 `````
 2. Install OpenCompass:
   ```bash
--- a/docs/en/notes/news.md
+++ b/docs/en/notes/news.md
@ -1,5 +1,7 @@
 # News
 - **\[2023.09.08\]** We update the leaderboard with Baichuan-2/Tigerbot-2/Vicuna-v1.5, welcome to our [homepage](https://opencompass.org.cn) for more details.
 - **\[2023.09.06\]**  [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) team adpots OpenCompass to evaluate their models systematically. We deeply appreciate the community's dedication to transparency and reproducibility in LLM evaluation.
 - **\[2023.09.02\]** We have supported the evaluation of [Qwen-VL](https://github.com/QwenLM/Qwen-VL) in OpenCompass.
 - **\[2023.08.25\]**  [**TigerBot**](https://github.com/TigerResearch/TigerBot) team adpots OpenCompass to evaluate their models systematically. We deeply appreciate the community's dedication to transparency and reproducibility in LLM evaluation.
 - **\[2023.08.21\]** [**Lagent**](https://github.com/InternLM/lagent) has been released, which is a lightweight framework for building LLM-based agents. We are working with Lagent team to support the evaluation of general tool-use capability, stay tuned!
--- a/docs/zh_cn/get_started/installation.md
+++ b/docs/zh_cn/get_started/installation.md
@ -2,6 +2,9 @@
 1. 准备 OpenCompass 运行环境：
 `````{tabs}
 ````{tab} 面向开源模型的GPU环境
   ```bash
   conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
   conda activate opencompass
@ -9,6 +12,21 @@
   如果你希望自定义 PyTorch 版本或相关的 CUDA 版本，请参考 [官方文档](https://pytorch.org/get-started/locally/) 准备 PyTorch 环境。需要注意的是，OpenCompass 要求 `pytorch>=1.13`。
 ````
 ````{tab} 面向API模型测试的CPU环境
   ```bash
   conda create -n opencompass python=3.10 pytorch torchvision torchaudio cpuonly -c pytorch -y
   conda activate opencompass
   # 如果需要使用各个API模型，请 `pip install -r requirements/api.txt` 安装API模型的相关依赖
   ```
   如果你希望自定义 PyTorch 版本，请参考 [官方文档](https://pytorch.org/get-started/locally/) 准备 PyTorch 环境。需要注意的是，OpenCompass 要求 `pytorch>=1.13`。
 ````
 `````
 2. 安装 OpenCompass：
   ```bash
--- a/docs/zh_cn/notes/news.md
+++ b/docs/zh_cn/notes/news.md
@ -1,5 +1,7 @@
 # 新闻
 - **\[2023.09.08\]** 我们在评测榜单上更新了Baichuan-2/Tigerbot-2/Vicuna-v1.5, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情。
 - **\[2023.09.06\]** 欢迎 [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) 团队采用OpenCompass对模型进行系统评估。我们非常感谢社区在提升LLM评估的透明度和可复现性上所做的努力。
 - **\[2023.09.02\]** 我们加入了[Qwen-VL](https://github.com/QwenLM/Qwen-VL)的评测支持。
 - **\[2023.08.25\]** 欢迎 [**TigerBot**](https://github.com/TigerResearch/TigerBot) 团队采用OpenCompass对模型进行系统评估。我们非常感谢社区在提升LLM评估的透明度和可复现性上所做的努力。
 - **\[2023.08.21\]** [**Lagent**](https://github.com/InternLM/lagent) 正式发布，它是一个轻量级、开源的基于大语言模型的智能体（agent）框架。我们正与Lagent团队紧密合作，推进支持基于Lagent的大模型工具能力评测 !
--- a/requirements/runtime.txt
+++ b/requirements/runtime.txt
@ -11,7 +11,7 @@ fairscale
 fuzzywuzzy
 jieba
 ltp
-mmengine>=0.8.2
+mmengine-lite
 nltk==3.8
 numpy==1.23.4
 openai