[Doc] Update dataset list (#437)

* add new dataset list * add new dataset list * add new dataset list * update * update * update readme --------- Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2025-05-30 16:03:24 +08:00 · 2023-09-27 15:02:09 +08:00 · 2023-09-27 15:02:09 +08:00 · d6261e109d
commit d6261e109d
parent dc1b82c346
2 changed files with 628 additions and 486 deletions
--- a/README.md
+++ b/README.md
@ -34,9 +34,10 @@ Just like a compass guides us on our journey, OpenCompass will guide you through

 ## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>

+- **\[2023.09.26\]** We update the leaderboard with [Qwen](https://github.com/QwenLM/Qwen), one of the best-performing open-source models currently available, welcome to our [homepage](https://opencompass.org.cn) for more details. 🔥🔥🔥.
 - **\[2023.09.20\]** We update the leaderboard with [InternLM-20B](https://github.com/InternLM/InternLM), welcome to our [homepage](https://opencompass.org.cn) for more details. 🔥🔥🔥.
- **\[2023.09.19\]** We update the leaderboard with WeMix-LLaMA2-70B/Phi-1.5-1.3B, welcome to our [homepage](https://opencompass.org.cn) for more details. 🔥🔥🔥.
- **\[2023.09.18\]** We have released [long context evaluation guidance](docs/en/advanced_guides/longeval.md). 🔥🔥🔥.
+- **\[2023.09.19\]** We update the leaderboard with WeMix-LLaMA2-70B/Phi-1.5-1.3B, welcome to our [homepage](https://opencompass.org.cn) for more details.
+- **\[2023.09.18\]** We have released [long context evaluation guidance](docs/en/advanced_guides/longeval.md).
 - **\[2023.09.08\]** We update the leaderboard with Baichuan-2/Tigerbot-2/Vicuna-v1.5, welcome to our [homepage](https://opencompass.org.cn) for more details.
 - **\[2023.09.06\]**  [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) team adpots OpenCompass to evaluate their models systematically. We deeply appreciate the community's dedication to transparency and reproducibility in LLM evaluation.
 - **\[2023.09.02\]** We have supported the evaluation of [Qwen-VL](https://github.com/QwenLM/Qwen-VL) in OpenCompass.
@ -51,7 +52,7 @@ Just like a compass guides us on our journey, OpenCompass will guide you through

 OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features includes:

- **Comprehensive support for models and datasets**: Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 50+ datasets with about 300,000 questions, comprehensively evaluating the capabilities of the models in five dimensions.
+- **Comprehensive support for models and datasets**: Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 70+ datasets with about 400,000 questions, comprehensively evaluating the capabilities of the models in five dimensions.

 - **Efficient distributed evaluation**: One line command to implement task division and distributed evaluation, completing the full evaluation of billion-scale models in just a few hours.

@ -67,247 +68,6 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun

 <p align="right"><a href="#top">🔝Back to top</a></p>

-## 📖 Dataset Support
-
-<table align="center">
-  <tbody>
-    <tr align="center" valign="bottom">
-      <td>
-        <b>Language</b>
-      </td>
-      <td>
-        <b>Knowledge</b>
-      </td>
-      <td>
-        <b>Reasoning</b>
-      </td>
-      <td>
-        <b>Comprehensive Examination</b>
-      </td>
-      <td>
-        <b>Understanding</b>
-      </td>
-    </tr>
-    <tr valign="top">
-      <td>
-<details open>
-<summary><b>Word Definition</b></summary>
-
- WiC
- SummEdits
-
-</details>
-
-<details open>
-<summary><b>Idiom Learning</b></summary>
-
- CHID
-
-</details>
-
-<details open>
-<summary><b>Semantic Similarity</b></summary>
-
- AFQMC
- BUSTM
-
-</details>
-
-<details open>
-<summary><b>Coreference Resolution</b></summary>
-
- CLUEWSC
- WSC
- WinoGrande
-
-</details>
-
-<details open>
-<summary><b>Translation</b></summary>
-
- Flores
-
-</details>
-      </td>
-      <td>
-<details open>
-<summary><b>Knowledge Question Answering</b></summary>
-
- BoolQ
- CommonSenseQA
- NaturalQuestion
- TrivialQA
-
-</details>
-
-<details open>
-<summary><b>Multi-language Question Answering</b></summary>
-
- TyDi-QA
-
-</details>
-      </td>
-      <td>
-<details open>
-<summary><b>Textual Entailment</b></summary>
-
- CMNLI
- OCNLI
- OCNLI_FC
- AX-b
- AX-g
- CB
- RTE
-
-</details>
-
-<details open>
-<summary><b>Commonsense Reasoning</b></summary>
-
- StoryCloze
- StoryCloze-CN (coming soon)
- COPA
- ReCoRD
- HellaSwag
- PIQA
- SIQA
-
-</details>
-
-<details open>
-<summary><b>Mathematical Reasoning</b></summary>
-
- MATH
- GSM8K
-
-</details>
-
-<details open>
-<summary><b>Theorem Application</b></summary>
-
- TheoremQA
-
-</details>
-
-<details open>
-<summary><b>Code</b></summary>
-
- HumanEval
- MBPP
-
-</details>
-
-<details open>
-<summary><b>Comprehensive Reasoning</b></summary>
-
- BBH
-
-</details>
-      </td>
-      <td>
-<details open>
-<summary><b>Junior High, High School, University, Professional Examinations</b></summary>
-
- GAOKAO-2023
- CEval
- AGIEval
- MMLU
- GAOKAO-Bench
- CMMLU
- ARC
-
-</details>
-      </td>
-      <td>
-<details open>
-<summary><b>Reading Comprehension</b></summary>
-
- C3
- CMRC
- DRCD
- MultiRC
- RACE
-
-</details>
-
-<details open>
-<summary><b>Content Summary</b></summary>
-
- CSL
- LCSTS
- XSum
-
-</details>
-
-<details open>
-<summary><b>Content Analysis</b></summary>
-
- EPRSTMT
- LAMBADA
- TNEWS
-
-</details>
-      </td>
-    </tr>
-</td>
-    </tr>
-  </tbody>
-</table>
-
-<p align="right"><a href="#top">🔝Back to top</a></p>
-
-## 📖 Model Support
-
-<table align="center">
-  <tbody>
-    <tr align="center" valign="bottom">
-      <td>
-        <b>Open-source Models</b>
-      </td>
-      <td>
-        <b>API Models</b>
-      </td>
-      <!-- <td>
-        <b>Custom Models</b>
-      </td> -->
-    </tr>
-    <tr valign="top">
-      <td>
-
- InternLM
- LLaMA
- Vicuna
- Alpaca
- Baichuan
- WizardLM
- ChatGLM-6B
- ChatGLM2-6B
- MPT
- Falcon
- TigerBot
- MOSS
- ...
-
-</td>
-<td>
-
- OpenAI
- Claude (coming soon)
- PaLM (coming soon)
- ……
-
-</td>
-
-<!--
- GLM
- ...
-
-</td> -->
-
-</tr>
-  </tbody>
-</table>
-
 ## 🛠️ Installation

 Below are the steps for quick installation and datasets preparation.
@ -360,6 +120,316 @@ python run.py --datasets ceval_ppl mmlu_ppl \

 Through the command line or configuration files, OpenCompass also supports evaluating APIs or custom models, as well as more diversified evaluation strategies. Please read the [Quick Start](https://opencompass.readthedocs.io/en/latest/get_started.html) to learn how to run an evaluation task.

+<p align="right"><a href="#top">🔝Back to top</a></p>
+
+## 📖 Dataset Support
+
+<table align="center">
+  <tbody>
+    <tr align="center" valign="bottom">
+      <td>
+        <b>Language</b>
+      </td>
+      <td>
+        <b>Knowledge</b>
+      </td>
+      <td>
+        <b>Reasoning</b>
+      </td>
+      <td>
+        <b>Examination</b>
+      </td>
+    </tr>
+    <tr valign="top">
+      <td>
+<details open>
+<summary><b>Word Definition</b></summary>
+
+- WiC
+- SummEdits
+
+</details>
+
+<details open>
+<summary><b>Idiom Learning</b></summary>
+
+- CHID
+
+</details>
+
+<details open>
+<summary><b>Semantic Similarity</b></summary>
+
+- AFQMC
+- BUSTM
+
+</details>
+
+<details open>
+<summary><b>Coreference Resolution</b></summary>
+
+- CLUEWSC
+- WSC
+- WinoGrande
+
+</details>
+
+<details open>
+<summary><b>Translation</b></summary>
+
+- Flores
+- IWSLT2017
+
+</details>
+
+<details open>
+<summary><b>Multi-language Question Answering</b></summary>
+
+- TyDi-QA
+- XCOPA
+
+</details>
+
+<details open>
+<summary><b>Multi-language Summary</b></summary>
+
+- XLSum
+
+</details>
+      </td>
+      <td>
+<details open>
+<summary><b>Knowledge Question Answering</b></summary>
+
+- BoolQ
+- CommonSenseQA
+- NaturalQuestions
+- TriviaQA
+
+</details>
+      </td>
+      <td>
+<details open>
+<summary><b>Textual Entailment</b></summary>
+
+- CMNLI
+- OCNLI
+- OCNLI_FC
+- AX-b
+- AX-g
+- CB
+- RTE
+- ANLI
+
+</details>
+
+<details open>
+<summary><b>Commonsense Reasoning</b></summary>
+
+- StoryCloze
+- COPA
+- ReCoRD
+- HellaSwag
+- PIQA
+- SIQA
+
+</details>
+
+<details open>
+<summary><b>Mathematical Reasoning</b></summary>
+
+- MATH
+- GSM8K
+
+</details>
+
+<details open>
+<summary><b>Theorem Application</b></summary>
+
+- TheoremQA
+- StrategyQA
+- SciBench
+
+</details>
+
+<details open>
+<summary><b>Comprehensive Reasoning</b></summary>
+
+- BBH
+
+</details>
+      </td>
+      <td>
+<details open>
+<summary><b>Junior High, High School, University, Professional Examinations</b></summary>
+
+- C-Eval
+- AGIEval
+- MMLU
+- GAOKAO-Bench
+- CMMLU
+- ARC
+- Xiezhi
+
+</details>
+
+<details open>
+<summary><b>Medical Examinations</b></summary>
+
+- CMB
+
+</details>
+      </td>
+    </tr>
+</td>
+    </tr>
+  </tbody>
+  <tbody>
+    <tr align="center" valign="bottom">
+      <td>
+        <b>Understanding</b>
+      </td>
+      <td>
+        <b>Long Context</b>
+      </td>
+      <td>
+        <b>Safety</b>
+      </td>
+      <td>
+        <b>Code</b>
+      </td>
+    </tr>
+    <tr valign="top">
+      <td>
+<details open>
+<summary><b>Reading Comprehension</b></summary>
+
+- C3
+- CMRC
+- DRCD
+- MultiRC
+- RACE
+- DROP
+- OpenBookQA
+- SQuAD2.0
+
+</details>
+
+<details open>
+<summary><b>Content Summary</b></summary>
+
+- CSL
+- LCSTS
+- XSum
+- SummScreen
+
+</details>
+
+<details open>
+<summary><b>Content Analysis</b></summary>
+
+- EPRSTMT
+- LAMBADA
+- TNEWS
+
+</details>
+      </td>
+      <td>
+<details open>
+<summary><b>Long Context Understanding</b></summary>
+
+- LEval
+- LongBench
+- GovReports
+- NarrativeQA
+- Qasper
+
+</details>
+      </td>
+      <td>
+<details open>
+<summary><b>Safety</b></summary>
+
+- CivilComments
+- CrowsPairs
+- CValues
+- JigsawMultilingual
+- TruthfulQA
+
+</details>
+<details open>
+<summary><b>Robustness</b></summary>
+
+- AdvGLUE
+
+</details>
+      </td>
+      <td>
+<details open>
+<summary><b>Code</b></summary>
+
+- HumanEval
+- HumanEvalX
+- MBPP
+- APPs
+- DS1000
+
+</details>
+      </td>
+    </tr>
+</td>
+    </tr>
+  </tbody>
+</table>
+
+<p align="right"><a href="#top">🔝Back to top</a></p>
+
+## 📖 Model Support
+
+<table align="center">
+  <tbody>
+    <tr align="center" valign="bottom">
+      <td>
+        <b>Open-source Models</b>
+      </td>
+      <td>
+        <b>API Models</b>
+      </td>
+      <!-- <td>
+        <b>Custom Models</b>
+      </td> -->
+    </tr>
+    <tr valign="top">
+      <td>
+
+- InternLM
+- LLaMA
+- Vicuna
+- Alpaca
+- Baichuan
+- WizardLM
+- ChatGLM2
+- Falcon
+- TigerBot
+- Qwen
+- ...
+
+</td>
+<td>
+
+- OpenAI
+- Claude
+- PaLM (coming soon)
+- ……
+
+</td>
+
+</tr>
+  </tbody>
+</table>
+
+<p align="right"><a href="#top">🔝Back to top</a></p>
+
 ## 🔜 Roadmap

 - [ ] Subjective Evaluation
--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@ -34,9 +34,10 @@

 ## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>

+- **\[2023.09.26\]** 我们在评测榜单上更新了[Qwen](https://github.com/QwenLM/Qwen), 这是目前表现最好的开源模型之一, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.🔥🔥🔥.
 - **\[2023.09.20\]** 我们在评测榜单上更新了[InternLM-20B](https://github.com/InternLM/InternLM), 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.🔥🔥🔥.
- **\[2023.09.19\]** 我们在评测榜单上更新了WeMix-LLaMA2-70B/Phi-1.5-1.3B, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.🔥🔥🔥.
- **\[2023.09.18\]** 我们发布了[长文本评测指引](docs/zh_cn/advanced_guides/longeval.md).🔥🔥🔥.
+- **\[2023.09.19\]** 我们在评测榜单上更新了WeMix-LLaMA2-70B/Phi-1.5-1.3B, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.
+- **\[2023.09.18\]** 我们发布了[长文本评测指引](docs/zh_cn/advanced_guides/longeval.md).
 - **\[2023.09.08\]** 我们在评测榜单上更新了Baichuan-2/Tigerbot-2/Vicuna-v1.5, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情。
 - **\[2023.09.06\]** 欢迎 [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) 团队采用OpenCompass对模型进行系统评估。我们非常感谢社区在提升LLM评估的透明度和可复现性上所做的努力。
 - **\[2023.09.02\]** 我们加入了[Qwen-VL](https://github.com/QwenLM/Qwen-VL)的评测支持。
@ -53,7 +54,7 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下

 - **开源可复现**：提供公平、公开、可复现的大模型评测方案

- **全面的能力维度**：五大维度设计，提供 50+ 个数据集约 30 万题的的模型评测方案，全面评估模型能力
+- **全面的能力维度**：五大维度设计，提供 70+ 个数据集约 40 万题的的模型评测方案，全面评估模型能力

 - **丰富的模型支持**：已支持 20+ HuggingFace 及 API 模型

@ -69,245 +70,6 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下

 <p align="right"><a href="#top">🔝返回顶部</a></p>

-## 📖 数据集支持
-
-<table align="center">
-  <tbody>
-    <tr align="center" valign="bottom">
-      <td>
-        <b>语言</b>
-      </td>
-      <td>
-        <b>知识</b>
-      </td>
-      <td>
-        <b>推理</b>
-      </td>
-      <td>
-        <b>学科</b>
-      </td>
-      <td>
-        <b>理解</b>
-      </td>
-    </tr>
-    <tr valign="top">
-      <td>
-<details open>
-<summary><b>字词释义</b></summary>
-
- WiC
- SummEdits
-
-</details>
-
-<details open>
-<summary><b>成语习语</b></summary>
-
- CHID
-
-</details>
-
-<details open>
-<summary><b>语义相似度</b></summary>
-
- AFQMC
- BUSTM
-
-</details>
-
-<details open>
-<summary><b>指代消解</b></summary>
-
- CLUEWSC
- WSC
- WinoGrande
-
-</details>
-
-<details open>
-<summary><b>翻译</b></summary>
-
- Flores
-
-</details>
-      </td>
-      <td>
-<details open>
-<summary><b>知识问答</b></summary>
-
- BoolQ
- CommonSenseQA
- NaturalQuestion
- TrivialQA
-
-</details>
-
-<details open>
-<summary><b>多语种问答</b></summary>
-
- TyDi-QA
-
-</details>
-      </td>
-      <td>
-<details open>
-<summary><b>文本蕴含</b></summary>
-
- CMNLI
- OCNLI
- OCNLI_FC
- AX-b
- AX-g
- CB
- RTE
-
-</details>
-
-<details open>
-<summary><b>常识推理</b></summary>
-
- StoryCloze
- StoryCloze-CN（即将上线）
- COPA
- ReCoRD
- HellaSwag
- PIQA
- SIQA
-
-</details>
-
-<details open>
-<summary><b>数学推理</b></summary>
-
- MATH
- GSM8K
-
-</details>
-
-<details open>
-<summary><b>定理应用</b></summary>
-
- TheoremQA
-
-</details>
-
-<details open>
-<summary><b>代码</b></summary>
-
- HumanEval
- MBPP
-
-</details>
-
-<details open>
-<summary><b>综合推理</b></summary>
-
- BBH
-
-</details>
-      </td>
-      <td>
-<details open>
-<summary><b>初中/高中/大学/职业考试</b></summary>
-
- GAOKAO-2023
- CEval
- AGIEval
- MMLU
- GAOKAO-Bench
- CMMLU
- ARC
-
-</details>
-      </td>
-      <td>
-<details open>
-<summary><b>阅读理解</b></summary>
-
- C3
- CMRC
- DRCD
- MultiRC
- RACE
-
-</details>
-
-<details open>
-<summary><b>内容总结</b></summary>
-
- CSL
- LCSTS
- XSum
-
-</details>
-
-<details open>
-<summary><b>内容分析</b></summary>
-
- EPRSTMT
- LAMBADA
- TNEWS
-
-</details>
-      </td>
-    </tr>
-</td>
-    </tr>
-  </tbody>
-</table>
-
-<p align="right"><a href="#top">🔝返回顶部</a></p>
-
-## 📖 模型支持
-
-<table align="center">
-  <tbody>
-    <tr align="center" valign="bottom">
-      <td>
-        <b>开源模型</b>
-      </td>
-      <td>
-        <b>API 模型</b>
-      </td>
-      <!-- <td>
-        <b>自定义模型</b>
-      </td> -->
-    </tr>
-    <tr valign="top">
-      <td>
-
- LLaMA
- Vicuna
- Alpaca
- Baichuan
- WizardLM
- ChatGLM-6B
- ChatGLM2-6B
- MPT
- Falcon
- TigerBot
- MOSS
- ……
-
-</td>
-<td>
-
- OpenAI
- Claude (即将推出)
- PaLM (即将推出)
- ……
-
-</td>
-<!-- <td>
-
- GLM
- ……
-
-</td> -->
-</tr>
-  </tbody>
-</table>
-
 ## 🛠️ 安装

 下面展示了快速安装以及准备数据集的步骤。
@ -362,6 +124,316 @@ python run.py --datasets ceval_ppl mmlu_ppl \

 更多教程请查看我们的[文档](https://opencompass.readthedocs.io/zh_CN/latest/index.html)。

+<p align="right"><a href="#top">🔝返回顶部</a></p>
+
+## 📖 数据集支持
+
+<table align="center">
+  <tbody>
+    <tr align="center" valign="bottom">
+      <td>
+        <b>语言</b>
+      </td>
+      <td>
+        <b>知识</b>
+      </td>
+      <td>
+        <b>推理</b>
+      </td>
+      <td>
+        <b>考试</b>
+      </td>
+    </tr>
+    <tr valign="top">
+      <td>
+<details open>
+<summary><b>字词释义</b></summary>
+
+- WiC
+- SummEdits
+
+</details>
+
+<details open>
+<summary><b>成语习语</b></summary>
+
+- CHID
+
+</details>
+
+<details open>
+<summary><b>语义相似度</b></summary>
+
+- AFQMC
+- BUSTM
+
+</details>
+
+<details open>
+<summary><b>指代消解</b></summary>
+
+- CLUEWSC
+- WSC
+- WinoGrande
+
+</details>
+
+<details open>
+<summary><b>翻译</b></summary>
+
+- Flores
+- IWSLT2017
+
+</details>
+
+<details open>
+<summary><b>多语种问答</b></summary>
+
+- TyDi-QA
+- XCOPA
+
+</details>
+
+<details open>
+<summary><b>多语种总结</b></summary>
+
+- XLSum
+
+</details>
+      </td>
+      <td>
+<details open>
+<summary><b>知识问答</b></summary>
+
+- BoolQ
+- CommonSenseQA
+- NaturalQuestions
+- TriviaQA
+
+</details>
+      </td>
+      <td>
+<details open>
+<summary><b>文本蕴含</b></summary>
+
+- CMNLI
+- OCNLI
+- OCNLI_FC
+- AX-b
+- AX-g
+- CB
+- RTE
+- ANLI
+
+</details>
+
+<details open>
+<summary><b>常识推理</b></summary>
+
+- StoryCloze
+- COPA
+- ReCoRD
+- HellaSwag
+- PIQA
+- SIQA
+
+</details>
+
+<details open>
+<summary><b>数学推理</b></summary>
+
+- MATH
+- GSM8K
+
+</details>
+
+<details open>
+<summary><b>定理应用</b></summary>
+
+- TheoremQA
+- StrategyQA
+- SciBench
+
+</details>
+
+<details open>
+<summary><b>综合推理</b></summary>
+
+- BBH
+
+</details>
+      </td>
+      <td>
+<details open>
+<summary><b>初中/高中/大学/职业考试</b></summary>
+
+- C-Eval
+- AGIEval
+- MMLU
+- GAOKAO-Bench
+- CMMLU
+- ARC
+- Xiezhi
+
+</details>
+
+<details open>
+<summary><b>医学考试</b></summary>
+
+- CMB
+
+</details>
+      </td>
+    </tr>
+</td>
+    </tr>
+  </tbody>
+  <tbody>
+    <tr align="center" valign="bottom">
+      <td>
+        <b>理解</b>
+      </td>
+      <td>
+        <b>长文本</b>
+      </td>
+      <td>
+        <b>安全</b>
+      </td>
+      <td>
+        <b>代码</b>
+      </td>
+    </tr>
+    <tr valign="top">
+      <td>
+<details open>
+<summary><b>阅读理解</b></summary>
+
+- C3
+- CMRC
+- DRCD
+- MultiRC
+- RACE
+- DROP
+- OpenBookQA
+- SQuAD2.0
+
+</details>
+
+<details open>
+<summary><b>内容总结</b></summary>
+
+- CSL
+- LCSTS
+- XSum
+- SummScreen
+
+</details>
+
+<details open>
+<summary><b>内容分析</b></summary>
+
+- EPRSTMT
+- LAMBADA
+- TNEWS
+
+</details>
+      </td>
+      <td>
+<details open>
+<summary><b>长文本理解</b></summary>
+
+- LEval
+- LongBench
+- GovReports
+- NarrativeQA
+- Qasper
+
+</details>
+      </td>
+      <td>
+<details open>
+<summary><b>安全</b></summary>
+
+- CivilComments
+- CrowsPairs
+- CValues
+- JigsawMultilingual
+- TruthfulQA
+
+</details>
+<details open>
+<summary><b>健壮性</b></summary>
+
+- AdvGLUE
+
+</details>
+      </td>
+      <td>
+<details open>
+<summary><b>代码</b></summary>
+
+- HumanEval
+- HumanEvalX
+- MBPP
+- APPs
+- DS1000
+
+</details>
+      </td>
+    </tr>
+</td>
+    </tr>
+  </tbody>
+</table>
+
+<p align="right"><a href="#top">🔝返回顶部</a></p>
+
+## 📖 模型支持
+
+<table align="center">
+  <tbody>
+    <tr align="center" valign="bottom">
+      <td>
+        <b>开源模型</b>
+      </td>
+      <td>
+        <b>API 模型</b>
+      </td>
+      <!-- <td>
+        <b>自定义模型</b>
+      </td> -->
+    </tr>
+    <tr valign="top">
+      <td>
+
+- InternLM
+- LLaMA
+- Vicuna
+- Alpaca
+- Baichuan
+- WizardLM
+- ChatGLM2
+- Falcon
+- TigerBot
+- Qwen
+- ……
+
+</td>
+<td>
+
+- OpenAI
+- Claude
+- PaLM (即将推出)
+- ……
+
+</td>
+
+</tr>
+  </tbody>
+</table>
+
+<p align="right"><a href="#top">🔝返回顶部</a></p>
+
 ## 🔜 路线图

 - [ ] 主观评测