OpenCompass/README_zh-CN.md

<div align="center">
  <img src="docs/zh_cn/_static/image/logo.svg" width="500px"/>
  <br />
  <br />

[![][github-release-shield]][github-release-link]
[![][github-releasedate-shield]][github-releasedate-link]
[![][github-contributors-shield]][github-contributors-link]<br>
[![][github-forks-shield]][github-forks-link]
[![][github-stars-shield]][github-stars-link]
[![][github-issues-shield]][github-issues-link]
[![][github-license-shield]][github-license-link]

<!-- [![PyPI](https://badge.fury.io/py/opencompass.svg)](https://pypi.org/project/opencompass/) -->

[🌐官方网站](https://opencompass.org.cn/) |
[📖数据集社区](https://hub.opencompass.org.cn/home) |
[📊性能榜单](https://rank.opencompass.org.cn/home) |
[📘文档教程](https://opencompass.readthedocs.io/zh_CN/latest/index.html) |
[🛠️安装](https://opencompass.readthedocs.io/zh_CN/latest/get_started/installation.html) |
[🤔报告问题](https://github.com/open-compass/opencompass/issues/new/choose)

[English](/README.md) | 简体中文

[![][github-trending-shield]][github-trending-url]

</div>

<p align="center">
    👋 加入我们的 <a href="https://discord.gg/KKwfEbFj7U" target="_blank">Discord</a> 和 <a href="https://r.vansin.top/?r=opencompass" target="_blank">微信社区</a>
</p>

> \[!IMPORTANT\]
>
> **收藏项目**，你将能第一时间获取 OpenCompass 的最新动态～⭐️

## 📣 OpenCompass 2.0

我们很高兴发布 OpenCompass 司南 2.0 大模型评测体系，它主要由三大核心模块构建而成：[CompassKit](https://github.com/open-compass)、[CompassHub](https://hub.opencompass.org.cn/home)以及[CompassRank](https://rank.opencompass.org.cn/home)。

**CompassRank** 系统进行了重大革新与提升，现已成为一个兼容并蓄的排行榜体系，不仅囊括了开源基准测试项目，还包含了私有基准测试。此番升级极大地拓宽了对行业内各类模型进行全面而深入测评的可能性。

**CompassHub** 创新性地推出了一个基准测试资源导航平台，其设计初衷旨在简化和加快研究人员及行业从业者在多样化的基准测试库中进行搜索与利用的过程。为了让更多独具特色的基准测试成果得以在业内广泛传播和应用，我们热忱欢迎各位将自定义的基准数据贡献至CompassHub平台。只需轻点鼠标，通过访问[这里](https://hub.opencompass.org.cn/dataset-submit)，即可启动提交流程。

**CompassKit** 是一系列专为大型语言模型和大型视觉-语言模型打造的强大评估工具合集，它所提供的全面评测工具集能够有效地对这些复杂模型的功能性能进行精准测量和科学评估。在此，我们诚挚邀请您在学术研究或产品研发过程中积极尝试运用我们的工具包，以助您取得更加丰硕的研究成果和产品优化效果。

<details>
  <summary><kbd>Star History</kbd></summary>
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=open-compass%2Fopencompass&theme=dark&type=Date">
    <img width="100%" src="https://api.star-history.com/svg?repos=open-compass%2Fopencompass&type=Date">
  </picture>
</details>

## 🧭	欢迎

来到**OpenCompass**！

就像指南针在我们的旅程中为我们导航一样，我们希望OpenCompass能够帮助你穿越评估大型语言模型的重重迷雾。OpenCompass提供丰富的算法和功能支持，期待OpenCompass能够帮助社区更便捷地对NLP模型的性能进行公平全面的评估。

🚩🚩🚩 欢迎加入 OpenCompass！我们目前**招聘全职研究人员/工程师和实习生**。如果您对 LLM 和 OpenCompass 充满热情，请随时通过[电子邮件](mailto:zhangsongyang@pjlab.org.cn)与我们联系。我们非常期待与您交流！

🔥🔥🔥 祝贺 **OpenCompass 作为大模型标准测试工具被Meta AI官方推荐**, 点击 Llama 的 [入门文档](https://ai.meta.com/llama/get-started/#validation) 获取更多信息。

> **注意**<br />
> 我们正式启动 OpenCompass 共建计划，诚邀社区用户为 OpenCompass 提供更具代表性和可信度的客观评测数据集!
> 点击 [Issue](https://github.com/open-compass/opencompass/issues/248) 获取更多数据集.
> 让我们携手共进，打造功能强大易用的大模型评测平台！

## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>

- **\[2024.07.23\]** 我们支持了[Gemma2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315)模型，欢迎试用！🔥🔥🔥
- **\[2024.07.23\]** 我们支持了[ModelScope](www.modelscope.cn)数据集，您可以按需加载，无需事先下载全部数据到本地，欢迎试用！🔥🔥🔥
- **\[2024.07.17\]** 我们发布了CompassBench-202408榜单的示例数据和评测规则，敬请访问 [CompassBench](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/compassbench_intro.html) 获取更多信息。 🔥🔥🔥
- **\[2024.07.17\]** 我们正式发布 NeedleBench 的[技术报告](http://arxiv.org/abs/2407.11963)。诚邀您访问我们的[帮助文档](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/needleinahaystack_eval.html)进行评估。🔥🔥🔥
- **\[2024.07.04\]** OpenCompass 现已支持 InternLM2.5， 它拥有卓越的推理性能、有效支持百万字超长上下文以及工具调用能力整体升级，欢迎访问[OpenCompass Config](https://github.com/open-compass/opencompass/tree/main/configs/models/hf_internlm) 和 [InternLM](https://github.com/InternLM/InternLM) .🔥🔥🔥.
- **\[2024.06.20\]** OpenCompass 现已支持一键切换推理加速后端，助力评测过程更加高效。除了默认的HuggingFace推理后端外，还支持了常用的 [LMDeploy](https://github.com/InternLM/lmdeploy) 和 [vLLM](https://github.com/vllm-project/vllm) ，支持命令行一键切换和部署 API 加速服务两种方式，详细使用方法见[文档](docs/zh_cn/advanced_guides/accelerator_intro.md)。
  欢迎试用！🔥🔥🔥.
- **\[2024.05.08\]** 我们支持了以下四个MoE模型的评测配置文件: [Mixtral-8x22B-v0.1](configs/models/mixtral/hf_mixtral_8x22b_v0_1.py), [Mixtral-8x22B-Instruct-v0.1](configs/models/mixtral/hf_mixtral_8x22b_instruct_v0_1.py), [Qwen1.5-MoE-A2.7B](configs/models/qwen/hf_qwen1_5_moe_a2_7b.py), [Qwen1.5-MoE-A2.7B-Chat](configs/models/qwen/hf_qwen1_5_moe_a2_7b_chat.py) 。欢迎试用!
- **\[2024.04.30\]** 我们支持了计算模型在给定[数据集](configs/datasets/llm_compression/README.md)上的压缩率（Bits per Character）的评测方法（[官方文献](https://github.com/hkust-nlp/llm-compression-intelligence)）。欢迎试用[llm-compression](configs/eval_llm_compression.py)评测集! 🔥🔥🔥
- **\[2024.04.26\]** 我们报告了典型LLM在常用基准测试上的表现，欢迎访问[文档](https://opencompass.readthedocs.io/zh-cn/latest/user_guides/corebench.html)以获取更多信息！🔥🔥🔥.
- **\[2024.04.26\]** 我们废弃了 OpenCompass 进行多模态大模型评测的功能，相关功能转移至 [VLMEvalKit](https://github.com/open-compass/VLMEvalKit)，推荐使用！🔥🔥🔥.
- **\[2024.04.26\]** 我们支持了 [ArenaHard评测](configs/eval_subjective_arena_hard.py) 欢迎试用！🔥🔥🔥.
- **\[2024.04.22\]** 我们支持了 [LLaMA3](configs/models/hf_llama/hf_llama3_8b.py) 和 [LLaMA3-Instruct](configs/models/hf_llama/hf_llama3_8b_instruct.py) 的评测，欢迎试用！🔥🔥🔥.
- **\[2024.02.29\]** 我们支持了MT-Bench、AlpacalEval和AlignBench，更多信息可以在[这里](https://opencompass.readthedocs.io/en/latest/advanced_guides/subjective_evaluation.html)找到。
- **\[2024.01.30\]** 我们发布了OpenCompass 2.0。更多信息，请访问[CompassKit](https://github.com/open-compass)、[CompassHub](https://hub.opencompass.org.cn/home)和[CompassRank](https://rank.opencompass.org.cn/home)。

> [更多](docs/zh_cn/notes/news.md)

## ✨ 介绍

![image](https://github.com/open-compass/opencompass/assets/22607038/30bcb2e2-3969-4ac5-9f29-ad3f4abb4f3b)

OpenCompass 是面向大模型评测的一站式平台。其主要特点如下：

- **开源可复现**：提供公平、公开、可复现的大模型评测方案

- **全面的能力维度**：五大维度设计，提供 70+ 个数据集约 40 万题的的模型评测方案，全面评估模型能力

- **丰富的模型支持**：已支持 20+ HuggingFace 及 API 模型

- **分布式高效评测**：一行命令实现任务分割和分布式评测，数小时即可完成千亿模型全量评测

- **多样化评测范式**：支持零样本、小样本及思维链评测，结合标准型或对话型提示词模板，轻松激发各种模型最大性能

- **灵活化拓展**：想增加新模型或数据集？想要自定义更高级的任务分割策略，甚至接入新的集群管理系统？OpenCompass 的一切均可轻松扩展！

## 📊 性能榜单

我们将陆续提供开源模型和 API 模型的具体性能榜单，请见 [OpenCompass Leaderboard](https://rank.opencompass.org.cn/home) 。如需加入评测，请提供模型仓库地址或标准的 API 接口至邮箱  `opencompass@pjlab.org.cn`.

<p align="right"><a href="#top">🔝返回顶部</a></p>

## 🛠️ 安装

下面展示了快速安装以及准备数据集的步骤。

### 💻 环境配置

#### 面向开源模型的GPU环境

```bash
conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
conda activate opencompass
git clone https://github.com/open-compass/opencompass opencompass
cd opencompass
pip install -e .
```

#### 面向API模型测试的CPU环境

```bash
conda create -n opencompass python=3.10 pytorch torchvision torchaudio cpuonly -c pytorch -y
conda activate opencompass
git clone https://github.com/open-compass/opencompass opencompass
cd opencompass
pip install -e .
# 如果需要使用各个API模型，请 `pip install -r requirements/api.txt` 安装API模型的相关依赖
```

### 📂 数据准备

#### 提前离线下载

OpenCompass支持使用本地数据集进行评测，数据集的下载和解压可以通过以下命令完成：

```bash
# 下载数据集到 data/ 处
wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
unzip OpenCompassData-core-20240207.zip
```

#### 从 OpenCompass 自动下载

我们已经支持从OpenCompass存储服务器自动下载数据集。您可以通过额外的 `--dry-run` 参数来运行评估以下载这些数据集。
目前支持的数据集列表在[这里](https://github.com/open-compass/opencompass/blob/main/opencompass/utils/datasets_info.py#L259)。更多数据集将会很快上传。

#### (可选) 使用 ModelScope 自动下载

另外，您还可以使用[ModelScope](www.modelscope.cn)来加载数据集：
环境准备：

```bash
pip install modelscope
export DATASET_SOURCE=ModelScope
```

配置好环境后，无需下载全部数据，直接提交评测任务即可。目前支持的数据集有：

```bash
humaneval, triviaqa, commonsenseqa, tydiqa, strategyqa, cmmlu, lambada, piqa, ceval, math, LCSTS, Xsum, winogrande, openbookqa, AGIEval, gsm8k, nq, race, siqa, mbpp, mmlu, hellaswag, ARC, BBH, xstory_cloze, summedits, GAOKAO-BENCH, OCNLI, cmnli
```

有部分第三方功能,如 Humaneval 以及 Llama,可能需要额外步骤才能正常运行，详细步骤请参考[安装指南](https://opencompass.readthedocs.io/zh_CN/latest/get_started/installation.html)。

<p align="right"><a href="#top">🔝返回顶部</a></p>

## 🏗️ ️评测

确保按照上述步骤正确安装 OpenCompass 并准备好数据集后，可以通过以下命令评测 LLaMA-7b 模型在 MMLU 和 C-Eval 数据集上的性能：

```bash
python run.py --models hf_llama_7b --datasets mmlu_ppl ceval_ppl
```

另外，如果想使用除了 HuggingFace 外的推理后端进行加速评测，如 LMDeploy 或 vLLM，可以通过以下命令。使用前请确保您已经安装了相应后端的软件包，以及模型支持使用该后端进行加速推理，更多内容见推理加速后端[文档](docs/zh_cn/advanced_guides/accelerator_intro.md)，下面以LMDeploy为例：

```bash
python run.py --models hf_llama_7b --datasets mmlu_ppl ceval_ppl -a lmdeploy
```

OpenCompass 预定义了许多模型和数据集的配置，你可以通过 [工具](./docs/zh_cn/tools.md#ListConfigs) 列出所有可用的模型和数据集配置。

```bash
# 列出所有配置
python tools/list_configs.py
# 列出所有跟 llama 及 mmlu 相关的配置
python tools/list_configs.py llama mmlu
```

你也可以通过命令行去评测其它 HuggingFace 模型。同样以 LLaMA-7b 为例：

```bash
python run.py --datasets ceval_ppl mmlu_ppl --hf-type base --hf-path huggyllama/llama-7b
```

通过命令行或配置文件，OpenCompass 还支持评测 API 或自定义模型，以及更多样化的评测策略。请阅读[快速开始](https://opencompass.readthedocs.io/zh_CN/latest/get_started/quick_start.html)了解如何运行一个评测任务。

更多教程请查看我们的[文档](https://opencompass.readthedocs.io/zh_CN/latest/index.html)。

<p align="right"><a href="#top">🔝返回顶部</a></p>

## 📖 数据集支持

<table align="center">
  <tbody>
    <tr align="center" valign="bottom">
      <td>
        <b>语言</b>
      </td>
      <td>
        <b>知识</b>
      </td>
      <td>
        <b>推理</b>
      </td>
      <td>
        <b>考试</b>
      </td>
    </tr>
    <tr valign="top">
      <td>
<details open>
<summary><b>字词释义</b></summary>

- WiC
- SummEdits

</details>

<details open>
<summary><b>成语习语</b></summary>

- CHID

</details>

<details open>
<summary><b>语义相似度</b></summary>

- AFQMC
- BUSTM

</details>

<details open>
<summary><b>指代消解</b></summary>

- CLUEWSC
- WSC
- WinoGrande

</details>

<details open>
<summary><b>翻译</b></summary>

- Flores
- IWSLT2017

</details>

<details open>
<summary><b>多语种问答</b></summary>

- TyDi-QA
- XCOPA

</details>

<details open>
<summary><b>多语种总结</b></summary>

- XLSum

</details>
      </td>
      <td>
<details open>
<summary><b>知识问答</b></summary>

- BoolQ
- CommonSenseQA
- NaturalQuestions
- TriviaQA

</details>
      </td>
      <td>
<details open>
<summary><b>文本蕴含</b></summary>

- CMNLI
- OCNLI
- OCNLI_FC
- AX-b
- AX-g
- CB
- RTE
- ANLI

</details>

<details open>
<summary><b>常识推理</b></summary>

- StoryCloze
- COPA
- ReCoRD
- HellaSwag
- PIQA
- SIQA

</details>

<details open>
<summary><b>数学推理</b></summary>

- MATH
- GSM8K

</details>

<details open>
<summary><b>定理应用</b></summary>

- TheoremQA
- StrategyQA
- SciBench

</details>

<details open>
<summary><b>综合推理</b></summary>

- BBH

</details>
      </td>
      <td>
<details open>
<summary><b>初中/高中/大学/职业考试</b></summary>

- C-Eval
- AGIEval
- MMLU
- GAOKAO-Bench
- CMMLU
- ARC
- Xiezhi

</details>

<details open>
<summary><b>医学考试</b></summary>

- CMB

</details>
      </td>
    </tr>
</td>
    </tr>
  </tbody>
  <tbody>
    <tr align="center" valign="bottom">
      <td>
        <b>理解</b>
      </td>
      <td>
        <b>长文本</b>
      </td>
      <td>
        <b>安全</b>
      </td>
      <td>
        <b>代码</b>
      </td>
    </tr>
    <tr valign="top">
      <td>
<details open>
<summary><b>阅读理解</b></summary>

- C3
- CMRC
- DRCD
- MultiRC
- RACE
- DROP
- OpenBookQA
- SQuAD2.0

</details>

<details open>
<summary><b>内容总结</b></summary>

- CSL
- LCSTS
- XSum
- SummScreen

</details>

<details open>
<summary><b>内容分析</b></summary>

- EPRSTMT
- LAMBADA
- TNEWS

</details>
      </td>
      <td>
<details open>
<summary><b>长文本理解</b></summary>

- LEval
- LongBench
- GovReports
- NarrativeQA
- Qasper

</details>
      </td>
      <td>
<details open>
<summary><b>安全</b></summary>

- CivilComments
- CrowsPairs
- CValues
- JigsawMultilingual
- TruthfulQA

</details>
<details open>
<summary><b>健壮性</b></summary>

- AdvGLUE

</details>
      </td>
      <td>
<details open>
<summary><b>代码</b></summary>

- HumanEval
- HumanEvalX
- MBPP
- APPs
- DS1000

</details>
      </td>
    </tr>
</td>
    </tr>
  </tbody>
</table>

<p align="right"><a href="#top">🔝返回顶部</a></p>

## 📖 模型支持

<table align="center">
  <tbody>
    <tr align="center" valign="bottom">
      <td>
        <b>开源模型</b>
      </td>
      <td>
        <b>API 模型</b>
      </td>
      <!-- <td>
        <b>自定义模型</b>
      </td> -->
    </tr>
    <tr valign="top">
      <td>

- [Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
- [Baichuan](https://github.com/baichuan-inc)
- [BlueLM](https://github.com/vivo-ai-lab/BlueLM)
- [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B)
- [ChatGLM3](https://github.com/THUDM/ChatGLM3-6B)
- [Gemma](https://huggingface.co/google/gemma-7b)
- [InternLM](https://github.com/InternLM/InternLM)
- [LLaMA](https://github.com/facebookresearch/llama)
- [LLaMA3](https://github.com/meta-llama/llama3)
- [Qwen](https://github.com/QwenLM/Qwen)
- [TigerBot](https://github.com/TigerResearch/TigerBot)
- [Vicuna](https://github.com/lm-sys/FastChat)
- [WizardLM](https://github.com/nlpxucan/WizardLM)
- [Yi](https://github.com/01-ai/Yi)
- ……

</td>
<td>

- OpenAI
- Gemini
- Claude
- ZhipuAI(ChatGLM)
- Baichuan
- ByteDance(YunQue)
- Huawei(PanGu)
- 360
- Baidu(ERNIEBot)
- MiniMax(ABAB-Chat)
- SenseTime(nova)
- Xunfei(Spark)
- ……

</td>

</tr>
  </tbody>
</table>

<p align="right"><a href="#top">🔝返回顶部</a></p>

## 🔜 路线图

- [x] 主观评测
  - [x] 发布主观评测榜单
  - [ ] 发布主观评测数据集
- [x] 长文本
  - [x] 支持广泛的长文本评测集
  - [ ] 发布长文本评测榜单
- [x] 代码能力
  - [ ] 发布代码能力评测榜单
  - [x] 提供非Python语言的评测服务
- [x] 智能体
  - [ ] 支持丰富的智能体方案
  - [x] 提供智能体评测榜单
- [x] 鲁棒性
  - [x] 支持各类攻击方法

## 👷‍♂️ 贡献

我们感谢所有的贡献者为改进和提升 OpenCompass 所作出的努力。请参考[贡献指南](https://opencompass.readthedocs.io/zh_CN/latest/notes/contribution_guide.html)来了解参与项目贡献的相关指引。

<a href="https://github.com/open-compass/opencompass/graphs/contributors" target="_blank">
  <table>
    <tr>
      <th colspan="2">
        <br><img src="https://contrib.rocks/image?repo=open-compass/opencompass"><br><br>
      </th>
    </tr>
  </table>
</a>

## 🤝 致谢

该项目部分的代码引用并修改自 [OpenICL](https://github.com/Shark-NLP/OpenICL)。

该项目部分的数据集和提示词实现修改自 [chain-of-thought-hub](https://github.com/FranxYao/chain-of-thought-hub), [instruct-eval](https://github.com/declare-lab/instruct-eval)

## 🖊️ 引用

```bibtex
@misc{2023opencompass,
    title={OpenCompass: A Universal Evaluation Platform for Foundation Models},
    author={OpenCompass Contributors},
    howpublished = {\url{https://github.com/open-compass/opencompass}},
    year={2023}
}
```

<p align="right"><a href="#top">🔝返回顶部</a></p>

[github-contributors-link]: https://github.com/open-compass/opencompass/graphs/contributors
[github-contributors-shield]: https://img.shields.io/github/contributors/open-compass/opencompass?color=c4f042&labelColor=black&style=flat-square
[github-forks-link]: https://github.com/open-compass/opencompass/network/members
[github-forks-shield]: https://img.shields.io/github/forks/open-compass/opencompass?color=8ae8ff&labelColor=black&style=flat-square
[github-issues-link]: https://github.com/open-compass/opencompass/issues
[github-issues-shield]: https://img.shields.io/github/issues/open-compass/opencompass?color=ff80eb&labelColor=black&style=flat-square
[github-license-link]: https://github.com/open-compass/opencompass/blob/main/LICENSE
[github-license-shield]: https://img.shields.io/github/license/open-compass/opencompass?color=white&labelColor=black&style=flat-square
[github-release-link]: https://github.com/open-compass/opencompass/releases
[github-release-shield]: https://img.shields.io/github/v/release/open-compass/opencompass?color=369eff&labelColor=black&logo=github&style=flat-square
[github-releasedate-link]: https://github.com/open-compass/opencompass/releases
[github-releasedate-shield]: https://img.shields.io/github/release-date/open-compass/opencompass?labelColor=black&style=flat-square
[github-stars-link]: https://github.com/open-compass/opencompass/stargazers
[github-stars-shield]: https://img.shields.io/github/stars/open-compass/opencompass?color=ffcb47&labelColor=black&style=flat-square
[github-trending-shield]: https://trendshift.io/api/badge/repositories/6630
[github-trending-url]: https://trendshift.io/repositories/6630
-												initial commit

											
										
										
											2023-07-04 21:34:55 +08:00
+								<div align="center">
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
+								  <img src="docs/zh_cn/_static/image/logo.svg" width="500px"/>
 								  <br />
 								  <br />
-												initial commit

											
										
										
											2023-07-04 21:34:55 +08:00
-												[Doc] Update README (#1053)

* [Update] Update readme

* [Update] Update readme

* [Update] Update readme
											
										
										
											2024-04-16 19:54:12 +08:00
+								[![][github-release-shield]][github-release-link]
 								[![][github-releasedate-shield]][github-releasedate-link]
 								[![][github-contributors-shield]][github-contributors-link]<br>
 								[![][github-forks-shield]][github-forks-link]
 								[![][github-stars-shield]][github-stars-link]
 								[![][github-issues-shield]][github-issues-link]
 								[![][github-license-shield]][github-license-link]
-												[Docs] Update readme (#34)


											
										
										
											2023-07-08 10:42:30 +08:00
-												[Fix] fix readme (#31)


											
										
										
											2023-07-07 17:08:33 +08:00
+								<!-- [![PyPI](https://badge.fury.io/py/opencompass.svg)](https://pypi.org/project/opencompass/) -->
-												initial commit

											
										
										
											2023-07-04 21:34:55 +08:00
-												[Docs] Update README (#956)

* [Docs] Update README

* Update README.md

* [Docs] Update README
											
										
										
											2024-03-12 11:40:34 +08:00
+								[🌐官方网站](https://opencompass.org.cn/) |
 								[📖数据集社区](https://hub.opencompass.org.cn/home) |
 								[📊性能榜单](https://rank.opencompass.org.cn/home) |
 								[📘文档教程](https://opencompass.readthedocs.io/zh_CN/latest/index.html) |
 								[🛠️安装](https://opencompass.readthedocs.io/zh_CN/latest/get_started/installation.html) |
 								[🤔报告问题](https://github.com/open-compass/opencompass/issues/new/choose)
-												initial commit

											
										
										
											2023-07-04 21:34:55 +08:00
 								[English](/README.md) | 简体中文
-												[Doc] Update README (#1053)

* [Update] Update readme

* [Update] Update readme

* [Update] Update readme
											
										
										
											2024-04-16 19:54:12 +08:00
+								[![][github-trending-shield]][github-trending-url]
-												initial commit

											
										
										
											2023-07-04 21:34:55 +08:00
+								</div>
-												[Doc] add discord and wechat link (#64)


											
										
										
											2023-07-14 15:33:43 +08:00
+								<p align="center">
-												[Docs] Update wechat and discord (#328)


											
										
										
											2023-08-29 23:30:39 +08:00
+								    👋 加入我们的 <a href="https://discord.gg/KKwfEbFj7U" target="_blank">Discord</a> 和 <a href="https://r.vansin.top/?r=opencompass" target="_blank">微信社区</a>
-												[Doc] add discord and wechat link (#64)


											
										
										
											2023-07-14 15:33:43 +08:00
+								</p>
-												[Doc] Update README (#1053)

* [Update] Update readme

* [Update] Update readme

* [Update] Update readme
											
										
										
											2024-04-16 19:54:12 +08:00
+								> \[!IMPORTANT\]
 								>
 								> **收藏项目**，你将能第一时间获取 OpenCompass 的最新动态～⭐️
-												[Docs] Update README (#956)

* [Docs] Update README

* Update README.md

* [Docs] Update README
											
										
										
											2024-03-12 11:40:34 +08:00
+								## 📣 OpenCompass 2.0
-												[Doc] Update README (#629)

* update readme

* fix typo

* Update README.md

---------

Co-authored-by: liushz <qq1791167085@163.com>
											
										
										
											2023-11-24 11:24:00 +08:00
-												[Docs] Update README (#956)

* [Docs] Update README

* Update README.md

* [Docs] Update README
											
										
										
											2024-03-12 11:40:34 +08:00
+								我们很高兴发布 OpenCompass 司南 2.0 大模型评测体系，它主要由三大核心模块构建而成：[CompassKit](https://github.com/open-compass)、[CompassHub](https://hub.opencompass.org.cn/home)以及[CompassRank](https://rank.opencompass.org.cn/home)。
-												[Doc] Update README (#629)

* update readme

* fix typo

* Update README.md

---------

Co-authored-by: liushz <qq1791167085@163.com>
											
										
										
											2023-11-24 11:24:00 +08:00
-												[Docs] Update README (#956)

* [Docs] Update README

* Update README.md

* [Docs] Update README
											
										
										
											2024-03-12 11:40:34 +08:00
+								**CompassRank** 系统进行了重大革新与提升，现已成为一个兼容并蓄的排行榜体系，不仅囊括了开源基准测试项目，还包含了私有基准测试。此番升级极大地拓宽了对行业内各类模型进行全面而深入测评的可能性。
-												[Doc] Update README (#629)

* update readme

* fix typo

* Update README.md

---------

Co-authored-by: liushz <qq1791167085@163.com>
											
										
										
											2023-11-24 11:24:00 +08:00
-												[Docs] Update README (#956)

* [Docs] Update README

* Update README.md

* [Docs] Update README
											
										
										
											2024-03-12 11:40:34 +08:00
+								**CompassHub** 创新性地推出了一个基准测试资源导航平台，其设计初衷旨在简化和加快研究人员及行业从业者在多样化的基准测试库中进行搜索与利用的过程。为了让更多独具特色的基准测试成果得以在业内广泛传播和应用，我们热忱欢迎各位将自定义的基准数据贡献至CompassHub平台。只需轻点鼠标，通过访问[这里](https://hub.opencompass.org.cn/dataset-submit)，即可启动提交流程。
-												[Doc] Update README (#629)

* update readme

* fix typo

* Update README.md

---------

Co-authored-by: liushz <qq1791167085@163.com>
											
										
										
											2023-11-24 11:24:00 +08:00
-												[Docs] Update README (#956)

* [Docs] Update README

* Update README.md

* [Docs] Update README
											
										
										
											2024-03-12 11:40:34 +08:00
+								**CompassKit** 是一系列专为大型语言模型和大型视觉-语言模型打造的强大评估工具合集，它所提供的全面评测工具集能够有效地对这些复杂模型的功能性能进行精准测量和科学评估。在此，我们诚挚邀请您在学术研究或产品研发过程中积极尝试运用我们的工具包，以助您取得更加丰硕的研究成果和产品优化效果。
-												[Doc] Update README (#629)

* update readme

* fix typo

* Update README.md

---------

Co-authored-by: liushz <qq1791167085@163.com>
											
										
										
											2023-11-24 11:24:00 +08:00
-												[Doc] Update README (#1053)

* [Update] Update readme

* [Update] Update readme

* [Update] Update readme
											
										
										
											2024-04-16 19:54:12 +08:00
+								<details>
 								  <summary><kbd>Star History</kbd></summary>
 								  <picture>
 								    <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=open-compass%2Fopencompass&theme=dark&type=Date">
 								    <img width="100%" src="https://api.star-history.com/svg?repos=open-compass%2Fopencompass&type=Date">
 								  </picture>
 								</details>
-												[Docs] update readme (#165)


											
										
										
											2023-08-08 12:49:04 +08:00
+								## 🧭	欢迎
 								来到**OpenCompass**！
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
 								就像指南针在我们的旅程中为我们导航一样，我们希望OpenCompass能够帮助你穿越评估大型语言模型的重重迷雾。OpenCompass提供丰富的算法和功能支持，期待OpenCompass能够帮助社区更便捷地对NLP模型的性能进行公平全面的评估。
-												update readme (#540)


											
										
										
											2023-11-06 16:40:09 +08:00
+								🚩🚩🚩 欢迎加入 OpenCompass！我们目前**招聘全职研究人员/工程师和实习生**。如果您对 LLM 和 OpenCompass 充满热情，请随时通过[电子邮件](mailto:zhangsongyang@pjlab.org.cn)与我们联系。我们非常期待与您交流！
-												[Deperecate] Remove multi-modal related stuff (#1072)

* Remove MultiModal

* update index.rst

* update README

* remove mmbench codes

* update news

---------

Co-authored-by: Leymore <zfz-960727@163.com>
											
										
										
											2024-04-26 21:20:14 +08:00
+								🔥🔥🔥 祝贺 **OpenCompass 作为大模型标准测试工具被Meta AI官方推荐**, 点击 Llama 的 [入门文档](https://ai.meta.com/llama/get-started/#validation) 获取更多信息。
-												[Doc] Update README and FAQ (#535)

* update readme

* update readme and faq
											
										
										
											2023-11-02 15:16:37 +08:00
 								> **注意**<br />
-												Update README.md (#262)

* Update README.md

* update news and readme

* update

* update

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-08-25 18:53:35 +08:00
+								> 我们正式启动 OpenCompass 共建计划，诚邀社区用户为 OpenCompass 提供更具代表性和可信度的客观评测数据集!
-												[Feat] Update URL (#368)


											
										
										
											2023-09-07 17:29:50 +08:00
+								> 点击 [Issue](https://github.com/open-compass/opencompass/issues/248) 获取更多数据集.
-												Update README.md (#262)

* Update README.md

* update news and readme

* update

* update

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-08-25 18:53:35 +08:00
+								> 让我们携手共进，打造功能强大易用的大模型评测平台！
-												[Docs] update readme (#165)


											
										
										
											2023-08-08 12:49:04 +08:00
+								## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
-												[Feature] Add llama-2 models (#81)

* add llama-2 models

* update docs

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-07-19 19:51:29 +08:00
-												[Feature] Support OpenAI ChatCompletion (#1389)

* [Feature] Support import configs/models/summarizers from whl

* Update

* Update openai sdk

* Update

* Update gemma
											
										
										
											2024-08-01 19:10:13 +08:00
+								- **\[2024.07.23\]** 我们支持了[Gemma2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315)模型，欢迎试用！🔥🔥🔥
-												[Feature] Support ModelScope datasets (#1289)

* add ceval, gsm8k modelscope surpport

* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest

* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets

* format file

* format file

* update dataset format

* support ms_dataset

* udpate dataset for modelscope support

* merge myl_dev and update test_ms_dataset

* udpate dataset for modelscope support

* update readme

* update eval_api_zhipu_v2

* remove unused code

* add get_data_path function

* update readme

* remove tydiqa japanese subset

* add ceval, gsm8k modelscope surpport

* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest

* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets

* format file

* format file

* update dataset format

* support ms_dataset

* udpate dataset for modelscope support

* merge myl_dev and update test_ms_dataset

* update readme

* udpate dataset for modelscope support

* update eval_api_zhipu_v2

* remove unused code

* add get_data_path function

* remove tydiqa japanese subset

* update util

* remove .DS_Store

* fix md format

* move util into package

* update docs/get_started.md

* restore eval_api_zhipu_v2.py, add environment setting

* Update dataset

* Update

* Update

* Update

* Update

---------

Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local>
Co-authored-by: Yunnglin <mao.looper@qq.com>
Co-authored-by: Yun lin <yunlin@laptop.local>
Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
											
										
										
											2024-07-29 13:48:32 +08:00
+								- **\[2024.07.23\]** 我们支持了[ModelScope](www.modelscope.cn)数据集，您可以按需加载，无需事先下载全部数据到本地，欢迎试用！🔥🔥🔥
-												[Feature] CompassBench v1_3 subjective evaluation (#1341)

* stash files

* compassbench subjective evaluation added

* evaluation update

* remove unneeded content

* fix lint

* update docs

* Update lint

* Update

---------

Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
											
										
										
											2024-07-19 23:12:23 +08:00
+								- **\[2024.07.17\]** 我们发布了CompassBench-202408榜单的示例数据和评测规则，敬请访问 [CompassBench](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/compassbench_intro.html) 获取更多信息。 🔥🔥🔥
-												[Doc] Update NeedleBench Docs (#1330)

* update needlebench docs

* update model_name_mapping dict

* update README

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2024-07-18 13:16:19 +08:00
+								- **\[2024.07.17\]** 我们正式发布 NeedleBench 的[技术报告](http://arxiv.org/abs/2407.11963)。诚邀您访问我们的[帮助文档](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/needleinahaystack_eval.html)进行评估。🔥🔥🔥
-												[Feature] Add InternLM2.5 (#1286)

* [Feature] Add InternLM2.5

* Update

* update readme

---------

Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
											
										
										
											2024-07-04 20:10:31 +08:00
+								- **\[2024.07.04\]** OpenCompass 现已支持 InternLM2.5， 它拥有卓越的推理性能、有效支持百万字超长上下文以及工具调用能力整体升级，欢迎访问[OpenCompass Config](https://github.com/open-compass/opencompass/tree/main/configs/models/hf_internlm) 和 [InternLM](https://github.com/InternLM/InternLM) .🔥🔥🔥.
-												Add doc for accelerator function (#1252)

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Update acclerator

* Update MathBench

* Update accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Add Doc for accelerator

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
											
										
										
											2024-06-24 14:53:51 +08:00
+								- **\[2024.06.20\]** OpenCompass 现已支持一键切换推理加速后端，助力评测过程更加高效。除了默认的HuggingFace推理后端外，还支持了常用的 [LMDeploy](https://github.com/InternLM/lmdeploy) 和 [vLLM](https://github.com/vllm-project/vllm) ，支持命令行一键切换和部署 API 加速服务两种方式，详细使用方法见[文档](docs/zh_cn/advanced_guides/accelerator_intro.md)。
 								  欢迎试用！🔥🔥🔥.
-												[Feature] Add Qwen1.5 MoE 7b and Mixtral 8x22b model configs (#1123)

* added qwen moe and mixtral 8x22 model configs

* updated README files news section
											
										
										
											2024-05-09 11:04:26 +08:00
+								- **\[2024.05.08\]** 我们支持了以下四个MoE模型的评测配置文件: [Mixtral-8x22B-v0.1](configs/models/mixtral/hf_mixtral_8x22b_v0_1.py), [Mixtral-8x22B-Instruct-v0.1](configs/models/mixtral/hf_mixtral_8x22b_instruct_v0_1.py), [Qwen1.5-MoE-A2.7B](configs/models/qwen/hf_qwen1_5_moe_a2_7b.py), [Qwen1.5-MoE-A2.7B-Chat](configs/models/qwen/hf_qwen1_5_moe_a2_7b_chat.py) 。欢迎试用!
 								- **\[2024.04.30\]** 我们支持了计算模型在给定[数据集](configs/datasets/llm_compression/README.md)上的压缩率（Bits per Character）的评测方法（[官方文献](https://github.com/hkust-nlp/llm-compression-intelligence)）。欢迎试用[llm-compression](configs/eval_llm_compression.py)评测集! 🔥🔥🔥
-												[Update] Update performance of common benchmarks (#1109)

* [Update] Update performance of common benchmarks

* [Update] Update performance of common benchmarks

* [Update] Update performance of common benchmarks
											
										
										
											2024-04-30 00:09:08 +08:00
+								- **\[2024.04.26\]** 我们报告了典型LLM在常用基准测试上的表现，欢迎访问[文档](https://opencompass.readthedocs.io/zh-cn/latest/user_guides/corebench.html)以获取更多信息！🔥🔥🔥.
-												[Deperecate] Remove multi-modal related stuff (#1072)

* Remove MultiModal

* update index.rst

* update README

* remove mmbench codes

* update news

---------

Co-authored-by: Leymore <zfz-960727@163.com>
											
										
										
											2024-04-26 21:20:14 +08:00
+								- **\[2024.04.26\]** 我们废弃了 OpenCompass 进行多模态大模型评测的功能，相关功能转移至 [VLMEvalKit](https://github.com/open-compass/VLMEvalKit)，推荐使用！🔥🔥🔥.
-												[Feature] support arenahard evaluation (#1096)

* support arenahard

* support arenahard

* support arenahard
											
										
										
											2024-04-26 15:42:00 +08:00
+								- **\[2024.04.26\]** 我们支持了 [ArenaHard评测](configs/eval_subjective_arena_hard.py) 欢迎试用！🔥🔥🔥.
-												[Feature] Add LLaMA-3 Series Configs (#1065)

* add LLaMA-3 Series configs

* update readme
											
										
										
											2024-04-22 14:39:31 +08:00
+								- **\[2024.04.22\]** 我们支持了 [LLaMA3](configs/models/hf_llama/hf_llama3_8b.py) 和 [LLaMA3-Instruct](configs/models/hf_llama/hf_llama3_8b_instruct.py) 的评测，欢迎试用！🔥🔥🔥.
 								- **\[2024.02.29\]** 我们支持了MT-Bench、AlpacalEval和AlignBench，更多信息可以在[这里](https://opencompass.readthedocs.io/en/latest/advanced_guides/subjective_evaluation.html)找到。
 								- **\[2024.01.30\]** 我们发布了OpenCompass 2.0。更多信息，请访问[CompassKit](https://github.com/open-compass)、[CompassHub](https://hub.opencompass.org.cn/home)和[CompassRank](https://rank.opencompass.org.cn/home)。
-												Update README.md (#262)

* Update README.md

* update news and readme

* update

* update

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-08-25 18:53:35 +08:00
 								> [更多](docs/zh_cn/notes/news.md)
-												[Feature] Add llama-2 models (#81)

* add llama-2 models

* update docs

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-07-19 19:51:29 +08:00
-												[Docs] update readme (#165)


											
										
										
											2023-08-08 12:49:04 +08:00
+								## ✨ 介绍
-												initial commit

											
										
										
											2023-07-04 21:34:55 +08:00
-												[Docs] Add intro figure to README (#413)

* [Docs] Add intro figure to README

* update
											
										
										
											2023-09-19 20:19:35 +08:00
+								![image](https://github.com/open-compass/opencompass/assets/22607038/30bcb2e2-3969-4ac5-9f29-ad3f4abb4f3b)
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
+								OpenCompass 是面向大模型评测的一站式平台。其主要特点如下：
 								- **开源可复现**：提供公平、公开、可复现的大模型评测方案
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								- **全面的能力维度**：五大维度设计，提供 70+ 个数据集约 40 万题的的模型评测方案，全面评估模型能力
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
 								- **丰富的模型支持**：已支持 20+ HuggingFace 及 API 模型
 								- **分布式高效评测**：一行命令实现任务分割和分布式评测，数小时即可完成千亿模型全量评测
 								- **多样化评测范式**：支持零样本、小样本及思维链评测，结合标准型或对话型提示词模板，轻松激发各种模型最大性能
 								- **灵活化拓展**：想增加新模型或数据集？想要自定义更高级的任务分割策略，甚至接入新的集群管理系统？OpenCompass 的一切均可轻松扩展！
-												[Docs] update readme (#165)


											
										
										
											2023-08-08 12:49:04 +08:00
+								## 📊 性能榜单
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
-												[Docs] Update rank link (#911)


											
										
										
											2024-03-05 20:33:44 +08:00
+								我们将陆续提供开源模型和 API 模型的具体性能榜单，请见 [OpenCompass Leaderboard](https://rank.opencompass.org.cn/home) 。如需加入评测，请提供模型仓库地址或标准的 API 接口至邮箱  `opencompass@pjlab.org.cn`.
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
-												[Docs] update readme (#165)


											
										
										
											2023-08-08 12:49:04 +08:00
+								<p align="right"><a href="#top">🔝返回顶部</a></p>
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								## 🛠️ 安装
 								下面展示了快速安装以及准备数据集的步骤。
-												[Doc] Update README and requirements. (#622)

* update readme

* update doc
											
										
										
											2023-11-22 19:16:54 +08:00
+								### 💻 环境配置
 								#### 面向开源模型的GPU环境
 								```bash
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
 								conda activate opencompass
 								git clone https://github.com/open-compass/opencompass opencompass
 								cd opencompass
 								pip install -e .
-												[Doc] Update README and requirements. (#622)

* update readme

* update doc
											
										
										
											2023-11-22 19:16:54 +08:00
+								```
 								#### 面向API模型测试的CPU环境
 								```bash
 								conda create -n opencompass python=3.10 pytorch torchvision torchaudio cpuonly -c pytorch -y
 								conda activate opencompass
 								git clone https://github.com/open-compass/opencompass opencompass
 								cd opencompass
 								pip install -e .
 								# 如果需要使用各个API模型，请 `pip install -r requirements/api.txt` 安装API模型的相关依赖
 								```
 								### 📂 数据准备
-												[Doc] Update README (#1404)

* [Doc] Update README

* Update
											
										
										
											2024-08-08 16:18:33 +08:00
+								#### 提前离线下载
-												[Feature] Support ModelScope datasets (#1289)

* add ceval, gsm8k modelscope surpport

* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest

* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets

* format file

* format file

* update dataset format

* support ms_dataset

* udpate dataset for modelscope support

* merge myl_dev and update test_ms_dataset

* udpate dataset for modelscope support

* update readme

* update eval_api_zhipu_v2

* remove unused code

* add get_data_path function

* update readme

* remove tydiqa japanese subset

* add ceval, gsm8k modelscope surpport

* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest

* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets

* format file

* format file

* update dataset format

* support ms_dataset

* udpate dataset for modelscope support

* merge myl_dev and update test_ms_dataset

* update readme

* udpate dataset for modelscope support

* update eval_api_zhipu_v2

* remove unused code

* add get_data_path function

* remove tydiqa japanese subset

* update util

* remove .DS_Store

* fix md format

* move util into package

* update docs/get_started.md

* restore eval_api_zhipu_v2.py, add environment setting

* Update dataset

* Update

* Update

* Update

* Update

---------

Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local>
Co-authored-by: Yunnglin <mao.looper@qq.com>
Co-authored-by: Yun lin <yunlin@laptop.local>
Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
											
										
										
											2024-07-29 13:48:32 +08:00
+								OpenCompass支持使用本地数据集进行评测，数据集的下载和解压可以通过以下命令完成：
-												[Doc] Update README and requirements. (#622)

* update readme

* update doc
											
										
										
											2023-11-22 19:16:54 +08:00
+								```bash
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								# 下载数据集到 data/ 处
-												[Update] Rename dataset pack (#922)


											
										
										
											2024-02-28 10:54:04 +08:00
+								wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
 								unzip OpenCompassData-core-20240207.zip
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								```
-												[Doc] Update README (#1404)

* [Doc] Update README

* Update
											
										
										
											2024-08-08 16:18:33 +08:00
+								#### 从 OpenCompass 自动下载
 								我们已经支持从OpenCompass存储服务器自动下载数据集。您可以通过额外的 `--dry-run` 参数来运行评估以下载这些数据集。
 								目前支持的数据集列表在[这里](https://github.com/open-compass/opencompass/blob/main/opencompass/utils/datasets_info.py#L259)。更多数据集将会很快上传。
 								#### (可选) 使用 ModelScope 自动下载
-												[Feature] Support ModelScope datasets (#1289)

* add ceval, gsm8k modelscope surpport

* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest

* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets

* format file

* format file

* update dataset format

* support ms_dataset

* udpate dataset for modelscope support

* merge myl_dev and update test_ms_dataset

* udpate dataset for modelscope support

* update readme

* update eval_api_zhipu_v2

* remove unused code

* add get_data_path function

* update readme

* remove tydiqa japanese subset

* add ceval, gsm8k modelscope surpport

* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest

* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets

* format file

* format file

* update dataset format

* support ms_dataset

* udpate dataset for modelscope support

* merge myl_dev and update test_ms_dataset

* update readme

* udpate dataset for modelscope support

* update eval_api_zhipu_v2

* remove unused code

* add get_data_path function

* remove tydiqa japanese subset

* update util

* remove .DS_Store

* fix md format

* move util into package

* update docs/get_started.md

* restore eval_api_zhipu_v2.py, add environment setting

* Update dataset

* Update

* Update

* Update

* Update

---------

Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local>
Co-authored-by: Yunnglin <mao.looper@qq.com>
Co-authored-by: Yun lin <yunlin@laptop.local>
Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
											
										
										
											2024-07-29 13:48:32 +08:00
+								另外，您还可以使用[ModelScope](www.modelscope.cn)来加载数据集：
 								环境准备：
 								```bash
 								pip install modelscope
 								export DATASET_SOURCE=ModelScope
 								```
 								配置好环境后，无需下载全部数据，直接提交评测任务即可。目前支持的数据集有：
 								```bash
 								humaneval, triviaqa, commonsenseqa, tydiqa, strategyqa, cmmlu, lambada, piqa, ceval, math, LCSTS, Xsum, winogrande, openbookqa, AGIEval, gsm8k, nq, race, siqa, mbpp, mmlu, hellaswag, ARC, BBH, xstory_cloze, summedits, GAOKAO-BENCH, OCNLI, cmnli
 								```
-												[Docs] Fix dead links in readme (#455)


											
										
										
											2023-10-07 13:14:29 +08:00
+								有部分第三方功能,如 Humaneval 以及 Llama,可能需要额外步骤才能正常运行，详细步骤请参考[安装指南](https://opencompass.readthedocs.io/zh_CN/latest/get_started/installation.html)。
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
 								<p align="right"><a href="#top">🔝返回顶部</a></p>
 								## 🏗️ ️评测
 								确保按照上述步骤正确安装 OpenCompass 并准备好数据集后，可以通过以下命令评测 LLaMA-7b 模型在 MMLU 和 C-Eval 数据集上的性能：
 								```bash
 								python run.py --models hf_llama_7b --datasets mmlu_ppl ceval_ppl
 								```
-												Add doc for accelerator function (#1252)

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Update acclerator

* Update MathBench

* Update accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Add Doc for accelerator

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
											
										
										
											2024-06-24 14:53:51 +08:00
+								另外，如果想使用除了 HuggingFace 外的推理后端进行加速评测，如 LMDeploy 或 vLLM，可以通过以下命令。使用前请确保您已经安装了相应后端的软件包，以及模型支持使用该后端进行加速推理，更多内容见推理加速后端[文档](docs/zh_cn/advanced_guides/accelerator_intro.md)，下面以LMDeploy为例：
 								```bash
 								python run.py --models hf_llama_7b --datasets mmlu_ppl ceval_ppl -a lmdeploy
 								```
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								OpenCompass 预定义了许多模型和数据集的配置，你可以通过 [工具](./docs/zh_cn/tools.md#ListConfigs) 列出所有可用的模型和数据集配置。
 								```bash
 								# 列出所有配置
 								python tools/list_configs.py
 								# 列出所有跟 llama 及 mmlu 相关的配置
 								python tools/list_configs.py llama mmlu
 								```
 								你也可以通过命令行去评测其它 HuggingFace 模型。同样以 LLaMA-7b 为例：
 								```bash
-												[Feature] Add huggingface apply_chat_template (#1098)

* add TheoremQA with 5-shot

* add huggingface_above_v4_33 classes

* use num_worker partitioner in cli

* update theoremqa

* update TheoremQA

* add TheoremQA

* rename theoremqa -> TheoremQA

* update TheoremQA output path

* rewrite many model configs

* update huggingface

* further update

* refine configs

* update configs

* update configs

* add configs/eval_llama3_instruct.py

* add summarizer multi faceted

* update bbh datasets

* update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py

* rename class

* update readme

* update hf above v4.33
											
										
										
											2024-05-14 14:50:16 +08:00
+								python run.py --datasets ceval_ppl mmlu_ppl --hf-type base --hf-path huggyllama/llama-7b
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								```
-												[Docs] Fix dead links in readme (#455)


											
										
										
											2023-10-07 13:14:29 +08:00
+								通过命令行或配置文件，OpenCompass 还支持评测 API 或自定义模型，以及更多样化的评测策略。请阅读[快速开始](https://opencompass.readthedocs.io/zh_CN/latest/get_started/quick_start.html)了解如何运行一个评测任务。
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
 								更多教程请查看我们的[文档](https://opencompass.readthedocs.io/zh_CN/latest/index.html)。
 								<p align="right"><a href="#top">🔝返回顶部</a></p>
-												[Docs] update readme (#165)


											
										
										
											2023-08-08 12:49:04 +08:00
+								## 📖 数据集支持
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
 								<table align="center">
 								  <tbody>
 								    <tr align="center" valign="bottom">
 								      <td>
 								        <b>语言</b>
 								      </td>
 								      <td>
 								        <b>知识</b>
 								      </td>
 								      <td>
 								        <b>推理</b>
 								      </td>
 								      <td>
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								        <b>考试</b>
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
+								      </td>
 								    </tr>
 								    <tr valign="top">
 								      <td>
 								<details open>
 								<summary><b>字词释义</b></summary>
 								- WiC
 								- SummEdits
 								</details>
 								<details open>
 								<summary><b>成语习语</b></summary>
 								- CHID
 								</details>
 								<details open>
 								<summary><b>语义相似度</b></summary>
 								- AFQMC
 								- BUSTM
 								</details>
 								<details open>
 								<summary><b>指代消解</b></summary>
 								- CLUEWSC
 								- WSC
 								- WinoGrande
 								</details>
 								<details open>
 								<summary><b>翻译</b></summary>
 								- Flores
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								- IWSLT2017
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
 								</details>
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
+								<details open>
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								<summary><b>多语种问答</b></summary>
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								- TyDi-QA
 								- XCOPA
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
 								</details>
 								<details open>
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								<summary><b>多语种总结</b></summary>
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								- XLSum
 								</details>
 								      </td>
 								      <td>
 								<details open>
 								<summary><b>知识问答</b></summary>
 								- BoolQ
 								- CommonSenseQA
 								- NaturalQuestions
 								- TriviaQA
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
 								</details>
 								      </td>
 								      <td>
 								<details open>
 								<summary><b>文本蕴含</b></summary>
 								- CMNLI
 								- OCNLI
 								- OCNLI_FC
 								- AX-b
 								- AX-g
 								- CB
 								- RTE
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								- ANLI
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
 								</details>
 								<details open>
 								<summary><b>常识推理</b></summary>
 								- StoryCloze
 								- COPA
 								- ReCoRD
 								- HellaSwag
 								- PIQA
 								- SIQA
 								</details>
 								<details open>
 								<summary><b>数学推理</b></summary>
 								- MATH
 								- GSM8K
 								</details>
 								<details open>
 								<summary><b>定理应用</b></summary>
 								- TheoremQA
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								- StrategyQA
 								- SciBench
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
 								</details>
 								<details open>
 								<summary><b>综合推理</b></summary>
 								- BBH
 								</details>
 								      </td>
 								      <td>
 								<details open>
 								<summary><b>初中/高中/大学/职业考试</b></summary>
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								- C-Eval
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
+								- AGIEval
 								- MMLU
 								- GAOKAO-Bench
-												[Enhancement] Update README.md (#119)

* Update README.md

* update README_zh-CN.md

* update get_started

---------

Co-authored-by: Leymore <zfz-960727@163.com>
											
										
										
											2023-07-31 18:26:46 +08:00
+								- CMMLU
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
+								- ARC
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								- Xiezhi
 								</details>
 								<details open>
 								<summary><b>医学考试</b></summary>
 								- CMB
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
 								</details>
 								      </td>
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								    </tr>
 								</td>
 								    </tr>
 								  </tbody>
 								  <tbody>
 								    <tr align="center" valign="bottom">
 								      <td>
 								        <b>理解</b>
 								      </td>
 								      <td>
 								        <b>长文本</b>
 								      </td>
 								      <td>
 								        <b>安全</b>
 								      </td>
 								      <td>
 								        <b>代码</b>
 								      </td>
 								    </tr>
 								    <tr valign="top">
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
+								      <td>
 								<details open>
 								<summary><b>阅读理解</b></summary>
 								- C3
 								- CMRC
 								- DRCD
 								- MultiRC
 								- RACE
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								- DROP
 								- OpenBookQA
 								- SQuAD2.0
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
 								</details>
 								<details open>
 								<summary><b>内容总结</b></summary>
 								- CSL
 								- LCSTS
 								- XSum
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								- SummScreen
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
 								</details>
 								<details open>
 								<summary><b>内容分析</b></summary>
 								- EPRSTMT
 								- LAMBADA
 								- TNEWS
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								</details>
 								      </td>
 								      <td>
 								<details open>
 								<summary><b>长文本理解</b></summary>
 								- LEval
 								- LongBench
 								- GovReports
 								- NarrativeQA
 								- Qasper
 								</details>
 								      </td>
 								      <td>
 								<details open>
 								<summary><b>安全</b></summary>
 								- CivilComments
 								- CrowsPairs
 								- CValues
 								- JigsawMultilingual
 								- TruthfulQA
 								</details>
 								<details open>
 								<summary><b>健壮性</b></summary>
 								- AdvGLUE
 								</details>
 								      </td>
 								      <td>
 								<details open>
 								<summary><b>代码</b></summary>
 								- HumanEval
 								- HumanEvalX
 								- MBPP
 								- APPs
 								- DS1000
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
+								</details>
 								      </td>
 								    </tr>
 								</td>
 								    </tr>
 								  </tbody>
 								</table>
-												[Docs] update readme (#165)


											
										
										
											2023-08-08 12:49:04 +08:00
+								<p align="right"><a href="#top">🔝返回顶部</a></p>
 								## 📖 模型支持
-												initial commit

											
										
										
											2023-07-04 21:34:55 +08:00
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
+								<table align="center">
 								  <tbody>
 								    <tr align="center" valign="bottom">
 								      <td>
-												update readme (#16)


											
										
										
											2023-07-06 12:54:25 +08:00
+								        <b>开源模型</b>
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
+								      </td>
 								      <td>
 								        <b>API 模型</b>
 								      </td>
-												update readme (#16)


											
										
										
											2023-07-06 12:54:25 +08:00
+								      <!-- <td>
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
+								        <b>自定义模型</b>
-												update readme (#16)


											
										
										
											2023-07-06 12:54:25 +08:00
+								      </td> -->
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
+								    </tr>
 								    <tr valign="top">
 								      <td>
-												initial commit

											
										
										
											2023-07-04 21:34:55 +08:00
-												[Doc] Update README and requirements. (#622)

* update readme

* update doc
											
										
										
											2023-11-22 19:16:54 +08:00
+								- [Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
 								- [Baichuan](https://github.com/baichuan-inc)
-												[Update] Update model support list (#1353)

* fix pip version

* fix pip version

* update model support
											
										
										
											2024-07-23 13:35:58 +08:00
+								- [BlueLM](https://github.com/vivo-ai-lab/BlueLM)
-												[Doc] Update README and requirements. (#622)

* update readme

* update doc
											
										
										
											2023-11-22 19:16:54 +08:00
+								- [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B)
 								- [ChatGLM3](https://github.com/THUDM/ChatGLM3-6B)
-												[Docs] Update README (#956)

* [Docs] Update README

* Update README.md

* [Docs] Update README
											
										
										
											2024-03-12 11:40:34 +08:00
+								- [Gemma](https://huggingface.co/google/gemma-7b)
-												[Update] Update model support list (#1353)

* fix pip version

* fix pip version

* update model support
											
										
										
											2024-07-23 13:35:58 +08:00
+								- [InternLM](https://github.com/InternLM/InternLM)
 								- [LLaMA](https://github.com/facebookresearch/llama)
 								- [LLaMA3](https://github.com/meta-llama/llama3)
 								- [Qwen](https://github.com/QwenLM/Qwen)
 								- [TigerBot](https://github.com/TigerResearch/TigerBot)
 								- [Vicuna](https://github.com/lm-sys/FastChat)
 								- [WizardLM](https://github.com/nlpxucan/WizardLM)
 								- [Yi](https://github.com/01-ai/Yi)
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
+								- ……
-												initial commit

											
										
										
											2023-07-04 21:34:55 +08:00
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
+								</td>
 								<td>
-												initial commit

											
										
										
											2023-07-04 21:34:55 +08:00
-												Update Figure (#17)

* Update README.md

update_readme

* Update README_zh-CN.md

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Ma Zerun <mzr1996@163.com>
											
										
										
											2023-07-06 13:21:00 +08:00
+								- OpenAI
-												[Docs] Update README (#956)

* [Docs] Update README

* Update README.md

* [Docs] Update README
											
										
										
											2024-03-12 11:40:34 +08:00
+								- Gemini
-												[Doc] Update dataset list (#437)

* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
											
										
										
											2023-09-27 15:02:09 +08:00
+								- Claude
-												[Doc] Update README and requirements. (#622)

* update readme

* update doc
											
										
										
											2023-11-22 19:16:54 +08:00
+								- ZhipuAI(ChatGLM)
 								- Baichuan
 								- ByteDance(YunQue)
 								- Huawei(PanGu)
 								- 360
 								- Baidu(ERNIEBot)
 								- MiniMax(ABAB-Chat)
 								- SenseTime(nova)
 								- Xunfei(Spark)
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
+								- ……
-												initial commit

											
										
										
											2023-07-04 21:34:55 +08:00
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
+								</td>
-												initial commit

											
										
										
											2023-07-04 21:34:55 +08:00
-												Update readme (#6)

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* Update README.md

Add description for name

* Update README_zh-CN.md

Update introduction

* Update README_zh-CN.md

* Update README_zh-CN.md

* update readme

* Update README.md

Add Leaderboard

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-07-06 12:14:23 +08:00
+								</tr>
 								  </tbody>
 								</table>
-												initial commit

											
										
										
											2023-07-04 21:34:55 +08:00
-												[Docs] update readme (#165)


											
										
										
											2023-08-08 12:49:04 +08:00
+								<p align="right"><a href="#top">🔝返回顶部</a></p>
-												Update news (#241)


											
										
										
											2023-08-21 23:03:53 +08:00
+								## 🔜 路线图
-												[Docs] Update README (#956)

* [Docs] Update README

* Update README.md

* [Docs] Update README
											
										
										
											2024-03-12 11:40:34 +08:00
+								- [x] 主观评测
 								  - [x] 发布主观评测榜单
-												Update news (#241)


											
										
										
											2023-08-21 23:03:53 +08:00
+								  - [ ] 发布主观评测数据集
-												[Doc] Update README and requirements. (#622)

* update readme

* update doc
											
										
										
											2023-11-22 19:16:54 +08:00
+								- [x] 长文本
-												[Docs] Update README (#956)

* [Docs] Update README

* Update README.md

* [Docs] Update README
											
										
										
											2024-03-12 11:40:34 +08:00
+								  - [x] 支持广泛的长文本评测集
-												Update news (#241)


											
										
										
											2023-08-21 23:03:53 +08:00
+								  - [ ] 发布长文本评测榜单
-												[Docs] Update README (#956)

* [Docs] Update README

* Update README.md

* [Docs] Update README
											
										
										
											2024-03-12 11:40:34 +08:00
+								- [x] 代码能力
-												Update news (#241)


											
										
										
											2023-08-21 23:03:53 +08:00
+								  - [ ] 发布代码能力评测榜单
-												[Doc] Update README and requirements. (#622)

* update readme

* update doc
											
										
										
											2023-11-22 19:16:54 +08:00
+								  - [x] 提供非Python语言的评测服务
-												[Docs] Update README (#956)

* [Docs] Update README

* Update README.md

* [Docs] Update README
											
										
										
											2024-03-12 11:40:34 +08:00
+								- [x] 智能体
-												Update news (#241)


											
										
										
											2023-08-21 23:03:53 +08:00
+								  - [ ] 支持丰富的智能体方案
-												[Docs] Update README (#956)

* [Docs] Update README

* Update README.md

* [Docs] Update README
											
										
										
											2024-03-12 11:40:34 +08:00
+								  - [x] 提供智能体评测榜单
-												[Doc] Update README and requirements. (#622)

* update readme

* update doc
											
										
										
											2023-11-22 19:16:54 +08:00
+								- [x] 鲁棒性
 								  - [x] 支持各类攻击方法
-												Update news (#241)


											
										
										
											2023-08-21 23:03:53 +08:00
-												[Docs] Update contribution guide & toc, improve user experience (#188)

* [Docs] Update contribution guide & toc

* update

* Update docs/en/notes/contribution_guide.md

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* update

* update

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
											
										
										
											2023-08-11 11:36:09 +08:00
+								## 👷‍♂️ 贡献
 								我们感谢所有的贡献者为改进和提升 OpenCompass 所作出的努力。请参考[贡献指南](https://opencompass.readthedocs.io/zh_CN/latest/notes/contribution_guide.html)来了解参与项目贡献的相关指引。
-												[Doc] Update README (#1053)

* [Update] Update readme

* [Update] Update readme

* [Update] Update readme
											
										
										
											2024-04-16 19:54:12 +08:00
+								<a href="https://github.com/open-compass/opencompass/graphs/contributors" target="_blank">
 								  <table>
 								    <tr>
 								      <th colspan="2">
 								        <br><img src="https://contrib.rocks/image?repo=open-compass/opencompass"><br><br>
 								      </th>
 								    </tr>
 								  </table>
 								</a>
-												[Docs] update readme (#165)


											
										
										
											2023-08-08 12:49:04 +08:00
+								## 🤝 致谢
-												initial commit

											
										
										
											2023-07-04 21:34:55 +08:00
 								该项目部分的代码引用并修改自 [OpenICL](https://github.com/Shark-NLP/OpenICL)。
-												[Doc] update acknowledgements (#147)


											
										
										
											2023-08-02 10:16:53 +08:00
+								该项目部分的数据集和提示词实现修改自 [chain-of-thought-hub](https://github.com/FranxYao/chain-of-thought-hub), [instruct-eval](https://github.com/declare-lab/instruct-eval)
-												[Docs] update readme (#165)


											
										
										
											2023-08-08 12:49:04 +08:00
+								## 🖊️ 引用
-												initial commit

											
										
										
											2023-07-04 21:34:55 +08:00
 								```bibtex
 								@misc{2023opencompass,
 								    title={OpenCompass: A Universal Evaluation Platform for Foundation Models},
 								    author={OpenCompass Contributors},
-												[Feat] Update URL (#368)


											
										
										
											2023-09-07 17:29:50 +08:00
+								    howpublished = {\url{https://github.com/open-compass/opencompass}},
-												initial commit

											
										
										
											2023-07-04 21:34:55 +08:00
+								    year={2023}
 								}
 								```
-												[Docs] update readme (#165)


											
										
										
											2023-08-08 12:49:04 +08:00
 								<p align="right"><a href="#top">🔝返回顶部</a></p>
-												[Doc] Update README (#1053)

* [Update] Update readme

* [Update] Update readme

* [Update] Update readme
											
										
										
											2024-04-16 19:54:12 +08:00
 								[github-contributors-link]: https://github.com/open-compass/opencompass/graphs/contributors
 								[github-contributors-shield]: https://img.shields.io/github/contributors/open-compass/opencompass?color=c4f042&labelColor=black&style=flat-square
 								[github-forks-link]: https://github.com/open-compass/opencompass/network/members
 								[github-forks-shield]: https://img.shields.io/github/forks/open-compass/opencompass?color=8ae8ff&labelColor=black&style=flat-square
 								[github-issues-link]: https://github.com/open-compass/opencompass/issues
 								[github-issues-shield]: https://img.shields.io/github/issues/open-compass/opencompass?color=ff80eb&labelColor=black&style=flat-square
 								[github-license-link]: https://github.com/open-compass/opencompass/blob/main/LICENSE
 								[github-license-shield]: https://img.shields.io/github/license/open-compass/opencompass?color=white&labelColor=black&style=flat-square
 								[github-release-link]: https://github.com/open-compass/opencompass/releases
 								[github-release-shield]: https://img.shields.io/github/v/release/open-compass/opencompass?color=369eff&labelColor=black&logo=github&style=flat-square
 								[github-releasedate-link]: https://github.com/open-compass/opencompass/releases
 								[github-releasedate-shield]: https://img.shields.io/github/release-date/open-compass/opencompass?labelColor=black&style=flat-square
 								[github-stars-link]: https://github.com/open-compass/opencompass/stargazers
 								[github-stars-shield]: https://img.shields.io/github/stars/open-compass/opencompass?color=ffcb47&labelColor=black&style=flat-square
 								[github-trending-shield]: https://trendshift.io/api/badge/repositories/6630
 								[github-trending-url]: https://trendshift.io/repositories/6630