[Docs] add en docs (#15)

* add en docs

* update

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
This commit is contained in:
Hubert 2023-07-06 12:58:44 +08:00 committed by GitHub
parent 07dfe8c5fc
commit 7f8eee4725
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
15 changed files with 193 additions and 91 deletions

View File

@ -200,4 +200,4 @@ Copyright 2020 OpenCompass Authors. All rights reserved.
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
limitations under the License.

View File

@ -39,8 +39,6 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
[![image](https://github.com/InternLM/OpenCompass/assets/7881589/6b56c297-77c0-4e1a-9acc-24a45c5a734a)](https://opencompass.org.cn/rank)
## Dataset Support
<table align="center">
@ -245,7 +243,7 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
</tr>
<tr valign="top">
<td>
- InternLM
- LLaMA
- Vicuna

View File

@ -40,10 +40,8 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
我们将陆续提供开源模型和API模型的具体性能榜单请见 [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) 。如需加入评测,请提供模型仓库地址或标准的 API 接口至邮箱 `opencompass@pjlab.org.cn`.
![image](https://github.com/InternLM/OpenCompass/assets/7881589/fddc8ab4-d2bd-429d-89f0-4ca90606599a)
## 数据集支持
<table align="center">

View File

@ -1,3 +1,57 @@
# New Dataset
# Add a dataset
Coming soon.
Although OpenCompass has already included most commonly used datasets, users need to follow the steps below to support a new dataset if wanted:
1. Add a dataset script `mydataset.py` to the `opencompass/datasets` folder. This script should include:
- The dataset and its loading method. Define a `MyDataset` class that implements the data loading method `load` as a static method. This method should return data of type `datasets.Dataset`. We use the Hugging Face dataset as the unified interface for datasets to avoid introducing additional logic. Here's an example:
```python
import datasets
from .base import BaseDataset
class MyDataset(BaseDataset):
@staticmethod
def load(**kwargs) -> datasets.Dataset:
pass
```
- (Optional) If the existing evaluators in OpenCompass do not meet your needs, you need to define a `MyDatasetEvaluator` class that implements the scoring method `score`. This method should take `predictions` and `references` as input and return the desired dictionary. Since a dataset may have multiple metrics, the method should return a dictionary containing the metrics and their corresponding scores. Here's an example:
```python
from opencompass.openicl.icl_evaluator import BaseEvaluator
class MyDatasetEvaluator(BaseEvaluator):
def score(self, predictions: List, references: List) -> dict:
pass
```
- (Optional) If the existing postprocessors in OpenCompass do not meet your needs, you need to define the `mydataset_postprocess` method. This method takes an input string and returns the corresponding postprocessed result string. Here's an example:
```python
def mydataset_postprocess(text: str) -> str:
pass
```
2. After defining the dataset loading, data postprocessing, and evaluator methods, you need to add the following configurations to the configuration file:
```python
from opencompass.datasets import MyDataset, MyDatasetEvaluator, mydataset_postprocess
mydataset_eval_cfg = dict(
evaluator=dict(type=MyDatasetEvaluator),
pred_postprocessor=dict(type=mydataset_postprocess))
mydataset_datasets = [
dict(
type=MyDataset,
...,
reader_cfg=...,
infer_cfg=...,
eval_cfg=mydataset_eval_cfg)
]
```
Once the dataset is configured, you can refer to the instructions on [Get started](../get_started.md) for other requirements.

View File

@ -1,3 +1,73 @@
# New A Model
# Add a Model
Coming soon.
Currently, we support HF models, some model APIs, and some third-party models.
## Adding API Models
To add a new API-based model, you need to create a new file named `mymodel_api.py` under `opencompass/models` directory. In this file, you should inherit from `BaseAPIModel` and implement the `generate` method for inference and the `get_token_len` method to calculate the length of tokens. Once you have defined the model, you can modify the corresponding configuration file.
```python
from ..base_api import BaseAPIModel
class MyModelAPI(BaseAPIModel):
is_api: bool = True
def __init__(self,
path: str,
max_seq_len: int = 2048,
query_per_second: int = 1,
retry: int = 2,
**kwargs):
super().__init__(path=path,
max_seq_len=max_seq_len,
meta_template=meta_template,
query_per_second=query_per_second,
retry=retry)
...
def generate(
self,
inputs,
max_out_len: int = 512,
temperature: float = 0.7,
) -> List[str]:
"""Generate results given a list of inputs."""
pass
def get_token_len(self, prompt: str) -> int:
"""Get lengths of the tokenized string."""
pass
```
## Adding Third-Party Models
To add a new third-party model, you need to create a new file named `mymodel.py` under `opencompass/models` directory. In this file, you should inherit from `BaseModel` and implement the `generate` method for generative inference, the `get_ppl` method for discriminative inference, and the `get_token_len` method to calculate the length of tokens. Once you have defined the model, you can modify the corresponding configuration file.
```python
from ..base import BaseModel
class MyModel(BaseModel):
def __init__(self,
pkg_root: str,
ckpt_path: str,
tokenizer_only: bool = False,
meta_template: Optional[Dict] = None,
**kwargs):
...
def get_token_len(self, prompt: str) -> int:
"""Get lengths of the tokenized strings."""
pass
def generate(self, inputs: List[str], max_out_len: int) -> List[str]:
"""Generate results given a list of inputs. """
pass
def get_ppl(self,
inputs: List[str],
mask_length: Optional[List[int]] = None) -> List[float]:
"""Get perplexity scores given a list of inputs."""
pass
```

View File

@ -107,7 +107,7 @@ models = [llama_7b]
</details>
<details>
<summary>Lauch Evalution</summary>
<summary>Launch Evaluation</summary>
First, we can start the task in **debug mode** to check for any exceptions in model loading, dataset reading, or incorrect cache usage.

View File

@ -79,4 +79,4 @@ Indexes & Tables
==================
* :ref:`genindex`
* :ref:`search`
* :ref:`search`

View File

@ -8,34 +8,26 @@ First, let's introduce the structure under the `configs/datasets` directory in O
```
configs/datasets/
├── ChineseUniversal # Ability dimension
│ ├── CLUE_afqmc # Dataset under this dimension
│ │ ├── CLUE_afqmc_gen_db509b.py # Different configuration files for this dataset
│ │ ├── CLUE_afqmc_gen.py
│ │ ├── CLUE_afqmc_ppl_00b348.py
│ │ ├── CLUE_afqmc_ppl_2313cf.py
│ │ └── CLUE_afqmc_ppl.py
│ ├── CLUE_C3
│ │ ├── ...
│ ├── ...
├── Coding
├── collections
├── Completion
├── EnglishUniversal
├── Exam
├── glm
├── LongText
├── MISC
├── NLG
├── QA
├── Reasoning
├── Security
└── Translation
├── agieval
├── apps
├── ARC_c
├── ...
├── CLUE_afqmc # dataset
│   ├── CLUE_afqmc_gen_901306.py # different version of config
│   ├── CLUE_afqmc_gen.py
│   ├── CLUE_afqmc_ppl_378c5b.py
│   ├── CLUE_afqmc_ppl_6507d7.py
│   ├── CLUE_afqmc_ppl_7b0c1e.py
│   └── CLUE_afqmc_ppl.py
├── ...
├── XLSum
├── Xsum
└── z_bench
```
In the `configs/datasets` directory structure, we have divided the datasets into over ten dimensions based on ability dimensions, such as: Chinese and English Universal, Exam, QA, Reasoning, Security, etc. Each dimension contains a series of datasets, and there are multiple dataset configurations in the corresponding folder of each dataset.
In the `configs/datasets` directory structure, we flatten all datasets directly, and there are multiple dataset configurations within the corresponding folders for each dataset.
The naming of the dataset configuration file is made up of `{dataset name}_{evaluation method}_{prompt version number}.py`. For example, `ChineseUniversal/CLUE_afqmc/CLUE_afqmc_gen_db509b.py`, this configuration file is the `CLUE_afqmc` dataset under the Chinese universal ability, the corresponding evaluation method is `gen`, i.e., generative evaluation, and the corresponding prompt version number is `db509b`; similarly, `CLUE_afqmc_ppl_00b348.py` indicates that the evaluation method is `ppl`, i.e., discriminative evaluation, and the prompt version number is `00b348`.
The naming of the dataset configuration file is made up of `{dataset name}_{evaluation method}_{prompt version number}.py`. For example, `CLUE_afqmc/CLUE_afqmc_gen_db509b.py`, this configuration file is the `CLUE_afqmc` dataset under the Chinese universal ability, the corresponding evaluation method is `gen`, i.e., generative evaluation, and the corresponding prompt version number is `db509b`; similarly, `CLUE_afqmc_ppl_00b348.py` indicates that the evaluation method is `ppl`, i.e., discriminative evaluation, and the prompt version number is `00b348`.
In addition, files without a version number, such as: `CLUE_afqmc_gen.py`, point to the latest prompt configuration file of that evaluation method, which is usually the most accurate prompt.
@ -49,13 +41,13 @@ The datasets supported by OpenCompass mainly include two parts:
2. OpenCompass Self-built Datasets
In addition to supporting Huggingface's existing datasets, OpenCompass also provides some self-built CN datasets. In the future, a dataset-related Repo will be provided for users to download and use. Following the instructions in the document to place the datasets uniformly in the `./data` directory can complete dataset preparation.
In addition to supporting Huggingface's existing datasets, OpenCompass also provides some self-built CN datasets. In the future, a dataset-related link will be provided for users to download and use. Following the instructions in the document to place the datasets uniformly in the `./data` directory can complete dataset preparation.
It is important to note that the Repo not only contains self-built datasets, but also includes some HF-supported datasets for testing convenience.
## Dataset Selection
In each dataset configuration file, the dataset will be defined in the `{}_datasets` variable, such as `afqmc_datasets` in `ChineseUniversal/CLUE_afqmc/CLUE_afqmc_gen_db509b.py`.
In each dataset configuration file, the dataset will be defined in the `{}_datasets` variable, such as `afqmc_datasets` in `CLUE_afqmc/CLUE_afqmc_gen_db509b.py`.
```python
afqmc_datasets = [
@ -70,7 +62,7 @@ afqmc_datasets = [
]
```
And `afqmc_datasets` in `ChineseUniversal/CLUE_cmnli/CLUE_cmnli_ppl_b78ad4.py`.
And `cmnli_datasets` in `CLUE_cmnli/CLUE_cmnli_ppl_b78ad4.py`.
```python
cmnli_datasets = [

View File

@ -4,7 +4,7 @@
1. 在 `opencompass/datasets` 文件夹新增数据集脚本 `mydataset.py`, 该脚本需要包含:
- 数据集及其加载方式,需要定义一个 `MyDataset` 类,实现数据集加载方法 `load` ,该方法为静态方法,需要返回 `datasets.Dataset` 类型的数据。这里我们使用 huggingface dataset 作为数据集的统一接口,避免引入额外的逻辑。具体示例如下:
- 数据集及其加载方式,需要定义一个 `MyDataset` 类,实现数据集加载方法 `load`,该方法为静态方法,需要返回 `datasets.Dataset` 类型的数据。这里我们使用 huggingface dataset 作为数据集的统一接口,避免引入额外的逻辑。具体示例如下:
```python
import datasets
@ -17,10 +17,9 @@
pass
```
- (可选)如果OpenCompass已有的evaluator不能满足需要,需要用户定义 `MyDatasetlEvaluator` 类,实现评分方法 `score` ,需要根据输入的 `predictions``references` 列表得到需要的字典。由于一个数据集可能存在多种metric需要返回一个 metrics 以及对应 scores 的相关字典。具体示例如下:
- (可选)如果 OpenCompass 已有的评测器不能满足需要,需要用户定义 `MyDatasetlEvaluator` 类,实现评分方法 `score`,需要根据输入的 `predictions``references` 列表,得到需要的字典。由于一个数据集可能存在多种 metric需要返回一个 metrics 以及对应 scores 的相关字典。具体示例如下:
```python
from opencompass.openicl.icl_evaluator import BaseEvaluator
class MyDatasetlEvaluator(BaseEvaluator):
@ -30,14 +29,14 @@
```
- (可选)如果 OpenCompass 已有的 postprocesser 不能满足需要,需要用户定义 `mydataset_postprocess` 方法,根据输入的字符串得到相应后处理的结果。具体示例如下:
- (可选)如果 OpenCompass 已有的后处理方法不能满足需要,需要用户定义 `mydataset_postprocess` 方法,根据输入的字符串得到相应后处理的结果。具体示例如下:
```python
def mydataset_postprocess(text: str) -> str:
pass
```
2. 在定义好数据集加载,数据后处理以及 `evaluator` 等方法之后,需要在配置文件中新增以下配置:
2. 在定义好数据集加载、评测以及数据后处理等方法之后,需要在配置文件中新增以下配置:
```python
from opencompass.datasets import MyDataset, MyDatasetlEvaluator, mydataset_postprocess
@ -56,5 +55,4 @@
]
```
配置好数据集之后,其他需要的配置文件直接参考如何启动评测任务教程即可。
配置好数据集之后,其他需要的配置文件直接参考[快速上手](../get_started.md)教程即可。

View File

@ -1,6 +1,6 @@
# 支持新模型
目前我们已经支持的模型有 HF 模型、部分模型 API 、自建模型和部分第三方模型。
目前我们已经支持的模型有 HF 模型、部分模型 API 、部分第三方模型。
## 新增API模型

View File

@ -79,4 +79,4 @@ OpenCompass 上手路线
==================
* :ref:`genindex`
* :ref:`search`
* :ref:`search`

View File

@ -1,3 +1,3 @@
# Prompt 模板
Coming soon.
Coming soon.

View File

@ -8,34 +8,26 @@
```
configs/datasets/
├── ChineseUniversal # 能力维度
│   ├── CLUE_afqmc # 该维度下的数据集
│   │   ├── CLUE_afqmc_gen_db509b.py # 该数据集的不同配置文件
│   │   ├── CLUE_afqmc_gen.py
│   │   ├── CLUE_afqmc_ppl_00b348.py
│   │   ├── CLUE_afqmc_ppl_2313cf.py
│   │   └── CLUE_afqmc_ppl.py
│   ├── CLUE_C3
│   │   ├── ...
│   ├── ...
├── Coding
├── collections
├── Completion
├── EnglishUniversal
├── Exam
├── glm
├── LongText
├── MISC
├── NLG
├── QA
├── Reasoning
├── Security
└── Translation
├── agieval
├── apps
├── ARC_c
├── ...
├── CLUE_afqmc # 数据集
│   ├── CLUE_afqmc_gen_901306.py # 不同版本数据集配置文件
│   ├── CLUE_afqmc_gen.py
│   ├── CLUE_afqmc_ppl_378c5b.py
│   ├── CLUE_afqmc_ppl_6507d7.py
│   ├── CLUE_afqmc_ppl_7b0c1e.py
│   └── CLUE_afqmc_ppl.py
├── ...
├── XLSum
├── Xsum
└── z_bench
```
`configs/datasets` 目录结构下,我们主要以能力维度对数据集划分了十余项维度,例如:中英文通用、考试、问答、推理、安全等等。每一项维度又包含了一系列数据集,在各个数据集对应的文件夹下存在多个数据集配置。
`configs/datasets` 目录结构下,我们直接展平所有数据集,在各个数据集对应的文件夹下存在多个数据集配置。
数据集配置文件名由以下命名方式构成 `{数据集名称}_{评测方式}_{prompt版本号}.py`,以 `ChineseUniversal/CLUE_afqmc/CLUE_afqmc_gen_db509b.py` 为例,该配置文件则为中文通用能力下的 `CLUE_afqmc` 数据集,对应的评测方式为 `gen`即生成式评测对应的prompt版本号为 `db509b`;同样的, `CLUE_afqmc_ppl_00b348.py` 指评测方式为`ppl`即判别式评测prompt版本号为 `00b348`
数据集配置文件名由以下命名方式构成 `{数据集名称}_{评测方式}_{prompt版本号}.py`,以 `CLUE_afqmc/CLUE_afqmc_gen_db509b.py` 为例,该配置文件则为中文通用能力下的 `CLUE_afqmc` 数据集,对应的评测方式为 `gen`即生成式评测对应的prompt版本号为 `db509b`;同样的, `CLUE_afqmc_ppl_00b348.py` 指评测方式为`ppl`即判别式评测prompt版本号为 `00b348`
除此之外,不带版本号的文件,例如: `CLUE_afqmc_gen.py` 则指向该评测方式最新的prompt配置文件通常来说会是精度最高的prompt。
@ -49,13 +41,13 @@ OpenCompass 支持的数据集主要包括两个部分:
2. OpenCompass 自建数据集
除了支持 Huggingface 已有的数据集, OpenCompass 还提供了一些自建CN数据集未来将会提供一个数据集相关的Repo供用户下载使用。按照文档指示将数据集统一放置在`./data`目录下即可完成数据集准备。
除了支持 Huggingface 已有的数据集, OpenCompass 还提供了一些自建CN数据集未来将会提供一个数据集相关的链接供用户下载使用。按照文档指示将数据集统一放置在`./data`目录下即可完成数据集准备。
需要注意的是Repo中不仅包含自建的数据集为了方便也加入了部分HF已支持的数据集方便测试。
## 数据集选择
在各个数据集配置文件中,数据集将会被定义在 `{}_datasets` 变量当中,例如下面 `ChineseUniversal/CLUE_afqmc/CLUE_afqmc_gen_db509b.py` 中的 `afqmc_datasets`
在各个数据集配置文件中,数据集将会被定义在 `{}_datasets` 变量当中,例如下面 `CLUE_afqmc/CLUE_afqmc_gen_db509b.py` 中的 `afqmc_datasets`
```python
afqmc_datasets = [
@ -70,7 +62,7 @@ afqmc_datasets = [
]
```
以及 `ChineseUniversal/CLUE_cmnli/CLUE_cmnli_ppl_b78ad4.py` 中的 `afqmc_datasets`。
以及 `CLUE_cmnli/CLUE_cmnli_ppl_b78ad4.py` 中的 `cmnli_datasets`。
```python
cmnli_datasets = [

View File

@ -39,27 +39,27 @@ run.py {--slrum | --dlc | None} $Config [-p PARTITION] [-q QUOTATYPE] [--debug]
1. 打开 `configs/lark.py` 文件,并在文件中加入以下行:
```python
lark_bot_url = 'YOUR_WEBHOOK_URL'
```
```python
lark_bot_url = 'YOUR_WEBHOOK_URL'
```
通常, Webhook URL 格式如 https://open.feishu.cn/open-apis/bot/v2/hook/xxxxxxxxxxxxxxxxx 。
通常, Webhook URL 格式如 https://open.feishu.cn/open-apis/bot/v2/hook/xxxxxxxxxxxxxxxxx 。
2. 在完整的评测配置中继承该文件:
```python
from mmengine.config import read_base
```python
from mmengine.config import read_base
with read_base():
from .lark import lark_bot_url
with read_base():
from .lark import lark_bot_url
```
```
3. 为了避免机器人频繁发消息形成骚扰,默认运行时状态不会自动上报。有需要时,可以通过 `-l``--lark` 启动状态上报:
```bash
python run.py configs/eval_demo.py -p {PARTITION} -l
```
```bash
python run.py configs/eval_demo.py -p {PARTITION} -l
```
## Summerizer介绍

View File

@ -1,6 +1,6 @@
from .abbr import * # noqa
from .build import * # noqa
from .collect_env import * #noqa
from .collect_env import * # noqa
from .fileio import * # noqa
from .git import * # noqa
from .lark import * # noqa