[Docs] Polish docs (#43)

* [Docs] Polish docs

* apply suggestions

* apply suggestions
This commit is contained in:
Tong Gao 2023-07-13 09:07:53 +08:00 committed by GitHub
parent 7ee5a86fee
commit fd57786954
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 121 additions and 145 deletions

View File

@ -43,7 +43,7 @@ The datasets supported by OpenCompass mainly include two parts:
1. Huggingface datasets: The [Huggingface Datasets](https://huggingface.co/datasets) provide a large number of datasets, which will **automatically download** when running with this option.
2. Custom dataset: OpenCompass also provides some Chinese custom **self-built** datasets. Please run the following command to **manually download and extract** them.
Run the following commands to download and place the datasets in the '${OpenCompass}/data' directory can complete dataset preparation.
Run the following commands to download and place the datasets in the `${OpenCompass}/data` directory can complete dataset preparation.
```bash
# Run in the OpenCompass directory
@ -63,41 +63,53 @@ We will demonstrate some basic features of OpenCompass through evaluating pretra
Before running this experiment, please make sure you have installed OpenCompass locally and it should run successfully under one _GTX-1660-6G_ GPU.
For larger parameterized models like Llama-7B, refer to other examples provided in the [configs directory](https://github.com/InternLM/opencompass/tree/main/configs).
To start the evaluation task, use the following command:
Since OpenCompass launches evaluation processes in parallel by default, we can start the evaluation for the first run and check if there is any prblem. In debugging mode, the tasks will be executed sequentially and the status will be printed in real time.
```bash
python run.py configs/eval_demo.py --debug
python run.py configs/eval_demo.py -w outputs/demo --debug
```
While running the demo, let's go over the details of the configuration content and launch options used in this case.
If everything is fine, you should see "Starting inference process" on screen:
## Step by step
```bash
[2023-07-12 18:23:55,076] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
```
<details>
<summary><b>Learn about `datasets`</b></summary>
Then you can press `ctrl+c` to interrupt the program, and then run the following command to start the parallel evaluation:
```bash
python run.py configs/eval_demo.py -w outputs/demo
```
Now let's go over the configuration file and the launch options used in this case.
## Explanations
### Dataset list - `datasets`
Below is the configuration snippet related to datasets in `configs/eval_demo.py`:
```python
from mmengine.config import read_base
from mmengine.config import read_base # Use mmengine.read_base() to load base configs
with read_base():
# Read the required dataset configurations directly from the preset dataset configurations
from .datasets.winograd.winograd_ppl import winograd_datasets # ppl inference
from .datasets.siqa.siqa_gen import siqa_datasets # gen inference
from .datasets.winograd.winograd_ppl import winograd_datasets # Load Winograd's configuration, which uses perplexity-based inference
from .datasets.siqa.siqa_gen import siqa_datasets # Load SIQA's configuration, which uses generation-based inference
datasets = [*siqa_datasets, *winograd_datasets] # Concatenate the datasets to be evaluated into the datasets field
```
Various dataset configurations are available in [configs/datasets](https://github.com/InternLM/OpenCompass/blob/main/configs/datasets).
Some datasets have two types of configuration files within their folders named `'ppl'` and `'gen'`, representing different evaluation methods. Specifically, `'ppl'` represents discriminative evaluation, while `'gen'` stands for generative evaluation.
Some datasets have two types of configuration files within their folders named `ppl` and `gen`, representing different evaluation methods. Specifically, `ppl` represents discriminative evaluation, while `gen` stands for generative evaluation.
[configs/datasets/collections](https://github.com/InternLM/OpenCompass/blob/main/configs/datasets/collections) contains various collections of datasets for comprehensive evaluation purposes.
</details>
You can find more information from [Dataset Preparation](./user_guides/dataset_prepare.md).
<details>
<summary><b>Learn about `models`</b></summary>
### Model list - `models`
The pretrained models 'facebook/opt-350m' and 'facebook/opt-125m' from HuggingFace supports automatic downloading.
OpenCompass supports directly specifying the list of models to be tested in the configuration. For HuggingFace models, users usually do not need to modify the code. The following is the relevant configuration snippet:
```python
# Evaluate models supported by HuggingFace's `AutoModelForCausalLM` using `HuggingFaceCausalLM`
@ -115,9 +127,9 @@ opt350m = dict(
proxies=None,
trust_remote_code=True),
model_kwargs=dict(device_map='auto'),
max_seq_len=2048,
# Common parameters for all models, not specific to HuggingFaceCausalLM's initialization parameters
# Below are common parameters for all models, not specific to HuggingFaceCausalLM
abbr='opt350m', # Model abbreviation for result display
max_seq_len=2048, # The maximum length of the entire sequence
max_out_len=100, # Maximum number of generated tokens
batch_size=64, # batchsize
run_cfg=dict(num_gpus=1), # Run configuration for specifying resource requirements
@ -135,9 +147,9 @@ opt125m = dict(
proxies=None,
trust_remote_code=True),
model_kwargs=dict(device_map='auto'),
max_seq_len=2048,
# Common parameters for all models, not specific to HuggingFaceCausalLM's initialization parameters
# Below are common parameters for all models, not specific to HuggingFaceCausalLM
abbr='opt125m', # Model abbreviation for result display
max_seq_len=2048, # The maximum length of the entire sequence
max_out_len=100, # Maximum number of generated tokens
batch_size=128, # batchsize
run_cfg=dict(num_gpus=1), # Run configuration for specifying resource requirements
@ -146,12 +158,13 @@ opt125m = dict(
models = [opt350m, opt125m]
```
</details>
The pretrained models 'facebook/opt-350m' and 'facebook/opt-125m' will be automatically downloaded from HuggingFace during the first run.
<details>
<summary><b>Launch Evaluation</b></summary>
More information about model configuration can be found in [Prepare Models](./user_guides/models.md).
First, we can start the task in **debug mode** to check for any exceptions in model loading, dataset reading, or incorrect cache usage.
### Launch Evaluation
When the config file is ready, we can start the task in **debug mode** to check for any exceptions in model loading, dataset reading, or incorrect cache usage.
```shell
python run.py configs/eval_demo.py -w outputs/demo --debug
@ -186,8 +199,6 @@ If you are not performing the evaluation on your local machine but using a Slurm
The entry also supports submitting tasks to Alibaba Deep Learning Center (DLC), and more customized evaluation strategies. Please refer to [Launching an Evaluation Task](./user_guides/experimentation.md#launching-an-evaluation-task) for details.
```
</details>
## Obtaining Evaluation Results
After the evaluation is complete, the evaluation results table will be printed as follows:
@ -199,33 +210,28 @@ siqa e78df3 accuracy gen 21.55 12.44
winograd b6c7ed accuracy ppl 51.23 49.82
```
All run outputs will default to `outputs/default/` directory with following structure:
All run outputs will be directed to `outputs/demo/` directory with following structure:
```text
outputs/default/
├── 20200220_120000
├── 20230220_183030 # one experiment pre folder
│ ├── configs # replicable config files
│ ├── configs # Dumped config files for record. Multiple configs may be kept if different experiments have been re-run on the same experiment folder
│ ├── logs # log files for both inference and evaluation stages
│ │ ├── eval
│ │ └── infer
│ ├── predictions # json format of per data point inference result
│ └── results # numerical conclusions of each evaluation session
│   ├── predictions # Prediction results for each task
│   ├── results # Evaluation results for each task
│   └── summary # Summarized evaluation results for a single experiment
├── ...
```
Each timestamp folder represents one experiment with the following contents:
- `configs`: configuration file storage;
- `logs`: log file storage for both **inference** and **evaluation** stages;
- `predictions`: json format output of inference result per data points;
- `results`: json format output of numerical conclusion on each evaluation session.
## Additional Tutorials
To learn more about using OpenCompass, explore the following tutorials:
- [Preparing Datasets](./user\_guides/dataset\_prepare.md)
- [Customizing Models](./user\_guides/models.md)
- [Exploring Experimentation Workflows](./user\_guides/experimentation.md)
- [Understanding Prompts](./prompt/overview.md)
- [Prepare Datasets](./user_guides/dataset_prepare.md)
- [Prepare Models](./user_guides/models.md)
- [Task Execution and Monitoring](./user_guides/experimentation.md)
- [Understand Prompts](./prompt/overview.md)
- [Learn about Config](./user_guides/config.md)

View File

@ -1,23 +1,22 @@
Welcome to OpenCompass' documentation!
==========================================
Hands-on Roadmap of OpenCompass
Getting started with OpenCompass
-------------------------------
To help users quickly utilize OpenCompass, we recommend following the hands-on
roadmap we have created for the library:
To help you quickly familiarized with OpenCompass, we recommend you to walk through the following documents in order:
- For users who want to use OpenCompass, we recommend reading the GetStarted_ section first to set up the environment.
- First read the GetStarted_ section set up the environment, and run a mini experiment.
- For some basic usage, we suggest users read the UserGuides_.
- Then learn its basic usage through the UserGuides_.
- If you want to adjust the prompts, you can browse the Prompt_.
- If you want to tune the prompts, refer to the Prompt_.
- If you want to customize the algorithm, we have provided the AdvancedGuides_.
- If you want to customize some modules, like adding a new dataset or model, we have provided the AdvancedGuides_.
- We also offer the Tools_.
- There are more handy tools, such as prompt viewer and lark bot reporter, all presented in Tools_.
We always welcome *PRs* and *Issues* for the betterment of MMPretrain.
We always welcome *PRs* and *Issues* for the betterment of OpenCompass.
.. _GetStarted:
.. toctree::
@ -32,7 +31,7 @@ We always welcome *PRs* and *Issues* for the betterment of MMPretrain.
:caption: User Guides
user_guides/config.md
user_guides/dataset_prepare.md
user_guides/datasets.md
user_guides/models.md
user_guides/evaluation.md
user_guides/experimentation.md
@ -68,13 +67,6 @@ We always welcome *PRs* and *Issues* for the betterment of MMPretrain.
notes/contribution_guide.md
.. toctree::
:caption: switch language
English <https://OpenCompass.readthedocs.io/en/latest/>
简体中文 <https://OpenCompass.readthedocs.io/zh_CN/latest/>
Indexes & Tables
==================

View File

@ -1,6 +1,6 @@
# Configure DataSets
# Configure Datasets
This section of the tutorial mainly focuses on how to prepare the datasets supported by OpenCompass and build configuration files to complete dataset selection.
This tutorial mainly focuses on selecting datasets supported by OpenCompass and preparing their configs files. Please make sure you have downloaded the datasets following the steps in [Dataset Preparation](../get_started.md#dataset-preparation).
## Directory Structure of Dataset Configuration Files
@ -31,26 +31,6 @@ The naming of the dataset configuration file is made up of `{dataset name}_{eval
In addition, files without a version number, such as: `CLUE_afqmc_gen.py`, point to the latest prompt configuration file of that evaluation method, which is usually the most accurate prompt.
## Dataset Preparation
The datasets supported by OpenCompass mainly include two parts:
1. Huggingface Dataset
[Huggingface Dataset](https://huggingface.co/datasets) provides a large number of datasets. OpenCompass has supported most of the datasets commonly used for performance comparison, please refer to `configs/dataset` for the specific list of supported datasets.
2. Third-party Datasets
In addition to supporting Huggingface's existing datasets, OpenCompass also provides some third-party and self-built datasets. Run the following commands to download and place the datasets in the `./data` directory can complete dataset preparation.
```bash
# Run in the OpenCompass directory
wget https://github.com/InternLM/opencompass/releases/download/0.1.0/OpenCompassData.zip
unzip OpenCompassData.zip
```
Note that the Repo not only contains self-built datasets, but also includes some HF-supported datasets for testing convenience.
## Dataset Selection
In each dataset configuration file, the dataset will be defined in the `{}_datasets` variable, such as `afqmc_datasets` in `CLUE_afqmc/CLUE_afqmc_gen_db509b.py`.

View File

@ -44,7 +44,7 @@ OpenCompass 支持的数据集主要包括两个部分:
2. 自建以及第三方数据集OpenCompass 还提供了一些第三方数据集及自建**中文**数据集。运行以下命令**手动下载解压**。
在 OpenCompass 项目根目录下运行下面命令,将数据集准备至 '${OpenCompass}/data' 目录下:
在 OpenCompass 项目根目录下运行下面命令,将数据集准备至 `${OpenCompass}/data` 目录下:
```bash
wget https://github.com/InternLM/opencompass/releases/download/0.1.0/OpenCompassData.zip
@ -63,50 +63,63 @@ OpenCompass 的评测以配置文件为中心,必须包含 `datasets` 和 `mod
运行前确保已经安装了 OpenCompass本实验可以在单张 _GTX-1660-6G_ 显卡上成功运行。
更大参数的模型,如 Llama-7B, 可参考 [configs](https://github.com/InternLM/opencompass/tree/main/configs) 中其他例子。
使用以下命令在本地启动评测任务(运行中需要联网自动下载数据集和模型,模型下载较慢)
由于 OpenCompass 默认使用并行的方式进行评测,为了便于及时发现问题,我们可以在首次启动时使用 debug 模式运行,该模式会将任务串行执行,并会实时输出任务的执行进度。
```bash
python run.py configs/eval_demo.py
python run.py configs/eval_demo.py -w outputs/demo --debug
```
运行 demo 期间,我们来仔细解本案例中的配置内容以及启动选项。
如果一切正常,屏幕上会出现 "Starting inference process"
```bash
Loading cached processed dataset at .cache/huggingface/datasets/social_i_qa/default/0.1.0/674d85e42ac7430d3dcd4de7007feaffcb1527c535121e09bab2803fbcc925f8/cache-742512eab30e8c9c.arrow
[2023-07-12 18:23:55,076] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
```
此时可以使用 `ctrl+c` 中断 debug 模式的执行,并运行以下命令进行并行评测:
```bash
python run.py configs/eval_demo.py -w outputs/demo
```
运行 demo 期间,我们来介绍一下本案例中的配置内容以及启动选项。
## 步骤详解
<details>
<summary><b>数据集列表`datasets`</b></summary>
### 数据集列表 `datasets`
以下为 `configs/eval_demo.py` 中与数据集相关的配置片段:
```python
from mmengine.config import read_base # 使用 mmengine 的 config 机制
from mmengine.config import read_base # 使用 mmengine.read_base() 读取基础配置
with read_base():
# 直接从预设数据集配置中读取需要的数据集配置
from .datasets.winograd.winograd_ppl import winograd_datasets
from .datasets.siqa.siqa_gen import siqa_datasets
from .datasets.winograd.winograd_ppl import winograd_datasets # 读取 Winograd 的配置,基于 PPL (perplexity) 进行评测
from .datasets.siqa.siqa_gen import siqa_datasets # 读取 SIQA 的配置,基于生成式进行评测
datasets = [*siqa_datasets, *winograd_datasets] # 最后 config 需要包含所需的评测数据集列表 datasets
```
[configs/datasets](https://github.com/InternLM/OpenCompass/blob/main/configs/datasets) 包含各种数据集预先定义好的配置文件;
部分数据集文件夹下有 'ppl' 和 'gen' 两类配置文件,表示使用的评估方式,其中 `ppl` 表示使用判别式评测, `gen` 表示使用生成式评测。
[configs/datasets/collections](https://github.com/InternLM/OpenCompass/blob/main/configs/datasets/collections) 存放了各类数据集集合,方便做综合评测。
更多信息可查看 [配置数据集](./user_guides/dataset_prepare.md)
更多介绍可查看 [数据集配置](./user_guides/dataset_prepare.md)。
</details>
### 模型列表 `models`
<details>
<summary><b>模型列表`models`</b></summary>
HuggingFace 中的 'facebook/opt-350m' 以及 'facebook/opt-125m' 支持自动下载权重,所以不需要额外下载权重:
OpenCompass 支持直接在配置中指定待测试的模型列表,对于 HuggingFace 模型来说,用户通常无需添加代码。下面为相关的配置片段:
```python
from opencompass.models import HuggingFaceCausalLM # 提供直接使用 HuggingFaceCausalLM 模型的接口
# 提供直接使用 HuggingFaceCausalLM 模型的接口
from opencompass.models import HuggingFaceCausalLM
# OPT-350M
opt350m = dict(
type=HuggingFaceCausalLM,
# 以下参数为 HuggingFaceCausalLM 的初始化参数
# 以下参数为 HuggingFaceCausalLM 相关的初始化参数
path='facebook/opt-350m',
tokenizer_path='facebook/opt-350m',
tokenizer_kwargs=dict(
@ -115,9 +128,9 @@ opt350m = dict(
proxies=None,
trust_remote_code=True),
model_kwargs=dict(device_map='auto'),
max_seq_len=2048,
# 下参数为各类模型都有的参数,非 HuggingFaceCausalLM 的初始化参数
# 下列参数为所有模型均需设定的初始化参数,非 HuggingFaceCausalLM 独有
abbr='opt350m', # 模型简称,用于结果展示
max_seq_len=2048, # 模型能接受的最大序列长度
max_out_len=100, # 最长生成 token 数
batch_size=64, # 批次大小
run_cfg=dict(num_gpus=1), # 运行配置,用于指定资源需求
@ -135,9 +148,9 @@ opt125m = dict(
proxies=None,
trust_remote_code=True),
model_kwargs=dict(device_map='auto'),
max_seq_len=2048,
# 下参数为各类模型都有的参数,非 HuggingFaceCausalLM 的初始化参数
# 下列参数为所有模型均需设定的初始化参数,非 HuggingFaceCausalLM 独有
abbr='opt125m', # 模型简称,用于结果展示
max_seq_len=2048, # 模型能接受的最大序列长度
max_out_len=100, # 最长生成 token 数
batch_size=128, # 批次大小
run_cfg=dict(num_gpus=1), # 运行配置,用于指定资源需求
@ -146,12 +159,13 @@ opt125m = dict(
models = [opt350m, opt125m]
```
</details>
HuggingFace 中的 'facebook/opt-350m' 以及 'facebook/opt-125m' 权重会在运行时自动下载。
<details>
<summary><b>启动评测</b></summary>
关于模型配置的更多介绍可阅读 [准备模型](./user_guides/models.md)。
首先,我们可以使用 debug 模式启动任务,以检查模型加载、数据集读取是否出现异常,如未正确读取缓存等。
### 启动评测
配置文件准备完毕后,我们可以使用 debug 模式启动任务,以检查模型加载、数据集读取是否出现异常,如未正确读取缓存等。
```shell
python run.py configs/eval_demo.py -w outputs/demo --debug
@ -165,7 +179,7 @@ python run.py configs/eval_demo.py -w outputs/demo
以下是一些与评测相关的参数,可以帮助你根据自己的环境情况配置更高效的推理任务。
- `-w outputs/demo`: 评测日志及结果保存目录
- `-w outputs/demo`: 评测日志及结果保存目录。若不指定,则默认为 `outputs/default`
- `-r`: 重启上一次(中断的)评测
- `--mode all`: 指定进行某一阶段的任务
- all: 进行全阶段评测,包括推理和评估
@ -181,14 +195,10 @@ python run.py configs/eval_demo.py -w outputs/demo
- `--partition(-p) my_part`: slurm 集群分区
- `--retry 2`: 任务出错重试次数
The entry also supports submitting tasks to Alibaba Deep Learning Center (DLC), and more customized evaluation strategies. Please refer to [Launching an Evaluation Task](./user_guides/experimentation.md#launching-an-evaluation-task) for details.
```{tip}
这个脚本同样支持将任务提交到阿里云深度学习中心DLC上运行以及更多定制化的评测策略。请参考 [评测任务发起](./user_guides/experimentation.md#评测任务发起) 了解更多细节。
```
</details>
## 评测结果
评测完成后,会打印评测结果表格如下:
@ -200,33 +210,28 @@ siqa e78df3 accuracy gen 21.55 12.44
winograd b6c7ed accuracy ppl 51.23 49.82
```
所有过程的日志,预测,以及最终结果会默认放在`outputs/default/`目录下。目录结构如下所示:
所有过程的日志,预测,以及最终结果会放在 `outputs/demo/` 目录下。目录结构如下所示:
```text
outputs/default/
├── 20200220_120000
├── 20230220_183030 # 一次实验
│   ├── configs # 可复现 config
│   ├── logs # 日志
│   ├── configs # 每次实验都会在此处存下用于追溯的 config
│   ├── logs # 运行日志
│   │   ├── eval
│   │   └── infer
│   ├── predictions # 推理结果,每一条数据推理结果
│   └── results # 评估结论,一个评估实验的数值结论
│   ├── predictions # 储存了每个任务的推理结果
│   ├── results # 储存了每个任务的评测结果
│   └── summary # 汇总每次实验的所有评测结果
├── ...
```
其中,每一个时间戳文件夹代表一次实验中存在以下内容:
- 'configs':用于存放可复现配置文件;
- 'logs':用于存放**推理**和**评测**两个阶段的日志文件
- 'predicitions':用于存放推理结果格式为json
- 'results': 用于存放评测最终结果总结。
## 更多教程
想要更多了解 OpenCompass, 可以点击下列链接学习。
- [如何配置数据集](./user_guides/dataset_prepare.md)
- [如何定制模型](./user_guides/models.md)
- [深入了解启动实验](./user_guides/experimentation.md)
- [数据集配置](./user_guides/dataset_prepare.md)
- [准备模型](./user_guides/models.md)
- [任务运行和监控](./user_guides/experimentation.md)
- [如何调Prompt](./prompt/overview.md)
- [学习配置文件](./user_guides/config.md)

View File

@ -6,15 +6,15 @@ OpenCompass 上手路线
为了用户能够快速上手,我们推荐以下流程:
- 对于想要使用 OpenCompass 的用户,我们推荐先阅读 开始你的第一步_ 部分来设置环境。
- 对于想要使用 OpenCompass 的用户,我们推荐先阅读 开始你的第一步_ 部分来设置环境,并启动一个迷你实验熟悉流程
- 如果您想调整提示语,您可以浏览 提示语_ 。
- 对于一些基础使用,我们建议用户阅读 教程_ 。
- 对于一些基础使用,我们建议用户阅读 教程_ 。
- 如果您想调整提示词prompt您可以浏览 提示词_ 。
- 若您想进行算法的自定义,我们提供了 进阶教程_ 。
- 若您想进行更多模块的自定义,例如增加数据集和模型,我们提供了 进阶教程_ 。
- 我们同样提供了 工具_ 。
- 还有更多实用的工具,如提示词预览、飞书机器人上报等功能,我们同样提供了 工具_ 教程
我们始终非常欢迎用户的 PRs 和 Issues 来完善 OpenCompass
@ -32,15 +32,15 @@ OpenCompass 上手路线
:caption: 教程
user_guides/config.md
user_guides/dataset_prepare.md
user_guides/datasets.md
user_guides/models.md
user_guides/evaluation.md
user_guides/experimentation.md
.. _提示:
.. _提示:
.. toctree::
:maxdepth: 1
:caption: 提示
:caption: 提示
prompt/few_shot.md
prompt/prompt_template.md
@ -68,13 +68,6 @@ OpenCompass 上手路线
notes/contribution_guide.md
.. toctree::
:caption: 切换语言
English <https://OpenCompass.readthedocs.io/en/latest/>
简体中文 <https://OpenCompass.readthedocs.io/zh_CN/latest/>
索引与表格
==================

View File

@ -1,6 +1,6 @@
# 数据集配置
# 配置数据集
本节教程主要关注如何构建需要的配置文件完成数据集的选择
本节教程主要关注如何选择和配置所需要的数据集。请确保你已按照[数据集准备](../get_started.md#数据集准备)中的步骤下载好数据集
## 数据集配置文件目录结构