update docs (#14)

* update docs

* update docs

* update docs
This commit is contained in:
Ezra-Yu 2023-07-06 12:41:17 +08:00 committed by GitHub
parent 86d5ec3d0f
commit 83dac269bd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
14 changed files with 209 additions and 61 deletions

24
configs/eval_llama_7b.py Normal file
View File

@ -0,0 +1,24 @@
from mmengine.config import read_base
with read_base():
from .datasets.piqa.piqa_ppl import piqa_datasets
from .datasets.siqa.siqa_gen import siqa_datasets
datasets = [*piqa_datasets, *siqa_datasets]
from opencompass.models import HuggingFaceCausalLM
models = [
dict(
type=HuggingFaceCausalLM,
path='huggyllama/llama-7b',
tokenizer_path='huggyllama/llama-7b',
tokenizer_kwargs=dict(padding_side='left', truncation_side='left'),
max_seq_len=2048,
abbr='llama-7b',
max_out_len=100,
batch_size=16,
run_cfg=dict(num_gpus=1),
)
]

View File

@ -1 +1,3 @@
# New Dataset
Coming soon.

View File

@ -1 +1,3 @@
# New A Model
Coming soon.

View File

@ -21,7 +21,7 @@ pip install -e .
If you want to perform evaluations on the humaneval dataset, follow these steps.
```
```bash
git clone https://github.com/openai/human-eval.git
cd human-eval
pip install -r requirements.txt
@ -39,11 +39,30 @@ resources that meet the minimum requirements for LLaMA-7B.
## Prepare the Dataset
Create a `data` folder in the repository directory and place the dataset files in the `data` folder.
To start a simple evaluation task using OpenCompass, you generally need to follow three steps:
## Prepare the Evaluation Configuration File
1. **Prepare dataset configurations** - [`configs/datasets`](https://github.com/open-mmlab/OpenCompass/tree/main/configs/datasets) provides over 50 datasets supported by OpenCompass.
2. **Prepare model configurations** - The [`configs/models`](https://github.com/open-mmlab/OpenCompass/tree/main/configs/models) contains sample configuration files for already supported large models including those based on HuggingFace and similar APIs like ChatGPT.
3. **Use the 'run' script to launch** - Supported commands include running locally or on Slurm, testing multiple datasets and models at once.
Create the following configuration file `configs/llama.py`:
In this example, we will demonstrate how to test the performance of pre-trained base models from LLaMA-7B on two benchmark tasks, SIQA and PIQA. Before proceeding, ensure that you have installed OpenCompass and have access to sufficient computing resources with GPU support that meet the minimum requirements for LLaMA-7B.
To initiate the evaluation task on your local machine, use the following command:
```bash
python run.py configs/eval_llama_7b.py --debug
```
Here's a detailed step-by-step explanation of this case study:
## Step by step
<details>
<summary>prepare datasets</summary>
The SiQA and PiQA benchmarks can be automatically downloaded through their respective links here and here, so no manual downloading is required here. However, some other datasets may require manual downloads. Please refer to the documentation [Prepare Datasets](docs/zh_cn/user_guides/dataset_prepare.md) for more information.
Create a '.py' configuration file and add the following content:
```python
from mmengine.config import read_base
@ -55,12 +74,20 @@ with read_base():
# Concatenate the datasets to be evaluated into the datasets field
datasets = [*piqa_datasets, *siqa_datasets]
```
</details>
<details>
<summary>prepare models</summary>
The pretrained model 'huggyllama/llama-7b' from HuggingFace supports automatic downloading. Add the following line to your configuration file:
```python
# Evaluate models supported by HuggingFace's `AutoModelForCausalLM` using `HuggingFaceCausalLM`
from opencompass.models import HuggingFaceCausalLM
models = [
dict(
llama_7b = dict(
type=HuggingFaceCausalLM,
# Initialization parameters for `HuggingFaceCausalLM`
path='huggyllama/llama-7b',
@ -73,10 +100,14 @@ models = [
batch_size=16,
run_cfg=dict(num_gpus=1), # Run configuration for specifying resource requirements
)
]
models = [llama_7b]
```
## Start the Evaluation
</details>
<details>
<summary>Lauch Evalution</summary>
First, we can start the task in **debug mode** to check for any exceptions in model loading, dataset reading, or incorrect cache usage.
@ -109,6 +140,8 @@ If you are not performing the evaluation on your local machine but using a Slurm
- `--partition my_part`: Slurm cluster partition.
- `--retry 2`: Number of retries for failed tasks.
</details>
## Obtaining Evaluation Results
After the evaluation is complete, the evaluation results table will be printed as follows:
@ -120,4 +153,26 @@ piqa 1cf9f0 accuracy ppl 77.75
siqa e78df3 accuracy gen 36.08
```
Additionally, the text and CSV format result files will be saved in the `summary` folder of the result directory.
All run outputs will default to `outputs/default/` directory with following structure:
```markdown
outputs/default/
├── 20200220_120000
├── ...
├── 20230220_183030
│ ├── configs
│ ├── logs
│ │ ├── eval
│ │ └── infer
│ ├── predictions
│ │ └── MODEL1
│ └── results
│ └── MODEL1
```
Inside each timestamp folder there would be below items:
- configs folder, used for storing configuration files corresponding to this output dir using current time stamp;
- logs folder, used for storing inference and evaluation log files of different models;
- predictions folder, used for storing inference json result file(s), grouped by model;
- results folder, used for storing evaluation json result file(s), grouped by model.

View File

@ -7,15 +7,15 @@ Hands-on Roadmap of OpenCompass
To help users quickly utilize OpenCompass, we recommend following the hands-on
roadmap we have created for the library:
- For users who want to use OpenCompass, we recommend reading the GetStarted_ section first to set up the environment.
- For users who want to use OpenCompass, we recommend reading the GetStarted_ section first to set up the environment.
- For some basic usage, we suggest users read the UserGuides_.
- For some basic usage, we suggest users read the UserGuides_.
- If you want to customize the algorithm, we have provided the AdvancedGuides_.
- If you want to adjust the prompts, you can browse the Prompt_.
- If you want to adjust the prompts, you can browse the Prompt_.
- If you want to customize the algorithm, we have provided the AdvancedGuides_.
- We also offer the Tools_.
- We also offer the Tools_.
We always welcome *PRs* and *Issues* for the betterment of MMPretrain.
@ -31,13 +31,11 @@ We always welcome *PRs* and *Issues* for the betterment of MMPretrain.
:maxdepth: 1
:caption: UserGuides
user_guides/framework_overview.md
user_guides/config.md
user_guides/dataset_prepare.md
user_guides/models.md
user_guides/evaluation.md
user_guides/experimentation.md
user_guides/metrics.md
.. _AdvancedGuides:
.. toctree::
@ -52,7 +50,6 @@ We always welcome *PRs* and *Issues* for the betterment of MMPretrain.
:maxdepth: 1
:caption: Prompt
prompt/overview.md
prompt/few_shot.md
prompt/prompt_template.md
prompt/meta_template.md

View File

@ -1 +1,3 @@
# In-context Learning
# In-context Learning
Coming soon.

View File

@ -1 +1,3 @@
# Meta-Prompt
# Meta-Prompt
Coming soon.

View File

@ -1 +1,3 @@
# Prompt Template
# Prompt Template
Coming soon.

View File

@ -19,9 +19,9 @@ pip install -e .
3. 安装 humaneval可选
如果你希望在 humaneval 数据集上进行评估,请执行此步骤。
如果你需要在 humaneval 数据集上进行评估,请执行此步骤,否则忽略这一步
```
```bash
git clone https://github.com/openai/human-eval.git
cd human-eval
pip install -r requirements.txt
@ -33,33 +33,58 @@ cd ..
# 快速上手
在这一节,我们会以测试 LLaMA-7B 在 SIQA 和 PIQA 上的性能为例,带领你熟悉 OpenCompass 的一些基本功能。在运行前,
启动一个简单评测任务一般需要三个步骤:
1. **准备数据集及其配置** [`configs/datasets`](https://github.com/open-mmlab/OpenCompass/tree/main/configs/datasets) 提供了 OpenCompass 已经支持的 50 多种数据集。
2. **准备模型配置**[`configs/models`](https://github.com/open-mmlab/OpenCompass/tree/main/configs/models) 提供已经支持的大模型样例, 包括基于 HuggingFace 的模型以及类似 ChatGPT 的 API 模型。
3. **使用 `run` 脚本启动** 支持一行命令在本地或者 slurm 上启动评测,支持一次测试多个数据集,多个模型。
我们会以测试 LLaMA-7B 预训练基座模型在 SIQA 和 PIQA 上的性能为例,带领你熟悉 OpenCompass 的一些基本功能。在运行前,
请先确保你安装好了 OpenCompass并在本机或集群上有满足 LLaMA-7B 最低要求的 GPU 计算资源。
## 准备数据集
使用以下命令在本地启动评测任务(运行中需要联网自动下载数据集和模型,模型下载较慢)
在仓库目录创建 data 文件夹,并将数据集文件放置在 data 文件夹中
```bash
python run.py configs/eval_llama_7b.py --debug
```
## 准备评测配置文件
下面是这个案例的详细步骤解释。
创建如下配置文件 `configs/llama.py`:
## 详细步骤
<details>
<summary>准备数据集及其配置</summary>
因为 [siqa](https://huggingface.co/datasets/siqa) [piqa](https://huggingface.co/datasets/piqa) 支持自动下载,所以这里不需要手动下载数据集,但有部分数据集可能需要手动下载,详细查看文档 [准备数据集](docs/zh_cn/user_guides/dataset_prepare.md).
创建一个 '.py' 配置文件, 添加以下内容:
```python
from mmengine.config import read_base
from mmengine.config import read_base # 使用 mmengine 的 config 机制
with read_base():
# 直接从预设数据集配置中读取需要的数据集配置
from .datasets.piqa.piqa_ppl import piqa_datasets
from .datasets.siqa.siqa_gen import siqa_datasets
datasets = [*piqa_datasets, *siqa_datasets] # 最后 config 需要包含所需的评测数据集列表 datasets
```
# 将需要评测的数据集拼接成 datasets 字段
datasets = [*piqa_datasets, *siqa_datasets]
[configs/datasets](https://github.com/InternLM/OpenCompass/blob/main/configs/datasets) 包含各种数据集预先定义好的配置文件,如 [piqa](https://github.com/InternLM/OpenCompass/blob/main/configs/) 文件夹下有不同 Prompt 版本的 piqa 定义,其中 `ppl` 表示使用判别式评测, `gen` 表示使用生成式评测。[configs/datasets/collections](https://github.com/InternLM/OpenCompass/blob/main/configs/datasets/collections) 存放了各类数据集集合,方便做综合评测。
# 使用 HuggingFaceCausalLM 评测 HuggingFace 中 AutoModelForCausalLM 支持的模型
from opencompass.models import HuggingFaceCausalLM
</details>
models = [
dict(
<details>
<summary>准备模型</summary>
[configs/models](https://github.com/InternLM/OpenCompass/blob/main/configs/models) 包含多种已经支持的模型案案例,如 gpt3.5, hf_llama 等。
HuggingFace 中的 'huggyllama/llama-7b' 支持自动下载,在配置文件中添加以下内容:
```python
from opencompass.models import HuggingFaceCausalLM # 提供直接使用 HuggingFaceCausalLM 模型的接口
llama_7b = dict(
type=HuggingFaceCausalLM,
# 以下参数为 HuggingFaceCausalLM 的初始化参数
path='huggyllama/llama-7b',
@ -72,21 +97,25 @@ models = [
batch_size=16, # 批次大小
run_cfg=dict(num_gpus=1), # 运行配置,用于指定资源需求
)
]
models = [llama_7b] # 最后 config 需要包含所需的模型列表 models
```
## 启动评测
</details>
<details>
<summary>启动评测</summary>
首先,我们可以使用 debug 模式启动任务,以检查模型加载、数据集读取是否出现异常,如未正确读取缓存等。
```shell
python run.py configs/llama.py -w outputs/llama --debug
python run.py configs/eval_llama_7b.py -w outputs/llama --debug
```
`--debug` 模式下只能逐一序列执行任务,因此检查无误后,可关闭 `--debug` 模式,使程序充分利用多卡资源
```shell
python run.py configs/llama.py -w outputs/llama
python run.py configs/eval_llama_7b.py -w outputs/llama
```
以下是一些与评测相关的参数,可以帮助你根据自己的环境情况配置更高效的推理任务。
@ -104,10 +133,12 @@ python run.py configs/llama.py -w outputs/llama
如果你不是在本机进行评测,而是使用 slurm 集群,可以指定如下参数:
- `--slurm`: 使用 slurm 在集群提交任务
- `--partition my_part`: slurm 集群分区
- `--partition(-p) my_part`: slurm 集群分区
- `--retry 2`: 任务出错重试次数
## 获取评测结果
</details>
## 评测结果
评测完成后,会打印评测结果表格如下:
@ -118,4 +149,26 @@ piqa 1cf9f0 accuracy ppl 77.75
siqa e78df3 accuracy gen 36.08
```
另外,会在结果保存目录的 `summary` 文件夹中保存 txt 和 csv 格式的结果文件。
所有运行结果会默认放在`outputs/default/`目录下,目录结构如下所示:
```text
outputs/default/
├── 20200220_120000
├── ...
├── 20230220_183030
│   ├── configs
│   ├── logs
│   │   ├── eval
│   │   └── infer
│   ├── predictions
│   │   └── MODEL1
│   └── results
│ └── MODEL1
```
其中,每一个时间戳中存在以下内容:
- configs文件夹用于存放以这个时间戳为输出目录的每次运行对应的配置文件
- logs文件夹用于存放推理和评测两个阶段的输出日志文件各个文件夹内会以模型为子文件夹存放日志
- predicitions文件夹用于存放推理json结果以模型为子文件夹
- results文件夹用于存放评测json结果以模型为子文件夹

View File

@ -8,12 +8,12 @@ OpenCompass 上手路线
- 对于想要使用 OpenCompass 的用户,我们推荐先阅读 开始你的第一步_ 部分来设置环境。
- 如果您想调整提示语,您可以浏览 提示语_ 。
- 对于一些基础使用,我们建议用户阅读 教程_ 。
- 若您想进行算法的自定义,我们提供了 进阶教程_ 。
- 如果您想调整提示语,您可以浏览 提示语_ 。
- 我们同样提供了 工具_ 。
@ -31,13 +31,20 @@ OpenCompass 上手路线
:maxdepth: 1
:caption: 教程
user_guides/framework_overview.md
user_guides/config.md
user_guides/dataset_prepare.md
user_guides/models.md
user_guides/evaluation.md
user_guides/experimentation.md
user_guides/metrics.md
.. _提示语:
.. toctree::
:maxdepth: 1
:caption: 提示语
prompt/few_shot.md
prompt/prompt_template.md
prompt/meta_template.md
.. _进阶教程:
.. toctree::
@ -47,16 +54,6 @@ OpenCompass 上手路线
advanced_guides/new_dataset.md
advanced_guides/new_model.md
.. _提示语:
.. toctree::
:maxdepth: 1
:caption: 提示语
prompt/overview.md
prompt/few_shot.md
prompt/prompt_template.md
prompt/meta_template.md
.. _工具:
.. toctree::
:maxdepth: 1

View File

@ -1 +1,3 @@
# Few-shot
# Few-shot
Coming soon.

View File

@ -1 +1,3 @@
# Prompt 模板
# Prompt 模板
Coming soon.

View File

@ -5,9 +5,16 @@
评测任务的程序入口为 `run.py`,使用方法如下:
```shell
run.py [-p PARTITION] [-q QUOTATYPE] [--debug] [-m MODE] [-r [REUSE]] [-w WORKDIR] [-l LARK] config
run.py {--slrum | --dlc | None} $Config [-p PARTITION] [-q QUOTATYPE] [--debug] [-m MODE] [-r [REUSE]] [-w WORKDIR] [-l LARK]
```
启动方式:
- 本地机器运行: `run.py $Config`$Config 中不包含 `eval``infer` 字段。
- srun运行: `run.py $Config --slurm -p $PARTITION_name`
- dlc运行 `run.py $Config --dlc --aliyun-cfg $AliYun_Cfg` 后续会有教程。
- 定制化启动: `run.py $Config` $Config 中包含 `eval``infer` 字段,参考 [评估文档](./evaluation.md)。
参数解释如下:
- -p 指定 slurm 分区;
@ -18,14 +25,12 @@ run.py [-p PARTITION] [-q QUOTATYPE] [--debug] [-m MODE] [-r [REUSE]] [-w WORKDI
- -w 指定工作路径,默认为 ./outputs/default
- -l 打开飞书机器人状态上报。
以运行模式`-m all`为例,整体运行流如下:
1. 读取配置文件,解析出模型、数据集、评估器等配置信息
2. 评测任务主要分为推理 infer、评测 eval 和可视化 viz 三个阶段,其中推理和评测经过 Partitioner 进行任务切分后,交由 Runner 负责并行执行。单个推理和评测任务则被抽象成 OpenICLInferTask 和 OpenICLEvalTask。
3. 两阶段分别结束后,可视化阶段会读取 results 中的评测结果,生成可视化报告。
## 任务监控:飞书机器人
用户可以通过配置飞书机器人,实现任务状态的实时监控。飞书机器人的设置文档请[参考这里](https://open.feishu.cn/document/ukTMukTMukTM/ucTM5YjL3ETO24yNxkjN?lang=zh-CN#7a28964d)。
@ -64,7 +69,7 @@ run.py [-p PARTITION] [-q QUOTATYPE] [--debug] [-m MODE] [-r [REUSE]] [-w WORKDI
所有运行结果会默认放在`outputs/default/`目录下,目录结构如下所示:
```
```text
outputs/default/
├── 20200220_120000
├── ...
@ -80,6 +85,7 @@ outputs/default/
```
其中,每一个时间戳中存在以下内容:
- configs文件夹用于存放以这个时间戳为输出目录的每次运行对应的配置文件
- logs文件夹用于存放推理和评测两个阶段的输出日志文件各个文件夹内会以模型为子文件夹存放日志
- predicitions文件夹用于存放推理json结果以模型为子文件夹

View File

@ -1 +1,3 @@
# 评估指标
Coming soon.