mirror of
https://github.com/open-compass/opencompass.git
synced 2025-05-30 16:03:24 +08:00
[Docs] Results persistance (#1908)
* feat persistance.md * doc * doc * lint * doc * fix * doc
This commit is contained in:
parent
fff2d51440
commit
54324657f0
65
docs/en/advanced_guides/persistence.md
Normal file
65
docs/en/advanced_guides/persistence.md
Normal file
@ -0,0 +1,65 @@
|
||||
# Evaluation Results Persistence
|
||||
|
||||
## Introduction
|
||||
|
||||
Normally, the evaluation results of OpenCompass will be saved to your work directory. But in some cases, there may be a need for data sharing among users or quickly browsing existing public evaluation results. Therefore, we provide an interface that can quickly transfer evaluation results to external public data stations, and on this basis, provide functions such as uploading, overwriting, and reading.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Uploading
|
||||
|
||||
By adding `args` to the evaluation command or adding configuration in the Eval script, the results of evaluation can be stored in the path you specify. Here are the examples:
|
||||
|
||||
(Approach 1) Add an `args` option to the command and specify your public path address.
|
||||
|
||||
```bash
|
||||
opencompass ... -sp '/your_path'
|
||||
```
|
||||
|
||||
(Approach 2) Add configuration in the Eval script.
|
||||
|
||||
```pythonE
|
||||
station_path = '/your_path'
|
||||
```
|
||||
|
||||
### Overwriting
|
||||
|
||||
The above storage method will first determine whether the same task result already exists in the data station based on the `abbr` attribute in the model and dataset configuration before uploading data. If results already exists, cancel this storage. If you need to update these results, please add the `station-overwrite` option to the command, here is an example:
|
||||
|
||||
```bash
|
||||
opencompass ... -sp '/your_path' --station-overwrite
|
||||
```
|
||||
|
||||
### Reading
|
||||
|
||||
You can directly read existing results from the data station to avoid duplicate evaluation tasks. The read results will directly participate in the 'summarize' step. When using this configuration, only tasks that do not store results in the data station will be initiated. Here is an example:
|
||||
|
||||
```bash
|
||||
opencompass ... -sp '/your_path' --read-from-station
|
||||
```
|
||||
|
||||
### Command Combination
|
||||
|
||||
1. Only upload the results under your latest working directory to the data station, without supplementing tasks that missing results:
|
||||
|
||||
```bash
|
||||
opencompass ... -sp '/your_path' -r latest -m viz
|
||||
```
|
||||
|
||||
## Storage Format of the Data Station
|
||||
|
||||
In the data station, the evaluation results are stored as `json` files for each `model-dataset` pair. The specific directory form is `/your_path/dataset_name/model_name.json `. Each `json` file stores a dictionary corresponding to the results, including `predictions`, `results`, and `cfg`, here is an example:
|
||||
|
||||
```pythonE
|
||||
Result = {
|
||||
'predictions': List[Dict],
|
||||
'results': Dict,
|
||||
'cfg': Dict = {
|
||||
'models': Dict,
|
||||
'datasets': Dict,
|
||||
(Only subjective datasets)'judge_models': Dict
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Among this three keys, `predictions` records the predictions of the model on each item of data in the dataset. `results` records the total score of the model on the dataset. `cfg` records detailed configurations of the model and the dataset in this evaluation task.
|
@ -67,6 +67,7 @@ We always welcome *PRs* and *Issues* for the betterment of OpenCompass.
|
||||
advanced_guides/code_eval.md
|
||||
advanced_guides/code_eval_service.md
|
||||
advanced_guides/subjective_evaluation.md
|
||||
advanced_guides/persistence.md
|
||||
|
||||
.. _Tools:
|
||||
.. toctree::
|
||||
|
65
docs/zh_cn/advanced_guides/persistence.md
Normal file
65
docs/zh_cn/advanced_guides/persistence.md
Normal file
@ -0,0 +1,65 @@
|
||||
# 评测结果持久化
|
||||
|
||||
## 介绍
|
||||
|
||||
通常情况下,OpenCompass的评测结果将会保存到工作目录下。 但在某些情况下,可能会产生用户间的数据共享,以及快速查看已有的公共评测结果等需求。 因此,我们提供了一个能够将评测结果快速转存到外部公共数据站的接口,并且在此基础上提供了对数据站的上传、更新、读取等功能。
|
||||
|
||||
## 快速开始
|
||||
|
||||
### 向数据站存储数据
|
||||
|
||||
通过在CLI评测指令中添加`args`或在Eval脚本中添加配置,即可将本次评测结果存储到您所指定的路径,示例如下:
|
||||
|
||||
(方式1)在指令中添加`args`选项并指定你的公共路径地址。
|
||||
|
||||
```bash
|
||||
opencompass ... -sp '/your_path'
|
||||
```
|
||||
|
||||
(方式2)在Eval脚本中添加配置。
|
||||
|
||||
```pythonE
|
||||
station_path = '/your_path'
|
||||
```
|
||||
|
||||
### 向数据站更新数据
|
||||
|
||||
上述存储方法在上传数据前会首先根据模型和数据集配置中的`abbr`属性来判断数据站中是否已有相同任务结果。若已有结果,则取消本次存储。如果您需要更新这部分结果,请在指令中添加`station-overwrite`选项,示例如下:
|
||||
|
||||
```bash
|
||||
opencompass ... -sp '/your_path' --station-overwrite
|
||||
```
|
||||
|
||||
### 读取数据站中已有的结果
|
||||
|
||||
您可以直接从数据站中读取已有的结果,以避免重复进行评测任务。读取到的结果会直接参与到`summarize`步骤。采用该配置时,仅有数据站中未存储结果的任务会被启动。示例如下:
|
||||
|
||||
```bash
|
||||
opencompass ... -sp '/your_path' --read-from-station
|
||||
```
|
||||
|
||||
### 指令组合
|
||||
|
||||
1. 仅向数据站上传最新工作目录下结果,不补充运行缺失结果的任务:
|
||||
|
||||
```bash
|
||||
opencompass ... -sp '/your_path' -r latest -m viz
|
||||
```
|
||||
|
||||
## 数据站存储格式
|
||||
|
||||
在数据站中,评测结果按照每个`model-dataset`对的结果存储为`json`文件。具体的目录组织形式为`/your_path/dataset_name/model_name.json`。每个`json`文件都存储了对应结果的字典,包括`predictions`、`results`以及`cfg`三个子项,具体示例如下:
|
||||
|
||||
```pythonE
|
||||
Result = {
|
||||
'predictions': List[Dict],
|
||||
'results': Dict,
|
||||
'cfg': Dict = {
|
||||
'models': Dict,
|
||||
'datasets': Dict,
|
||||
(Only subjective datasets)'judge_models': Dict
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
其中,`predictions`记录了模型对数据集中每一条数据的prediction的结果,`results`记录了模型在该数据集上的评分,`cfg`记录了该评测任务中模型和数据集的详细配置。
|
@ -67,6 +67,7 @@ OpenCompass 上手路线
|
||||
advanced_guides/code_eval.md
|
||||
advanced_guides/code_eval_service.md
|
||||
advanced_guides/subjective_evaluation.md
|
||||
advanced_guides/persistence.md
|
||||
|
||||
.. _工具:
|
||||
.. toctree::
|
||||
|
Loading…
Reference in New Issue
Block a user