Merge branch 'open-compass:main' into main

This commit is contained in:
bittersweet1999 2024-07-29 13:52:21 +08:00 committed by GitHub
commit d72ca83102
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
448 changed files with 4015 additions and 1059 deletions

2
.gitignore vendored
View File

@ -1,4 +1,4 @@
.DS_Store
output_*/ output_*/
outputs/ outputs/
scripts/ scripts/

View File

@ -1,6 +1,7 @@
exclude: | exclude: |
(?x)^( (?x)^(
tests/data/| tests/data/|
tests/dataset/|
opencompass/models/internal/| opencompass/models/internal/|
opencompass/utils/internal/| opencompass/utils/internal/|
opencompass/openicl/icl_evaluator/hf_metrics/| opencompass/openicl/icl_evaluator/hf_metrics/|

View File

@ -70,6 +70,7 @@ Just like a compass guides us on our journey, OpenCompass will guide you through
## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a> ## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
- **\[2024.07.23\]** We supported the [ModelScope](www.modelscope.cn) datasets, you can load them on demand without downloading all the data to your local disk. Welcome to try! 🔥🔥🔥
- **\[2024.07.17\]** We have released the example data and configuration for the CompassBench-202408, welcome to [CompassBench](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/compassbench_intro.html) for more details. 🔥🔥🔥 - **\[2024.07.17\]** We have released the example data and configuration for the CompassBench-202408, welcome to [CompassBench](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/compassbench_intro.html) for more details. 🔥🔥🔥
- **\[2024.07.17\]** We are excited to announce the release of NeedleBench's [technical report](http://arxiv.org/abs/2407.11963). We invite you to visit our [support documentation](https://opencompass.readthedocs.io/en/latest/advanced_guides/needleinahaystack_eval.html) for detailed evaluation guidelines. 🔥🔥🔥 - **\[2024.07.17\]** We are excited to announce the release of NeedleBench's [technical report](http://arxiv.org/abs/2407.11963). We invite you to visit our [support documentation](https://opencompass.readthedocs.io/en/latest/advanced_guides/needleinahaystack_eval.html) for detailed evaluation guidelines. 🔥🔥🔥
- **\[2024.07.04\]** OpenCompass now supports InternLM2.5, which has **outstanding reasoning capability**, **1M Context window and** and **stronger tool use**, you can try the models in [OpenCompass Config](https://github.com/open-compass/opencompass/tree/main/configs/models/hf_internlm) and [InternLM](https://github.com/InternLM/InternLM) .🔥🔥🔥. - **\[2024.07.04\]** OpenCompass now supports InternLM2.5, which has **outstanding reasoning capability**, **1M Context window and** and **stronger tool use**, you can try the models in [OpenCompass Config](https://github.com/open-compass/opencompass/tree/main/configs/models/hf_internlm) and [InternLM](https://github.com/InternLM/InternLM) .🔥🔥🔥.
@ -136,12 +137,29 @@ pip install -e .
### 📂 Data Preparation ### 📂 Data Preparation
You can download and extract the datasets with the following commands:
```bash ```bash
# Download dataset to data/ folder # Download dataset to data/ folder
wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
unzip OpenCompassData-core-20240207.zip unzip OpenCompassData-core-20240207.zip
``` ```
Also, use the [ModelScope](www.modelscope.cn) to load the datasets on demand.
Installation:
```bash
pip install modelscope
export DATASET_SOURCE=ModelScope
```
Then submit the evaluation task without downloading all the data to your local disk. Available datasets include:
```bash
humaneval, triviaqa, commonsenseqa, tydiqa, strategyqa, cmmlu, lambada, piqa, ceval, math, LCSTS, Xsum, winogrande, openbookqa, AGIEval, gsm8k, nq, race, siqa, mbpp, mmlu, hellaswag, ARC, BBH, xstory_cloze, summedits, GAOKAO-BENCH, OCNLI, cmnli
```
Some third-party features, like Humaneval and Llama, may require additional steps to work properly, for detailed steps please refer to the [Installation Guide](https://opencompass.readthedocs.io/en/latest/get_started/installation.html). Some third-party features, like Humaneval and Llama, may require additional steps to work properly, for detailed steps please refer to the [Installation Guide](https://opencompass.readthedocs.io/en/latest/get_started/installation.html).
<p align="right"><a href="#top">🔝Back to top</a></p> <p align="right"><a href="#top">🔝Back to top</a></p>

View File

@ -69,6 +69,7 @@
## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a> ## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
- **\[2024.07.23\]** 我们支持了[ModelScope](www.modelscope.cn)数据集,您可以按需加载,无需事先下载全部数据到本地,欢迎试用!🔥🔥🔥
- **\[2024.07.17\]** 我们发布了CompassBench-202408榜单的示例数据和评测规则敬请访问 [CompassBench](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/compassbench_intro.html) 获取更多信息。 🔥🔥🔥 - **\[2024.07.17\]** 我们发布了CompassBench-202408榜单的示例数据和评测规则敬请访问 [CompassBench](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/compassbench_intro.html) 获取更多信息。 🔥🔥🔥
- **\[2024.07.17\]** 我们正式发布 NeedleBench 的[技术报告](http://arxiv.org/abs/2407.11963)。诚邀您访问我们的[帮助文档](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/needleinahaystack_eval.html)进行评估。🔥🔥🔥 - **\[2024.07.17\]** 我们正式发布 NeedleBench 的[技术报告](http://arxiv.org/abs/2407.11963)。诚邀您访问我们的[帮助文档](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/needleinahaystack_eval.html)进行评估。🔥🔥🔥
- **\[2024.07.04\]** OpenCompass 现已支持 InternLM2.5 它拥有卓越的推理性能、有效支持百万字超长上下文以及工具调用能力整体升级,欢迎访问[OpenCompass Config](https://github.com/open-compass/opencompass/tree/main/configs/models/hf_internlm) 和 [InternLM](https://github.com/InternLM/InternLM) .🔥🔥🔥. - **\[2024.07.04\]** OpenCompass 现已支持 InternLM2.5 它拥有卓越的推理性能、有效支持百万字超长上下文以及工具调用能力整体升级,欢迎访问[OpenCompass Config](https://github.com/open-compass/opencompass/tree/main/configs/models/hf_internlm) 和 [InternLM](https://github.com/InternLM/InternLM) .🔥🔥🔥.
@ -138,12 +139,28 @@ pip install -e .
### 📂 数据准备 ### 📂 数据准备
OpenCompass支持使用本地数据集进行评测数据集的下载和解压可以通过以下命令完成
```bash ```bash
# 下载数据集到 data/ 处 # 下载数据集到 data/ 处
wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
unzip OpenCompassData-core-20240207.zip unzip OpenCompassData-core-20240207.zip
``` ```
另外,您还可以使用[ModelScope](www.modelscope.cn)来加载数据集:
环境准备:
```bash
pip install modelscope
export DATASET_SOURCE=ModelScope
```
配置好环境后,无需下载全部数据,直接提交评测任务即可。目前支持的数据集有:
```bash
humaneval, triviaqa, commonsenseqa, tydiqa, strategyqa, cmmlu, lambada, piqa, ceval, math, LCSTS, Xsum, winogrande, openbookqa, AGIEval, gsm8k, nq, race, siqa, mbpp, mmlu, hellaswag, ARC, BBH, xstory_cloze, summedits, GAOKAO-BENCH, OCNLI, cmnli
```
有部分第三方功能,如 Humaneval 以及 Llama,可能需要额外步骤才能正常运行,详细步骤请参考[安装指南](https://opencompass.readthedocs.io/zh_CN/latest/get_started/installation.html)。 有部分第三方功能,如 Humaneval 以及 Llama,可能需要额外步骤才能正常运行,详细步骤请参考[安装指南](https://opencompass.readthedocs.io/zh_CN/latest/get_started/installation.html)。
<p align="right"><a href="#top">🔝返回顶部</a></p> <p align="right"><a href="#top">🔝返回顶部</a></p>

View File

@ -47,7 +47,8 @@ ARC_c_datasets = [
dict( dict(
type=ARCDataset, type=ARCDataset,
abbr='ARC-c-test', abbr='ARC-c-test',
path='./data/ARC/ARC-c/ARC-Challenge-Test.jsonl', path='opencompass/ai2_arc-test',
name='ARC-Challenge',
reader_cfg=ARC_c_reader_cfg, reader_cfg=ARC_c_reader_cfg,
infer_cfg=ARC_c_infer_cfg, infer_cfg=ARC_c_infer_cfg,
eval_cfg=ARC_c_eval_cfg) eval_cfg=ARC_c_eval_cfg)

View File

@ -35,7 +35,8 @@ ARC_c_datasets = [
dict( dict(
abbr='ARC-c', abbr='ARC-c',
type=ARCDataset, type=ARCDataset,
path='./data/ARC/ARC-c/ARC-Challenge-Dev.jsonl', path='opencompass/ai2_arc-dev',
name='ARC-Challenge',
reader_cfg=ARC_c_reader_cfg, reader_cfg=ARC_c_reader_cfg,
infer_cfg=ARC_c_infer_cfg, infer_cfg=ARC_c_infer_cfg,
eval_cfg=ARC_c_eval_cfg, eval_cfg=ARC_c_eval_cfg,

View File

@ -29,7 +29,8 @@ ARC_c_datasets = [
dict( dict(
type=ARCDataset, type=ARCDataset,
abbr='ARC-c', abbr='ARC-c',
path='./data/ARC/ARC-c/ARC-Challenge-Dev.jsonl', path='opencompass/ai2_arc-dev',
name='ARC-Challenge',
reader_cfg=ARC_c_reader_cfg, reader_cfg=ARC_c_reader_cfg,
infer_cfg=ARC_c_infer_cfg, infer_cfg=ARC_c_infer_cfg,
eval_cfg=ARC_c_eval_cfg) eval_cfg=ARC_c_eval_cfg)

View File

@ -46,7 +46,8 @@ ARC_c_datasets = [
dict( dict(
type=ARCDataset, type=ARCDataset,
abbr='ARC-c', abbr='ARC-c',
path='./data/ARC/ARC-c/ARC-Challenge-Dev.jsonl', path='opencompass/ai2_arc-dev',
name='ARC-Challenge',
reader_cfg=ARC_c_reader_cfg, reader_cfg=ARC_c_reader_cfg,
infer_cfg=ARC_c_infer_cfg, infer_cfg=ARC_c_infer_cfg,
eval_cfg=ARC_c_eval_cfg) eval_cfg=ARC_c_eval_cfg)

View File

@ -1,3 +1,5 @@
from mmengine.config import read_base
# with read_base():
from opencompass.openicl.icl_prompt_template import PromptTemplate from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import PPLInferencer from opencompass.openicl.icl_inferencer import PPLInferencer
@ -26,7 +28,8 @@ ARC_c_datasets = [
dict( dict(
type=ARCDataset, type=ARCDataset,
abbr='ARC-c', abbr='ARC-c',
path='./data/ARC/ARC-c/ARC-Challenge-Dev.jsonl', path='opencompass/ai2_arc-dev',
name='ARC-Challenge',
reader_cfg=ARC_c_reader_cfg, reader_cfg=ARC_c_reader_cfg,
infer_cfg=ARC_c_infer_cfg, infer_cfg=ARC_c_infer_cfg,
eval_cfg=ARC_c_eval_cfg) eval_cfg=ARC_c_eval_cfg)

View File

@ -35,7 +35,8 @@ ARC_e_datasets = [
dict( dict(
abbr='ARC-e', abbr='ARC-e',
type=ARCDataset, type=ARCDataset,
path='./data/ARC/ARC-e/ARC-Easy-Dev.jsonl', path='opencompass/ai2_arc-easy-dev',
name='ARC-Easy',
reader_cfg=ARC_e_reader_cfg, reader_cfg=ARC_e_reader_cfg,
infer_cfg=ARC_e_infer_cfg, infer_cfg=ARC_e_infer_cfg,
eval_cfg=ARC_e_eval_cfg, eval_cfg=ARC_e_eval_cfg,

View File

@ -29,7 +29,8 @@ ARC_e_datasets = [
dict( dict(
type=ARCDataset, type=ARCDataset,
abbr='ARC-e', abbr='ARC-e',
path='./data/ARC/ARC-e/ARC-Easy-Dev.jsonl', path='opencompass/ai2_arc-easy-dev',
name='ARC-Easy',
reader_cfg=ARC_e_reader_cfg, reader_cfg=ARC_e_reader_cfg,
infer_cfg=ARC_e_infer_cfg, infer_cfg=ARC_e_infer_cfg,
eval_cfg=ARC_e_eval_cfg) eval_cfg=ARC_e_eval_cfg)

View File

@ -46,7 +46,8 @@ ARC_e_datasets = [
dict( dict(
type=ARCDataset, type=ARCDataset,
abbr='ARC-e', abbr='ARC-e',
path='./data/ARC/ARC-e/ARC-Easy-Dev.jsonl', path='opencompass/ai2_arc-easy-dev',
name='ARC-Easy',
reader_cfg=ARC_e_reader_cfg, reader_cfg=ARC_e_reader_cfg,
infer_cfg=ARC_e_infer_cfg, infer_cfg=ARC_e_infer_cfg,
eval_cfg=ARC_e_eval_cfg) eval_cfg=ARC_e_eval_cfg)

View File

@ -26,7 +26,8 @@ ARC_e_datasets = [
dict( dict(
type=ARCDataset, type=ARCDataset,
abbr='ARC-e', abbr='ARC-e',
path='./data/ARC/ARC-e/ARC-Easy-Dev.jsonl', path='opencompass/ai2_arc-easy-dev',
name='ARC-Easy',
reader_cfg=ARC_e_reader_cfg, reader_cfg=ARC_e_reader_cfg,
infer_cfg=ARC_e_infer_cfg, infer_cfg=ARC_e_infer_cfg,
eval_cfg=ARC_e_eval_cfg) eval_cfg=ARC_e_eval_cfg)

View File

@ -86,15 +86,69 @@ Below are the steps for quickly downloading CHARM and using OpenCompass for eval
### 1. Download CHARM ### 1. Download CHARM
```bash ```bash
git clone https://github.com/opendatalab/CHARM ${path_to_CHARM_repo} git clone https://github.com/opendatalab/CHARM ${path_to_CHARM_repo}
cd ${path_to_opencompass}
mkdir data
ln -snf ${path_to_CHARM_repo}/data/CHARM ./data/CHARM
``` ```
### 2. Run Inference and Evaluation ### 2. Run Inference and Evaluation
```bash ```bash
cd ${path_to_opencompass} cd ${path_to_opencompass}
mkdir -p data
ln -snf ${path_to_CHARM_repo}/data/CHARM ./data/CHARM
# Infering and evaluating CHARM with hf_llama3_8b_instruct model # modify config file `configs/eval_charm_rea.py`: uncomment or add models you want to evaluate
python run.py --models hf_llama3_8b_instruct --datasets charm_gen python run.py configs/eval_charm_rea.py -r --dump-eval-details
# modify config file `configs/eval_charm_mem.py`: uncomment or add models you want to evaluate
python run.py configs/eval_charm_mem.py -r --dump-eval-details
```
The inference and evaluation results would be in `${path_to_opencompass}/outputs`, like this:
```bash
outputs
├── CHARM_mem
│ └── chat
│ └── 20240605_151442
│ ├── predictions
│ │ ├── internlm2-chat-1.8b-turbomind
│ │ ├── llama-3-8b-instruct-lmdeploy
│ │ └── qwen1.5-1.8b-chat-hf
│ ├── results
│ │ ├── internlm2-chat-1.8b-turbomind_judged-by--GPT-3.5-turbo-0125
│ │ ├── llama-3-8b-instruct-lmdeploy_judged-by--GPT-3.5-turbo-0125
│ │ └── qwen1.5-1.8b-chat-hf_judged-by--GPT-3.5-turbo-0125
│   └── summary
│   └── 20240605_205020 # MEMORY_SUMMARY_DIR
│   ├── judged-by--GPT-3.5-turbo-0125-charm-memory-Chinese_Anachronisms_Judgment
│   ├── judged-by--GPT-3.5-turbo-0125-charm-memory-Chinese_Movie_and_Music_Recommendation
│   ├── judged-by--GPT-3.5-turbo-0125-charm-memory-Chinese_Sport_Understanding
│   ├── judged-by--GPT-3.5-turbo-0125-charm-memory-Chinese_Time_Understanding
│   └── judged-by--GPT-3.5-turbo-0125.csv # MEMORY_SUMMARY_CSV
└── CHARM_rea
└── chat
└── 20240605_152359
├── predictions
│ ├── internlm2-chat-1.8b-turbomind
│ ├── llama-3-8b-instruct-lmdeploy
│ └── qwen1.5-1.8b-chat-hf
├── results # REASON_RESULTS_DIR
│ ├── internlm2-chat-1.8b-turbomind
│ ├── llama-3-8b-instruct-lmdeploy
│ └── qwen1.5-1.8b-chat-hf
└── summary
├── summary_20240605_205328.csv # REASON_SUMMARY_CSV
└── summary_20240605_205328.txt
```
### 3. Generate Analysis Results
```bash
cd ${path_to_CHARM_repo}
# generate Table5, Table6, Table9 and Table10 in https://arxiv.org/abs/2403.14112
PYTHONPATH=. python tools/summarize_reasoning.py ${REASON_SUMMARY_CSV}
# generate Figure3 and Figure9 in https://arxiv.org/abs/2403.14112
PYTHONPATH=. python tools/summarize_mem_rea.py ${REASON_SUMMARY_CSV} ${MEMORY_SUMMARY_CSV}
# generate Table7, Table12, Table13 and Figure11 in https://arxiv.org/abs/2403.14112
PYTHONPATH=. python tools/analyze_mem_indep_rea.py data/CHARM ${REASON_RESULTS_DIR} ${MEMORY_SUMMARY_DIR} ${MEMORY_SUMMARY_CSV}
``` ```
## 🖊️ Citation ## 🖊️ Citation

View File

@ -84,15 +84,69 @@
### 1. 下载 CHARM ### 1. 下载 CHARM
```bash ```bash
git clone https://github.com/opendatalab/CHARM ${path_to_CHARM_repo} git clone https://github.com/opendatalab/CHARM ${path_to_CHARM_repo}
cd ${path_to_opencompass}
mkdir data
ln -snf ${path_to_CHARM_repo}/data/CHARM ./data/CHARM
``` ```
### 2. 推理和评测 ### 2. 推理和评测
```bash ```bash
cd ${path_to_opencompass} cd ${path_to_opencompass}
mkdir -p data
ln -snf ${path_to_CHARM_repo}/data/CHARM ./data/CHARM
# 在CHARM上对模型hf_llama3_8b_instruct做推理和评测 # 修改配置文件`configs/eval_charm_rea.py`: 将现有的模型取消注释,或者添加你想评测的模型
python run.py --models hf_llama3_8b_instruct --datasets charm_gen python run.py configs/eval_charm_rea.py -r --dump-eval-details
# 修改配置文件`configs/eval_charm_mem.py`: 将现有的模型取消注释,或者添加你想评测的模型
python run.py configs/eval_charm_mem.py -r --dump-eval-details
```
推理和评测的结果位于路径`${path_to_opencompass}/outputs`, 如下所示:
```bash
outputs
├── CHARM_mem
│ └── chat
│ └── 20240605_151442
│ ├── predictions
│ │ ├── internlm2-chat-1.8b-turbomind
│ │ ├── llama-3-8b-instruct-lmdeploy
│ │ └── qwen1.5-1.8b-chat-hf
│ ├── results
│ │ ├── internlm2-chat-1.8b-turbomind_judged-by--GPT-3.5-turbo-0125
│ │ ├── llama-3-8b-instruct-lmdeploy_judged-by--GPT-3.5-turbo-0125
│ │ └── qwen1.5-1.8b-chat-hf_judged-by--GPT-3.5-turbo-0125
│   └── summary
│   └── 20240605_205020 # MEMORY_SUMMARY_DIR
│   ├── judged-by--GPT-3.5-turbo-0125-charm-memory-Chinese_Anachronisms_Judgment
│   ├── judged-by--GPT-3.5-turbo-0125-charm-memory-Chinese_Movie_and_Music_Recommendation
│   ├── judged-by--GPT-3.5-turbo-0125-charm-memory-Chinese_Sport_Understanding
│   ├── judged-by--GPT-3.5-turbo-0125-charm-memory-Chinese_Time_Understanding
│   └── judged-by--GPT-3.5-turbo-0125.csv # MEMORY_SUMMARY_CSV
└── CHARM_rea
└── chat
└── 20240605_152359
├── predictions
│ ├── internlm2-chat-1.8b-turbomind
│ ├── llama-3-8b-instruct-lmdeploy
│ └── qwen1.5-1.8b-chat-hf
├── results # REASON_RESULTS_DIR
│ ├── internlm2-chat-1.8b-turbomind
│ ├── llama-3-8b-instruct-lmdeploy
│ └── qwen1.5-1.8b-chat-hf
└── summary
├── summary_20240605_205328.csv # REASON_SUMMARY_CSV
└── summary_20240605_205328.txt
```
### 3. 生成分析结果
```bash
cd ${path_to_CHARM_repo}
# 生成论文中的Table5, Table6, Table9 and Table10详见https://arxiv.org/abs/2403.14112
PYTHONPATH=. python tools/summarize_reasoning.py ${REASON_SUMMARY_CSV}
# 生成论文中的Figure3 and Figure9详见https://arxiv.org/abs/2403.14112
PYTHONPATH=. python tools/summarize_mem_rea.py ${REASON_SUMMARY_CSV} ${MEMORY_SUMMARY_CSV}
# 生成论文中的Table7, Table12, Table13 and Figure11详见https://arxiv.org/abs/2403.14112
PYTHONPATH=. python tools/analyze_mem_indep_rea.py data/CHARM ${REASON_RESULTS_DIR} ${MEMORY_SUMMARY_DIR} ${MEMORY_SUMMARY_CSV}
``` ```
## 🖊️ 引用 ## 🖊️ 引用

View File

@ -0,0 +1,63 @@
import os
from mmengine.config import read_base
from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.datasets import CharmDataset, CharmMemoryEvaluator, LMEvaluator
with read_base():
from .charm_memory_settings import charm_memory_tasks, judge_system_prompts, dataset_path
charm_memory_datasets = []
for _task in charm_memory_tasks:
charm_memory_reader_cfg = dict(input_columns=['input'],
output_column='target')
charm_memory_infer_cfg = dict(
prompt_template=dict(
type=PromptTemplate,
template=dict(round=[
dict(role='HUMAN', prompt='请尽可能简短地回答下述问题。\n问题:{input}\n答:')
]),
),
retriever=dict(type=ZeroRetriever),
inferencer=dict(type=GenInferencer, max_out_len=512),
)
if _task == 'Chinese_Movie_and_Music_Recommendation':
charm_memory_eval_cfg = dict(
evaluator=dict(type=CharmMemoryEvaluator),
pred_role='BOT',
)
else:
judge_system_prompt = judge_system_prompts[_task]
charm_memory_eval_cfg = dict(
evaluator=dict(
type=LMEvaluator,
prompt_template=dict(
type=PromptTemplate,
template=dict(round=[
dict(
role='HUMAN',
prompt=judge_system_prompt +
"\n\n[Question]\n{input}\n[The Start of Reference Answer]\n{target}\n[The End of Reference Answer]\n\n[The Start of Assistant's Answer]\n{prediction}\n[The End of Assistant's Answer]" # noqa
),
]),
),
),
pred_role='BOT',
)
charm_memory_datasets.append(
dict(
type=CharmDataset,
path=dataset_path,
name=_task,
abbr='charm-memory-' + _task,
reader_cfg=charm_memory_reader_cfg,
infer_cfg=charm_memory_infer_cfg.copy(),
eval_cfg=charm_memory_eval_cfg.copy(),
))

View File

@ -0,0 +1,31 @@
import os
charm_memory_tasks = [
'Chinese_Anachronisms_Judgment',
'Chinese_Movie_and_Music_Recommendation',
'Chinese_Sport_Understanding',
'Chinese_Time_Understanding',
]
dataset_path = 'data/CHARM/memorization'
system_prompt_template = """Please act as an impartial judge, comparing the responses of the AI assistants to the reference answer and determining if the answers are correct.
You will receive the reference answer provided by a human and the responses of the AI assistants.
Your task is to judge whether the AI assistant's answers is correct.
{task_specific_prompt}
After providing your explanation, strictly output your final judgment in the following format: [正确] if the AI assistant's response is correct, “[错误]” if the AI assistant's response is incorrect.
"""
task_specific_prompts = {
'Chinese_Anachronisms_Judgment':
"If the provided reference answer is a list, the model's prediction is considered correct if it matches any item in the list.",
'Chinese_Time_Understanding':
"When evaluating the AI assistant's response regarding Chinese solar terms, as long as the AI assistant's response falls within the time frame provided in the reference answer, consider it correct.",
'Chinese_Sport_Understanding':
"If the provided reference answer is a list, the model's prediction is considered correct if it matches any item in the list."
}
judge_system_prompts = {
k: system_prompt_template.format(task_specific_prompt=v)
for k, v in task_specific_prompts.items()
}

View File

@ -28,7 +28,7 @@ CMRC_datasets = [
dict( dict(
type=CMRCDataset, type=CMRCDataset,
abbr='CMRC_dev', abbr='CMRC_dev',
path='./data/CLUE/CMRC/dev.json', path='opencompass/cmrc_dev',
reader_cfg=CMRC_reader_cfg, reader_cfg=CMRC_reader_cfg,
infer_cfg=CMRC_infer_cfg, infer_cfg=CMRC_infer_cfg,
eval_cfg=CMRC_eval_cfg), eval_cfg=CMRC_eval_cfg),

View File

@ -26,7 +26,7 @@ CMRC_datasets = [
dict( dict(
type=CMRCDataset, type=CMRCDataset,
abbr='CMRC_dev', abbr='CMRC_dev',
path='./data/CLUE/CMRC/dev.json', path='opencompass/cmrc_dev',
reader_cfg=CMRC_reader_cfg, reader_cfg=CMRC_reader_cfg,
infer_cfg=CMRC_infer_cfg, infer_cfg=CMRC_infer_cfg,
eval_cfg=CMRC_eval_cfg), eval_cfg=CMRC_eval_cfg),

View File

@ -20,7 +20,7 @@ CMRC_datasets = [
dict( dict(
type=CMRCDataset, type=CMRCDataset,
abbr='CMRC_dev', abbr='CMRC_dev',
path='./data/CLUE/CMRC/dev.json', path='opencompass/cmrc_dev',
reader_cfg=CMRC_reader_cfg, reader_cfg=CMRC_reader_cfg,
infer_cfg=CMRC_infer_cfg, infer_cfg=CMRC_infer_cfg,
eval_cfg=CMRC_eval_cfg), eval_cfg=CMRC_eval_cfg),

View File

@ -27,7 +27,7 @@ CMRC_datasets = [
dict( dict(
type=CMRCDataset, type=CMRCDataset,
abbr='CMRC_dev', abbr='CMRC_dev',
path='./data/CLUE/CMRC/dev.json', path='opencompass/cmrc_dev',
reader_cfg=CMRC_reader_cfg, reader_cfg=CMRC_reader_cfg,
infer_cfg=CMRC_infer_cfg, infer_cfg=CMRC_infer_cfg,
eval_cfg=CMRC_eval_cfg), eval_cfg=CMRC_eval_cfg),

View File

@ -29,7 +29,7 @@ DRCD_datasets = [
dict( dict(
type=DRCDDataset, type=DRCDDataset,
abbr='DRCD_dev', abbr='DRCD_dev',
path='./data/CLUE/DRCD/dev.json', path='opencompass/drcd_dev',
reader_cfg=DRCD_reader_cfg, reader_cfg=DRCD_reader_cfg,
infer_cfg=DRCD_infer_cfg, infer_cfg=DRCD_infer_cfg,
eval_cfg=DRCD_eval_cfg), eval_cfg=DRCD_eval_cfg),

View File

@ -26,7 +26,7 @@ DRCD_datasets = [
dict( dict(
type=DRCDDataset, type=DRCDDataset,
abbr='DRCD_dev', abbr='DRCD_dev',
path='./data/CLUE/DRCD/dev.json', path='opencompass/drcd_dev',
reader_cfg=DRCD_reader_cfg, reader_cfg=DRCD_reader_cfg,
infer_cfg=DRCD_infer_cfg, infer_cfg=DRCD_infer_cfg,
eval_cfg=DRCD_eval_cfg), eval_cfg=DRCD_eval_cfg),

View File

@ -20,7 +20,7 @@ DRCD_datasets = [
dict( dict(
type=DRCDDataset, type=DRCDDataset,
abbr='DRCD_dev', abbr='DRCD_dev',
path='./data/CLUE/DRCD/dev.json', path='opencompass/drcd_dev',
reader_cfg=DRCD_reader_cfg, reader_cfg=DRCD_reader_cfg,
infer_cfg=DRCD_infer_cfg, infer_cfg=DRCD_infer_cfg,
eval_cfg=DRCD_eval_cfg), eval_cfg=DRCD_eval_cfg),

View File

@ -27,7 +27,7 @@ DRCD_datasets = [
dict( dict(
type=DRCDDataset, type=DRCDDataset,
abbr='DRCD_dev', abbr='DRCD_dev',
path='./data/CLUE/DRCD/dev.json', path='opencompass/drcd_dev',
reader_cfg=DRCD_reader_cfg, reader_cfg=DRCD_reader_cfg,
infer_cfg=DRCD_infer_cfg, infer_cfg=DRCD_infer_cfg,
eval_cfg=DRCD_eval_cfg), eval_cfg=DRCD_eval_cfg),

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import AFQMCDataset_V2 from opencompass.datasets import AFQMCDatasetV2
from opencompass.utils.text_postprocessors import first_capital_postprocess from opencompass.utils.text_postprocessors import first_capital_postprocess
afqmc_reader_cfg = dict( afqmc_reader_cfg = dict(
@ -34,8 +34,8 @@ afqmc_eval_cfg = dict(
afqmc_datasets = [ afqmc_datasets = [
dict( dict(
abbr='afqmc-dev', abbr='afqmc-dev',
type=AFQMCDataset_V2, type=AFQMCDatasetV2,
path='./data/CLUE/AFQMC/dev.json', path='opencompass/afqmc-dev',
reader_cfg=afqmc_reader_cfg, reader_cfg=afqmc_reader_cfg,
infer_cfg=afqmc_infer_cfg, infer_cfg=afqmc_infer_cfg,
eval_cfg=afqmc_eval_cfg, eval_cfg=afqmc_eval_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import cmnliDataset_V2 from opencompass.datasets import CMNLIDatasetV2
from opencompass.utils.text_postprocessors import first_capital_postprocess from opencompass.utils.text_postprocessors import first_capital_postprocess
cmnli_reader_cfg = dict( cmnli_reader_cfg = dict(
@ -34,8 +34,8 @@ cmnli_eval_cfg = dict(
cmnli_datasets = [ cmnli_datasets = [
dict( dict(
abbr='cmnli', abbr='cmnli',
type=cmnliDataset_V2, type=CMNLIDatasetV2,
path='./data/CLUE/cmnli/cmnli_public/dev.json', path='opencompass/cmnli-dev',
reader_cfg=cmnli_reader_cfg, reader_cfg=cmnli_reader_cfg,
infer_cfg=cmnli_infer_cfg, infer_cfg=cmnli_infer_cfg,
eval_cfg=cmnli_eval_cfg, eval_cfg=cmnli_eval_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import cmnliDataset_V2 from opencompass.datasets import CMNLIDatasetV2
from opencompass.utils.text_postprocessors import first_capital_postprocess from opencompass.utils.text_postprocessors import first_capital_postprocess
cmnli_reader_cfg = dict( cmnli_reader_cfg = dict(
@ -34,8 +34,8 @@ cmnli_eval_cfg = dict(
cmnli_datasets = [ cmnli_datasets = [
dict( dict(
abbr='cmnli', abbr='cmnli',
type=cmnliDataset_V2, type=CMNLIDatasetV2,
path='./data/CLUE/cmnli/cmnli_public/dev.json', path='opencompass/cmnli-dev',
reader_cfg=cmnli_reader_cfg, reader_cfg=cmnli_reader_cfg,
infer_cfg=cmnli_infer_cfg, infer_cfg=cmnli_infer_cfg,
eval_cfg=cmnli_eval_cfg, eval_cfg=cmnli_eval_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import PPLInferencer from opencompass.openicl.icl_inferencer import PPLInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import cmnliDataset from opencompass.datasets import CMNLIDataset
cmnli_reader_cfg = dict( cmnli_reader_cfg = dict(
input_columns=['sentence1', 'sentence2'], input_columns=['sentence1', 'sentence2'],
@ -26,8 +26,8 @@ cmnli_eval_cfg = dict(evaluator=dict(type=AccEvaluator))
cmnli_datasets = [ cmnli_datasets = [
dict( dict(
abbr='cmnli', abbr='cmnli',
type=cmnliDataset, type=CMNLIDataset,
path='./data/CLUE/cmnli/cmnli_public/dev.json', path='opencompass/cmnli-dev',
reader_cfg=cmnli_reader_cfg, reader_cfg=cmnli_reader_cfg,
infer_cfg=cmnli_infer_cfg, infer_cfg=cmnli_infer_cfg,
eval_cfg=cmnli_eval_cfg) eval_cfg=cmnli_eval_cfg)

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import PPLInferencer from opencompass.openicl.icl_inferencer import PPLInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import cmnliDataset from opencompass.datasets import CMNLIDataset
cmnli_reader_cfg = dict( cmnli_reader_cfg = dict(
input_columns=['sentence1', 'sentence2'], input_columns=['sentence1', 'sentence2'],
@ -42,8 +42,8 @@ cmnli_eval_cfg = dict(evaluator=dict(type=AccEvaluator))
cmnli_datasets = [ cmnli_datasets = [
dict( dict(
abbr='cmnli', abbr='cmnli',
type=cmnliDataset, type=CMNLIDataset,
path='./data/CLUE/cmnli/cmnli_public/dev.json', path='opencompass/cmnli-dev',
reader_cfg=cmnli_reader_cfg, reader_cfg=cmnli_reader_cfg,
infer_cfg=cmnli_infer_cfg, infer_cfg=cmnli_infer_cfg,
eval_cfg=cmnli_eval_cfg) eval_cfg=cmnli_eval_cfg)

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import PPLInferencer from opencompass.openicl.icl_inferencer import PPLInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import cmnliDataset from opencompass.datasets import CMNLIDataset
cmnli_reader_cfg = dict( cmnli_reader_cfg = dict(
input_columns=['sentence1', 'sentence2'], input_columns=['sentence1', 'sentence2'],
@ -46,8 +46,8 @@ cmnli_eval_cfg = dict(evaluator=dict(type=AccEvaluator))
cmnli_datasets = [ cmnli_datasets = [
dict( dict(
abbr='cmnli', abbr='cmnli',
type=cmnliDataset, type=CMNLIDataset,
path='./data/CLUE/cmnli/cmnli_public/dev.json', path='opencompass/cmnli-dev',
reader_cfg=cmnli_reader_cfg, reader_cfg=cmnli_reader_cfg,
infer_cfg=cmnli_infer_cfg, infer_cfg=cmnli_infer_cfg,
eval_cfg=cmnli_eval_cfg) eval_cfg=cmnli_eval_cfg)

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import cmnliDataset_V2 from opencompass.datasets import CMNLIDatasetV2
from opencompass.utils.text_postprocessors import first_capital_postprocess from opencompass.utils.text_postprocessors import first_capital_postprocess
ocnli_reader_cfg = dict( ocnli_reader_cfg = dict(
@ -35,8 +35,8 @@ ocnli_eval_cfg = dict(
ocnli_datasets = [ ocnli_datasets = [
dict( dict(
abbr='ocnli', abbr='ocnli',
type=cmnliDataset_V2, # ocnli share the same format with cmnli type=CMNLIDatasetV2, # ocnli share the same format with cmnli
path='./data/CLUE/OCNLI/dev.json', path='opencompass/OCNLI-dev',
reader_cfg=ocnli_reader_cfg, reader_cfg=ocnli_reader_cfg,
infer_cfg=ocnli_infer_cfg, infer_cfg=ocnli_infer_cfg,
eval_cfg=ocnli_eval_cfg, eval_cfg=ocnli_eval_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import cmnliDataset_V2 from opencompass.datasets import CMNLIDatasetV2
from opencompass.utils.text_postprocessors import first_capital_postprocess from opencompass.utils.text_postprocessors import first_capital_postprocess
ocnli_reader_cfg = dict( ocnli_reader_cfg = dict(
@ -35,8 +35,8 @@ ocnli_eval_cfg = dict(
ocnli_datasets = [ ocnli_datasets = [
dict( dict(
abbr='ocnli', abbr='ocnli',
type=cmnliDataset_V2, # ocnli share the same format with cmnli type=CMNLIDatasetV2, # ocnli share the same format with cmnli
path='./data/CLUE/OCNLI/dev.json', path='opencompass/OCNLI-dev',
reader_cfg=ocnli_reader_cfg, reader_cfg=ocnli_reader_cfg,
infer_cfg=ocnli_infer_cfg, infer_cfg=ocnli_infer_cfg,
eval_cfg=ocnli_eval_cfg, eval_cfg=ocnli_eval_cfg,

View File

@ -67,7 +67,7 @@ for _name in chembench_all_sets:
dict( dict(
abbr=f'ChemBench_{_name}', abbr=f'ChemBench_{_name}',
type=ChemBenchDataset, type=ChemBenchDataset,
path='./data/ChemBench/', path='opencompass/ChemBench',
name=_name, name=_name,
reader_cfg=chembench_reader_cfg, reader_cfg=chembench_reader_cfg,
infer_cfg=chembench_infer_cfg, infer_cfg=chembench_infer_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import AFQMCDataset_V2 from opencompass.datasets import AFQMCDatasetV2
from opencompass.utils.text_postprocessors import first_capital_postprocess from opencompass.utils.text_postprocessors import first_capital_postprocess
bustm_reader_cfg = dict( bustm_reader_cfg = dict(
@ -34,16 +34,18 @@ bustm_eval_cfg = dict(
bustm_datasets = [ bustm_datasets = [
dict( dict(
abbr='bustm-dev', abbr='bustm-dev',
type=AFQMCDataset_V2, # bustm share the same format with AFQMC type=AFQMCDatasetV2, # bustm share the same format with AFQMC
path='./data/FewCLUE/bustm/dev_few_all.json', path='./data/FewCLUE/bustm/dev_few_all.json',
local_mode=True,
reader_cfg=bustm_reader_cfg, reader_cfg=bustm_reader_cfg,
infer_cfg=bustm_infer_cfg, infer_cfg=bustm_infer_cfg,
eval_cfg=bustm_eval_cfg, eval_cfg=bustm_eval_cfg,
), ),
dict( dict(
abbr='bustm-test', abbr='bustm-test',
type=AFQMCDataset_V2, # bustm share the same format with AFQMC type=AFQMCDatasetV2, # bustm share the same format with AFQMC
path='./data/FewCLUE/bustm/test_public.json', path='./data/FewCLUE/bustm/test_public.json',
local_mode=True,
reader_cfg=bustm_reader_cfg, reader_cfg=bustm_reader_cfg,
infer_cfg=bustm_infer_cfg, infer_cfg=bustm_infer_cfg,
eval_cfg=bustm_eval_cfg, eval_cfg=bustm_eval_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import CHIDDataset_V2 from opencompass.datasets import CHIDDatasetV2
from opencompass.utils.text_postprocessors import first_capital_postprocess from opencompass.utils.text_postprocessors import first_capital_postprocess
chid_reader_cfg = dict( chid_reader_cfg = dict(
@ -34,7 +34,7 @@ chid_eval_cfg = dict(
chid_datasets = [ chid_datasets = [
dict( dict(
abbr='chid-dev', abbr='chid-dev',
type=CHIDDataset_V2, type=CHIDDatasetV2,
path='./data/FewCLUE/chid/dev_few_all.json', path='./data/FewCLUE/chid/dev_few_all.json',
reader_cfg=chid_reader_cfg, reader_cfg=chid_reader_cfg,
infer_cfg=chid_infer_cfg, infer_cfg=chid_infer_cfg,
@ -42,7 +42,7 @@ chid_datasets = [
), ),
dict( dict(
abbr='chid-test', abbr='chid-test',
type=CHIDDataset_V2, type=CHIDDatasetV2,
path='./data/FewCLUE/chid/test_public.json', path='./data/FewCLUE/chid/test_public.json',
reader_cfg=chid_reader_cfg, reader_cfg=chid_reader_cfg,
infer_cfg=chid_infer_cfg, infer_cfg=chid_infer_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import CluewscDataset_V2 from opencompass.datasets import CluewscDatasetV2
from opencompass.utils.text_postprocessors import first_capital_postprocess from opencompass.utils.text_postprocessors import first_capital_postprocess
cluewsc_reader_cfg = dict( cluewsc_reader_cfg = dict(
@ -34,7 +34,7 @@ cluewsc_eval_cfg = dict(
cluewsc_datasets = [ cluewsc_datasets = [
dict( dict(
abbr='cluewsc-dev', abbr='cluewsc-dev',
type=CluewscDataset_V2, type=CluewscDatasetV2,
path='./data/FewCLUE/cluewsc/dev_few_all.json', path='./data/FewCLUE/cluewsc/dev_few_all.json',
reader_cfg=cluewsc_reader_cfg, reader_cfg=cluewsc_reader_cfg,
infer_cfg=cluewsc_infer_cfg, infer_cfg=cluewsc_infer_cfg,
@ -42,7 +42,7 @@ cluewsc_datasets = [
), ),
dict( dict(
abbr='cluewsc-test', abbr='cluewsc-test',
type=CluewscDataset_V2, type=CluewscDatasetV2,
path='./data/FewCLUE/cluewsc/test_public.json', path='./data/FewCLUE/cluewsc/test_public.json',
reader_cfg=cluewsc_reader_cfg, reader_cfg=cluewsc_reader_cfg,
infer_cfg=cluewsc_infer_cfg, infer_cfg=cluewsc_infer_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import CslDataset_V2 from opencompass.datasets import CslDatasetV2
from opencompass.utils.text_postprocessors import first_capital_postprocess from opencompass.utils.text_postprocessors import first_capital_postprocess
csl_reader_cfg = dict( csl_reader_cfg = dict(
@ -34,7 +34,7 @@ csl_eval_cfg = dict(
csl_datasets = [ csl_datasets = [
dict( dict(
abbr='csl_dev', abbr='csl_dev',
type=CslDataset_V2, type=CslDatasetV2,
path='./data/FewCLUE/csl/dev_few_all.json', path='./data/FewCLUE/csl/dev_few_all.json',
reader_cfg=csl_reader_cfg, reader_cfg=csl_reader_cfg,
infer_cfg=csl_infer_cfg, infer_cfg=csl_infer_cfg,
@ -42,7 +42,7 @@ csl_datasets = [
), ),
dict( dict(
abbr='csl_test', abbr='csl_test',
type=CslDataset_V2, type=CslDatasetV2,
path='./data/FewCLUE/csl/test_public.json', path='./data/FewCLUE/csl/test_public.json',
reader_cfg=csl_reader_cfg, reader_cfg=csl_reader_cfg,
infer_cfg=csl_infer_cfg, infer_cfg=csl_infer_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import CslDataset_V2 from opencompass.datasets import CslDatasetV2
from opencompass.utils.text_postprocessors import first_capital_postprocess from opencompass.utils.text_postprocessors import first_capital_postprocess
csl_reader_cfg = dict( csl_reader_cfg = dict(
@ -34,7 +34,7 @@ csl_eval_cfg = dict(
csl_datasets = [ csl_datasets = [
dict( dict(
abbr='csl_dev', abbr='csl_dev',
type=CslDataset_V2, type=CslDatasetV2,
path='./data/FewCLUE/csl/dev_few_all.json', path='./data/FewCLUE/csl/dev_few_all.json',
reader_cfg=csl_reader_cfg, reader_cfg=csl_reader_cfg,
infer_cfg=csl_infer_cfg, infer_cfg=csl_infer_cfg,
@ -42,7 +42,7 @@ csl_datasets = [
), ),
dict( dict(
abbr='csl_test', abbr='csl_test',
type=CslDataset_V2, type=CslDatasetV2,
path='./data/FewCLUE/csl/test_public.json', path='./data/FewCLUE/csl/test_public.json',
reader_cfg=csl_reader_cfg, reader_cfg=csl_reader_cfg,
infer_cfg=csl_infer_cfg, infer_cfg=csl_infer_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import eprstmtDataset_V2 from opencompass.datasets import EprstmtDatasetV2
from opencompass.utils.text_postprocessors import first_capital_postprocess from opencompass.utils.text_postprocessors import first_capital_postprocess
eprstmt_reader_cfg = dict( eprstmt_reader_cfg = dict(
@ -32,7 +32,7 @@ eprstmt_eval_cfg = dict(
eprstmt_datasets = [ eprstmt_datasets = [
dict( dict(
abbr='eprstmt-dev', abbr='eprstmt-dev',
type=eprstmtDataset_V2, type=EprstmtDatasetV2,
path='./data/FewCLUE/eprstmt/dev_few_all.json', path='./data/FewCLUE/eprstmt/dev_few_all.json',
reader_cfg=eprstmt_reader_cfg, reader_cfg=eprstmt_reader_cfg,
infer_cfg=eprstmt_infer_cfg, infer_cfg=eprstmt_infer_cfg,
@ -40,7 +40,7 @@ eprstmt_datasets = [
), ),
dict( dict(
abbr='eprstmt-test', abbr='eprstmt-test',
type=eprstmtDataset_V2, type=EprstmtDatasetV2,
path='./data/FewCLUE/eprstmt/test_public.json', path='./data/FewCLUE/eprstmt/test_public.json',
reader_cfg=eprstmt_reader_cfg, reader_cfg=eprstmt_reader_cfg,
infer_cfg=eprstmt_infer_cfg, infer_cfg=eprstmt_infer_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import cmnliDataset_V2 from opencompass.datasets import CMNLIDatasetV2
from opencompass.utils.text_postprocessors import first_capital_postprocess from opencompass.utils.text_postprocessors import first_capital_postprocess
ocnli_fc_reader_cfg = dict( ocnli_fc_reader_cfg = dict(
@ -33,16 +33,18 @@ ocnli_fc_eval_cfg = dict(
ocnli_fc_datasets = [ ocnli_fc_datasets = [
dict( dict(
abbr='ocnli_fc-dev', abbr='ocnli_fc-dev',
type=cmnliDataset_V2, # ocnli_fc share the same format with cmnli type=CMNLIDatasetV2, # ocnli_fc share the same format with cmnli
path='./data/FewCLUE/ocnli/dev_few_all.json', path='./data/FewCLUE/ocnli/dev_few_all.json',
local_mode=True,
reader_cfg=ocnli_fc_reader_cfg, reader_cfg=ocnli_fc_reader_cfg,
infer_cfg=ocnli_fc_infer_cfg, infer_cfg=ocnli_fc_infer_cfg,
eval_cfg=ocnli_fc_eval_cfg, eval_cfg=ocnli_fc_eval_cfg,
), ),
dict( dict(
abbr='ocnli_fc-test', abbr='ocnli_fc-test',
type=cmnliDataset_V2, # ocnli_fc share the same format with cmnli type=CMNLIDatasetV2, # ocnli_fc share the same format with cmnli
path='./data/FewCLUE/ocnli/test_public.json', path='./data/FewCLUE/ocnli/test_public.json',
local_mode=True,
reader_cfg=ocnli_fc_reader_cfg, reader_cfg=ocnli_fc_reader_cfg,
infer_cfg=ocnli_fc_infer_cfg, infer_cfg=ocnli_fc_infer_cfg,
eval_cfg=ocnli_fc_eval_cfg, eval_cfg=ocnli_fc_eval_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import TNewsDataset_V2 from opencompass.datasets import TNewsDatasetV2
from opencompass.utils.text_postprocessors import first_capital_postprocess from opencompass.utils.text_postprocessors import first_capital_postprocess
tnews_reader_cfg = dict( tnews_reader_cfg = dict(
@ -56,7 +56,7 @@ tnews_eval_cfg = dict(
tnews_datasets = [ tnews_datasets = [
dict( dict(
abbr='tnews-dev', abbr='tnews-dev',
type=TNewsDataset_V2, type=TNewsDatasetV2,
path='./data/FewCLUE/tnews/dev_few_all.json', path='./data/FewCLUE/tnews/dev_few_all.json',
reader_cfg=tnews_reader_cfg, reader_cfg=tnews_reader_cfg,
infer_cfg=tnews_infer_cfg, infer_cfg=tnews_infer_cfg,
@ -64,7 +64,7 @@ tnews_datasets = [
), ),
dict( dict(
abbr='tnews-test', abbr='tnews-test',
type=TNewsDataset_V2, type=TNewsDatasetV2,
path='./data/FewCLUE/tnews/test_public.json', path='./data/FewCLUE/tnews/test_public.json',
reader_cfg=tnews_reader_cfg, reader_cfg=tnews_reader_cfg,
infer_cfg=tnews_infer_cfg, infer_cfg=tnews_infer_cfg,

View File

@ -3,6 +3,7 @@ from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.datasets import GaokaoBenchDataset from opencompass.datasets import GaokaoBenchDataset
_MCQ_prompts = [ _MCQ_prompts = [
{ {
'type': 'single_choice', 'type': 'single_choice',
@ -288,6 +289,7 @@ for _folder, _prompts in [
'type': GaokaoBenchDataset, 'type': GaokaoBenchDataset,
'abbr': 'GaokaoBench_' + _p['keyword'], 'abbr': 'GaokaoBench_' + _p['keyword'],
'path': _base_path + '/' + _folder + '/' + _p['keyword'] + '.json', 'path': _base_path + '/' + _folder + '/' + _p['keyword'] + '.json',
'name': _p['keyword'],
'reader_cfg': _reader_cfg, 'reader_cfg': _reader_cfg,
'infer_cfg': _infer_cfg, 'infer_cfg': _infer_cfg,
'eval_cfg': _eval_cfg, 'eval_cfg': _eval_cfg,

View File

@ -2,7 +2,6 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer, PPLInferencer from opencompass.openicl.icl_inferencer import GenInferencer, PPLInferencer
from opencompass.datasets import GaokaoBenchDataset from opencompass.datasets import GaokaoBenchDataset
_MCQ_prompts = [ _MCQ_prompts = [
{ {
'type': 'single_choice', 'type': 'single_choice',
@ -290,6 +289,7 @@ for _folder, _prompts in [
'type': GaokaoBenchDataset, 'type': GaokaoBenchDataset,
'abbr': 'GaokaoBench_' + _p['keyword'], 'abbr': 'GaokaoBench_' + _p['keyword'],
'path': _base_path + '/' + _folder + '/' + _p['keyword'] + '.json', 'path': _base_path + '/' + _folder + '/' + _p['keyword'] + '.json',
'name': _p['keyword'],
'reader_cfg': _reader_cfg, 'reader_cfg': _reader_cfg,
'infer_cfg': _infer_cfg, 'infer_cfg': _infer_cfg,
'eval_cfg': _eval_cfg, 'eval_cfg': _eval_cfg,
@ -340,6 +340,7 @@ for _p in _MCQ_prompts:
'type': GaokaoBenchDataset, 'type': GaokaoBenchDataset,
'abbr': 'GaokaoBench_' + _p['keyword'], 'abbr': 'GaokaoBench_' + _p['keyword'],
'path': _base_path + '/' + _folder + '/' + _p['keyword'] + '.json', 'path': _base_path + '/' + _folder + '/' + _p['keyword'] + '.json',
'name': _p['keyword'],
'reader_cfg': _reader_cfg, 'reader_cfg': _reader_cfg,
'infer_cfg': _infer_cfg, 'infer_cfg': _infer_cfg,
'eval_cfg': _eval_cfg, 'eval_cfg': _eval_cfg,

View File

@ -35,6 +35,7 @@ for folder, prompts in [
'type': GaokaoBenchDataset, 'type': GaokaoBenchDataset,
'abbr': 'GaokaoBench_' + p['keyword'], 'abbr': 'GaokaoBench_' + p['keyword'],
'path': os.path.join('data', 'GAOKAO-BENCH', 'data', folder, p['keyword'] + '.json'), 'path': os.path.join('data', 'GAOKAO-BENCH', 'data', folder, p['keyword'] + '.json'),
'name': p['keyword'],
'reader_cfg': reader_cfg, 'reader_cfg': reader_cfg,
'infer_cfg': infer_cfg, 'infer_cfg': infer_cfg,
'eval_cfg': eval_cfg, 'eval_cfg': eval_cfg,

View File

@ -34,6 +34,7 @@ for folder, prompts in [
'type': GaokaoBenchDataset, 'type': GaokaoBenchDataset,
'abbr': 'GaokaoBench_' + p['keyword'], 'abbr': 'GaokaoBench_' + p['keyword'],
'path': os.path.join('data', 'GAOKAO-BENCH', 'data', folder, p['keyword'] + '.json'), 'path': os.path.join('data', 'GAOKAO-BENCH', 'data', folder, p['keyword'] + '.json'),
'name': p['keyword'],
'reader_cfg': reader_cfg, 'reader_cfg': reader_cfg,
'infer_cfg': infer_cfg, 'infer_cfg': infer_cfg,
'eval_cfg': eval_cfg, 'eval_cfg': eval_cfg,

View File

@ -2,27 +2,27 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.datasets.NPHardEval import ( from opencompass.datasets.NPHardEval import (
hard_GCP_Dataset, hard_GCP_Evaluator, HardGCPDataset, HardGCPEvaluator,
hard_TSP_Dataset, hard_TSP_Evaluator, Hard_TSP_Dataset, Hard_TSP_Evaluator,
hard_MSP_Dataset, hard_MSP_Evaluator, Hard_MSP_Dataset, Hard_MSP_Evaluator,
cmp_GCP_D_Dataset, cmp_GCP_D_Evaluator, CMP_GCP_D_Dataset, CMP_GCP_D_Evaluator,
cmp_TSP_D_Dataset, cmp_TSP_D_Evaluator, CMP_TSP_D_Dataset, CMP_TSP_D_Evaluator,
cmp_KSP_Dataset, cmp_KSP_Evaluator, CMP_KSP_Dataset, CMP_KSP_Evaluator,
p_BSP_Dataset, p_BSP_Evaluator, P_BSP_Dataset, P_BSP_Evaluator,
p_EDP_Dataset, p_EDP_Evaluator, P_EDP_Dataset, P_EDP_Evaluator,
p_SPP_Dataset, p_SPP_Evaluator, P_SPP_Dataset, P_SPP_Evaluator,
) )
NPHardEval_tasks = [ NPHardEval_tasks = [
['hard_GCP', 'GCP', hard_GCP_Dataset, hard_GCP_Evaluator], ['hard_GCP', 'GCP', HardGCPDataset, HardGCPEvaluator],
['hard_TSP', 'TSP', hard_TSP_Dataset, hard_TSP_Evaluator], ['hard_TSP', 'TSP', Hard_TSP_Dataset, Hard_TSP_Evaluator],
['hard_MSP', 'MSP', hard_MSP_Dataset, hard_MSP_Evaluator], ['hard_MSP', 'MSP', Hard_MSP_Dataset, Hard_MSP_Evaluator],
['cmp_GCP_D', 'GCP_Decision', cmp_GCP_D_Dataset, cmp_GCP_D_Evaluator], ['cmp_GCP_D', 'GCP_Decision', CMP_GCP_D_Dataset, CMP_GCP_D_Evaluator],
['cmp_TSP_D', 'TSP_Decision', cmp_TSP_D_Dataset, cmp_TSP_D_Evaluator], ['cmp_TSP_D', 'TSP_Decision', CMP_TSP_D_Dataset, CMP_TSP_D_Evaluator],
['cmp_KSP', 'KSP', cmp_KSP_Dataset, cmp_KSP_Evaluator], ['cmp_KSP', 'KSP', CMP_KSP_Dataset, CMP_KSP_Evaluator],
['p_BSP', 'BSP', p_BSP_Dataset, p_BSP_Evaluator], ['p_BSP', 'BSP', P_BSP_Dataset, P_BSP_Evaluator],
['p_EDP', 'EDP', p_EDP_Dataset, p_EDP_Evaluator], ['p_EDP', 'EDP', P_EDP_Dataset, P_EDP_Evaluator],
['p_SPP', 'SPP', p_SPP_Dataset, p_SPP_Evaluator], ['p_SPP', 'SPP', P_SPP_Dataset, P_SPP_Evaluator],
] ]
NPHardEval_datasets = [] NPHardEval_datasets = []

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import AXDataset_V2 from opencompass.datasets import AXDatasetV2
from opencompass.utils.text_postprocessors import first_option_postprocess from opencompass.utils.text_postprocessors import first_option_postprocess
AX_b_reader_cfg = dict( AX_b_reader_cfg = dict(
@ -34,7 +34,7 @@ AX_b_eval_cfg = dict(
AX_b_datasets = [ AX_b_datasets = [
dict( dict(
abbr='AX_b', abbr='AX_b',
type=AXDataset_V2, type=AXDatasetV2,
path='./data/SuperGLUE/AX-b/AX-b.jsonl', path='./data/SuperGLUE/AX-b/AX-b.jsonl',
reader_cfg=AX_b_reader_cfg, reader_cfg=AX_b_reader_cfg,
infer_cfg=AX_b_infer_cfg, infer_cfg=AX_b_infer_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import AXDataset_V2 from opencompass.datasets import AXDatasetV2
from opencompass.utils.text_postprocessors import first_option_postprocess from opencompass.utils.text_postprocessors import first_option_postprocess
AX_g_reader_cfg = dict( AX_g_reader_cfg = dict(
@ -34,7 +34,7 @@ AX_g_eval_cfg = dict(
AX_g_datasets = [ AX_g_datasets = [
dict( dict(
abbr='AX_g', abbr='AX_g',
type=AXDataset_V2, type=AXDatasetV2,
path='./data/SuperGLUE/AX-g/AX-g.jsonl', path='./data/SuperGLUE/AX-g/AX-g.jsonl',
reader_cfg=AX_g_reader_cfg, reader_cfg=AX_g_reader_cfg,
infer_cfg=AX_g_infer_cfg, infer_cfg=AX_g_infer_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import BoolQDataset_V2 from opencompass.datasets import BoolQDatasetV2
from opencompass.utils.text_postprocessors import first_capital_postprocess from opencompass.utils.text_postprocessors import first_capital_postprocess
BoolQ_reader_cfg = dict( BoolQ_reader_cfg = dict(
@ -32,7 +32,7 @@ BoolQ_eval_cfg = dict(
BoolQ_datasets = [ BoolQ_datasets = [
dict( dict(
abbr='BoolQ', abbr='BoolQ',
type=BoolQDataset_V2, type=BoolQDatasetV2,
path='./data/SuperGLUE/BoolQ/val.jsonl', path='./data/SuperGLUE/BoolQ/val.jsonl',
reader_cfg=BoolQ_reader_cfg, reader_cfg=BoolQ_reader_cfg,
infer_cfg=BoolQ_infer_cfg, infer_cfg=BoolQ_infer_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import PPLInferencer from opencompass.openicl.icl_inferencer import PPLInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import BoolQDataset_V3 from opencompass.datasets import BoolQDatasetV3
BoolQ_reader_cfg = dict( BoolQ_reader_cfg = dict(
input_columns=['question', 'passage'], input_columns=['question', 'passage'],
@ -34,7 +34,7 @@ BoolQ_eval_cfg = dict(evaluator=dict(type=AccEvaluator))
BoolQ_datasets = [ BoolQ_datasets = [
dict( dict(
abbr='BoolQ', abbr='BoolQ',
type=BoolQDataset_V3, type=BoolQDatasetV3,
path='./data/SuperGLUE/BoolQ/val.jsonl', path='./data/SuperGLUE/BoolQ/val.jsonl',
reader_cfg=BoolQ_reader_cfg, reader_cfg=BoolQ_reader_cfg,
infer_cfg=BoolQ_infer_cfg, infer_cfg=BoolQ_infer_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import CBDataset_V2 from opencompass.datasets import CBDatasetV2
from opencompass.utils.text_postprocessors import first_option_postprocess from opencompass.utils.text_postprocessors import first_option_postprocess
CB_reader_cfg = dict( CB_reader_cfg = dict(
@ -35,7 +35,7 @@ CB_eval_cfg = dict(
CB_datasets = [ CB_datasets = [
dict( dict(
abbr='CB', abbr='CB',
type=CBDataset_V2, type=CBDatasetV2,
path='./data/SuperGLUE/CB/val.jsonl', path='./data/SuperGLUE/CB/val.jsonl',
reader_cfg=CB_reader_cfg, reader_cfg=CB_reader_cfg,
infer_cfg=CB_infer_cfg, infer_cfg=CB_infer_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import COPADataset_V2 from opencompass.datasets import COPADatasetV2
from opencompass.utils.text_postprocessors import first_option_postprocess from opencompass.utils.text_postprocessors import first_option_postprocess
COPA_reader_cfg = dict( COPA_reader_cfg = dict(
@ -35,7 +35,7 @@ COPA_eval_cfg = dict(
COPA_datasets = [ COPA_datasets = [
dict( dict(
abbr='COPA', abbr='COPA',
type=COPADataset_V2, type=COPADatasetV2,
path='./data/SuperGLUE/COPA/val.jsonl', path='./data/SuperGLUE/COPA/val.jsonl',
reader_cfg=COPA_reader_cfg, reader_cfg=COPA_reader_cfg,
infer_cfg=COPA_infer_cfg, infer_cfg=COPA_infer_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import MultiRCDataset_V2 from opencompass.datasets import MultiRCDatasetV2
from opencompass.utils.text_postprocessors import first_option_postprocess from opencompass.utils.text_postprocessors import first_option_postprocess
MultiRC_reader_cfg = dict( MultiRC_reader_cfg = dict(
@ -34,7 +34,7 @@ MultiRC_eval_cfg = dict(
MultiRC_datasets = [ MultiRC_datasets = [
dict( dict(
abbr='MultiRC', abbr='MultiRC',
type=MultiRCDataset_V2, type=MultiRCDatasetV2,
path='./data/SuperGLUE/MultiRC/val.jsonl', path='./data/SuperGLUE/MultiRC/val.jsonl',
reader_cfg=MultiRC_reader_cfg, reader_cfg=MultiRC_reader_cfg,
infer_cfg=MultiRC_infer_cfg, infer_cfg=MultiRC_infer_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import AXDataset_V2 from opencompass.datasets import AXDatasetV2
from opencompass.utils.text_postprocessors import first_option_postprocess from opencompass.utils.text_postprocessors import first_option_postprocess
RTE_reader_cfg = dict( RTE_reader_cfg = dict(
@ -34,7 +34,7 @@ RTE_eval_cfg = dict(
RTE_datasets = [ RTE_datasets = [
dict( dict(
abbr='RTE', abbr='RTE',
type=AXDataset_V2, # rte share the same format with ax type=AXDatasetV2, # rte share the same format with ax
path='./data/SuperGLUE/RTE/val.jsonl', path='./data/SuperGLUE/RTE/val.jsonl',
reader_cfg=RTE_reader_cfg, reader_cfg=RTE_reader_cfg,
infer_cfg=RTE_infer_cfg, infer_cfg=RTE_infer_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import EMEvaluator from opencompass.openicl.icl_evaluator import EMEvaluator
from opencompass.datasets import ReCoRDDataset_V2, ReCoRD_postprocess from opencompass.datasets import ReCoRDDatasetV2, ReCoRD_postprocess
ReCoRD_reader_cfg = dict( ReCoRD_reader_cfg = dict(
input_columns=['question', 'text'], output_column='answers') input_columns=['question', 'text'], output_column='answers')
@ -26,7 +26,7 @@ ReCoRD_eval_cfg = dict(
ReCoRD_datasets = [ ReCoRD_datasets = [
dict( dict(
type=ReCoRDDataset_V2, type=ReCoRDDatasetV2,
abbr='ReCoRD', abbr='ReCoRD',
path='./data/SuperGLUE/ReCoRD/val.jsonl', path='./data/SuperGLUE/ReCoRD/val.jsonl',
reader_cfg=ReCoRD_reader_cfg, reader_cfg=ReCoRD_reader_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import WSCDataset_V2 from opencompass.datasets import WSCDatasetV2
from opencompass.utils.text_postprocessors import first_capital_postprocess from opencompass.utils.text_postprocessors import first_capital_postprocess
WSC_reader_cfg = dict( WSC_reader_cfg = dict(
@ -34,7 +34,7 @@ WSC_eval_cfg = dict(
WSC_datasets = [ WSC_datasets = [
dict( dict(
abbr='WSC', abbr='WSC',
type=WSCDataset_V2, type=WSCDatasetV2,
path='./data/SuperGLUE/WSC/val.jsonl', path='./data/SuperGLUE/WSC/val.jsonl',
reader_cfg=WSC_reader_cfg, reader_cfg=WSC_reader_cfg,
infer_cfg=WSC_infer_cfg, infer_cfg=WSC_infer_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import WSCDataset_V3 from opencompass.datasets import WSCDatasetV3
from opencompass.utils.text_postprocessors import first_capital_postprocess from opencompass.utils.text_postprocessors import first_capital_postprocess
WSC_reader_cfg = dict( WSC_reader_cfg = dict(
@ -34,7 +34,7 @@ WSC_eval_cfg = dict(
WSC_datasets = [ WSC_datasets = [
dict( dict(
abbr='WSC', abbr='WSC',
type=WSCDataset_V3, type=WSCDatasetV3,
path='./data/SuperGLUE/WSC/val.jsonl', path='./data/SuperGLUE/WSC/val.jsonl',
reader_cfg=WSC_reader_cfg, reader_cfg=WSC_reader_cfg,
infer_cfg=WSC_infer_cfg, infer_cfg=WSC_infer_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import PPLInferencer from opencompass.openicl.icl_inferencer import PPLInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import WSCDataset_V3 from opencompass.datasets import WSCDatasetV3
WSC_reader_cfg = dict( WSC_reader_cfg = dict(
input_columns=['span1', 'span2', 'text'], input_columns=['span1', 'span2', 'text'],
@ -40,7 +40,7 @@ WSC_eval_cfg = dict(evaluator=dict(type=AccEvaluator), )
WSC_datasets = [ WSC_datasets = [
dict( dict(
abbr='WSC', abbr='WSC',
type=WSCDataset_V3, type=WSCDatasetV3,
path='./data/SuperGLUE/WSC/val.jsonl', path='./data/SuperGLUE/WSC/val.jsonl',
reader_cfg=WSC_reader_cfg, reader_cfg=WSC_reader_cfg,
infer_cfg=WSC_infer_cfg, infer_cfg=WSC_infer_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import PPLInferencer from opencompass.openicl.icl_inferencer import PPLInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import WSCDataset_V2 from opencompass.datasets import WSCDatasetV2
WSC_reader_cfg = dict( WSC_reader_cfg = dict(
input_columns=['span1', 'span2', 'text'], input_columns=['span1', 'span2', 'text'],
@ -42,7 +42,7 @@ WSC_eval_cfg = dict(evaluator=dict(type=AccEvaluator), )
WSC_datasets = [ WSC_datasets = [
dict( dict(
abbr='WSC', abbr='WSC',
type=WSCDataset_V2, type=WSCDatasetV2,
path='./data/SuperGLUE/WSC/val.jsonl', path='./data/SuperGLUE/WSC/val.jsonl',
reader_cfg=WSC_reader_cfg, reader_cfg=WSC_reader_cfg,
infer_cfg=WSC_infer_cfg, infer_cfg=WSC_infer_cfg,

View File

@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import WiCDataset_V2 from opencompass.datasets import WiCDatasetV2
from opencompass.utils.text_postprocessors import first_capital_postprocess from opencompass.utils.text_postprocessors import first_capital_postprocess
WiC_reader_cfg = dict( WiC_reader_cfg = dict(
@ -38,7 +38,7 @@ WiC_eval_cfg = dict(
WiC_datasets = [ WiC_datasets = [
dict( dict(
abbr='WiC', abbr='WiC',
type=WiCDataset_V2, type=WiCDatasetV2,
path='./data/SuperGLUE/WiC/val.jsonl', path='./data/SuperGLUE/WiC/val.jsonl',
reader_cfg=WiC_reader_cfg, reader_cfg=WiC_reader_cfg,
infer_cfg=WiC_infer_cfg, infer_cfg=WiC_infer_cfg,

View File

@ -31,7 +31,7 @@ Xsum_datasets = [
dict( dict(
type=XsumDataset, type=XsumDataset,
abbr='Xsum', abbr='Xsum',
path='./data/Xsum/dev.jsonl', path='opencompass/xsum',
reader_cfg=Xsum_reader_cfg, reader_cfg=Xsum_reader_cfg,
infer_cfg=Xsum_infer_cfg, infer_cfg=Xsum_infer_cfg,
eval_cfg=Xsum_eval_cfg, eval_cfg=Xsum_eval_cfg,

View File

@ -23,7 +23,7 @@ Xsum_datasets = [
dict( dict(
type=XsumDataset, type=XsumDataset,
abbr='Xsum', abbr='Xsum',
path='./data/Xsum/dev.jsonl', path='opencompass/xsum',
reader_cfg=Xsum_reader_cfg, reader_cfg=Xsum_reader_cfg,
infer_cfg=Xsum_infer_cfg, infer_cfg=Xsum_infer_cfg,
eval_cfg=Xsum_eval_cfg) eval_cfg=Xsum_eval_cfg)

View File

@ -34,7 +34,7 @@ adv_mnli_datasets = [
dict( dict(
abbr='adv_mnli', abbr='adv_mnli',
type=AdvMnliDataset, type=AdvMnliDataset,
path='./data/adv_glue/dev_ann.json', path='opencompass/advglue-dev',
reader_cfg=adv_mnli_reader_cfg, reader_cfg=adv_mnli_reader_cfg,
infer_cfg=adv_mnli_infer_cfg, infer_cfg=adv_mnli_infer_cfg,
eval_cfg=adv_mnli_eval_cfg, eval_cfg=adv_mnli_eval_cfg,

View File

@ -34,7 +34,7 @@ adv_mnli_mm_datasets = [
dict( dict(
abbr='adv_mnli_mm', abbr='adv_mnli_mm',
type=AdvMnliMMDataset, type=AdvMnliMMDataset,
path='./data/adv_glue/dev_ann.json', path='opencompass/advglue-dev',
reader_cfg=adv_mnli_mm_reader_cfg, reader_cfg=adv_mnli_mm_reader_cfg,
infer_cfg=adv_mnli_mm_infer_cfg, infer_cfg=adv_mnli_mm_infer_cfg,
eval_cfg=adv_mnli_mm_eval_cfg, eval_cfg=adv_mnli_mm_eval_cfg,

View File

@ -34,7 +34,7 @@ adv_qnli_datasets = [
dict( dict(
abbr='adv_qnli', abbr='adv_qnli',
type=AdvQnliDataset, type=AdvQnliDataset,
path='./data/adv_glue/dev_ann.json', path='opencompass/advglue-dev',
reader_cfg=adv_qnli_reader_cfg, reader_cfg=adv_qnli_reader_cfg,
infer_cfg=adv_qnli_infer_cfg, infer_cfg=adv_qnli_infer_cfg,
eval_cfg=adv_qnli_eval_cfg, eval_cfg=adv_qnli_eval_cfg,

View File

@ -34,7 +34,7 @@ adv_qqp_datasets = [
dict( dict(
abbr='adv_qqp', abbr='adv_qqp',
type=AdvQqpDataset, type=AdvQqpDataset,
path='./data/adv_glue/dev_ann.json', path='opencompass/advglue-dev',
reader_cfg=adv_qqp_reader_cfg, reader_cfg=adv_qqp_reader_cfg,
infer_cfg=adv_qqp_infer_cfg, infer_cfg=adv_qqp_infer_cfg,
eval_cfg=adv_qqp_eval_cfg, eval_cfg=adv_qqp_eval_cfg,

View File

@ -34,7 +34,7 @@ adv_rte_datasets = [
dict( dict(
abbr='adv_rte', abbr='adv_rte',
type=AdvRteDataset, type=AdvRteDataset,
path='./data/adv_glue/dev_ann.json', path='opencompass/advglue-dev',
reader_cfg=adv_rte_reader_cfg, reader_cfg=adv_rte_reader_cfg,
infer_cfg=adv_rte_infer_cfg, infer_cfg=adv_rte_infer_cfg,
eval_cfg=adv_rte_eval_cfg, eval_cfg=adv_rte_eval_cfg,

View File

@ -33,7 +33,7 @@ adv_sst2_datasets = [
dict( dict(
abbr='adv_sst2', abbr='adv_sst2',
type=AdvSst2Dataset, type=AdvSst2Dataset,
path='./data/adv_glue/dev_ann.json', path='opencompass/advglue-dev',
reader_cfg=adv_sst2_reader_cfg, reader_cfg=adv_sst2_reader_cfg,
infer_cfg=adv_sst2_infer_cfg, infer_cfg=adv_sst2_infer_cfg,
eval_cfg=adv_sst2_eval_cfg, eval_cfg=adv_sst2_eval_cfg,

View File

@ -88,7 +88,7 @@ for _name in agieval_single_choice_sets:
agieval_datasets.append( agieval_datasets.append(
dict( dict(
type=AGIEvalDataset_v2, type=AGIEvalDataset_v2,
path='./data/AGIEval/data/v1/', path='opencompass/agieval',
name=_name, name=_name,
abbr='agieval-' + _name, abbr='agieval-' + _name,
setting_name='zero-shot', setting_name='zero-shot',
@ -117,7 +117,7 @@ for _name in agieval_multiple_choices_sets:
agieval_datasets.append( agieval_datasets.append(
dict( dict(
type=AGIEvalDataset_v2, type=AGIEvalDataset_v2,
path='./data/AGIEval/data/v1/', path='opencompass/agieval',
name=_name, name=_name,
abbr='agieval-' + _name, abbr='agieval-' + _name,
setting_name='zero-shot', setting_name='zero-shot',
@ -143,7 +143,7 @@ for _name in agieval_cloze_sets:
agieval_datasets.append( agieval_datasets.append(
dict( dict(
type=AGIEvalDataset_v2, type=AGIEvalDataset_v2,
path='./data/AGIEval/data/v1/', path='opencompass/agieval',
name=_name, name=_name,
abbr='agieval-' + _name, abbr='agieval-' + _name,
setting_name='zero-shot', setting_name='zero-shot',

View File

@ -92,7 +92,7 @@ for _name in agieval_single_choice_sets:
agieval_datasets.append( agieval_datasets.append(
dict( dict(
type=AGIEvalDataset_v2, type=AGIEvalDataset_v2,
path='./data/AGIEval/data/v1/', path='opencompass/agieval',
name=_name, name=_name,
abbr='agieval-' + _name, abbr='agieval-' + _name,
setting_name='zero-shot', setting_name='zero-shot',
@ -122,7 +122,7 @@ for _name in agieval_multiple_choices_sets:
agieval_datasets.append( agieval_datasets.append(
dict( dict(
type=AGIEvalDataset_v2, type=AGIEvalDataset_v2,
path='./data/AGIEval/data/v1/', path='opencompass/agieval',
name=_name, name=_name,
abbr='agieval-' + _name, abbr='agieval-' + _name,
setting_name='zero-shot', setting_name='zero-shot',
@ -148,7 +148,7 @@ for _name in agieval_cloze_sets:
agieval_datasets.append( agieval_datasets.append(
dict( dict(
type=AGIEvalDataset_v2, type=AGIEvalDataset_v2,
path='./data/AGIEval/data/v1/', path='opencompass/agieval',
name=_name, name=_name,
abbr='agieval-' + _name, abbr='agieval-' + _name,
setting_name='zero-shot', setting_name='zero-shot',

View File

@ -90,7 +90,7 @@ for _name in agieval_single_choice_sets:
agieval_datasets.append( agieval_datasets.append(
dict( dict(
type=AGIEvalDataset_v2, type=AGIEvalDataset_v2,
path='./data/AGIEval/data/v1/', path='opencompass/agieval',
name=_name, name=_name,
abbr='agieval-' + _name, abbr='agieval-' + _name,
setting_name='zero-shot', setting_name='zero-shot',
@ -120,7 +120,7 @@ for _name in agieval_multiple_choices_sets:
agieval_datasets.append( agieval_datasets.append(
dict( dict(
type=AGIEvalDataset_v2, type=AGIEvalDataset_v2,
path='./data/AGIEval/data/v1/', path='opencompass/agieval',
name=_name, name=_name,
abbr='agieval-' + _name, abbr='agieval-' + _name,
setting_name='zero-shot', setting_name='zero-shot',
@ -146,7 +146,7 @@ for _name in agieval_cloze_sets:
agieval_datasets.append( agieval_datasets.append(
dict( dict(
type=AGIEvalDataset_v2, type=AGIEvalDataset_v2,
path='./data/AGIEval/data/v1/', path='opencompass/agieval',
name=_name, name=_name,
abbr='agieval-' + _name, abbr='agieval-' + _name,
setting_name='zero-shot', setting_name='zero-shot',

View File

@ -50,7 +50,7 @@ for name in agieval_single_choice_sets:
agieval_datasets.append( agieval_datasets.append(
dict( dict(
type=AGIEvalDataset, type=AGIEvalDataset,
path='./data/AGIEval/data/v1/', path='opencompass/agieval',
name=name, name=name,
abbr='agieval-' + name, abbr='agieval-' + name,
setting_name='zero-shot', setting_name='zero-shot',
@ -74,7 +74,7 @@ for name in agieval_multiple_choices_sets + agieval_cloze_sets:
agieval_datasets.append( agieval_datasets.append(
dict( dict(
type=AGIEvalDataset, type=AGIEvalDataset,
path='./data/AGIEval/data/v1/', path='opencompass/agieval',
name=name, name=name,
abbr='agieval-' + name, abbr='agieval-' + name,
setting_name='zero-shot', setting_name='zero-shot',

View File

@ -93,7 +93,7 @@ for _name in agieval_single_choice_sets:
agieval_datasets.append( agieval_datasets.append(
dict( dict(
type=AGIEvalDataset_v2, type=AGIEvalDataset_v2,
path='./data/AGIEval/data/v1/', path='opencompass/agieval',
name=_name, name=_name,
abbr='agieval-' + _name, abbr='agieval-' + _name,
setting_name='zero-shot', setting_name='zero-shot',
@ -124,7 +124,7 @@ for _name in agieval_multiple_choices_sets:
agieval_datasets.append( agieval_datasets.append(
dict( dict(
type=AGIEvalDataset_v2, type=AGIEvalDataset_v2,
path='./data/AGIEval/data/v1/', path='opencompass/agieval',
name=_name, name=_name,
abbr='agieval-' + _name, abbr='agieval-' + _name,
setting_name='zero-shot', setting_name='zero-shot',
@ -151,7 +151,7 @@ for _name in agieval_cloze_sets:
agieval_datasets.append( agieval_datasets.append(
dict( dict(
type=AGIEvalDataset_v2, type=AGIEvalDataset_v2,
path='./data/AGIEval/data/v1/', path='opencompass/agieval',
name=_name, name=_name,
abbr='agieval-' + _name, abbr='agieval-' + _name,
setting_name='zero-shot', setting_name='zero-shot',

View File

@ -48,7 +48,7 @@ for name, test_type in settings:
bbh_datasets.append( bbh_datasets.append(
dict( dict(
type=BBHDataset, type=BBHDataset,
path='./data/BBH/data', path='opencompass/bbh',
name=name, name=name,
abbr='bbh-' + name, abbr='bbh-' + name,
reader_cfg=bbh_reader_cfg.copy(), reader_cfg=bbh_reader_cfg.copy(),

View File

@ -64,7 +64,7 @@ for _name in bbh_multiple_choice_sets:
bbh_datasets.append( bbh_datasets.append(
dict( dict(
type=BBHDataset, type=BBHDataset,
path=f'./data/BBH/data', path='opencompass/bbh',
name=_name, name=_name,
abbr='bbh-' + _name, abbr='bbh-' + _name,
reader_cfg=bbh_reader_cfg, reader_cfg=bbh_reader_cfg,
@ -91,7 +91,7 @@ for _name in bbh_free_form_sets:
bbh_datasets.append( bbh_datasets.append(
dict( dict(
type=BBHDataset, type=BBHDataset,
path=f'./data/BBH/data', path='opencompass/bbh',
name=_name, name=_name,
abbr='bbh-' + _name, abbr='bbh-' + _name,
reader_cfg=bbh_reader_cfg, reader_cfg=bbh_reader_cfg,

View File

@ -64,7 +64,7 @@ for _name in bbh_multiple_choice_sets:
bbh_datasets.append( bbh_datasets.append(
dict( dict(
type=BBHDataset, type=BBHDataset,
path=f'./data/BBH/data', path='opencompass/bbh',
name=_name, name=_name,
abbr='bbh-' + _name, abbr='bbh-' + _name,
reader_cfg=bbh_reader_cfg, reader_cfg=bbh_reader_cfg,
@ -91,7 +91,7 @@ for _name in bbh_free_form_sets:
bbh_datasets.append( bbh_datasets.append(
dict( dict(
type=BBHDataset, type=BBHDataset,
path=f'./data/BBH/data', path='opencompass/bbh',
name=_name, name=_name,
abbr='bbh-' + _name, abbr='bbh-' + _name,
reader_cfg=bbh_reader_cfg, reader_cfg=bbh_reader_cfg,

View File

@ -59,7 +59,7 @@ for _name in bbh_multiple_choice_sets:
bbh_datasets.append( bbh_datasets.append(
dict( dict(
type=BBHDataset, type=BBHDataset,
path=f'./data/BBH/data', path='opencompass/bbh',
name=_name, name=_name,
abbr='bbh-' + _name, abbr='bbh-' + _name,
reader_cfg=bbh_reader_cfg, reader_cfg=bbh_reader_cfg,
@ -82,7 +82,7 @@ for _name in bbh_free_form_sets:
bbh_datasets.append( bbh_datasets.append(
dict( dict(
type=BBHDataset, type=BBHDataset,
path=f'./data/BBH/data', path='opencompass/bbh',
name=_name, name=_name,
abbr='bbh-' + _name, abbr='bbh-' + _name,
reader_cfg=bbh_reader_cfg, reader_cfg=bbh_reader_cfg,

View File

@ -5,6 +5,7 @@ from opencompass.openicl.icl_inferencer import PPLInferencer
from opencompass.openicl.icl_evaluator import AccContaminationEvaluator from opencompass.openicl.icl_evaluator import AccContaminationEvaluator
from opencompass.datasets import CEvalDatasetClean as CEvalDataset from opencompass.datasets import CEvalDatasetClean as CEvalDataset
ceval_subject_mapping = { ceval_subject_mapping = {
'computer_network': ['Computer Network', '计算机网络', 'STEM'], 'computer_network': ['Computer Network', '计算机网络', 'STEM'],
'operating_system': ['Operating System', '操作系统', 'STEM'], 'operating_system': ['Operating System', '操作系统', 'STEM'],
@ -92,7 +93,7 @@ for _split in ['val']:
ceval_datasets.append( ceval_datasets.append(
dict( dict(
type=CEvalDataset, type=CEvalDataset,
path='./data/ceval/formal_ceval', path='opencompass/ceval-exam',
name=_name, name=_name,
abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' + _name, abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' + _name,
reader_cfg=dict( reader_cfg=dict(

View File

@ -91,7 +91,7 @@ for _split in ['val', 'test']:
ceval_datasets.append( ceval_datasets.append(
dict( dict(
type=CEvalDataset, type=CEvalDataset,
path='./data/ceval/formal_ceval', path='opencompass/ceval-exam',
name=_name, name=_name,
abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' + abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' +
_name, _name,

View File

@ -5,6 +5,7 @@ from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import CEvalDataset from opencompass.datasets import CEvalDataset
from opencompass.utils.text_postprocessors import first_capital_postprocess from opencompass.utils.text_postprocessors import first_capital_postprocess
ceval_subject_mapping = { ceval_subject_mapping = {
'computer_network': ['Computer Network', '计算机网络', 'STEM'], 'computer_network': ['Computer Network', '计算机网络', 'STEM'],
'operating_system': ['Operating System', '操作系统', 'STEM'], 'operating_system': ['Operating System', '操作系统', 'STEM'],
@ -91,7 +92,7 @@ for _split in ['val']:
ceval_datasets.append( ceval_datasets.append(
dict( dict(
type=CEvalDataset, type=CEvalDataset,
path='./data/ceval/formal_ceval', path='opencompass/ceval-exam',
name=_name, name=_name,
abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' + abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' +
_name, _name,

View File

@ -4,6 +4,7 @@ from opencompass.openicl.icl_inferencer import PPLInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import CEvalDataset from opencompass.datasets import CEvalDataset
ceval_subject_mapping = { ceval_subject_mapping = {
'computer_network': ['Computer Network', '计算机网络', 'STEM'], 'computer_network': ['Computer Network', '计算机网络', 'STEM'],
'operating_system': ['Operating System', '操作系统', 'STEM'], 'operating_system': ['Operating System', '操作系统', 'STEM'],
@ -93,7 +94,7 @@ for _split in ['val', 'test']:
ceval_datasets.append( ceval_datasets.append(
dict( dict(
type=CEvalDataset, type=CEvalDataset,
path='./data/ceval_internal/formal_ceval', path='opencompass/ceval-exam',
name=_name, name=_name,
abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' + _name, abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' + _name,
reader_cfg=ceval_reader_cfg, reader_cfg=ceval_reader_cfg,

View File

@ -4,6 +4,7 @@ from opencompass.openicl.icl_inferencer import PPLInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import CEvalDataset from opencompass.datasets import CEvalDataset
ceval_subject_mapping = { ceval_subject_mapping = {
'computer_network': ['Computer Network', '计算机网络', 'STEM'], 'computer_network': ['Computer Network', '计算机网络', 'STEM'],
'operating_system': ['Operating System', '操作系统', 'STEM'], 'operating_system': ['Operating System', '操作系统', 'STEM'],
@ -93,7 +94,7 @@ for _split in ['val', 'test']:
ceval_datasets.append( ceval_datasets.append(
dict( dict(
type=CEvalDataset, type=CEvalDataset,
path='./data/ceval/formal_ceval', path='opencompass/ceval-exam',
name=_name, name=_name,
abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' + _name, abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' + _name,
reader_cfg=ceval_reader_cfg, reader_cfg=ceval_reader_cfg,

View File

@ -4,6 +4,7 @@ from opencompass.openicl.icl_inferencer import PPLInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import CEvalDataset from opencompass.datasets import CEvalDataset
ceval_subject_mapping = { ceval_subject_mapping = {
'computer_network': ['Computer Network', '计算机网络', 'STEM'], 'computer_network': ['Computer Network', '计算机网络', 'STEM'],
'operating_system': ['Operating System', '操作系统', 'STEM'], 'operating_system': ['Operating System', '操作系统', 'STEM'],
@ -91,7 +92,7 @@ for _split in ['val']:
ceval_datasets.append( ceval_datasets.append(
dict( dict(
type=CEvalDataset, type=CEvalDataset,
path='./data/ceval/formal_ceval', path='opencompass/ceval-exam',
name=_name, name=_name,
abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' + abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' +
_name, _name,

View File

@ -4,6 +4,7 @@ from opencompass.openicl.icl_inferencer import PPLInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import CEvalDataset from opencompass.datasets import CEvalDataset
ceval_subject_mapping = { ceval_subject_mapping = {
'computer_network': ['Computer Network', '计算机网络', 'STEM'], 'computer_network': ['Computer Network', '计算机网络', 'STEM'],
'operating_system': ['Operating System', '操作系统', 'STEM'], 'operating_system': ['Operating System', '操作系统', 'STEM'],
@ -91,7 +92,7 @@ for _split in ['val', 'test']:
ceval_datasets.append( ceval_datasets.append(
dict( dict(
type=CEvalDataset, type=CEvalDataset,
path='./data/ceval/formal_ceval', path='opencompass/ceval-exam',
name=_name, name=_name,
abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' + abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' +
_name, _name,

View File

@ -5,6 +5,7 @@ from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import CEvalDataset from opencompass.datasets import CEvalDataset
from opencompass.utils.text_postprocessors import first_option_postprocess from opencompass.utils.text_postprocessors import first_option_postprocess
ceval_subject_mapping = { ceval_subject_mapping = {
'computer_network': ['Computer Network', '计算机网络', 'STEM'], 'computer_network': ['Computer Network', '计算机网络', 'STEM'],
'operating_system': ['Operating System', '操作系统', 'STEM'], 'operating_system': ['Operating System', '操作系统', 'STEM'],
@ -91,7 +92,7 @@ for _split in ['val']:
ceval_datasets.append( ceval_datasets.append(
dict( dict(
type=CEvalDataset, type=CEvalDataset,
path='./data/ceval/formal_ceval', path='opencompass/ceval-exam',
name=_name, name=_name,
abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' + abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' +
_name, _name,

View File

@ -33,8 +33,8 @@ maxmin_datasets = [
dict( dict(
type=MaxminDataset, type=MaxminDataset,
abbr=f'maxmin', abbr=f'maxmin',
test_path=f'data/clozeTest-maxmin/python/clozeTest.json', test_path='opencompass/clozeTest_maxmin',
answer_path=f'data/clozeTest-maxmin/python/answers.txt', answer_path='opencompass/clozeTest_maxmin_answers',
reader_cfg=maxmin_reader_cfg, reader_cfg=maxmin_reader_cfg,
infer_cfg=maxmin_infer_cfg, infer_cfg=maxmin_infer_cfg,
eval_cfg=maxmin_eval_cfg, eval_cfg=maxmin_eval_cfg,

View File

@ -5,6 +5,7 @@ from opencompass.openicl.icl_evaluator import AccwithDetailsEvaluator
from opencompass.datasets import CMMLUDataset from opencompass.datasets import CMMLUDataset
from opencompass.utils.text_postprocessors import first_capital_postprocess from opencompass.utils.text_postprocessors import first_capital_postprocess
cmmlu_subject_mapping = { cmmlu_subject_mapping = {
'agronomy': '农学', 'agronomy': '农学',
'anatomy': '解剖学', 'anatomy': '解剖学',
@ -107,7 +108,7 @@ for _name in cmmlu_all_sets:
cmmlu_datasets.append( cmmlu_datasets.append(
dict( dict(
type=CMMLUDataset, type=CMMLUDataset,
path='./data/cmmlu/', path='opencompass/cmmlu',
name=_name, name=_name,
abbr=f'cmmlu-{_name}', abbr=f'cmmlu-{_name}',
reader_cfg=dict( reader_cfg=dict(

View File

@ -102,7 +102,7 @@ for _name in cmmlu_all_sets:
cmmlu_datasets.append( cmmlu_datasets.append(
dict( dict(
type=CMMLUDataset, type=CMMLUDataset,
path='./data/cmmlu/', path='opencompass/cmmlu',
name=_name, name=_name,
abbr=f'cmmlu-{_name}', abbr=f'cmmlu-{_name}',
reader_cfg=dict( reader_cfg=dict(

View File

@ -107,7 +107,7 @@ for _name in cmmlu_all_sets:
cmmlu_datasets.append( cmmlu_datasets.append(
dict( dict(
type=CMMLUDataset, type=CMMLUDataset,
path='./data/cmmlu/', path='opencompass/cmmlu',
name=_name, name=_name,
abbr=f'cmmlu-{_name}', abbr=f'cmmlu-{_name}',
reader_cfg=dict( reader_cfg=dict(

View File

@ -45,7 +45,7 @@ commonsenseqa_datasets = [
dict( dict(
abbr='commonsense_qa', abbr='commonsense_qa',
type=commonsenseqaDataset, type=commonsenseqaDataset,
path='./data/commonsenseqa', path='opencompass/commonsense_qa',
reader_cfg=commonsenseqa_reader_cfg, reader_cfg=commonsenseqa_reader_cfg,
infer_cfg=commonsenseqa_infer_cfg, infer_cfg=commonsenseqa_infer_cfg,
eval_cfg=commonsenseqa_eval_cfg, eval_cfg=commonsenseqa_eval_cfg,

View File

@ -52,7 +52,7 @@ commonsenseqa_datasets = [
dict( dict(
abbr='commonsense_qa', abbr='commonsense_qa',
type=commonsenseqaDataset, type=commonsenseqaDataset,
path='./data/commonsenseqa', path='opencompass/commonsense_qa',
reader_cfg=commonsenseqa_reader_cfg, reader_cfg=commonsenseqa_reader_cfg,
infer_cfg=commonsenseqa_infer_cfg, infer_cfg=commonsenseqa_infer_cfg,
eval_cfg=commonsenseqa_eval_cfg, eval_cfg=commonsenseqa_eval_cfg,

View File

@ -47,7 +47,7 @@ commonsenseqa_datasets = [
dict( dict(
abbr='commonsense_qa', abbr='commonsense_qa',
type=commonsenseqaDataset, type=commonsenseqaDataset,
path='./data/commonsenseqa', path='opencompass/commonsense_qa',
reader_cfg=commonsenseqa_reader_cfg, reader_cfg=commonsenseqa_reader_cfg,
infer_cfg=commonsenseqa_infer_cfg, infer_cfg=commonsenseqa_infer_cfg,
eval_cfg=commonsenseqa_eval_cfg) eval_cfg=commonsenseqa_eval_cfg)

View File

@ -42,7 +42,7 @@ commonsenseqa_datasets = [
dict( dict(
abbr='commonsense_qa', abbr='commonsense_qa',
type=commonsenseqaDataset, type=commonsenseqaDataset,
path='./data/commonsenseqa', path='opencompass/commonsense_qa',
reader_cfg=commonsenseqa_reader_cfg, reader_cfg=commonsenseqa_reader_cfg,
infer_cfg=commonsenseqa_infer_cfg, infer_cfg=commonsenseqa_infer_cfg,
eval_cfg=commonsenseqa_eval_cfg) eval_cfg=commonsenseqa_eval_cfg)

View File

@ -38,7 +38,7 @@ commonsenseqa_datasets = [
dict( dict(
abbr='commonsense_qa', abbr='commonsense_qa',
type=commonsenseqaDataset, type=commonsenseqaDataset,
path='./data/commonsenseqa', path='opencompass/commonsense_qa',
reader_cfg=commonsenseqa_reader_cfg, reader_cfg=commonsenseqa_reader_cfg,
infer_cfg=commonsenseqa_infer_cfg, infer_cfg=commonsenseqa_infer_cfg,
eval_cfg=commonsenseqa_eval_cfg) eval_cfg=commonsenseqa_eval_cfg)

View File

@ -34,7 +34,7 @@ commonsenseqa_datasets = [
dict( dict(
abbr='commonsense_qa', abbr='commonsense_qa',
type=commonsenseqaDataset, type=commonsenseqaDataset,
path='./data/commonsenseqa', path='opencompass/commonsense_qa',
reader_cfg=commonsenseqa_reader_cfg, reader_cfg=commonsenseqa_reader_cfg,
infer_cfg=commonsenseqa_infer_cfg, infer_cfg=commonsenseqa_infer_cfg,
eval_cfg=commonsenseqa_eval_cfg) eval_cfg=commonsenseqa_eval_cfg)

View File

@ -35,7 +35,7 @@ commonsenseqa_datasets = [
dict( dict(
abbr='commonsense_qa', abbr='commonsense_qa',
type=commonsenseqaDataset, type=commonsenseqaDataset,
path='./data/commonsenseqa', path='opencompass/commonsense_qa',
reader_cfg=commonsenseqa_reader_cfg, reader_cfg=commonsenseqa_reader_cfg,
infer_cfg=commonsenseqa_infer_cfg, infer_cfg=commonsenseqa_infer_cfg,
eval_cfg=commonsenseqa_eval_cfg) eval_cfg=commonsenseqa_eval_cfg)

View File

@ -92,7 +92,7 @@ for _split in list(compassbench_v1_knowledge_sets.keys()):
) )
from opencompass.datasets import TriviaQADataset_V3, TriviaQAEvaluator from opencompass.datasets import TriviaQADatasetV3, TriviaQAEvaluator
triviaqa_and_nq_reader_cfg = dict(input_columns=['question'], output_column='answer') triviaqa_and_nq_reader_cfg = dict(input_columns=['question'], output_column='answer')
@ -123,7 +123,7 @@ triviaqa_and_nq_eval_cfg = dict(evaluator=dict(type=TriviaQAEvaluator), pred_rol
compassbench_v1_knowledge_datasets.append( compassbench_v1_knowledge_datasets.append(
dict( dict(
type=TriviaQADataset_V3, type=TriviaQADatasetV3,
abbr='compassbench_v1_knowledge-mixed-cloze_en', abbr='compassbench_v1_knowledge-mixed-cloze_en',
path='data/compassbench_v1.1/knowledge/mixed/cloze_en.jsonl', path='data/compassbench_v1.1/knowledge/mixed/cloze_en.jsonl',
reader_cfg=triviaqa_and_nq_reader_cfg, reader_cfg=triviaqa_and_nq_reader_cfg,

View File

@ -92,7 +92,7 @@ for _split in list(compassbench_v1_knowledge_sets.keys()):
) )
from opencompass.datasets import TriviaQADataset_V3, TriviaQAEvaluator from opencompass.datasets import TriviaQADatasetV3, TriviaQAEvaluator
triviaqa_and_nq_reader_cfg = dict(input_columns=['question'], output_column='answer') triviaqa_and_nq_reader_cfg = dict(input_columns=['question'], output_column='answer')
@ -123,7 +123,7 @@ triviaqa_and_nq_eval_cfg = dict(evaluator=dict(type=TriviaQAEvaluator), pred_rol
compassbench_v1_knowledge_datasets.append( compassbench_v1_knowledge_datasets.append(
dict( dict(
type=TriviaQADataset_V3, type=TriviaQADatasetV3,
abbr='compassbench_v1_knowledge-mixed-cloze_en_public', abbr='compassbench_v1_knowledge-mixed-cloze_en_public',
path='data/compassbench_v1.1.public/knowledge/mixed/cloze_en.jsonl', path='data/compassbench_v1.1.public/knowledge/mixed/cloze_en.jsonl',
reader_cfg=triviaqa_and_nq_reader_cfg, reader_cfg=triviaqa_and_nq_reader_cfg,

Some files were not shown because too many files have changed in this diff Show More