mirror of
https://github.com/open-compass/opencompass.git
synced 2025-05-30 16:03:24 +08:00
[Feature] Support ModelScope datasets (#1289)
* add ceval, gsm8k modelscope surpport * update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest * update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets * format file * format file * update dataset format * support ms_dataset * udpate dataset for modelscope support * merge myl_dev and update test_ms_dataset * udpate dataset for modelscope support * update readme * update eval_api_zhipu_v2 * remove unused code * add get_data_path function * update readme * remove tydiqa japanese subset * add ceval, gsm8k modelscope surpport * update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest * update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets * format file * format file * update dataset format * support ms_dataset * udpate dataset for modelscope support * merge myl_dev and update test_ms_dataset * update readme * udpate dataset for modelscope support * update eval_api_zhipu_v2 * remove unused code * add get_data_path function * remove tydiqa japanese subset * update util * remove .DS_Store * fix md format * move util into package * update docs/get_started.md * restore eval_api_zhipu_v2.py, add environment setting * Update dataset * Update * Update * Update * Update --------- Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local> Co-authored-by: Yunnglin <mao.looper@qq.com> Co-authored-by: Yun lin <yunlin@laptop.local> Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn> Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
This commit is contained in:
parent
12b84aeb3b
commit
edab1c07ba
2
.gitignore
vendored
2
.gitignore
vendored
@ -1,4 +1,4 @@
|
||||
|
||||
.DS_Store
|
||||
output_*/
|
||||
outputs/
|
||||
scripts/
|
||||
|
@ -1,6 +1,7 @@
|
||||
exclude: |
|
||||
(?x)^(
|
||||
tests/data/|
|
||||
tests/dataset/|
|
||||
opencompass/models/internal/|
|
||||
opencompass/utils/internal/|
|
||||
opencompass/openicl/icl_evaluator/hf_metrics/|
|
||||
|
18
README.md
18
README.md
@ -70,6 +70,7 @@ Just like a compass guides us on our journey, OpenCompass will guide you through
|
||||
|
||||
## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
|
||||
|
||||
- **\[2024.07.23\]** We supported the [ModelScope](www.modelscope.cn) datasets, you can load them on demand without downloading all the data to your local disk. Welcome to try! 🔥🔥🔥
|
||||
- **\[2024.07.17\]** We have released the example data and configuration for the CompassBench-202408, welcome to [CompassBench](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/compassbench_intro.html) for more details. 🔥🔥🔥
|
||||
- **\[2024.07.17\]** We are excited to announce the release of NeedleBench's [technical report](http://arxiv.org/abs/2407.11963). We invite you to visit our [support documentation](https://opencompass.readthedocs.io/en/latest/advanced_guides/needleinahaystack_eval.html) for detailed evaluation guidelines. 🔥🔥🔥
|
||||
- **\[2024.07.04\]** OpenCompass now supports InternLM2.5, which has **outstanding reasoning capability**, **1M Context window and** and **stronger tool use**, you can try the models in [OpenCompass Config](https://github.com/open-compass/opencompass/tree/main/configs/models/hf_internlm) and [InternLM](https://github.com/InternLM/InternLM) .🔥🔥🔥.
|
||||
@ -136,12 +137,29 @@ pip install -e .
|
||||
|
||||
### 📂 Data Preparation
|
||||
|
||||
You can download and extract the datasets with the following commands:
|
||||
|
||||
```bash
|
||||
# Download dataset to data/ folder
|
||||
wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
|
||||
unzip OpenCompassData-core-20240207.zip
|
||||
```
|
||||
|
||||
Also, use the [ModelScope](www.modelscope.cn) to load the datasets on demand.
|
||||
|
||||
Installation:
|
||||
|
||||
```bash
|
||||
pip install modelscope
|
||||
export DATASET_SOURCE=ModelScope
|
||||
```
|
||||
|
||||
Then submit the evaluation task without downloading all the data to your local disk. Available datasets include:
|
||||
|
||||
```bash
|
||||
humaneval, triviaqa, commonsenseqa, tydiqa, strategyqa, cmmlu, lambada, piqa, ceval, math, LCSTS, Xsum, winogrande, openbookqa, AGIEval, gsm8k, nq, race, siqa, mbpp, mmlu, hellaswag, ARC, BBH, xstory_cloze, summedits, GAOKAO-BENCH, OCNLI, cmnli
|
||||
```
|
||||
|
||||
Some third-party features, like Humaneval and Llama, may require additional steps to work properly, for detailed steps please refer to the [Installation Guide](https://opencompass.readthedocs.io/en/latest/get_started/installation.html).
|
||||
|
||||
<p align="right"><a href="#top">🔝Back to top</a></p>
|
||||
|
@ -69,6 +69,7 @@
|
||||
|
||||
## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
|
||||
|
||||
- **\[2024.07.23\]** 我们支持了[ModelScope](www.modelscope.cn)数据集,您可以按需加载,无需事先下载全部数据到本地,欢迎试用!🔥🔥🔥
|
||||
- **\[2024.07.17\]** 我们发布了CompassBench-202408榜单的示例数据和评测规则,敬请访问 [CompassBench](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/compassbench_intro.html) 获取更多信息。 🔥🔥🔥
|
||||
- **\[2024.07.17\]** 我们正式发布 NeedleBench 的[技术报告](http://arxiv.org/abs/2407.11963)。诚邀您访问我们的[帮助文档](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/needleinahaystack_eval.html)进行评估。🔥🔥🔥
|
||||
- **\[2024.07.04\]** OpenCompass 现已支持 InternLM2.5, 它拥有卓越的推理性能、有效支持百万字超长上下文以及工具调用能力整体升级,欢迎访问[OpenCompass Config](https://github.com/open-compass/opencompass/tree/main/configs/models/hf_internlm) 和 [InternLM](https://github.com/InternLM/InternLM) .🔥🔥🔥.
|
||||
@ -138,12 +139,28 @@ pip install -e .
|
||||
|
||||
### 📂 数据准备
|
||||
|
||||
OpenCompass支持使用本地数据集进行评测,数据集的下载和解压可以通过以下命令完成:
|
||||
|
||||
```bash
|
||||
# 下载数据集到 data/ 处
|
||||
wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
|
||||
unzip OpenCompassData-core-20240207.zip
|
||||
```
|
||||
|
||||
另外,您还可以使用[ModelScope](www.modelscope.cn)来加载数据集:
|
||||
环境准备:
|
||||
|
||||
```bash
|
||||
pip install modelscope
|
||||
export DATASET_SOURCE=ModelScope
|
||||
```
|
||||
|
||||
配置好环境后,无需下载全部数据,直接提交评测任务即可。目前支持的数据集有:
|
||||
|
||||
```bash
|
||||
humaneval, triviaqa, commonsenseqa, tydiqa, strategyqa, cmmlu, lambada, piqa, ceval, math, LCSTS, Xsum, winogrande, openbookqa, AGIEval, gsm8k, nq, race, siqa, mbpp, mmlu, hellaswag, ARC, BBH, xstory_cloze, summedits, GAOKAO-BENCH, OCNLI, cmnli
|
||||
```
|
||||
|
||||
有部分第三方功能,如 Humaneval 以及 Llama,可能需要额外步骤才能正常运行,详细步骤请参考[安装指南](https://opencompass.readthedocs.io/zh_CN/latest/get_started/installation.html)。
|
||||
|
||||
<p align="right"><a href="#top">🔝返回顶部</a></p>
|
||||
|
@ -47,7 +47,8 @@ ARC_c_datasets = [
|
||||
dict(
|
||||
type=ARCDataset,
|
||||
abbr='ARC-c-test',
|
||||
path='./data/ARC/ARC-c/ARC-Challenge-Test.jsonl',
|
||||
path='opencompass/ai2_arc-test',
|
||||
name='ARC-Challenge',
|
||||
reader_cfg=ARC_c_reader_cfg,
|
||||
infer_cfg=ARC_c_infer_cfg,
|
||||
eval_cfg=ARC_c_eval_cfg)
|
||||
|
@ -35,7 +35,8 @@ ARC_c_datasets = [
|
||||
dict(
|
||||
abbr='ARC-c',
|
||||
type=ARCDataset,
|
||||
path='./data/ARC/ARC-c/ARC-Challenge-Dev.jsonl',
|
||||
path='opencompass/ai2_arc-dev',
|
||||
name='ARC-Challenge',
|
||||
reader_cfg=ARC_c_reader_cfg,
|
||||
infer_cfg=ARC_c_infer_cfg,
|
||||
eval_cfg=ARC_c_eval_cfg,
|
||||
|
@ -29,7 +29,8 @@ ARC_c_datasets = [
|
||||
dict(
|
||||
type=ARCDataset,
|
||||
abbr='ARC-c',
|
||||
path='./data/ARC/ARC-c/ARC-Challenge-Dev.jsonl',
|
||||
path='opencompass/ai2_arc-dev',
|
||||
name='ARC-Challenge',
|
||||
reader_cfg=ARC_c_reader_cfg,
|
||||
infer_cfg=ARC_c_infer_cfg,
|
||||
eval_cfg=ARC_c_eval_cfg)
|
||||
|
@ -46,7 +46,8 @@ ARC_c_datasets = [
|
||||
dict(
|
||||
type=ARCDataset,
|
||||
abbr='ARC-c',
|
||||
path='./data/ARC/ARC-c/ARC-Challenge-Dev.jsonl',
|
||||
path='opencompass/ai2_arc-dev',
|
||||
name='ARC-Challenge',
|
||||
reader_cfg=ARC_c_reader_cfg,
|
||||
infer_cfg=ARC_c_infer_cfg,
|
||||
eval_cfg=ARC_c_eval_cfg)
|
||||
|
@ -1,3 +1,5 @@
|
||||
from mmengine.config import read_base
|
||||
# with read_base():
|
||||
from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import PPLInferencer
|
||||
@ -26,7 +28,8 @@ ARC_c_datasets = [
|
||||
dict(
|
||||
type=ARCDataset,
|
||||
abbr='ARC-c',
|
||||
path='./data/ARC/ARC-c/ARC-Challenge-Dev.jsonl',
|
||||
path='opencompass/ai2_arc-dev',
|
||||
name='ARC-Challenge',
|
||||
reader_cfg=ARC_c_reader_cfg,
|
||||
infer_cfg=ARC_c_infer_cfg,
|
||||
eval_cfg=ARC_c_eval_cfg)
|
||||
|
@ -35,7 +35,8 @@ ARC_e_datasets = [
|
||||
dict(
|
||||
abbr='ARC-e',
|
||||
type=ARCDataset,
|
||||
path='./data/ARC/ARC-e/ARC-Easy-Dev.jsonl',
|
||||
path='opencompass/ai2_arc-easy-dev',
|
||||
name='ARC-Easy',
|
||||
reader_cfg=ARC_e_reader_cfg,
|
||||
infer_cfg=ARC_e_infer_cfg,
|
||||
eval_cfg=ARC_e_eval_cfg,
|
||||
|
@ -29,7 +29,8 @@ ARC_e_datasets = [
|
||||
dict(
|
||||
type=ARCDataset,
|
||||
abbr='ARC-e',
|
||||
path='./data/ARC/ARC-e/ARC-Easy-Dev.jsonl',
|
||||
path='opencompass/ai2_arc-easy-dev',
|
||||
name='ARC-Easy',
|
||||
reader_cfg=ARC_e_reader_cfg,
|
||||
infer_cfg=ARC_e_infer_cfg,
|
||||
eval_cfg=ARC_e_eval_cfg)
|
||||
|
@ -46,7 +46,8 @@ ARC_e_datasets = [
|
||||
dict(
|
||||
type=ARCDataset,
|
||||
abbr='ARC-e',
|
||||
path='./data/ARC/ARC-e/ARC-Easy-Dev.jsonl',
|
||||
path='opencompass/ai2_arc-easy-dev',
|
||||
name='ARC-Easy',
|
||||
reader_cfg=ARC_e_reader_cfg,
|
||||
infer_cfg=ARC_e_infer_cfg,
|
||||
eval_cfg=ARC_e_eval_cfg)
|
||||
|
@ -26,7 +26,8 @@ ARC_e_datasets = [
|
||||
dict(
|
||||
type=ARCDataset,
|
||||
abbr='ARC-e',
|
||||
path='./data/ARC/ARC-e/ARC-Easy-Dev.jsonl',
|
||||
path='opencompass/ai2_arc-easy-dev',
|
||||
name='ARC-Easy',
|
||||
reader_cfg=ARC_e_reader_cfg,
|
||||
infer_cfg=ARC_e_infer_cfg,
|
||||
eval_cfg=ARC_e_eval_cfg)
|
||||
|
@ -28,7 +28,7 @@ CMRC_datasets = [
|
||||
dict(
|
||||
type=CMRCDataset,
|
||||
abbr='CMRC_dev',
|
||||
path='./data/CLUE/CMRC/dev.json',
|
||||
path='opencompass/cmrc_dev',
|
||||
reader_cfg=CMRC_reader_cfg,
|
||||
infer_cfg=CMRC_infer_cfg,
|
||||
eval_cfg=CMRC_eval_cfg),
|
||||
|
@ -26,7 +26,7 @@ CMRC_datasets = [
|
||||
dict(
|
||||
type=CMRCDataset,
|
||||
abbr='CMRC_dev',
|
||||
path='./data/CLUE/CMRC/dev.json',
|
||||
path='opencompass/cmrc_dev',
|
||||
reader_cfg=CMRC_reader_cfg,
|
||||
infer_cfg=CMRC_infer_cfg,
|
||||
eval_cfg=CMRC_eval_cfg),
|
||||
|
@ -20,7 +20,7 @@ CMRC_datasets = [
|
||||
dict(
|
||||
type=CMRCDataset,
|
||||
abbr='CMRC_dev',
|
||||
path='./data/CLUE/CMRC/dev.json',
|
||||
path='opencompass/cmrc_dev',
|
||||
reader_cfg=CMRC_reader_cfg,
|
||||
infer_cfg=CMRC_infer_cfg,
|
||||
eval_cfg=CMRC_eval_cfg),
|
||||
|
@ -27,7 +27,7 @@ CMRC_datasets = [
|
||||
dict(
|
||||
type=CMRCDataset,
|
||||
abbr='CMRC_dev',
|
||||
path='./data/CLUE/CMRC/dev.json',
|
||||
path='opencompass/cmrc_dev',
|
||||
reader_cfg=CMRC_reader_cfg,
|
||||
infer_cfg=CMRC_infer_cfg,
|
||||
eval_cfg=CMRC_eval_cfg),
|
||||
|
@ -29,7 +29,7 @@ DRCD_datasets = [
|
||||
dict(
|
||||
type=DRCDDataset,
|
||||
abbr='DRCD_dev',
|
||||
path='./data/CLUE/DRCD/dev.json',
|
||||
path='opencompass/drcd_dev',
|
||||
reader_cfg=DRCD_reader_cfg,
|
||||
infer_cfg=DRCD_infer_cfg,
|
||||
eval_cfg=DRCD_eval_cfg),
|
||||
|
@ -26,7 +26,7 @@ DRCD_datasets = [
|
||||
dict(
|
||||
type=DRCDDataset,
|
||||
abbr='DRCD_dev',
|
||||
path='./data/CLUE/DRCD/dev.json',
|
||||
path='opencompass/drcd_dev',
|
||||
reader_cfg=DRCD_reader_cfg,
|
||||
infer_cfg=DRCD_infer_cfg,
|
||||
eval_cfg=DRCD_eval_cfg),
|
||||
|
@ -20,7 +20,7 @@ DRCD_datasets = [
|
||||
dict(
|
||||
type=DRCDDataset,
|
||||
abbr='DRCD_dev',
|
||||
path='./data/CLUE/DRCD/dev.json',
|
||||
path='opencompass/drcd_dev',
|
||||
reader_cfg=DRCD_reader_cfg,
|
||||
infer_cfg=DRCD_infer_cfg,
|
||||
eval_cfg=DRCD_eval_cfg),
|
||||
|
@ -27,7 +27,7 @@ DRCD_datasets = [
|
||||
dict(
|
||||
type=DRCDDataset,
|
||||
abbr='DRCD_dev',
|
||||
path='./data/CLUE/DRCD/dev.json',
|
||||
path='opencompass/drcd_dev',
|
||||
reader_cfg=DRCD_reader_cfg,
|
||||
infer_cfg=DRCD_infer_cfg,
|
||||
eval_cfg=DRCD_eval_cfg),
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import AFQMCDataset_V2
|
||||
from opencompass.datasets import AFQMCDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_capital_postprocess
|
||||
|
||||
afqmc_reader_cfg = dict(
|
||||
@ -34,8 +34,8 @@ afqmc_eval_cfg = dict(
|
||||
afqmc_datasets = [
|
||||
dict(
|
||||
abbr='afqmc-dev',
|
||||
type=AFQMCDataset_V2,
|
||||
path='./data/CLUE/AFQMC/dev.json',
|
||||
type=AFQMCDatasetV2,
|
||||
path='opencompass/afqmc-dev',
|
||||
reader_cfg=afqmc_reader_cfg,
|
||||
infer_cfg=afqmc_infer_cfg,
|
||||
eval_cfg=afqmc_eval_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import cmnliDataset_V2
|
||||
from opencompass.datasets import CMNLIDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_capital_postprocess
|
||||
|
||||
cmnli_reader_cfg = dict(
|
||||
@ -34,8 +34,8 @@ cmnli_eval_cfg = dict(
|
||||
cmnli_datasets = [
|
||||
dict(
|
||||
abbr='cmnli',
|
||||
type=cmnliDataset_V2,
|
||||
path='./data/CLUE/cmnli/cmnli_public/dev.json',
|
||||
type=CMNLIDatasetV2,
|
||||
path='opencompass/cmnli-dev',
|
||||
reader_cfg=cmnli_reader_cfg,
|
||||
infer_cfg=cmnli_infer_cfg,
|
||||
eval_cfg=cmnli_eval_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import cmnliDataset_V2
|
||||
from opencompass.datasets import CMNLIDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_capital_postprocess
|
||||
|
||||
cmnli_reader_cfg = dict(
|
||||
@ -34,8 +34,8 @@ cmnli_eval_cfg = dict(
|
||||
cmnli_datasets = [
|
||||
dict(
|
||||
abbr='cmnli',
|
||||
type=cmnliDataset_V2,
|
||||
path='./data/CLUE/cmnli/cmnli_public/dev.json',
|
||||
type=CMNLIDatasetV2,
|
||||
path='opencompass/cmnli-dev',
|
||||
reader_cfg=cmnli_reader_cfg,
|
||||
infer_cfg=cmnli_infer_cfg,
|
||||
eval_cfg=cmnli_eval_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import PPLInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import cmnliDataset
|
||||
from opencompass.datasets import CMNLIDataset
|
||||
|
||||
cmnli_reader_cfg = dict(
|
||||
input_columns=['sentence1', 'sentence2'],
|
||||
@ -26,8 +26,8 @@ cmnli_eval_cfg = dict(evaluator=dict(type=AccEvaluator))
|
||||
cmnli_datasets = [
|
||||
dict(
|
||||
abbr='cmnli',
|
||||
type=cmnliDataset,
|
||||
path='./data/CLUE/cmnli/cmnli_public/dev.json',
|
||||
type=CMNLIDataset,
|
||||
path='opencompass/cmnli-dev',
|
||||
reader_cfg=cmnli_reader_cfg,
|
||||
infer_cfg=cmnli_infer_cfg,
|
||||
eval_cfg=cmnli_eval_cfg)
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import PPLInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import cmnliDataset
|
||||
from opencompass.datasets import CMNLIDataset
|
||||
|
||||
cmnli_reader_cfg = dict(
|
||||
input_columns=['sentence1', 'sentence2'],
|
||||
@ -42,8 +42,8 @@ cmnli_eval_cfg = dict(evaluator=dict(type=AccEvaluator))
|
||||
cmnli_datasets = [
|
||||
dict(
|
||||
abbr='cmnli',
|
||||
type=cmnliDataset,
|
||||
path='./data/CLUE/cmnli/cmnli_public/dev.json',
|
||||
type=CMNLIDataset,
|
||||
path='opencompass/cmnli-dev',
|
||||
reader_cfg=cmnli_reader_cfg,
|
||||
infer_cfg=cmnli_infer_cfg,
|
||||
eval_cfg=cmnli_eval_cfg)
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import PPLInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import cmnliDataset
|
||||
from opencompass.datasets import CMNLIDataset
|
||||
|
||||
cmnli_reader_cfg = dict(
|
||||
input_columns=['sentence1', 'sentence2'],
|
||||
@ -46,8 +46,8 @@ cmnli_eval_cfg = dict(evaluator=dict(type=AccEvaluator))
|
||||
cmnli_datasets = [
|
||||
dict(
|
||||
abbr='cmnli',
|
||||
type=cmnliDataset,
|
||||
path='./data/CLUE/cmnli/cmnli_public/dev.json',
|
||||
type=CMNLIDataset,
|
||||
path='opencompass/cmnli-dev',
|
||||
reader_cfg=cmnli_reader_cfg,
|
||||
infer_cfg=cmnli_infer_cfg,
|
||||
eval_cfg=cmnli_eval_cfg)
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import cmnliDataset_V2
|
||||
from opencompass.datasets import CMNLIDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_capital_postprocess
|
||||
|
||||
ocnli_reader_cfg = dict(
|
||||
@ -35,8 +35,8 @@ ocnli_eval_cfg = dict(
|
||||
ocnli_datasets = [
|
||||
dict(
|
||||
abbr='ocnli',
|
||||
type=cmnliDataset_V2, # ocnli share the same format with cmnli
|
||||
path='./data/CLUE/OCNLI/dev.json',
|
||||
type=CMNLIDatasetV2, # ocnli share the same format with cmnli
|
||||
path='opencompass/OCNLI-dev',
|
||||
reader_cfg=ocnli_reader_cfg,
|
||||
infer_cfg=ocnli_infer_cfg,
|
||||
eval_cfg=ocnli_eval_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import cmnliDataset_V2
|
||||
from opencompass.datasets import CMNLIDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_capital_postprocess
|
||||
|
||||
ocnli_reader_cfg = dict(
|
||||
@ -35,8 +35,8 @@ ocnli_eval_cfg = dict(
|
||||
ocnli_datasets = [
|
||||
dict(
|
||||
abbr='ocnli',
|
||||
type=cmnliDataset_V2, # ocnli share the same format with cmnli
|
||||
path='./data/CLUE/OCNLI/dev.json',
|
||||
type=CMNLIDatasetV2, # ocnli share the same format with cmnli
|
||||
path='opencompass/OCNLI-dev',
|
||||
reader_cfg=ocnli_reader_cfg,
|
||||
infer_cfg=ocnli_infer_cfg,
|
||||
eval_cfg=ocnli_eval_cfg,
|
||||
|
@ -67,7 +67,7 @@ for _name in chembench_all_sets:
|
||||
dict(
|
||||
abbr=f'ChemBench_{_name}',
|
||||
type=ChemBenchDataset,
|
||||
path='./data/ChemBench/',
|
||||
path='opencompass/ChemBench',
|
||||
name=_name,
|
||||
reader_cfg=chembench_reader_cfg,
|
||||
infer_cfg=chembench_infer_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import AFQMCDataset_V2
|
||||
from opencompass.datasets import AFQMCDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_capital_postprocess
|
||||
|
||||
bustm_reader_cfg = dict(
|
||||
@ -34,16 +34,18 @@ bustm_eval_cfg = dict(
|
||||
bustm_datasets = [
|
||||
dict(
|
||||
abbr='bustm-dev',
|
||||
type=AFQMCDataset_V2, # bustm share the same format with AFQMC
|
||||
type=AFQMCDatasetV2, # bustm share the same format with AFQMC
|
||||
path='./data/FewCLUE/bustm/dev_few_all.json',
|
||||
local_mode=True,
|
||||
reader_cfg=bustm_reader_cfg,
|
||||
infer_cfg=bustm_infer_cfg,
|
||||
eval_cfg=bustm_eval_cfg,
|
||||
),
|
||||
dict(
|
||||
abbr='bustm-test',
|
||||
type=AFQMCDataset_V2, # bustm share the same format with AFQMC
|
||||
type=AFQMCDatasetV2, # bustm share the same format with AFQMC
|
||||
path='./data/FewCLUE/bustm/test_public.json',
|
||||
local_mode=True,
|
||||
reader_cfg=bustm_reader_cfg,
|
||||
infer_cfg=bustm_infer_cfg,
|
||||
eval_cfg=bustm_eval_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import CHIDDataset_V2
|
||||
from opencompass.datasets import CHIDDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_capital_postprocess
|
||||
|
||||
chid_reader_cfg = dict(
|
||||
@ -34,7 +34,7 @@ chid_eval_cfg = dict(
|
||||
chid_datasets = [
|
||||
dict(
|
||||
abbr='chid-dev',
|
||||
type=CHIDDataset_V2,
|
||||
type=CHIDDatasetV2,
|
||||
path='./data/FewCLUE/chid/dev_few_all.json',
|
||||
reader_cfg=chid_reader_cfg,
|
||||
infer_cfg=chid_infer_cfg,
|
||||
@ -42,7 +42,7 @@ chid_datasets = [
|
||||
),
|
||||
dict(
|
||||
abbr='chid-test',
|
||||
type=CHIDDataset_V2,
|
||||
type=CHIDDatasetV2,
|
||||
path='./data/FewCLUE/chid/test_public.json',
|
||||
reader_cfg=chid_reader_cfg,
|
||||
infer_cfg=chid_infer_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import CluewscDataset_V2
|
||||
from opencompass.datasets import CluewscDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_capital_postprocess
|
||||
|
||||
cluewsc_reader_cfg = dict(
|
||||
@ -34,7 +34,7 @@ cluewsc_eval_cfg = dict(
|
||||
cluewsc_datasets = [
|
||||
dict(
|
||||
abbr='cluewsc-dev',
|
||||
type=CluewscDataset_V2,
|
||||
type=CluewscDatasetV2,
|
||||
path='./data/FewCLUE/cluewsc/dev_few_all.json',
|
||||
reader_cfg=cluewsc_reader_cfg,
|
||||
infer_cfg=cluewsc_infer_cfg,
|
||||
@ -42,7 +42,7 @@ cluewsc_datasets = [
|
||||
),
|
||||
dict(
|
||||
abbr='cluewsc-test',
|
||||
type=CluewscDataset_V2,
|
||||
type=CluewscDatasetV2,
|
||||
path='./data/FewCLUE/cluewsc/test_public.json',
|
||||
reader_cfg=cluewsc_reader_cfg,
|
||||
infer_cfg=cluewsc_infer_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import CslDataset_V2
|
||||
from opencompass.datasets import CslDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_capital_postprocess
|
||||
|
||||
csl_reader_cfg = dict(
|
||||
@ -34,7 +34,7 @@ csl_eval_cfg = dict(
|
||||
csl_datasets = [
|
||||
dict(
|
||||
abbr='csl_dev',
|
||||
type=CslDataset_V2,
|
||||
type=CslDatasetV2,
|
||||
path='./data/FewCLUE/csl/dev_few_all.json',
|
||||
reader_cfg=csl_reader_cfg,
|
||||
infer_cfg=csl_infer_cfg,
|
||||
@ -42,7 +42,7 @@ csl_datasets = [
|
||||
),
|
||||
dict(
|
||||
abbr='csl_test',
|
||||
type=CslDataset_V2,
|
||||
type=CslDatasetV2,
|
||||
path='./data/FewCLUE/csl/test_public.json',
|
||||
reader_cfg=csl_reader_cfg,
|
||||
infer_cfg=csl_infer_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import CslDataset_V2
|
||||
from opencompass.datasets import CslDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_capital_postprocess
|
||||
|
||||
csl_reader_cfg = dict(
|
||||
@ -34,7 +34,7 @@ csl_eval_cfg = dict(
|
||||
csl_datasets = [
|
||||
dict(
|
||||
abbr='csl_dev',
|
||||
type=CslDataset_V2,
|
||||
type=CslDatasetV2,
|
||||
path='./data/FewCLUE/csl/dev_few_all.json',
|
||||
reader_cfg=csl_reader_cfg,
|
||||
infer_cfg=csl_infer_cfg,
|
||||
@ -42,7 +42,7 @@ csl_datasets = [
|
||||
),
|
||||
dict(
|
||||
abbr='csl_test',
|
||||
type=CslDataset_V2,
|
||||
type=CslDatasetV2,
|
||||
path='./data/FewCLUE/csl/test_public.json',
|
||||
reader_cfg=csl_reader_cfg,
|
||||
infer_cfg=csl_infer_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import eprstmtDataset_V2
|
||||
from opencompass.datasets import EprstmtDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_capital_postprocess
|
||||
|
||||
eprstmt_reader_cfg = dict(
|
||||
@ -32,7 +32,7 @@ eprstmt_eval_cfg = dict(
|
||||
eprstmt_datasets = [
|
||||
dict(
|
||||
abbr='eprstmt-dev',
|
||||
type=eprstmtDataset_V2,
|
||||
type=EprstmtDatasetV2,
|
||||
path='./data/FewCLUE/eprstmt/dev_few_all.json',
|
||||
reader_cfg=eprstmt_reader_cfg,
|
||||
infer_cfg=eprstmt_infer_cfg,
|
||||
@ -40,7 +40,7 @@ eprstmt_datasets = [
|
||||
),
|
||||
dict(
|
||||
abbr='eprstmt-test',
|
||||
type=eprstmtDataset_V2,
|
||||
type=EprstmtDatasetV2,
|
||||
path='./data/FewCLUE/eprstmt/test_public.json',
|
||||
reader_cfg=eprstmt_reader_cfg,
|
||||
infer_cfg=eprstmt_infer_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import cmnliDataset_V2
|
||||
from opencompass.datasets import CMNLIDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_capital_postprocess
|
||||
|
||||
ocnli_fc_reader_cfg = dict(
|
||||
@ -33,16 +33,18 @@ ocnli_fc_eval_cfg = dict(
|
||||
ocnli_fc_datasets = [
|
||||
dict(
|
||||
abbr='ocnli_fc-dev',
|
||||
type=cmnliDataset_V2, # ocnli_fc share the same format with cmnli
|
||||
type=CMNLIDatasetV2, # ocnli_fc share the same format with cmnli
|
||||
path='./data/FewCLUE/ocnli/dev_few_all.json',
|
||||
local_mode=True,
|
||||
reader_cfg=ocnli_fc_reader_cfg,
|
||||
infer_cfg=ocnli_fc_infer_cfg,
|
||||
eval_cfg=ocnli_fc_eval_cfg,
|
||||
),
|
||||
dict(
|
||||
abbr='ocnli_fc-test',
|
||||
type=cmnliDataset_V2, # ocnli_fc share the same format with cmnli
|
||||
type=CMNLIDatasetV2, # ocnli_fc share the same format with cmnli
|
||||
path='./data/FewCLUE/ocnli/test_public.json',
|
||||
local_mode=True,
|
||||
reader_cfg=ocnli_fc_reader_cfg,
|
||||
infer_cfg=ocnli_fc_infer_cfg,
|
||||
eval_cfg=ocnli_fc_eval_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import TNewsDataset_V2
|
||||
from opencompass.datasets import TNewsDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_capital_postprocess
|
||||
|
||||
tnews_reader_cfg = dict(
|
||||
@ -56,7 +56,7 @@ tnews_eval_cfg = dict(
|
||||
tnews_datasets = [
|
||||
dict(
|
||||
abbr='tnews-dev',
|
||||
type=TNewsDataset_V2,
|
||||
type=TNewsDatasetV2,
|
||||
path='./data/FewCLUE/tnews/dev_few_all.json',
|
||||
reader_cfg=tnews_reader_cfg,
|
||||
infer_cfg=tnews_infer_cfg,
|
||||
@ -64,7 +64,7 @@ tnews_datasets = [
|
||||
),
|
||||
dict(
|
||||
abbr='tnews-test',
|
||||
type=TNewsDataset_V2,
|
||||
type=TNewsDatasetV2,
|
||||
path='./data/FewCLUE/tnews/test_public.json',
|
||||
reader_cfg=tnews_reader_cfg,
|
||||
infer_cfg=tnews_infer_cfg,
|
||||
|
@ -3,6 +3,7 @@ from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.datasets import GaokaoBenchDataset
|
||||
|
||||
|
||||
_MCQ_prompts = [
|
||||
{
|
||||
'type': 'single_choice',
|
||||
@ -288,6 +289,7 @@ for _folder, _prompts in [
|
||||
'type': GaokaoBenchDataset,
|
||||
'abbr': 'GaokaoBench_' + _p['keyword'],
|
||||
'path': _base_path + '/' + _folder + '/' + _p['keyword'] + '.json',
|
||||
'name': _p['keyword'],
|
||||
'reader_cfg': _reader_cfg,
|
||||
'infer_cfg': _infer_cfg,
|
||||
'eval_cfg': _eval_cfg,
|
||||
|
@ -2,7 +2,6 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer, PPLInferencer
|
||||
from opencompass.datasets import GaokaoBenchDataset
|
||||
|
||||
_MCQ_prompts = [
|
||||
{
|
||||
'type': 'single_choice',
|
||||
@ -290,6 +289,7 @@ for _folder, _prompts in [
|
||||
'type': GaokaoBenchDataset,
|
||||
'abbr': 'GaokaoBench_' + _p['keyword'],
|
||||
'path': _base_path + '/' + _folder + '/' + _p['keyword'] + '.json',
|
||||
'name': _p['keyword'],
|
||||
'reader_cfg': _reader_cfg,
|
||||
'infer_cfg': _infer_cfg,
|
||||
'eval_cfg': _eval_cfg,
|
||||
@ -340,6 +340,7 @@ for _p in _MCQ_prompts:
|
||||
'type': GaokaoBenchDataset,
|
||||
'abbr': 'GaokaoBench_' + _p['keyword'],
|
||||
'path': _base_path + '/' + _folder + '/' + _p['keyword'] + '.json',
|
||||
'name': _p['keyword'],
|
||||
'reader_cfg': _reader_cfg,
|
||||
'infer_cfg': _infer_cfg,
|
||||
'eval_cfg': _eval_cfg,
|
||||
|
@ -35,6 +35,7 @@ for folder, prompts in [
|
||||
'type': GaokaoBenchDataset,
|
||||
'abbr': 'GaokaoBench_' + p['keyword'],
|
||||
'path': os.path.join('data', 'GAOKAO-BENCH', 'data', folder, p['keyword'] + '.json'),
|
||||
'name': p['keyword'],
|
||||
'reader_cfg': reader_cfg,
|
||||
'infer_cfg': infer_cfg,
|
||||
'eval_cfg': eval_cfg,
|
||||
|
@ -34,6 +34,7 @@ for folder, prompts in [
|
||||
'type': GaokaoBenchDataset,
|
||||
'abbr': 'GaokaoBench_' + p['keyword'],
|
||||
'path': os.path.join('data', 'GAOKAO-BENCH', 'data', folder, p['keyword'] + '.json'),
|
||||
'name': p['keyword'],
|
||||
'reader_cfg': reader_cfg,
|
||||
'infer_cfg': infer_cfg,
|
||||
'eval_cfg': eval_cfg,
|
||||
|
@ -2,27 +2,27 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.datasets.NPHardEval import (
|
||||
hard_GCP_Dataset, hard_GCP_Evaluator,
|
||||
hard_TSP_Dataset, hard_TSP_Evaluator,
|
||||
hard_MSP_Dataset, hard_MSP_Evaluator,
|
||||
cmp_GCP_D_Dataset, cmp_GCP_D_Evaluator,
|
||||
cmp_TSP_D_Dataset, cmp_TSP_D_Evaluator,
|
||||
cmp_KSP_Dataset, cmp_KSP_Evaluator,
|
||||
p_BSP_Dataset, p_BSP_Evaluator,
|
||||
p_EDP_Dataset, p_EDP_Evaluator,
|
||||
p_SPP_Dataset, p_SPP_Evaluator,
|
||||
HardGCPDataset, HardGCPEvaluator,
|
||||
Hard_TSP_Dataset, Hard_TSP_Evaluator,
|
||||
Hard_MSP_Dataset, Hard_MSP_Evaluator,
|
||||
CMP_GCP_D_Dataset, CMP_GCP_D_Evaluator,
|
||||
CMP_TSP_D_Dataset, CMP_TSP_D_Evaluator,
|
||||
CMP_KSP_Dataset, CMP_KSP_Evaluator,
|
||||
P_BSP_Dataset, P_BSP_Evaluator,
|
||||
P_EDP_Dataset, P_EDP_Evaluator,
|
||||
P_SPP_Dataset, P_SPP_Evaluator,
|
||||
)
|
||||
|
||||
NPHardEval_tasks = [
|
||||
['hard_GCP', 'GCP', hard_GCP_Dataset, hard_GCP_Evaluator],
|
||||
['hard_TSP', 'TSP', hard_TSP_Dataset, hard_TSP_Evaluator],
|
||||
['hard_MSP', 'MSP', hard_MSP_Dataset, hard_MSP_Evaluator],
|
||||
['cmp_GCP_D', 'GCP_Decision', cmp_GCP_D_Dataset, cmp_GCP_D_Evaluator],
|
||||
['cmp_TSP_D', 'TSP_Decision', cmp_TSP_D_Dataset, cmp_TSP_D_Evaluator],
|
||||
['cmp_KSP', 'KSP', cmp_KSP_Dataset, cmp_KSP_Evaluator],
|
||||
['p_BSP', 'BSP', p_BSP_Dataset, p_BSP_Evaluator],
|
||||
['p_EDP', 'EDP', p_EDP_Dataset, p_EDP_Evaluator],
|
||||
['p_SPP', 'SPP', p_SPP_Dataset, p_SPP_Evaluator],
|
||||
['hard_GCP', 'GCP', HardGCPDataset, HardGCPEvaluator],
|
||||
['hard_TSP', 'TSP', Hard_TSP_Dataset, Hard_TSP_Evaluator],
|
||||
['hard_MSP', 'MSP', Hard_MSP_Dataset, Hard_MSP_Evaluator],
|
||||
['cmp_GCP_D', 'GCP_Decision', CMP_GCP_D_Dataset, CMP_GCP_D_Evaluator],
|
||||
['cmp_TSP_D', 'TSP_Decision', CMP_TSP_D_Dataset, CMP_TSP_D_Evaluator],
|
||||
['cmp_KSP', 'KSP', CMP_KSP_Dataset, CMP_KSP_Evaluator],
|
||||
['p_BSP', 'BSP', P_BSP_Dataset, P_BSP_Evaluator],
|
||||
['p_EDP', 'EDP', P_EDP_Dataset, P_EDP_Evaluator],
|
||||
['p_SPP', 'SPP', P_SPP_Dataset, P_SPP_Evaluator],
|
||||
]
|
||||
|
||||
NPHardEval_datasets = []
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import AXDataset_V2
|
||||
from opencompass.datasets import AXDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_option_postprocess
|
||||
|
||||
AX_b_reader_cfg = dict(
|
||||
@ -34,7 +34,7 @@ AX_b_eval_cfg = dict(
|
||||
AX_b_datasets = [
|
||||
dict(
|
||||
abbr='AX_b',
|
||||
type=AXDataset_V2,
|
||||
type=AXDatasetV2,
|
||||
path='./data/SuperGLUE/AX-b/AX-b.jsonl',
|
||||
reader_cfg=AX_b_reader_cfg,
|
||||
infer_cfg=AX_b_infer_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import AXDataset_V2
|
||||
from opencompass.datasets import AXDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_option_postprocess
|
||||
|
||||
AX_g_reader_cfg = dict(
|
||||
@ -34,7 +34,7 @@ AX_g_eval_cfg = dict(
|
||||
AX_g_datasets = [
|
||||
dict(
|
||||
abbr='AX_g',
|
||||
type=AXDataset_V2,
|
||||
type=AXDatasetV2,
|
||||
path='./data/SuperGLUE/AX-g/AX-g.jsonl',
|
||||
reader_cfg=AX_g_reader_cfg,
|
||||
infer_cfg=AX_g_infer_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import BoolQDataset_V2
|
||||
from opencompass.datasets import BoolQDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_capital_postprocess
|
||||
|
||||
BoolQ_reader_cfg = dict(
|
||||
@ -32,7 +32,7 @@ BoolQ_eval_cfg = dict(
|
||||
BoolQ_datasets = [
|
||||
dict(
|
||||
abbr='BoolQ',
|
||||
type=BoolQDataset_V2,
|
||||
type=BoolQDatasetV2,
|
||||
path='./data/SuperGLUE/BoolQ/val.jsonl',
|
||||
reader_cfg=BoolQ_reader_cfg,
|
||||
infer_cfg=BoolQ_infer_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import PPLInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import BoolQDataset_V3
|
||||
from opencompass.datasets import BoolQDatasetV3
|
||||
|
||||
BoolQ_reader_cfg = dict(
|
||||
input_columns=['question', 'passage'],
|
||||
@ -34,7 +34,7 @@ BoolQ_eval_cfg = dict(evaluator=dict(type=AccEvaluator))
|
||||
BoolQ_datasets = [
|
||||
dict(
|
||||
abbr='BoolQ',
|
||||
type=BoolQDataset_V3,
|
||||
type=BoolQDatasetV3,
|
||||
path='./data/SuperGLUE/BoolQ/val.jsonl',
|
||||
reader_cfg=BoolQ_reader_cfg,
|
||||
infer_cfg=BoolQ_infer_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import CBDataset_V2
|
||||
from opencompass.datasets import CBDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_option_postprocess
|
||||
|
||||
CB_reader_cfg = dict(
|
||||
@ -35,7 +35,7 @@ CB_eval_cfg = dict(
|
||||
CB_datasets = [
|
||||
dict(
|
||||
abbr='CB',
|
||||
type=CBDataset_V2,
|
||||
type=CBDatasetV2,
|
||||
path='./data/SuperGLUE/CB/val.jsonl',
|
||||
reader_cfg=CB_reader_cfg,
|
||||
infer_cfg=CB_infer_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import COPADataset_V2
|
||||
from opencompass.datasets import COPADatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_option_postprocess
|
||||
|
||||
COPA_reader_cfg = dict(
|
||||
@ -35,7 +35,7 @@ COPA_eval_cfg = dict(
|
||||
COPA_datasets = [
|
||||
dict(
|
||||
abbr='COPA',
|
||||
type=COPADataset_V2,
|
||||
type=COPADatasetV2,
|
||||
path='./data/SuperGLUE/COPA/val.jsonl',
|
||||
reader_cfg=COPA_reader_cfg,
|
||||
infer_cfg=COPA_infer_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import MultiRCDataset_V2
|
||||
from opencompass.datasets import MultiRCDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_option_postprocess
|
||||
|
||||
MultiRC_reader_cfg = dict(
|
||||
@ -34,7 +34,7 @@ MultiRC_eval_cfg = dict(
|
||||
MultiRC_datasets = [
|
||||
dict(
|
||||
abbr='MultiRC',
|
||||
type=MultiRCDataset_V2,
|
||||
type=MultiRCDatasetV2,
|
||||
path='./data/SuperGLUE/MultiRC/val.jsonl',
|
||||
reader_cfg=MultiRC_reader_cfg,
|
||||
infer_cfg=MultiRC_infer_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import AXDataset_V2
|
||||
from opencompass.datasets import AXDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_option_postprocess
|
||||
|
||||
RTE_reader_cfg = dict(
|
||||
@ -34,7 +34,7 @@ RTE_eval_cfg = dict(
|
||||
RTE_datasets = [
|
||||
dict(
|
||||
abbr='RTE',
|
||||
type=AXDataset_V2, # rte share the same format with ax
|
||||
type=AXDatasetV2, # rte share the same format with ax
|
||||
path='./data/SuperGLUE/RTE/val.jsonl',
|
||||
reader_cfg=RTE_reader_cfg,
|
||||
infer_cfg=RTE_infer_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import EMEvaluator
|
||||
from opencompass.datasets import ReCoRDDataset_V2, ReCoRD_postprocess
|
||||
from opencompass.datasets import ReCoRDDatasetV2, ReCoRD_postprocess
|
||||
|
||||
ReCoRD_reader_cfg = dict(
|
||||
input_columns=['question', 'text'], output_column='answers')
|
||||
@ -26,7 +26,7 @@ ReCoRD_eval_cfg = dict(
|
||||
|
||||
ReCoRD_datasets = [
|
||||
dict(
|
||||
type=ReCoRDDataset_V2,
|
||||
type=ReCoRDDatasetV2,
|
||||
abbr='ReCoRD',
|
||||
path='./data/SuperGLUE/ReCoRD/val.jsonl',
|
||||
reader_cfg=ReCoRD_reader_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import WSCDataset_V2
|
||||
from opencompass.datasets import WSCDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_capital_postprocess
|
||||
|
||||
WSC_reader_cfg = dict(
|
||||
@ -34,7 +34,7 @@ WSC_eval_cfg = dict(
|
||||
WSC_datasets = [
|
||||
dict(
|
||||
abbr='WSC',
|
||||
type=WSCDataset_V2,
|
||||
type=WSCDatasetV2,
|
||||
path='./data/SuperGLUE/WSC/val.jsonl',
|
||||
reader_cfg=WSC_reader_cfg,
|
||||
infer_cfg=WSC_infer_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import WSCDataset_V3
|
||||
from opencompass.datasets import WSCDatasetV3
|
||||
from opencompass.utils.text_postprocessors import first_capital_postprocess
|
||||
|
||||
WSC_reader_cfg = dict(
|
||||
@ -34,7 +34,7 @@ WSC_eval_cfg = dict(
|
||||
WSC_datasets = [
|
||||
dict(
|
||||
abbr='WSC',
|
||||
type=WSCDataset_V3,
|
||||
type=WSCDatasetV3,
|
||||
path='./data/SuperGLUE/WSC/val.jsonl',
|
||||
reader_cfg=WSC_reader_cfg,
|
||||
infer_cfg=WSC_infer_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import PPLInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import WSCDataset_V3
|
||||
from opencompass.datasets import WSCDatasetV3
|
||||
|
||||
WSC_reader_cfg = dict(
|
||||
input_columns=['span1', 'span2', 'text'],
|
||||
@ -40,7 +40,7 @@ WSC_eval_cfg = dict(evaluator=dict(type=AccEvaluator), )
|
||||
WSC_datasets = [
|
||||
dict(
|
||||
abbr='WSC',
|
||||
type=WSCDataset_V3,
|
||||
type=WSCDatasetV3,
|
||||
path='./data/SuperGLUE/WSC/val.jsonl',
|
||||
reader_cfg=WSC_reader_cfg,
|
||||
infer_cfg=WSC_infer_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import PPLInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import WSCDataset_V2
|
||||
from opencompass.datasets import WSCDatasetV2
|
||||
|
||||
WSC_reader_cfg = dict(
|
||||
input_columns=['span1', 'span2', 'text'],
|
||||
@ -42,7 +42,7 @@ WSC_eval_cfg = dict(evaluator=dict(type=AccEvaluator), )
|
||||
WSC_datasets = [
|
||||
dict(
|
||||
abbr='WSC',
|
||||
type=WSCDataset_V2,
|
||||
type=WSCDatasetV2,
|
||||
path='./data/SuperGLUE/WSC/val.jsonl',
|
||||
reader_cfg=WSC_reader_cfg,
|
||||
infer_cfg=WSC_infer_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import WiCDataset_V2
|
||||
from opencompass.datasets import WiCDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_capital_postprocess
|
||||
|
||||
WiC_reader_cfg = dict(
|
||||
@ -38,7 +38,7 @@ WiC_eval_cfg = dict(
|
||||
WiC_datasets = [
|
||||
dict(
|
||||
abbr='WiC',
|
||||
type=WiCDataset_V2,
|
||||
type=WiCDatasetV2,
|
||||
path='./data/SuperGLUE/WiC/val.jsonl',
|
||||
reader_cfg=WiC_reader_cfg,
|
||||
infer_cfg=WiC_infer_cfg,
|
||||
|
@ -31,7 +31,7 @@ Xsum_datasets = [
|
||||
dict(
|
||||
type=XsumDataset,
|
||||
abbr='Xsum',
|
||||
path='./data/Xsum/dev.jsonl',
|
||||
path='opencompass/xsum',
|
||||
reader_cfg=Xsum_reader_cfg,
|
||||
infer_cfg=Xsum_infer_cfg,
|
||||
eval_cfg=Xsum_eval_cfg,
|
||||
|
@ -23,7 +23,7 @@ Xsum_datasets = [
|
||||
dict(
|
||||
type=XsumDataset,
|
||||
abbr='Xsum',
|
||||
path='./data/Xsum/dev.jsonl',
|
||||
path='opencompass/xsum',
|
||||
reader_cfg=Xsum_reader_cfg,
|
||||
infer_cfg=Xsum_infer_cfg,
|
||||
eval_cfg=Xsum_eval_cfg)
|
||||
|
@ -34,7 +34,7 @@ adv_mnli_datasets = [
|
||||
dict(
|
||||
abbr='adv_mnli',
|
||||
type=AdvMnliDataset,
|
||||
path='./data/adv_glue/dev_ann.json',
|
||||
path='opencompass/advglue-dev',
|
||||
reader_cfg=adv_mnli_reader_cfg,
|
||||
infer_cfg=adv_mnli_infer_cfg,
|
||||
eval_cfg=adv_mnli_eval_cfg,
|
||||
|
@ -34,7 +34,7 @@ adv_mnli_mm_datasets = [
|
||||
dict(
|
||||
abbr='adv_mnli_mm',
|
||||
type=AdvMnliMMDataset,
|
||||
path='./data/adv_glue/dev_ann.json',
|
||||
path='opencompass/advglue-dev',
|
||||
reader_cfg=adv_mnli_mm_reader_cfg,
|
||||
infer_cfg=adv_mnli_mm_infer_cfg,
|
||||
eval_cfg=adv_mnli_mm_eval_cfg,
|
||||
|
@ -34,7 +34,7 @@ adv_qnli_datasets = [
|
||||
dict(
|
||||
abbr='adv_qnli',
|
||||
type=AdvQnliDataset,
|
||||
path='./data/adv_glue/dev_ann.json',
|
||||
path='opencompass/advglue-dev',
|
||||
reader_cfg=adv_qnli_reader_cfg,
|
||||
infer_cfg=adv_qnli_infer_cfg,
|
||||
eval_cfg=adv_qnli_eval_cfg,
|
||||
|
@ -34,7 +34,7 @@ adv_qqp_datasets = [
|
||||
dict(
|
||||
abbr='adv_qqp',
|
||||
type=AdvQqpDataset,
|
||||
path='./data/adv_glue/dev_ann.json',
|
||||
path='opencompass/advglue-dev',
|
||||
reader_cfg=adv_qqp_reader_cfg,
|
||||
infer_cfg=adv_qqp_infer_cfg,
|
||||
eval_cfg=adv_qqp_eval_cfg,
|
||||
|
@ -34,7 +34,7 @@ adv_rte_datasets = [
|
||||
dict(
|
||||
abbr='adv_rte',
|
||||
type=AdvRteDataset,
|
||||
path='./data/adv_glue/dev_ann.json',
|
||||
path='opencompass/advglue-dev',
|
||||
reader_cfg=adv_rte_reader_cfg,
|
||||
infer_cfg=adv_rte_infer_cfg,
|
||||
eval_cfg=adv_rte_eval_cfg,
|
||||
|
@ -33,7 +33,7 @@ adv_sst2_datasets = [
|
||||
dict(
|
||||
abbr='adv_sst2',
|
||||
type=AdvSst2Dataset,
|
||||
path='./data/adv_glue/dev_ann.json',
|
||||
path='opencompass/advglue-dev',
|
||||
reader_cfg=adv_sst2_reader_cfg,
|
||||
infer_cfg=adv_sst2_infer_cfg,
|
||||
eval_cfg=adv_sst2_eval_cfg,
|
||||
|
@ -88,7 +88,7 @@ for _name in agieval_single_choice_sets:
|
||||
agieval_datasets.append(
|
||||
dict(
|
||||
type=AGIEvalDataset_v2,
|
||||
path='./data/AGIEval/data/v1/',
|
||||
path='opencompass/agieval',
|
||||
name=_name,
|
||||
abbr='agieval-' + _name,
|
||||
setting_name='zero-shot',
|
||||
@ -117,7 +117,7 @@ for _name in agieval_multiple_choices_sets:
|
||||
agieval_datasets.append(
|
||||
dict(
|
||||
type=AGIEvalDataset_v2,
|
||||
path='./data/AGIEval/data/v1/',
|
||||
path='opencompass/agieval',
|
||||
name=_name,
|
||||
abbr='agieval-' + _name,
|
||||
setting_name='zero-shot',
|
||||
@ -143,7 +143,7 @@ for _name in agieval_cloze_sets:
|
||||
agieval_datasets.append(
|
||||
dict(
|
||||
type=AGIEvalDataset_v2,
|
||||
path='./data/AGIEval/data/v1/',
|
||||
path='opencompass/agieval',
|
||||
name=_name,
|
||||
abbr='agieval-' + _name,
|
||||
setting_name='zero-shot',
|
||||
|
@ -92,7 +92,7 @@ for _name in agieval_single_choice_sets:
|
||||
agieval_datasets.append(
|
||||
dict(
|
||||
type=AGIEvalDataset_v2,
|
||||
path='./data/AGIEval/data/v1/',
|
||||
path='opencompass/agieval',
|
||||
name=_name,
|
||||
abbr='agieval-' + _name,
|
||||
setting_name='zero-shot',
|
||||
@ -122,7 +122,7 @@ for _name in agieval_multiple_choices_sets:
|
||||
agieval_datasets.append(
|
||||
dict(
|
||||
type=AGIEvalDataset_v2,
|
||||
path='./data/AGIEval/data/v1/',
|
||||
path='opencompass/agieval',
|
||||
name=_name,
|
||||
abbr='agieval-' + _name,
|
||||
setting_name='zero-shot',
|
||||
@ -148,7 +148,7 @@ for _name in agieval_cloze_sets:
|
||||
agieval_datasets.append(
|
||||
dict(
|
||||
type=AGIEvalDataset_v2,
|
||||
path='./data/AGIEval/data/v1/',
|
||||
path='opencompass/agieval',
|
||||
name=_name,
|
||||
abbr='agieval-' + _name,
|
||||
setting_name='zero-shot',
|
||||
|
@ -90,7 +90,7 @@ for _name in agieval_single_choice_sets:
|
||||
agieval_datasets.append(
|
||||
dict(
|
||||
type=AGIEvalDataset_v2,
|
||||
path='./data/AGIEval/data/v1/',
|
||||
path='opencompass/agieval',
|
||||
name=_name,
|
||||
abbr='agieval-' + _name,
|
||||
setting_name='zero-shot',
|
||||
@ -120,7 +120,7 @@ for _name in agieval_multiple_choices_sets:
|
||||
agieval_datasets.append(
|
||||
dict(
|
||||
type=AGIEvalDataset_v2,
|
||||
path='./data/AGIEval/data/v1/',
|
||||
path='opencompass/agieval',
|
||||
name=_name,
|
||||
abbr='agieval-' + _name,
|
||||
setting_name='zero-shot',
|
||||
@ -146,7 +146,7 @@ for _name in agieval_cloze_sets:
|
||||
agieval_datasets.append(
|
||||
dict(
|
||||
type=AGIEvalDataset_v2,
|
||||
path='./data/AGIEval/data/v1/',
|
||||
path='opencompass/agieval',
|
||||
name=_name,
|
||||
abbr='agieval-' + _name,
|
||||
setting_name='zero-shot',
|
||||
|
@ -50,7 +50,7 @@ for name in agieval_single_choice_sets:
|
||||
agieval_datasets.append(
|
||||
dict(
|
||||
type=AGIEvalDataset,
|
||||
path='./data/AGIEval/data/v1/',
|
||||
path='opencompass/agieval',
|
||||
name=name,
|
||||
abbr='agieval-' + name,
|
||||
setting_name='zero-shot',
|
||||
@ -74,7 +74,7 @@ for name in agieval_multiple_choices_sets + agieval_cloze_sets:
|
||||
agieval_datasets.append(
|
||||
dict(
|
||||
type=AGIEvalDataset,
|
||||
path='./data/AGIEval/data/v1/',
|
||||
path='opencompass/agieval',
|
||||
name=name,
|
||||
abbr='agieval-' + name,
|
||||
setting_name='zero-shot',
|
||||
|
@ -93,7 +93,7 @@ for _name in agieval_single_choice_sets:
|
||||
agieval_datasets.append(
|
||||
dict(
|
||||
type=AGIEvalDataset_v2,
|
||||
path='./data/AGIEval/data/v1/',
|
||||
path='opencompass/agieval',
|
||||
name=_name,
|
||||
abbr='agieval-' + _name,
|
||||
setting_name='zero-shot',
|
||||
@ -124,7 +124,7 @@ for _name in agieval_multiple_choices_sets:
|
||||
agieval_datasets.append(
|
||||
dict(
|
||||
type=AGIEvalDataset_v2,
|
||||
path='./data/AGIEval/data/v1/',
|
||||
path='opencompass/agieval',
|
||||
name=_name,
|
||||
abbr='agieval-' + _name,
|
||||
setting_name='zero-shot',
|
||||
@ -151,7 +151,7 @@ for _name in agieval_cloze_sets:
|
||||
agieval_datasets.append(
|
||||
dict(
|
||||
type=AGIEvalDataset_v2,
|
||||
path='./data/AGIEval/data/v1/',
|
||||
path='opencompass/agieval',
|
||||
name=_name,
|
||||
abbr='agieval-' + _name,
|
||||
setting_name='zero-shot',
|
||||
|
@ -48,7 +48,7 @@ for name, test_type in settings:
|
||||
bbh_datasets.append(
|
||||
dict(
|
||||
type=BBHDataset,
|
||||
path='./data/BBH/data',
|
||||
path='opencompass/bbh',
|
||||
name=name,
|
||||
abbr='bbh-' + name,
|
||||
reader_cfg=bbh_reader_cfg.copy(),
|
||||
|
@ -64,7 +64,7 @@ for _name in bbh_multiple_choice_sets:
|
||||
bbh_datasets.append(
|
||||
dict(
|
||||
type=BBHDataset,
|
||||
path=f'./data/BBH/data',
|
||||
path='opencompass/bbh',
|
||||
name=_name,
|
||||
abbr='bbh-' + _name,
|
||||
reader_cfg=bbh_reader_cfg,
|
||||
@ -91,7 +91,7 @@ for _name in bbh_free_form_sets:
|
||||
bbh_datasets.append(
|
||||
dict(
|
||||
type=BBHDataset,
|
||||
path=f'./data/BBH/data',
|
||||
path='opencompass/bbh',
|
||||
name=_name,
|
||||
abbr='bbh-' + _name,
|
||||
reader_cfg=bbh_reader_cfg,
|
||||
|
@ -64,7 +64,7 @@ for _name in bbh_multiple_choice_sets:
|
||||
bbh_datasets.append(
|
||||
dict(
|
||||
type=BBHDataset,
|
||||
path=f'./data/BBH/data',
|
||||
path='opencompass/bbh',
|
||||
name=_name,
|
||||
abbr='bbh-' + _name,
|
||||
reader_cfg=bbh_reader_cfg,
|
||||
@ -91,7 +91,7 @@ for _name in bbh_free_form_sets:
|
||||
bbh_datasets.append(
|
||||
dict(
|
||||
type=BBHDataset,
|
||||
path=f'./data/BBH/data',
|
||||
path='opencompass/bbh',
|
||||
name=_name,
|
||||
abbr='bbh-' + _name,
|
||||
reader_cfg=bbh_reader_cfg,
|
||||
|
@ -59,7 +59,7 @@ for _name in bbh_multiple_choice_sets:
|
||||
bbh_datasets.append(
|
||||
dict(
|
||||
type=BBHDataset,
|
||||
path=f'./data/BBH/data',
|
||||
path='opencompass/bbh',
|
||||
name=_name,
|
||||
abbr='bbh-' + _name,
|
||||
reader_cfg=bbh_reader_cfg,
|
||||
@ -82,7 +82,7 @@ for _name in bbh_free_form_sets:
|
||||
bbh_datasets.append(
|
||||
dict(
|
||||
type=BBHDataset,
|
||||
path=f'./data/BBH/data',
|
||||
path='opencompass/bbh',
|
||||
name=_name,
|
||||
abbr='bbh-' + _name,
|
||||
reader_cfg=bbh_reader_cfg,
|
||||
|
@ -5,6 +5,7 @@ from opencompass.openicl.icl_inferencer import PPLInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccContaminationEvaluator
|
||||
from opencompass.datasets import CEvalDatasetClean as CEvalDataset
|
||||
|
||||
|
||||
ceval_subject_mapping = {
|
||||
'computer_network': ['Computer Network', '计算机网络', 'STEM'],
|
||||
'operating_system': ['Operating System', '操作系统', 'STEM'],
|
||||
@ -92,7 +93,7 @@ for _split in ['val']:
|
||||
ceval_datasets.append(
|
||||
dict(
|
||||
type=CEvalDataset,
|
||||
path='./data/ceval/formal_ceval',
|
||||
path='opencompass/ceval-exam',
|
||||
name=_name,
|
||||
abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' + _name,
|
||||
reader_cfg=dict(
|
||||
|
@ -91,7 +91,7 @@ for _split in ['val', 'test']:
|
||||
ceval_datasets.append(
|
||||
dict(
|
||||
type=CEvalDataset,
|
||||
path='./data/ceval/formal_ceval',
|
||||
path='opencompass/ceval-exam',
|
||||
name=_name,
|
||||
abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' +
|
||||
_name,
|
||||
|
@ -5,6 +5,7 @@ from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import CEvalDataset
|
||||
from opencompass.utils.text_postprocessors import first_capital_postprocess
|
||||
|
||||
|
||||
ceval_subject_mapping = {
|
||||
'computer_network': ['Computer Network', '计算机网络', 'STEM'],
|
||||
'operating_system': ['Operating System', '操作系统', 'STEM'],
|
||||
@ -91,7 +92,7 @@ for _split in ['val']:
|
||||
ceval_datasets.append(
|
||||
dict(
|
||||
type=CEvalDataset,
|
||||
path='./data/ceval/formal_ceval',
|
||||
path='opencompass/ceval-exam',
|
||||
name=_name,
|
||||
abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' +
|
||||
_name,
|
||||
|
@ -4,6 +4,7 @@ from opencompass.openicl.icl_inferencer import PPLInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import CEvalDataset
|
||||
|
||||
|
||||
ceval_subject_mapping = {
|
||||
'computer_network': ['Computer Network', '计算机网络', 'STEM'],
|
||||
'operating_system': ['Operating System', '操作系统', 'STEM'],
|
||||
@ -93,7 +94,7 @@ for _split in ['val', 'test']:
|
||||
ceval_datasets.append(
|
||||
dict(
|
||||
type=CEvalDataset,
|
||||
path='./data/ceval_internal/formal_ceval',
|
||||
path='opencompass/ceval-exam',
|
||||
name=_name,
|
||||
abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' + _name,
|
||||
reader_cfg=ceval_reader_cfg,
|
||||
|
@ -4,6 +4,7 @@ from opencompass.openicl.icl_inferencer import PPLInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import CEvalDataset
|
||||
|
||||
|
||||
ceval_subject_mapping = {
|
||||
'computer_network': ['Computer Network', '计算机网络', 'STEM'],
|
||||
'operating_system': ['Operating System', '操作系统', 'STEM'],
|
||||
@ -93,7 +94,7 @@ for _split in ['val', 'test']:
|
||||
ceval_datasets.append(
|
||||
dict(
|
||||
type=CEvalDataset,
|
||||
path='./data/ceval/formal_ceval',
|
||||
path='opencompass/ceval-exam',
|
||||
name=_name,
|
||||
abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' + _name,
|
||||
reader_cfg=ceval_reader_cfg,
|
||||
|
@ -4,6 +4,7 @@ from opencompass.openicl.icl_inferencer import PPLInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import CEvalDataset
|
||||
|
||||
|
||||
ceval_subject_mapping = {
|
||||
'computer_network': ['Computer Network', '计算机网络', 'STEM'],
|
||||
'operating_system': ['Operating System', '操作系统', 'STEM'],
|
||||
@ -91,7 +92,7 @@ for _split in ['val']:
|
||||
ceval_datasets.append(
|
||||
dict(
|
||||
type=CEvalDataset,
|
||||
path='./data/ceval/formal_ceval',
|
||||
path='opencompass/ceval-exam',
|
||||
name=_name,
|
||||
abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' +
|
||||
_name,
|
||||
|
@ -4,6 +4,7 @@ from opencompass.openicl.icl_inferencer import PPLInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import CEvalDataset
|
||||
|
||||
|
||||
ceval_subject_mapping = {
|
||||
'computer_network': ['Computer Network', '计算机网络', 'STEM'],
|
||||
'operating_system': ['Operating System', '操作系统', 'STEM'],
|
||||
@ -91,7 +92,7 @@ for _split in ['val', 'test']:
|
||||
ceval_datasets.append(
|
||||
dict(
|
||||
type=CEvalDataset,
|
||||
path='./data/ceval/formal_ceval',
|
||||
path='opencompass/ceval-exam',
|
||||
name=_name,
|
||||
abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' +
|
||||
_name,
|
||||
|
@ -5,6 +5,7 @@ from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import CEvalDataset
|
||||
from opencompass.utils.text_postprocessors import first_option_postprocess
|
||||
|
||||
|
||||
ceval_subject_mapping = {
|
||||
'computer_network': ['Computer Network', '计算机网络', 'STEM'],
|
||||
'operating_system': ['Operating System', '操作系统', 'STEM'],
|
||||
@ -91,7 +92,7 @@ for _split in ['val']:
|
||||
ceval_datasets.append(
|
||||
dict(
|
||||
type=CEvalDataset,
|
||||
path='./data/ceval/formal_ceval',
|
||||
path='opencompass/ceval-exam',
|
||||
name=_name,
|
||||
abbr='ceval-' + _name if _split == 'val' else 'ceval-test-' +
|
||||
_name,
|
||||
|
@ -33,8 +33,8 @@ maxmin_datasets = [
|
||||
dict(
|
||||
type=MaxminDataset,
|
||||
abbr=f'maxmin',
|
||||
test_path=f'data/clozeTest-maxmin/python/clozeTest.json',
|
||||
answer_path=f'data/clozeTest-maxmin/python/answers.txt',
|
||||
test_path='opencompass/clozeTest_maxmin',
|
||||
answer_path='opencompass/clozeTest_maxmin_answers',
|
||||
reader_cfg=maxmin_reader_cfg,
|
||||
infer_cfg=maxmin_infer_cfg,
|
||||
eval_cfg=maxmin_eval_cfg,
|
||||
|
@ -5,6 +5,7 @@ from opencompass.openicl.icl_evaluator import AccwithDetailsEvaluator
|
||||
from opencompass.datasets import CMMLUDataset
|
||||
from opencompass.utils.text_postprocessors import first_capital_postprocess
|
||||
|
||||
|
||||
cmmlu_subject_mapping = {
|
||||
'agronomy': '农学',
|
||||
'anatomy': '解剖学',
|
||||
@ -107,7 +108,7 @@ for _name in cmmlu_all_sets:
|
||||
cmmlu_datasets.append(
|
||||
dict(
|
||||
type=CMMLUDataset,
|
||||
path='./data/cmmlu/',
|
||||
path='opencompass/cmmlu',
|
||||
name=_name,
|
||||
abbr=f'cmmlu-{_name}',
|
||||
reader_cfg=dict(
|
||||
|
@ -102,7 +102,7 @@ for _name in cmmlu_all_sets:
|
||||
cmmlu_datasets.append(
|
||||
dict(
|
||||
type=CMMLUDataset,
|
||||
path='./data/cmmlu/',
|
||||
path='opencompass/cmmlu',
|
||||
name=_name,
|
||||
abbr=f'cmmlu-{_name}',
|
||||
reader_cfg=dict(
|
||||
|
@ -107,7 +107,7 @@ for _name in cmmlu_all_sets:
|
||||
cmmlu_datasets.append(
|
||||
dict(
|
||||
type=CMMLUDataset,
|
||||
path='./data/cmmlu/',
|
||||
path='opencompass/cmmlu',
|
||||
name=_name,
|
||||
abbr=f'cmmlu-{_name}',
|
||||
reader_cfg=dict(
|
||||
|
@ -45,7 +45,7 @@ commonsenseqa_datasets = [
|
||||
dict(
|
||||
abbr='commonsense_qa',
|
||||
type=commonsenseqaDataset,
|
||||
path='./data/commonsenseqa',
|
||||
path='opencompass/commonsense_qa',
|
||||
reader_cfg=commonsenseqa_reader_cfg,
|
||||
infer_cfg=commonsenseqa_infer_cfg,
|
||||
eval_cfg=commonsenseqa_eval_cfg,
|
||||
|
@ -52,7 +52,7 @@ commonsenseqa_datasets = [
|
||||
dict(
|
||||
abbr='commonsense_qa',
|
||||
type=commonsenseqaDataset,
|
||||
path='./data/commonsenseqa',
|
||||
path='opencompass/commonsense_qa',
|
||||
reader_cfg=commonsenseqa_reader_cfg,
|
||||
infer_cfg=commonsenseqa_infer_cfg,
|
||||
eval_cfg=commonsenseqa_eval_cfg,
|
||||
|
@ -47,7 +47,7 @@ commonsenseqa_datasets = [
|
||||
dict(
|
||||
abbr='commonsense_qa',
|
||||
type=commonsenseqaDataset,
|
||||
path='./data/commonsenseqa',
|
||||
path='opencompass/commonsense_qa',
|
||||
reader_cfg=commonsenseqa_reader_cfg,
|
||||
infer_cfg=commonsenseqa_infer_cfg,
|
||||
eval_cfg=commonsenseqa_eval_cfg)
|
||||
|
@ -42,7 +42,7 @@ commonsenseqa_datasets = [
|
||||
dict(
|
||||
abbr='commonsense_qa',
|
||||
type=commonsenseqaDataset,
|
||||
path='./data/commonsenseqa',
|
||||
path='opencompass/commonsense_qa',
|
||||
reader_cfg=commonsenseqa_reader_cfg,
|
||||
infer_cfg=commonsenseqa_infer_cfg,
|
||||
eval_cfg=commonsenseqa_eval_cfg)
|
||||
|
@ -38,7 +38,7 @@ commonsenseqa_datasets = [
|
||||
dict(
|
||||
abbr='commonsense_qa',
|
||||
type=commonsenseqaDataset,
|
||||
path='./data/commonsenseqa',
|
||||
path='opencompass/commonsense_qa',
|
||||
reader_cfg=commonsenseqa_reader_cfg,
|
||||
infer_cfg=commonsenseqa_infer_cfg,
|
||||
eval_cfg=commonsenseqa_eval_cfg)
|
||||
|
@ -34,7 +34,7 @@ commonsenseqa_datasets = [
|
||||
dict(
|
||||
abbr='commonsense_qa',
|
||||
type=commonsenseqaDataset,
|
||||
path='./data/commonsenseqa',
|
||||
path='opencompass/commonsense_qa',
|
||||
reader_cfg=commonsenseqa_reader_cfg,
|
||||
infer_cfg=commonsenseqa_infer_cfg,
|
||||
eval_cfg=commonsenseqa_eval_cfg)
|
||||
|
@ -35,7 +35,7 @@ commonsenseqa_datasets = [
|
||||
dict(
|
||||
abbr='commonsense_qa',
|
||||
type=commonsenseqaDataset,
|
||||
path='./data/commonsenseqa',
|
||||
path='opencompass/commonsense_qa',
|
||||
reader_cfg=commonsenseqa_reader_cfg,
|
||||
infer_cfg=commonsenseqa_infer_cfg,
|
||||
eval_cfg=commonsenseqa_eval_cfg)
|
||||
|
@ -92,7 +92,7 @@ for _split in list(compassbench_v1_knowledge_sets.keys()):
|
||||
)
|
||||
|
||||
|
||||
from opencompass.datasets import TriviaQADataset_V3, TriviaQAEvaluator
|
||||
from opencompass.datasets import TriviaQADatasetV3, TriviaQAEvaluator
|
||||
|
||||
triviaqa_and_nq_reader_cfg = dict(input_columns=['question'], output_column='answer')
|
||||
|
||||
@ -123,7 +123,7 @@ triviaqa_and_nq_eval_cfg = dict(evaluator=dict(type=TriviaQAEvaluator), pred_rol
|
||||
|
||||
compassbench_v1_knowledge_datasets.append(
|
||||
dict(
|
||||
type=TriviaQADataset_V3,
|
||||
type=TriviaQADatasetV3,
|
||||
abbr='compassbench_v1_knowledge-mixed-cloze_en',
|
||||
path='data/compassbench_v1.1/knowledge/mixed/cloze_en.jsonl',
|
||||
reader_cfg=triviaqa_and_nq_reader_cfg,
|
||||
|
@ -92,7 +92,7 @@ for _split in list(compassbench_v1_knowledge_sets.keys()):
|
||||
)
|
||||
|
||||
|
||||
from opencompass.datasets import TriviaQADataset_V3, TriviaQAEvaluator
|
||||
from opencompass.datasets import TriviaQADatasetV3, TriviaQAEvaluator
|
||||
|
||||
triviaqa_and_nq_reader_cfg = dict(input_columns=['question'], output_column='answer')
|
||||
|
||||
@ -123,7 +123,7 @@ triviaqa_and_nq_eval_cfg = dict(evaluator=dict(type=TriviaQAEvaluator), pred_rol
|
||||
|
||||
compassbench_v1_knowledge_datasets.append(
|
||||
dict(
|
||||
type=TriviaQADataset_V3,
|
||||
type=TriviaQADatasetV3,
|
||||
abbr='compassbench_v1_knowledge-mixed-cloze_en_public',
|
||||
path='data/compassbench_v1.1.public/knowledge/mixed/cloze_en.jsonl',
|
||||
reader_cfg=triviaqa_and_nq_reader_cfg,
|
||||
|
@ -25,7 +25,7 @@ for split in ['train', 'test']:
|
||||
dict(
|
||||
abbr=f'mbpp-{split}-ppl',
|
||||
type=SanitizedMBPPDataset,
|
||||
path='./data/mbpp/sanitized-mbpp.jsonl',
|
||||
path='opencompass/sanitized_mbpp',
|
||||
reader_cfg=mbpp_reader_cfg,
|
||||
infer_cfg=mbpp_infer_cfg,
|
||||
eval_cfg=mbpp_eval_cfg)
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import crowspairsDataset_V2
|
||||
from opencompass.datasets import CrowspairsDatasetV2
|
||||
from opencompass.utils.text_postprocessors import first_capital_postprocess
|
||||
|
||||
crowspairs_reader_cfg = dict(
|
||||
@ -32,7 +32,7 @@ crowspairs_eval_cfg = dict(
|
||||
|
||||
crowspairs_datasets = [
|
||||
dict(
|
||||
type=crowspairsDataset_V2,
|
||||
type=CrowspairsDatasetV2,
|
||||
path='crows_pairs',
|
||||
reader_cfg=crowspairs_reader_cfg,
|
||||
infer_cfg=crowspairs_infer_cfg,
|
||||
|
@ -1,7 +1,7 @@
|
||||
from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import GenInferencer
|
||||
from opencompass.datasets import (crowspairsDataset_V2, crowspairs_postprocess,
|
||||
from opencompass.datasets import (CrowspairsDatasetV2, crowspairs_postprocess,
|
||||
CrowspairsEvaluator)
|
||||
|
||||
crowspairs_reader_cfg = dict(
|
||||
@ -41,7 +41,7 @@ crowspairs_eval_cfg = dict(
|
||||
crowspairs_datasets = [
|
||||
dict(
|
||||
abbr='crows_pairs',
|
||||
type=crowspairsDataset_V2,
|
||||
type=CrowspairsDatasetV2,
|
||||
path='crows_pairs',
|
||||
reader_cfg=crowspairs_reader_cfg,
|
||||
infer_cfg=crowspairs_infer_cfg,
|
||||
|
@ -2,7 +2,7 @@ from opencompass.openicl.icl_prompt_template import PromptTemplate
|
||||
from opencompass.openicl.icl_retriever import ZeroRetriever
|
||||
from opencompass.openicl.icl_inferencer import PPLInferencer
|
||||
from opencompass.openicl.icl_evaluator import AccEvaluator
|
||||
from opencompass.datasets import crowspairsDataset
|
||||
from opencompass.datasets import CrowspairsDataset
|
||||
|
||||
crowspairs_reader_cfg = dict(
|
||||
input_columns=['sent_more', 'sent_less'],
|
||||
@ -24,7 +24,7 @@ crowspairs_eval_cfg = dict(evaluator=dict(type=AccEvaluator), )
|
||||
|
||||
crowspairs_datasets = [
|
||||
dict(
|
||||
type=crowspairsDataset,
|
||||
type=CrowspairsDataset,
|
||||
path='crows_pairs',
|
||||
reader_cfg=crowspairs_reader_cfg,
|
||||
infer_cfg=crowspairs_infer_cfg,
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user