OpenCompass/opencompass/datasets/QuALITY.py

import json

from datasets import Dataset

from opencompass.openicl.icl_evaluator import BaseEvaluator
from opencompass.registry import LOAD_DATASET
from opencompass.utils import get_data_path

from .base import BaseDataset


@LOAD_DATASET.register_module()
class QuALITYDataset(BaseDataset):

    @staticmethod
    def load(path: str):
        path = get_data_path(path, local_mode=True)
        dataset_list = []
        with open(path, 'r', encoding='utf-8') as f:
            for line in f:
                line = json.loads(line)
                for question in line['questions']:
                    dataset_list.append({
                        'article':
                        line['article'],
                        'question':
                        question['question'],
                        'A':
                        question['options'][0],
                        'B':
                        question['options'][1],
                        'C':
                        question['options'][2],
                        'D':
                        question['options'][3],
                        'gold_label':
                        'ABCD'[question['gold_label'] - 1],
                        'difficult':
                        question['difficult']
                    })
        return Dataset.from_list(dataset_list)


class QuALITYEvaluator(BaseEvaluator):

    def score(self, predictions, references, test_set):
        assert len(predictions) == len(references)
        easy, hard, all = [], [], []
        for pred, refer, test in zip(predictions, references, test_set):
            if pred == refer:
                answer = True
            else:
                answer = False
            all.append(answer)
            if test['difficult'] == 0:
                easy.append(answer)
            else:
                hard.append(answer)
        return dict(easy_acc=sum(easy) / len(easy) * 100,
                    hard_acc=sum(hard) / len(easy) * 100,
                    all_acc=sum(all) / len(all) * 100)
[Feature] Add the implement of QuALITY datasets (#976) #976 2024-03-15 21:22:38 +08:00			`import json`

			`from datasets import Dataset`

			`from opencompass.openicl.icl_evaluator import BaseEvaluator`
			`from opencompass.registry import LOAD_DATASET`
[Feature] Support ModelScope datasets (#1289) * add ceval, gsm8k modelscope surpport * update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest * update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets * format file * format file * update dataset format * support ms_dataset * udpate dataset for modelscope support * merge myl_dev and update test_ms_dataset * udpate dataset for modelscope support * update readme * update eval_api_zhipu_v2 * remove unused code * add get_data_path function * update readme * remove tydiqa japanese subset * add ceval, gsm8k modelscope surpport * update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest * update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets * format file * format file * update dataset format * support ms_dataset * udpate dataset for modelscope support * merge myl_dev and update test_ms_dataset * update readme * udpate dataset for modelscope support * update eval_api_zhipu_v2 * remove unused code * add get_data_path function * remove tydiqa japanese subset * update util * remove .DS_Store * fix md format * move util into package * update docs/get_started.md * restore eval_api_zhipu_v2.py, add environment setting * Update dataset * Update * Update * Update * Update --------- Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local> Co-authored-by: Yunnglin <mao.looper@qq.com> Co-authored-by: Yun lin <yunlin@laptop.local> Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn> Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn> 2024-07-29 13:48:32 +08:00			`from opencompass.utils import get_data_path`
[Feature] Add the implement of QuALITY datasets (#976) #976 2024-03-15 21:22:38 +08:00
			`from .base import BaseDataset`


			`@LOAD_DATASET.register_module()`
			`class QuALITYDataset(BaseDataset):`

			`@staticmethod`
			`def load(path: str):`
[Feature] Support ModelScope datasets (#1289) * add ceval, gsm8k modelscope surpport * update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest * update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets * format file * format file * update dataset format * support ms_dataset * udpate dataset for modelscope support * merge myl_dev and update test_ms_dataset * udpate dataset for modelscope support * update readme * update eval_api_zhipu_v2 * remove unused code * add get_data_path function * update readme * remove tydiqa japanese subset * add ceval, gsm8k modelscope surpport * update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest * update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets * format file * format file * update dataset format * support ms_dataset * udpate dataset for modelscope support * merge myl_dev and update test_ms_dataset * update readme * udpate dataset for modelscope support * update eval_api_zhipu_v2 * remove unused code * add get_data_path function * remove tydiqa japanese subset * update util * remove .DS_Store * fix md format * move util into package * update docs/get_started.md * restore eval_api_zhipu_v2.py, add environment setting * Update dataset * Update * Update * Update * Update --------- Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local> Co-authored-by: Yunnglin <mao.looper@qq.com> Co-authored-by: Yun lin <yunlin@laptop.local> Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn> Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn> 2024-07-29 13:48:32 +08:00			`path = get_data_path(path, local_mode=True)`
[Feature] Add the implement of QuALITY datasets (#976) #976 2024-03-15 21:22:38 +08:00			`dataset_list = []`
			`with open(path, 'r', encoding='utf-8') as f:`
			`for line in f:`
			`line = json.loads(line)`
			`for question in line['questions']:`
			`dataset_list.append({`
			`'article':`
			`line['article'],`
			`'question':`
			`question['question'],`
			`'A':`
			`question['options'][0],`
			`'B':`
			`question['options'][1],`
			`'C':`
			`question['options'][2],`
			`'D':`
			`question['options'][3],`
			`'gold_label':`
			`'ABCD'[question['gold_label'] - 1],`
			`'difficult':`
			`question['difficult']`
			`})`
			`return Dataset.from_list(dataset_list)`


			`class QuALITYEvaluator(BaseEvaluator):`

			`def score(self, predictions, references, test_set):`
			`assert len(predictions) == len(references)`
			`easy, hard, all = [], [], []`
			`for pred, refer, test in zip(predictions, references, test_set):`
			`if pred == refer:`
			`answer = True`
			`else:`
			`answer = False`
			`all.append(answer)`
			`if test['difficult'] == 0:`
			`easy.append(answer)`
			`else:`
			`hard.append(answer)`
			`return dict(easy_acc=sum(easy) / len(easy) * 100,`
			`hard_acc=sum(hard) / len(easy) * 100,`
			`all_acc=sum(all) / len(all) * 100)`