OpenCompass/configs/datasets/subjective/judgerbench/judgerbench.py

from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.datasets.subjective import JudgerBenchDataset, JudgerBenchEvaluator
from mmengine.config import read_base

subjective_reader_cfg = dict(
    input_columns=['judge_prompt'],
    output_column='judge',
    )

subjective_all_sets = [
    'judgerbench_A_cn', 'judgerbench_A_en', 'judgerbench_B'
]

judgerbench_datasets = []

for _name in subjective_all_sets:
    subjective_infer_cfg = dict(
            prompt_template=dict(
                type=PromptTemplate,
                template=dict(round=[
                    dict(
                        role='HUMAN',
                        prompt='{judge_prompt}'
                    ),
                ]),
            ),
            retriever=dict(type=ZeroRetriever),
            inferencer=dict(type=GenInferencer, max_out_len=4096),
        )

    subjective_eval_cfg = dict(
        evaluator=dict(
            type=JudgerBenchEvaluator,
        ),
        pred_role='BOT',
    )

    judgerbench_datasets.append(
        dict(
            abbr=f'{_name}',
            type=JudgerBenchDataset,
            path='./data/subjective/judgerbench',
            name=_name,
            reader_cfg=subjective_reader_cfg,
            infer_cfg=subjective_infer_cfg,
            eval_cfg=subjective_eval_cfg,
        ))
# ds1000_eval_cfg = dict(
#     evaluator=dict(type=DS1000Evaluator),
#     pred_role='BOT',
#     pred_postprocessor=dict(type=ds1000_postprocess),
# )
[Feature] Add Subjective Evaluation (#680) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint 2023-12-11 22:22:11 +08:00			`from opencompass.openicl.icl_prompt_template import PromptTemplate`
			`from opencompass.openicl.icl_retriever import ZeroRetriever`
			`from opencompass.openicl.icl_inferencer import GenInferencer`
[Feature] Add Judgerbench and reorg subeval (#1593) * fix pip version * fix pip version * update (#1522) Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> * [Feature] Update Models (#1518) * Update Models * Update * Update humanevalx * Update * Update * [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527) add judgerbench and reorg sub add judgerbench and reorg subeval add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com> Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com> 2024-10-15 16:36:05 +08:00			`from opencompass.datasets.subjective import JudgerBenchDataset, JudgerBenchEvaluator`
[Feature] Add Subjective Evaluation (#680) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint 2023-12-11 22:22:11 +08:00			`from mmengine.config import read_base`

			`subjective_reader_cfg = dict(`
[Feature] Add Judgerbench and reorg subeval (#1593) * fix pip version * fix pip version * update (#1522) Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> * [Feature] Update Models (#1518) * Update Models * Update * Update humanevalx * Update * Update * [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527) add judgerbench and reorg sub add judgerbench and reorg subeval add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com> Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com> 2024-10-15 16:36:05 +08:00			`input_columns=['judge_prompt'],`
[Feature] Add Subjective Evaluation (#680) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint 2023-12-11 22:22:11 +08:00			`output_column='judge',`
			`)`

			`subjective_all_sets = [`
[Feature] Add Judgerbench and reorg subeval (#1593) * fix pip version * fix pip version * update (#1522) Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> * [Feature] Update Models (#1518) * Update Models * Update * Update humanevalx * Update * Update * [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527) add judgerbench and reorg sub add judgerbench and reorg subeval add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com> Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com> 2024-10-15 16:36:05 +08:00			`'judgerbench_A_cn', 'judgerbench_A_en', 'judgerbench_B'`
[Feature] Add Subjective Evaluation (#680) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint 2023-12-11 22:22:11 +08:00			`]`

[Feature] Add Judgerbench and reorg subeval (#1593) * fix pip version * fix pip version * update (#1522) Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> * [Feature] Update Models (#1518) * Update Models * Update * Update humanevalx * Update * Update * [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527) add judgerbench and reorg sub add judgerbench and reorg subeval add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com> Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com> 2024-10-15 16:36:05 +08:00			`judgerbench_datasets = []`
[Feature] Add Subjective Evaluation (#680) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint 2023-12-11 22:22:11 +08:00
			`for _name in subjective_all_sets:`
			`subjective_infer_cfg = dict(`
			`prompt_template=dict(`
			`type=PromptTemplate,`
			`template=dict(round=[`
			`dict(`
			`role='HUMAN',`
[Feature] Add Judgerbench and reorg subeval (#1593) * fix pip version * fix pip version * update (#1522) Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> * [Feature] Update Models (#1518) * Update Models * Update * Update humanevalx * Update * Update * [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527) add judgerbench and reorg sub add judgerbench and reorg subeval add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com> Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com> 2024-10-15 16:36:05 +08:00			`prompt='{judge_prompt}'`
[Feature] Add Subjective Evaluation (#680) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint 2023-12-11 22:22:11 +08:00			`),`
			`]),`
			`),`
			`retriever=dict(type=ZeroRetriever),`
[Feature] Add Judgerbench and reorg subeval (#1593) * fix pip version * fix pip version * update (#1522) Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> * [Feature] Update Models (#1518) * Update Models * Update * Update humanevalx * Update * Update * [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527) add judgerbench and reorg sub add judgerbench and reorg subeval add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com> Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com> 2024-10-15 16:36:05 +08:00			`inferencer=dict(type=GenInferencer, max_out_len=4096),`
[Feature] Add Subjective Evaluation (#680) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint 2023-12-11 22:22:11 +08:00			`)`

			`subjective_eval_cfg = dict(`
			`evaluator=dict(`
[Feature] Add Judgerbench and reorg subeval (#1593) * fix pip version * fix pip version * update (#1522) Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> * [Feature] Update Models (#1518) * Update Models * Update * Update humanevalx * Update * Update * [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527) add judgerbench and reorg sub add judgerbench and reorg subeval add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com> Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com> 2024-10-15 16:36:05 +08:00			`type=JudgerBenchEvaluator,`
[Feature] Add Subjective Evaluation (#680) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint 2023-12-11 22:22:11 +08:00			`),`
[Format] Add config lints (#892) 2024-05-14 15:35:58 +08:00			`pred_role='BOT',`
[Feature] Add Subjective Evaluation (#680) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint 2023-12-11 22:22:11 +08:00			`)`

[Feature] Add Judgerbench and reorg subeval (#1593) * fix pip version * fix pip version * update (#1522) Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> * [Feature] Update Models (#1518) * Update Models * Update * Update humanevalx * Update * Update * [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527) add judgerbench and reorg sub add judgerbench and reorg subeval add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com> Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com> 2024-10-15 16:36:05 +08:00			`judgerbench_datasets.append(`
[Feature] Add Subjective Evaluation (#680) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint 2023-12-11 22:22:11 +08:00			`dict(`
[Format] Add config lints (#892) 2024-05-14 15:35:58 +08:00			`abbr=f'{_name}',`
[Feature] Add Judgerbench and reorg subeval (#1593) * fix pip version * fix pip version * update (#1522) Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> * [Feature] Update Models (#1518) * Update Models * Update * Update humanevalx * Update * Update * [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527) add judgerbench and reorg sub add judgerbench and reorg subeval add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com> Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com> 2024-10-15 16:36:05 +08:00			`type=JudgerBenchDataset,`
			`path='./data/subjective/judgerbench',`
[Feature] Add Subjective Evaluation (#680) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint 2023-12-11 22:22:11 +08:00			`name=_name,`
			`reader_cfg=subjective_reader_cfg,`
			`infer_cfg=subjective_infer_cfg,`
[Feature] Add Judgerbench and reorg subeval (#1593) * fix pip version * fix pip version * update (#1522) Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> * [Feature] Update Models (#1518) * Update Models * Update * Update humanevalx * Update * Update * [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527) add judgerbench and reorg sub add judgerbench and reorg subeval add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com> Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com> 2024-10-15 16:36:05 +08:00			`eval_cfg=subjective_eval_cfg,`
[Feature] Add Subjective Evaluation (#680) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint 2023-12-11 22:22:11 +08:00			`))`
[Feature] Add Judgerbench and reorg subeval (#1593) * fix pip version * fix pip version * update (#1522) Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> * [Feature] Update Models (#1518) * Update Models * Update * Update humanevalx * Update * Update * [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527) add judgerbench and reorg sub add judgerbench and reorg subeval add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com> Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com> 2024-10-15 16:36:05 +08:00			`# ds1000_eval_cfg = dict(`
			`# evaluator=dict(type=DS1000Evaluator),`
			`# pred_role='BOT',`
			`# pred_postprocessor=dict(type=ds1000_postprocess),`
			`# )`