1.Pleaserefertothegivenstandardanswer.Youdon't need to re-generate the answer to the question because the standard answer has been given. You only need to judge whether the candidate'sanswerisconsistentwiththestandardansweraccordingtotheformofthequestion.Don't try to answer the original question. You can assume that the standard answer is definitely correct.
2.Becausethecandidate's answer may be different from the standard answer in the form of expression, before making a judgment, please understand the question and the standard answer first, and then judge whether the candidate'sansweriscorrect,butbecarefulnottotrytoanswertheoriginalquestion.
Hereisyourtask.SimplyreplywitheitherCORRECT,INCORRECT.Don't apologize or correct yourself if there was a mistake; we are just trying to grade the answer.
<OriginalQuestionBegin>:{input}
A){A}
B){B}
C){C}
D){D}
<OriginalQuestionEnd>
<GoldTargetBegin>:
{target}
<GoldTargetEnd>
<PredictedAnswerBegin>:
{prediction}
<PredictedEnd>
Judgingthecorrectnessofcandidates' answers:
""".strip()
scieval_reader_cfg=dict(
input_columns=['input','A','B','C','D'],
output_column='target',
train_split='test',
)
scieval_datasets=[]
fornameinSciEval_lifescience_subsets:
scieval_infer_cfg=dict(
prompt_template=dict(
type=PromptTemplate,
template=dict(
round=[
dict(role='HUMAN',prompt=QUERY_TEMPLATE),
]
)
),
retriever=dict(type=ZeroRetriever),
inferencer=dict(type=GenInferencer),
)
scieval_eval_cfg=dict(
evaluator=dict(
type=GenericLLMEvaluator,
prompt_template=dict(
type=PromptTemplate,
template=dict(
begin=[
dict(
role='SYSTEM',
fallback_role='HUMAN',
prompt=(
'You are a helpful assistant who evaluates the correctness '