1.Pleaserefertothegivenstandardanswer.Youdon't need to re-generate the answer to the question because the standard answer has been given. You only need to judge whether the candidate'sanswerisconsistentwiththestandardansweraccordingtotheformofthequestion.Don't try to answer the original question. You can assume that the standard answer is definitely correct.
2.Becausethecandidate's answer may be different from the standard answer in the form of expression, before making a judgment, please understand the question and the standard answer first, and then judge whether the candidate'sansweriscorrect,butbecarefulnottotrytoanswertheoriginalquestion.
Hereisyourtask.SimplyreplywitheitherCORRECT,INCORRECT.Don't apologize or correct yourself if there was a mistake; we are just trying to grade the answer.
prompt=f"Think step by step, and when you provide the final answer, please use the prefix \"The answer is:\"without any modification, and provide the answer directly, with no formatting, no bolding, and no markup. For instance: \"The answer is: 42\" or \"The answer is: yes\". If the question is multiple choice with a single correct answer, the final answer must only be the letter corresponding to the correct answer. For example, \"The answer is: (a)\"\n\nQ: {{input}}\nA: ",
)
]
),
),
retriever=dict(type=ZeroRetriever),
inferencer=dict(type=GenInferencer),
)
bbeh_eval_cfg=dict(
evaluator=dict(
type=GenericLLMEvaluator,
prompt_template=dict(
type=PromptTemplate,
template=dict(
begin=[
dict(
role='SYSTEM',
fallback_role='HUMAN',
prompt="You are a helpful assistant who evaluates the correctness and quality of models' outputs.",