OpenCompass/opencompass/configs/datasets/OpenHuEval/HuSimpleQA/HuSimpleQA_setting.py
2025-02-10 21:22:45 +08:00

136 lines
7.2 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

INSTRUCTIONS = {
'en':
"""Question: {question}
Please provide your best answer to this question in Hungarian and indicate your confidence in your answer using a score from 0 to 100. Please provide your response in the following JSON format:
{
"answer": "Your answer here",
"confidence_score": number
}
""",
'hu':
"""Kérdés: {question}
Kérjük, magyar nyelven adja meg a legjobb választ erre a kérdésre, és 0-tól 100-ig terjedő pontszámmal jelezze, hogy bízik a válaszában. Kérjük, válaszát a következő JSON formátumban adja meg:
{
"answer": "Az Ön válasza itt",
"confidence_score": szám
}
"""
}
JUDGE_PROMPT = {
'system_prompt':
"""
- Role: Hungarian General Knowledge Quiz Review Expert
- Background: Hungarian General Knowledge Quiz involves many aspects of Hungarian history, culture, geography, politics, people, etc., and a professional review expert is needed to ensure the accuracy of the answers.
- Goals: According to the given questions and reference answers, accurately judge whether the answers entered by the user are correct, and give the evaluation results: "CORRECT", "INCORRECT", "NOT_ATTEMPTED".
- Evaluation criteria:
- "Correct":
- Predict answer fully contain the important information in the gold_target.
- Predict answer do not contain any information that contradicts the gold_target.
- Only semantic meaning matters; capitalization, punctuation, grammar, and order don't matter.
- Hedging and guessing are permissible, provided that the gold_target is fully included and the response contains no incorrect information or contradictions.
- The answer must be consistent with the scope of the question. For example, if the question asks “on which day was someone born,” the answer must specify the exact date, such as “January 3, 1997.”
- "Not attempted":
- Questions that the user has not attempted to answer should be marked as "NOT_ATTEMPTED".
- The important information in the gold_target is not included in the answer.
- No statements in the answer contradict the gold_target.
- Workflow:
1. Receive questions, reference answers, and user answers.
2. Compare the reference answers and user answers to determine whether they are consistent.
3. Based on the judgment results, output the corresponding evaluation results.
- Constraints:
- For grading questions where the gold_target is a number, the predicted_answer needs to be correct to the last significant figure in the gold answer. For example, consider a question “Hány látogató érkezett Magyarországra 2024-ben?” with gold_target “440k”.
- predicted_answers “440k”, “444k”, and “435k” are all CORRECT.
- predicted_answers “400k” and “413k” are INCORRECT.
- predicted_answers “körülbelül 400k” and “több mint 300k” are considered NOT_ATTEMPTED because they neither confirm nor contradict the gold_target.
- The gold_target may contain more information than the question. In such cases, the predicted_answer only needs to contain the information that is in the question.
- For example, consider the question “Where was The Martian filmed to represent the NASA Johnson Space Center?” with the gold_target “Budapesti Bálna (HU)”. Either “Budapesti Bálna” or Bálna (építmény) would be considered a CORRECT answer.
- Do not punish predicted_answers if they omit information that would be clearly inferred from the question.
- For example, consider the question “Melyik városban található a Magyar Nemzeti Bank székhelye?” with the gold_target “Budapest, Magyarország”. The predicted_answer “Budapest” would be considered CORRECT, even though it does not include “Magyarország”.
- Consider the question “Milyen díjat nyert a Napsugár című magyar film a 2018-as Velencei Filmfesztiválon?” with the gold_target “Legjobb Forgatókönyv Díja”. The predicted_answer “Legjobb Forgatókönyv” would be considered CORRECT, because “díj” is presumed in the question.
- For the question “Mi Dominik Szoboszlai magassága méterben?” with the gold_target “1,85 m”, the predicted_answer “1,85” would be considered CORRECT, because meters is specified in the question.
- For the question “Mi Magyarország első női miniszterelnökének neve?” with the gold_target “Wittner Mária”, the predicted_answer “Mária” would be considered CORRECT, because the last name is presumed.
- Do not punish for typos in peoples names if its clearly the same name.
- For example, if the gold_target is “Nagy László”, you can consider the following predicted_answers as correct: “László Nagy”, “NAGY LÁSZLÓ”, or “nagy lászló”.
Example1:
Input:
{
"question": "Melyik törvény foglalkozik a találmányok szabadalmi oltalmával az 1969-es jogalkotásban?",
"gold_target": "1969. évi II. törvény",
"predicted_answer": "Nem áll rendelkezésre internetes keresés, így nem tudom megválaszolni a kérdést. Azonban 1969-ben valóban elfogadták a szabadalmi védelmi törvényt."
}
Output:
{
"evaluation": "NOT_ATTEMPTED"
}
Example2:
Input:
{
"question": "Melyik városban született Tisza Lajos 1832. szeptember 12-én?",
"gold_target": "Nagyváradon",
"predicted_answer": "Nagyváradon született Tisza Lajos 1832. szeptember 12-én."
}
Output: # Although the answer is long, it accurately answers the question
{
"evaluation": "CORRECT"
}
Example3:
Input:
{
"question": "Melyik olimpián vett részt Bogár Pál kosárlabda-játékosként?",
"gold_target": "1952-es Helsinki nyári olimpia",
"predicted_answer": "1952 Helsinki olimpián."
}
Output: # The descriptions are slightly different, but they all refer to the same Olympic Games, so they are considered correct
{
"evaluation": "CORRECT"
}
Example4:
Input:
{
"question": "Melyik labdarúgócsapat kötődik Budapest XIX. kerületéhez, amely 14-szeres magyar bajnok?",
"gold_target": "Budapest Honvéd FC",
"predicted_answer": "Ferencváros"
}
Output: #Although Ferencváros is a very famous football club in Hungary, it has no connection with the 19th district of Budapest and its number of championships does not match the description in the question.
{
"evaluation": "INCORRECT"
}
Example5:
Input:
{
"question": "Milyen biztosítás bevezetését szabályozta egy 1952-es törvényerejű rendelet Magyarországon?",
"gold_target": "kötelező tűz- és jégbiztosítás",
"predicted_answer": "Kötelező tűzbiztosítás"
}
Output: # The predicted_answer does not include all correct answers
{
"evaluation": "INCORRECT"
}
""",
'user_prompt':
"""Please strictly follow the above example and requirements, evaluate the following answer. Input:
{{
"question": {question},
"gold_target": {answer},
"predicted_answer": {pred_answer}
}}
Please respond strictly in JSON format. Do not include any additional text outside the JSON structure.
Output:
{{
"evaluation":"Correct"/"Incorrect"/"NOT_ATTEMPTED"
}}
"""
}
OpenHuEval_Path = '/mnt/hwfile/opendatalab/weixingjian/OpenHuEval'
DATA_VERSION = '250208'
DATA_PATH = f'{OpenHuEval_Path}/data/HuSimpleQA/HuSimpleQA_{DATA_VERSION}/HuSimpleQA.jsonl'