.. |
gpqa_0shot_nocot_gen_772ea0.py
|
[Update] Update Fullbench (#1712)
|
2024-11-26 14:26:55 +08:00 |
gpqa_0shot_nocot_genericllmeval_gen_772ea0.py
|
[Feature] Update o1 evaluation with JudgeLLM (#1795)
|
2024-12-30 17:31:00 +08:00 |
gpqa_0shot_nocot_genericllmeval_xml_gen_772ea0.py
|
[Refactor] Code refactoarization (#1831)
|
2025-01-20 19:17:38 +08:00 |
gpqa_0shot_nocot_llmjudge_gen_772ea0.py
|
[Update] Update Skywork/Qwen-QwQ (#1728)
|
2024-12-05 19:30:43 +08:00 |
gpqa_cascade_eval_gen_772ea0.py
|
[Update] Add CascadeEvaluator with Data Replica (#2022)
|
2025-05-20 16:46:55 +08:00 |
gpqa_few_shot_ppl_4b5a83.py
|
[Fix] gpqa_few_shot_ppl prompt bug (#1627)
|
2024-10-21 16:59:06 +08:00 |
gpqa_gen_4baadb.py
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
gpqa_gen_015262.py
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
gpqa_gen.py
|
[Feature] Add recommendation configs for datasets (#1937)
|
2025-03-25 14:54:13 +08:00 |
gpqa_llm_judge_gen.py
|
[Feature] Add recommendation configs for datasets (#1937)
|
2025-03-25 14:54:13 +08:00 |
gpqa_openai_simple_evals_gen_5aeece.py
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
gpqa_ppl_6bf57a.py
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
README.md
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |