..
__init__.py
[Refactor] Change HuSimpleQA to subjective evaluation
2025-02-12 20:25:03 +08:00
alignmentbench.py
[Fix] add bc for alignbench summarizer ( #1306 )
2024-07-12 11:06:20 +08:00
all_obj.py
[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README ( #1103 )
2024-04-28 21:58:58 +08:00
alpacaeval.py
[Fix] Fix model summarizer abbr ( #1789 )
2024-12-27 14:45:08 +08:00
arenahard.py
[Refactor] Reorganize subjective eval ( #1284 )
2024-07-05 22:11:37 +08:00
charm.py
[Feature] Update CHARM Memeorziation ( #1230 )
2024-07-26 18:42:30 +08:00
common_summarizer.py
[ci] add common_summarizer return ( #1724 )
2024-12-11 20:38:32 +08:00
compass_arena_bradley_terry.py
added predicted win rates reporting to bradley terry subj eval methods with an option to switch between win rates and elo ratings ( #1815 )
2025-01-10 18:20:25 +08:00
compass_arena.py
[Fix] Fix model summarizer abbr ( #1789 )
2024-12-27 14:45:08 +08:00
compassbench_v13.py
[Update] Compassbench v1.3 ( #1396 )
2024-08-12 19:09:19 +08:00
compassbench.py
[Feature] Update Models, Summarizers ( #1600 )
2024-10-29 18:37:15 +08:00
corev2.py
reorganize subject files ( #801 )
2024-01-16 18:03:11 +08:00
creationbench.py
reorganize subject files ( #801 )
2024-01-16 18:03:11 +08:00
flames.py
[Fix] fix Flames ( #1599 )
2024-10-12 14:34:59 +08:00
fofo.py
[Refactor] Reorganize subjective eval ( #1284 )
2024-07-05 22:11:37 +08:00
followbench.py
[Feature] add support for internal Followbench ( #1511 )
2024-09-11 13:32:34 +08:00
husimpleqa.py
add some features ( #32 )
2025-02-14 20:44:53 +08:00
mtbench101.py
[Refactor] Reorganize subjective eval ( #1284 )
2024-07-05 22:11:37 +08:00
mtbench.py
[Refactor] Reorganize subjective eval ( #1284 )
2024-07-05 22:11:37 +08:00
multiround.py
[Fix] Fix MultiRound Subjective Evaluation( #1043 )
2024-04-22 12:06:03 +08:00
qacompassbench.py
add new dataset summerizer ( #1758 )
2024-12-13 09:50:43 +08:00
subjective_post_process.py
reorganize subject files ( #801 )
2024-01-16 18:03:11 +08:00
subjective.py
[Fix] Sub summarizer order fix ( #1426 )
2024-08-15 21:08:18 +08:00
utils.py
[Feature] add support for internal Followbench ( #1511 )
2024-09-11 13:32:34 +08:00
wildbench.py
[Feature] Add OpenHuEval-HuLifeQA ( #4 )
2025-01-24 10:32:17 +08:00