OpenCompass/opencompass
Alexander Lam 1bd594fc62
[Feature] Added CompassArena-SubjectiveBench with Bradley-Terry Model (#1751)
* fix lint issues

* updated gitignore

* changed infer_order from random to double for the pairwise_judge.py (not changing for pairwise_bt_judge.py

* added return statement to CompassArenaBradleyTerrySummarizer to return overall score for each judger model
2024-12-16 13:41:28 +08:00
..
cli [Feature] Support import configs/models/summarizers from whl (#1376) 2024-08-01 00:42:48 +08:00
configs [Feature] Added CompassArena-SubjectiveBench with Bradley-Terry Model (#1751) 2024-12-16 13:41:28 +08:00
datasets [Feature] Added CompassArena-SubjectiveBench with Bradley-Terry Model (#1751) 2024-12-16 13:41:28 +08:00
lagent Update CIBench (#1089) 2024-04-26 18:46:02 +08:00
metrics [Feat] Support multi-modal evaluation on MME benchmark. (#197) 2023-08-21 15:53:20 +08:00
models [Update] Update O1-style Benchmark and Prompts (#1742) 2024-12-09 13:48:56 +08:00
openicl [Feature] Added CompassArena-SubjectiveBench with Bradley-Terry Model (#1751) 2024-12-16 13:41:28 +08:00
partitioners [Fix] fix duplicate error in partitioner (#1552) 2024-09-23 19:45:21 +08:00
runners [Feature] DLC runner Lark report (#1735) 2024-12-04 18:03:12 +08:00
summarizers [Feature] Added CompassArena-SubjectiveBench with Bradley-Terry Model (#1751) 2024-12-16 13:41:28 +08:00
tasks [Update] Update Skywork/Qwen-QwQ (#1728) 2024-12-05 19:30:43 +08:00
utils add new dataset summerizer (#1758) 2024-12-13 09:50:43 +08:00
__init__.py [Bump] Bump version to 0.3.7 (#1733) 2024-12-03 19:34:57 +08:00
registry.py [Feature] Add Judgerbench and reorg subeval (#1593) 2024-10-15 16:36:05 +08:00