bittersweet1999
3f50b1dc49
[Fix] fix order bug Update arena_hard.py ( #2015 )
2025-04-11 16:59:40 +08:00
zhulinJulia24
6ac9b06bc2
[ci] update baseline for kernal change of vllm and lmdeploy ( #2011 )
...
* update
* update
* update
* update
* update
* update
* update
2025-04-09 14:09:35 +08:00
Alexander Lam
f871e80887
[Feature] Add Bradley-Terry Subjective Evaluation method to Arena Hard dataset ( #1802 )
...
* added base_models_abbrs to references (passed from LMEvaluator); added bradleyterry subjective evaluation method for wildbench, alpacaeval, and compassarena datasets; added all_scores output files for reference in CompassArenaBradleyTerrySummarizer;
* added bradleyterry subjective evaluation method to arena_hard dataset
2025-01-03 16:33:43 +08:00
Alexander Lam
dc6035cfcb
[Feature] Added Bradley-Terry subjective evaluation
2024-12-31 11:01:23 +08:00
Alexander Lam
1bd594fc62
[Feature] Added CompassArena-SubjectiveBench with Bradley-Terry Model ( #1751 )
...
* fix lint issues
* updated gitignore
* changed infer_order from random to double for the pairwise_judge.py (not changing for pairwise_bt_judge.py
* added return statement to CompassArenaBradleyTerrySummarizer to return overall score for each judger model
2024-12-16 13:41:28 +08:00
zhulinJulia24
aeded4c4db
add new dataset summerizer ( #1758 )
...
add new dataset summerizer
2024-12-13 09:50:43 +08:00
bittersweet1999
54c0fb7a93
[Change] Change Compassarena metric ( #1749 )
...
* fix pip version
* fix pip version
* fix summarizer bug
* fix compassarena
* fix compassarena
* fix compassarena
2024-12-10 14:45:32 +08:00
bittersweet1999
a0853c939d
[Add] Add CompassArenaSubjectiveBench ( #1645 )
...
* fix pip version
* fix pip version
* add compassarenasubjectivebench
* add compassarenasubjectivebench
* add compassarenabench
2024-11-01 13:52:22 +08:00
bittersweet1999
f0d436496e
[Update] update docs and add compassarena ( #1614 )
...
* fix pip version
* fix pip version
* update docs and add compassarena
* update docs
2024-10-17 14:39:06 +08:00
Haoran Que
4fe251729b
Upload HelloBench ( #1607 )
...
* upload hellobench
* update hellobench
* update readme.md
* update eval_hellobench.py
* update lastest
---------
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-10-15 17:11:37 +08:00
bittersweet1999
fa54aa62f6
[Feature] Add Judgerbench and reorg subeval ( #1593 )
...
* fix pip version
* fix pip version
* update (#1522 )
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
* [Feature] Update Models (#1518 )
* Update Models
* Update
* Update humanevalx
* Update
* Update
* [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527 )
add judgerbench and reorg sub
add judgerbench and reorg subeval
add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
---------
Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2024-10-15 16:36:05 +08:00
bittersweet1999
3f7a3730d7
[Fix] fix Flames ( #1599 )
...
* fix pip version
* fix pip version
* fix flames
* fix flames
2024-10-12 14:34:59 +08:00
bittersweet1999
7c7fa36235
[Feature] add support for internal Followbench ( #1511 )
...
* fix pip version
* fix pip version
* add internal followbench
* add internal followbench
* fix lint
* fix lint
2024-09-11 13:32:34 +08:00
Songyang Zhang
9b3613f10b
[Update] Support auto-download of FOFO/MT-Bench-101 ( #1423 )
...
* [Update] Support auto-download of FOFO/MT-Bench-101
* Update wildbench
2024-08-16 11:57:41 +08:00
Songyang Zhang
c81329b548
[Fix] Fix Slurm ENV ( #1392 )
...
1. Support Slurm Cluster
2. Support automatic data download
3. Update InternLM2.5-1.8B/20B-Chat
2024-08-06 01:35:20 +08:00
klein
65fad8e2ac
[Fix] minor update wildbench ( #1335 )
...
* update crb
* update crbbench
* update crbbench
* update crbbench
* minor update wildbench
* [Fix] Update doc of wildbench, and merge wildbench into subjective
* [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench
* Update crb.md
* Update crb_pair_judge.py
* Update crb_single_judge.py
* Update subjective_evaluation.md
* Update openai_api.py
* [Update] update wildbench readme
* [Update] update wildbench readme
* [Update] update wildbench readme, remove crb
* Delete configs/eval_subjective_wildbench_pair.py
* Delete configs/eval_subjective_wildbench_single.py
* Update __init__.py
---------
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-07-26 11:19:04 +08:00
bittersweet1999
1f9f728f22
[Feature] support compassbench Checklist evaluation ( #1339 )
...
* fix pip version
* fix pip version
* support checklist eval
* init
* add lan
* fix typo
2024-07-19 16:40:44 +08:00
Fengzhe Zhou
a62c613d3e
[Sync] bump version 0.2.6+local ( #1294 )
2024-07-06 00:44:06 +08:00
bittersweet1999
68ca48496b
[Refactor] Reorganize subjective eval ( #1284 )
...
* fix pip version
* fix pip version
* reorganize subjective eval
* reorg sub
* reorg subeval
* reorg subeval
* update subjective doc
* reorg subeval
* reorg subeval
2024-07-05 22:11:37 +08:00
Fengzhe Zhou
a32f21a356
[Sync] Sync with internal codes 2024.06.28 ( #1279 )
2024-06-28 14:16:34 +08:00
Xingyuan Bu
842fb1cd70
Update mtbench101.py ( #1276 )
...
fix wrong-used import
from torch.utils.data import DataLoader, Dataset
2024-06-26 00:40:22 +08:00
klein
1fa62c4a42
Support wildbench ( #1266 )
...
Co-authored-by: Leymore <zfz-960727@163.com>
2024-06-24 13:16:27 +08:00
bittersweet1999
982e024540
[Feature] add dataset Fofo ( #1224 )
...
* add fofo dataset
* add dataset fofo
2024-06-06 11:40:48 +08:00
Xingyuan Bu
02a0a4e857
MT-Bench-101 ( #1215 )
...
* add mt-bench-101
* add readme and requirements
* add mt-bench-101 data
* Update readme_mtbench101.md
* update readme
* update leaderboard
* fix typo
* Update readme_mtbench101.md
* fit newest opencompass
* update readme.md
* mtbench101 to opencompass
* mtbench101 to opencompass
* for code review
* for code review
* for code review
* hook
* hook
---------
Co-authored-by: liujie <ljie@buaa.edu.cn>
2024-06-03 14:52:12 +08:00
Fengzhe Zhou
a77b8a5cec
[Sync] format ( #1214 )
2024-05-30 00:21:58 +08:00
bittersweet1999
e404b72c52
[Feature] support arenahard evaluation ( #1096 )
...
* support arenahard
* support arenahard
* support arenahard
2024-04-26 15:42:00 +08:00
bittersweet1999
2ee8e8a1a1
[Feature] add mtbench ( #829 )
...
* add mtbench
* add mtbench
* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/mtbench.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix mtbench
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-24 12:11:47 +08:00
bittersweet1999
2d4da8dd02
[Feature] Add CompassArena ( #828 )
...
* add compass arena
* add compass_arena
* add compass arena
* Update opencompass/summarizers/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/summarizers/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix check position bias
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-23 15:12:46 +08:00
bittersweet1999
814b3f73bd
reorganize subject files ( #801 )
2024-01-16 18:03:11 +08:00