Alexander Lam
1bd594fc62
[Feature] Added CompassArena-SubjectiveBench with Bradley-Terry Model ( #1751 )
...
* fix lint issues
* updated gitignore
* changed infer_order from random to double for the pairwise_judge.py (not changing for pairwise_bt_judge.py
* added return statement to CompassArenaBradleyTerrySummarizer to return overall score for each judger model
2024-12-16 13:41:28 +08:00
Linchen Xiao
bd7b705be4
[Update] Update dataset configuration with no max_out_len ( #1754 )
2024-12-11 18:20:29 +08:00
Linchen Xiao
0d26b348e4
[Feature] Add OC academic 2412 ( #1750 )
2024-12-10 21:53:06 +08:00
bittersweet1999
54c0fb7a93
[Change] Change Compassarena metric ( #1749 )
...
* fix pip version
* fix pip version
* fix summarizer bug
* fix compassarena
* fix compassarena
* fix compassarena
2024-12-10 14:45:32 +08:00
Linchen Xiao
9de27b4d85
[Update] Update max_out_len for datasets ( #1726 )
...
* [Update] Update max_out_len for datasets
* Update eval_regression_chat_objective_fullbench.py
* Update eval_regression_chat.py
* Update eval_regression_chat.py
* Update oc_score_baseline_fullbench.yaml
---------
Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
2024-12-02 11:42:07 +08:00
Songyang Zhang
f97c4eae42
[Update] Update Fullbench ( #1712 )
...
* Update JuderBench
* Support O1-style Prompts
* Update Code
2024-11-26 14:26:55 +08:00
bittersweet1999
a0853c939d
[Add] Add CompassArenaSubjectiveBench ( #1645 )
...
* fix pip version
* fix pip version
* add compassarenasubjectivebench
* add compassarenasubjectivebench
* add compassarenabench
2024-11-01 13:52:22 +08:00
Linchen Xiao
8172af49bb
[Update] Update wildbench max_seq_len ( #1648 )
...
* [Update] Wildbench max_seq_len update
* [Update] Wildbench max_seq_len update
2024-10-29 13:21:31 +08:00
bittersweet1999
a11e2b2fd4
[Fix] Compatible with old versions ( #1616 )
...
* fix pip version
* fix pip version
* Compatible with old versions
* compati old version
* compati old version
* compati old version
* update configs
2024-10-21 10:16:29 +08:00
bittersweet1999
f0d436496e
[Update] update docs and add compassarena ( #1614 )
...
* fix pip version
* fix pip version
* update docs and add compassarena
* update docs
2024-10-17 14:39:06 +08:00
bittersweet1999
fa54aa62f6
[Feature] Add Judgerbench and reorg subeval ( #1593 )
...
* fix pip version
* fix pip version
* update (#1522 )
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
* [Feature] Update Models (#1518 )
* Update Models
* Update
* Update humanevalx
* Update
* Update
* [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527 )
add judgerbench and reorg sub
add judgerbench and reorg subeval
add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
---------
Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2024-10-15 16:36:05 +08:00
bittersweet1999
3f7a3730d7
[Fix] fix Flames ( #1599 )
...
* fix pip version
* fix pip version
* fix flames
* fix flames
2024-10-12 14:34:59 +08:00
Linchen Xiao
80cda1980e
[BUG] fix followbench dataset config ( #1564 )
...
* [BUG] fix followbench dataset config
* [BUG] fix followbench dataset config
2024-09-25 20:58:34 +08:00
bittersweet1999
7c7fa36235
[Feature] add support for internal Followbench ( #1511 )
...
* fix pip version
* fix pip version
* add internal followbench
* add internal followbench
* fix lint
* fix lint
2024-09-11 13:32:34 +08:00
bittersweet1999
c2bcd8725e
[Fix] Fix wildbench ( #1508 )
...
* fix pip version
* fix pip version
* fix_wildbench
2024-09-10 17:35:07 +08:00
Songyang Zhang
9b3613f10b
[Update] Support auto-download of FOFO/MT-Bench-101 ( #1423 )
...
* [Update] Support auto-download of FOFO/MT-Bench-101
* Update wildbench
2024-08-16 11:57:41 +08:00
Linchen Xiao
8e55c9c6ee
[Update] Compassbench v1.3 ( #1396 )
...
* stash files
* compassbench subjective evaluation added
* evaluation update
* fix lint
* update docs
* Update lint
* changes saved
* changes saved
* CompassBench subjective summarizer added (#1349 )
* subjective summarizer added
* fix lint
[Fix] Fix MathBench (#1351 )
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
[Update] Update model support list (#1353 )
* fix pip version
* fix pip version
* update model support
subjective summarizer updated
knowledge, math objective done (data need update)
remove secrets
objective changes saved
knowledge data added
* secrets removed
* changed added
* summarizer modified
* summarizer modified
* compassbench coding added
* fix lint
* objective summarizer updated
* compass_bench_v1.3 updated
* update files in config folder
* remove unused model
* lcbench modified
* removed model evaluation configs
* remove duplicated sdk implementation
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-08-12 19:09:19 +08:00
Songyang Zhang
c81329b548
[Fix] Fix Slurm ENV ( #1392 )
...
1. Support Slurm Cluster
2. Support automatic data download
3. Update InternLM2.5-1.8B/20B-Chat
2024-08-06 01:35:20 +08:00
klein
65fad8e2ac
[Fix] minor update wildbench ( #1335 )
...
* update crb
* update crbbench
* update crbbench
* update crbbench
* minor update wildbench
* [Fix] Update doc of wildbench, and merge wildbench into subjective
* [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench
* Update crb.md
* Update crb_pair_judge.py
* Update crb_single_judge.py
* Update subjective_evaluation.md
* Update openai_api.py
* [Update] update wildbench readme
* [Update] update wildbench readme
* [Update] update wildbench readme, remove crb
* Delete configs/eval_subjective_wildbench_pair.py
* Delete configs/eval_subjective_wildbench_single.py
* Update __init__.py
---------
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-07-26 11:19:04 +08:00
Songyang Zhang
96f644de69
[Fix] Update path and folder ( #1344 )
...
* Update path and folder
* Update path
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-21 08:18:14 +08:00
Linchen Xiao
a56678190b
[Feature] CompassBench v1_3 subjective evaluation ( #1341 )
...
* stash files
* compassbench subjective evaluation added
* evaluation update
* remove unneeded content
* fix lint
* update docs
* Update lint
* Update
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-19 23:12:23 +08:00
bittersweet1999
1f9f728f22
[Feature] support compassbench Checklist evaluation ( #1339 )
...
* fix pip version
* fix pip version
* support checklist eval
* init
* add lan
* fix typo
2024-07-19 16:40:44 +08:00
bittersweet1999
8e7ad2e981
[Fix] add bc for alignbench summarizer ( #1306 )
...
* fix pip version
* fix pip version
* fix alignbench
* fix import error
2024-07-12 11:06:20 +08:00
bittersweet1999
889e7e1140
[Fix] Change abbr for arenahard dataset ( #1302 )
...
* fix pip version
* fix pip version
* change abbr for arenahard
2024-07-11 12:42:03 +08:00
bittersweet1999
68ca48496b
[Refactor] Reorganize subjective eval ( #1284 )
...
* fix pip version
* fix pip version
* reorganize subjective eval
* reorg sub
* reorg subeval
* reorg subeval
* update subjective doc
* reorg subeval
* reorg subeval
2024-07-05 22:11:37 +08:00
Fengzhe Zhou
a32f21a356
[Sync] Sync with internal codes 2024.06.28 ( #1279 )
2024-06-28 14:16:34 +08:00
klein
1fa62c4a42
Support wildbench ( #1266 )
...
Co-authored-by: Leymore <zfz-960727@163.com>
2024-06-24 13:16:27 +08:00
bittersweet1999
982e024540
[Feature] add dataset Fofo ( #1224 )
...
* add fofo dataset
* add dataset fofo
2024-06-06 11:40:48 +08:00
Xingyuan Bu
02a0a4e857
MT-Bench-101 ( #1215 )
...
* add mt-bench-101
* add readme and requirements
* add mt-bench-101 data
* Update readme_mtbench101.md
* update readme
* update leaderboard
* fix typo
* Update readme_mtbench101.md
* fit newest opencompass
* update readme.md
* mtbench101 to opencompass
* mtbench101 to opencompass
* for code review
* for code review
* for code review
* hook
* hook
---------
Co-authored-by: liujie <ljie@buaa.edu.cn>
2024-06-03 14:52:12 +08:00
Fengzhe Zhou
a77b8a5cec
[Sync] format ( #1214 )
2024-05-30 00:21:58 +08:00
bittersweet1999
07a6dacf33
fix length ( #1180 )
2024-05-24 23:30:01 +08:00
Fengzhe Zhou
aa2dd2b58c
[Format] Add config lints ( #892 )
2024-05-14 15:35:58 +08:00
bittersweet1999
5432dfc1ff
fix multiround ( #1146 )
2024-05-13 15:58:39 +08:00
bittersweet1999
e404b72c52
[Feature] support arenahard evaluation ( #1096 )
...
* support arenahard
* support arenahard
* support arenahard
2024-04-26 15:42:00 +08:00
bittersweet1999
6f98c8d9ab
[Fix] Fix MultiRound Subjective Evaluation( #1043 )
...
* fix multiround
* fix
2024-04-22 12:06:03 +08:00
Fengzhe Zhou
b39f501563
[Sync] update taco ( #1030 )
2024-04-09 17:50:23 +08:00
bittersweet1999
2d4e559763
[Feature] Add multi-model judge and fix some problems ( #1016 )
...
* support multi-model judge and moe judge
* test_moe
* test_moe
* test
* add moe judge
* support multi-judge-model
2024-04-02 11:52:06 +08:00
bittersweet1999
02e7eec911
[Feature] Support AlpacaEval_V2 ( #1006 )
...
* support alpacaeval_v2
* support alpacaeval
* update docs
* update docs
2024-03-28 16:49:04 +08:00
bittersweet1999
c78a4df923
add support for set prediction path ( #984 )
2024-03-19 14:32:15 +08:00
bittersweet1999
848e7c8a76
[fix] add different temp for different question in mtbench ( #954 )
...
* add temp for mtbench
* add document for mtbench
* add document for mtbench
2024-03-11 17:24:39 +08:00
bittersweet1999
001e77fea2
[Feature] add support for gemini ( #931 )
...
* add gemini
* add gemini
* add gemini
2024-02-28 19:38:34 +08:00
bittersweet1999
7806cd0f64
[Feature] support alpacaeval ( #809 )
...
* support alpacaeval_v1
* Update opencompass/summarizers/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/summarizers/subjective/alpacaeval_v1.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix conflict
* support alpacaeval v2
* support alpacav2
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-02-04 14:18:36 +08:00
bittersweet1999
5c6dc908cd
fix compass arena ( #854 )
2024-01-30 16:34:38 +08:00
bittersweet1999
77be07dbb5
[Fix] fix corev2 ( #838 )
...
* fix corev2
* fix corev2
2024-01-24 18:15:29 +08:00
bittersweet1999
2ee8e8a1a1
[Feature] add mtbench ( #829 )
...
* add mtbench
* add mtbench
* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/mtbench.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix mtbench
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-24 12:11:47 +08:00
bittersweet1999
2d4da8dd02
[Feature] Add CompassArena ( #828 )
...
* add compass arena
* add compass_arena
* add compass arena
* Update opencompass/summarizers/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/summarizers/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix check position bias
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-23 15:12:46 +08:00
bittersweet1999
814b3f73bd
reorganize subject files ( #801 )
2024-01-16 18:03:11 +08:00