klein
|
65fad8e2ac
|
[Fix] minor update wildbench (#1335)
* update crb
* update crbbench
* update crbbench
* update crbbench
* minor update wildbench
* [Fix] Update doc of wildbench, and merge wildbench into subjective
* [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench
* Update crb.md
* Update crb_pair_judge.py
* Update crb_single_judge.py
* Update subjective_evaluation.md
* Update openai_api.py
* [Update] update wildbench readme
* [Update] update wildbench readme
* [Update] update wildbench readme, remove crb
* Delete configs/eval_subjective_wildbench_pair.py
* Delete configs/eval_subjective_wildbench_single.py
* Update __init__.py
---------
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
|
2024-07-26 11:19:04 +08:00 |
|
Songyang Zhang
|
96f644de69
|
[Fix] Update path and folder (#1344)
* Update path and folder
* Update path
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
|
2024-07-21 08:18:14 +08:00 |
|
Linchen Xiao
|
a56678190b
|
[Feature] CompassBench v1_3 subjective evaluation (#1341)
* stash files
* compassbench subjective evaluation added
* evaluation update
* remove unneeded content
* fix lint
* update docs
* Update lint
* Update
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
|
2024-07-19 23:12:23 +08:00 |
|
bittersweet1999
|
1f9f728f22
|
[Feature] support compassbench Checklist evaluation (#1339)
* fix pip version
* fix pip version
* support checklist eval
* init
* add lan
* fix typo
|
2024-07-19 16:40:44 +08:00 |
|
bittersweet1999
|
8e7ad2e981
|
[Fix] add bc for alignbench summarizer (#1306)
* fix pip version
* fix pip version
* fix alignbench
* fix import error
|
2024-07-12 11:06:20 +08:00 |
|
bittersweet1999
|
889e7e1140
|
[Fix] Change abbr for arenahard dataset (#1302)
* fix pip version
* fix pip version
* change abbr for arenahard
|
2024-07-11 12:42:03 +08:00 |
|
bittersweet1999
|
68ca48496b
|
[Refactor] Reorganize subjective eval (#1284)
* fix pip version
* fix pip version
* reorganize subjective eval
* reorg sub
* reorg subeval
* reorg subeval
* update subjective doc
* reorg subeval
* reorg subeval
|
2024-07-05 22:11:37 +08:00 |
|
Fengzhe Zhou
|
a32f21a356
|
[Sync] Sync with internal codes 2024.06.28 (#1279)
|
2024-06-28 14:16:34 +08:00 |
|
klein
|
1fa62c4a42
|
Support wildbench (#1266)
Co-authored-by: Leymore <zfz-960727@163.com>
|
2024-06-24 13:16:27 +08:00 |
|
bittersweet1999
|
982e024540
|
[Feature] add dataset Fofo (#1224)
* add fofo dataset
* add dataset fofo
|
2024-06-06 11:40:48 +08:00 |
|
Xingyuan Bu
|
02a0a4e857
|
MT-Bench-101 (#1215)
* add mt-bench-101
* add readme and requirements
* add mt-bench-101 data
* Update readme_mtbench101.md
* update readme
* update leaderboard
* fix typo
* Update readme_mtbench101.md
* fit newest opencompass
* update readme.md
* mtbench101 to opencompass
* mtbench101 to opencompass
* for code review
* for code review
* for code review
* hook
* hook
---------
Co-authored-by: liujie <ljie@buaa.edu.cn>
|
2024-06-03 14:52:12 +08:00 |
|
Fengzhe Zhou
|
a77b8a5cec
|
[Sync] format (#1214)
|
2024-05-30 00:21:58 +08:00 |
|
bittersweet1999
|
07a6dacf33
|
fix length (#1180)
|
2024-05-24 23:30:01 +08:00 |
|
Fengzhe Zhou
|
aa2dd2b58c
|
[Format] Add config lints (#892)
|
2024-05-14 15:35:58 +08:00 |
|
bittersweet1999
|
5432dfc1ff
|
fix multiround (#1146)
|
2024-05-13 15:58:39 +08:00 |
|
bittersweet1999
|
e404b72c52
|
[Feature] support arenahard evaluation (#1096)
* support arenahard
* support arenahard
* support arenahard
|
2024-04-26 15:42:00 +08:00 |
|
bittersweet1999
|
6f98c8d9ab
|
[Fix] Fix MultiRound Subjective Evaluation(#1043)
* fix multiround
* fix
|
2024-04-22 12:06:03 +08:00 |
|
Fengzhe Zhou
|
b39f501563
|
[Sync] update taco (#1030)
|
2024-04-09 17:50:23 +08:00 |
|
bittersweet1999
|
2d4e559763
|
[Feature] Add multi-model judge and fix some problems (#1016)
* support multi-model judge and moe judge
* test_moe
* test_moe
* test
* add moe judge
* support multi-judge-model
|
2024-04-02 11:52:06 +08:00 |
|
bittersweet1999
|
02e7eec911
|
[Feature] Support AlpacaEval_V2 (#1006)
* support alpacaeval_v2
* support alpacaeval
* update docs
* update docs
|
2024-03-28 16:49:04 +08:00 |
|
bittersweet1999
|
c78a4df923
|
add support for set prediction path (#984)
|
2024-03-19 14:32:15 +08:00 |
|
bittersweet1999
|
848e7c8a76
|
[fix] add different temp for different question in mtbench (#954)
* add temp for mtbench
* add document for mtbench
* add document for mtbench
|
2024-03-11 17:24:39 +08:00 |
|
bittersweet1999
|
001e77fea2
|
[Feature] add support for gemini (#931)
* add gemini
* add gemini
* add gemini
|
2024-02-28 19:38:34 +08:00 |
|
bittersweet1999
|
7806cd0f64
|
[Feature] support alpacaeval (#809)
* support alpacaeval_v1
* Update opencompass/summarizers/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/summarizers/subjective/alpacaeval_v1.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix conflict
* support alpacaeval v2
* support alpacav2
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
|
2024-02-04 14:18:36 +08:00 |
|
bittersweet1999
|
5c6dc908cd
|
fix compass arena (#854)
|
2024-01-30 16:34:38 +08:00 |
|
bittersweet1999
|
77be07dbb5
|
[Fix] fix corev2 (#838)
* fix corev2
* fix corev2
|
2024-01-24 18:15:29 +08:00 |
|
bittersweet1999
|
2ee8e8a1a1
|
[Feature] add mtbench (#829)
* add mtbench
* add mtbench
* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/mtbench.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix mtbench
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
|
2024-01-24 12:11:47 +08:00 |
|
bittersweet1999
|
2d4da8dd02
|
[Feature] Add CompassArena (#828)
* add compass arena
* add compass_arena
* add compass arena
* Update opencompass/summarizers/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/summarizers/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix check position bias
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
|
2024-01-23 15:12:46 +08:00 |
|
bittersweet1999
|
814b3f73bd
|
reorganize subject files (#801)
|
2024-01-16 18:03:11 +08:00 |
|