bittersweet1999
|
3f50b1dc49
|
[Fix] fix order bug Update arena_hard.py (#2015)
|
2025-04-11 16:59:40 +08:00 |
|
Alexander Lam
|
f871e80887
|
[Feature] Add Bradley-Terry Subjective Evaluation method to Arena Hard dataset (#1802)
* added base_models_abbrs to references (passed from LMEvaluator); added bradleyterry subjective evaluation method for wildbench, alpacaeval, and compassarena datasets; added all_scores output files for reference in CompassArenaBradleyTerrySummarizer;
* added bradleyterry subjective evaluation method to arena_hard dataset
|
2025-01-03 16:33:43 +08:00 |
|
bittersweet1999
|
fa54aa62f6
|
[Feature] Add Judgerbench and reorg subeval (#1593)
* fix pip version
* fix pip version
* update (#1522)
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
* [Feature] Update Models (#1518)
* Update Models
* Update
* Update humanevalx
* Update
* Update
* [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527)
add judgerbench and reorg sub
add judgerbench and reorg subeval
add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
---------
Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
|
2024-10-15 16:36:05 +08:00 |
|
Songyang Zhang
|
c81329b548
|
[Fix] Fix Slurm ENV (#1392)
1. Support Slurm Cluster
2. Support automatic data download
3. Update InternLM2.5-1.8B/20B-Chat
|
2024-08-06 01:35:20 +08:00 |
|
bittersweet1999
|
68ca48496b
|
[Refactor] Reorganize subjective eval (#1284)
* fix pip version
* fix pip version
* reorganize subjective eval
* reorg sub
* reorg subeval
* reorg subeval
* update subjective doc
* reorg subeval
* reorg subeval
|
2024-07-05 22:11:37 +08:00 |
|
bittersweet1999
|
e404b72c52
|
[Feature] support arenahard evaluation (#1096)
* support arenahard
* support arenahard
* support arenahard
|
2024-04-26 15:42:00 +08:00 |
|