wanyu2018umac
|
90efcf2216
|
[Feature] Add P-MMEval (#1714)
* Update with PMMEval
* Update
* Update __init__.py
* Fix Bugs
* Delete .pre-commit-config.yaml
* Pull merge
---------
Co-authored-by: liushz <qq1791167085@163.com>
|
2024-11-27 21:26:18 +08:00 |
|
Junnan Liu
|
f7dbe6bb7d
|
[Feature] Add Arc Prize Public Evaluation (#1690)
* support arc prize
* update arc-prize dataset info & update arc-prize evaluation performance
|
2024-11-27 15:44:41 +08:00 |
|
Songyang Zhang
|
f97c4eae42
|
[Update] Update Fullbench (#1712)
* Update JuderBench
* Support O1-style Prompts
* Update Code
|
2024-11-26 14:26:55 +08:00 |
|
Yufeng Zhao
|
300adc31e8
|
[Feature] Add Korbench dataset (#1713)
* first version for korbench
* first stage for korbench
* korbench_1
* korbench_1
* korbench_1
* korbench_1
* korbench_1_revised
* korbench_combined_1
* korbench_combined_1
* kor_combined
* kor_combined
* update
---------
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
|
2024-11-25 20:11:27 +08:00 |
|
Linchen Xiao
|
ab8fdbbaab
|
[Update] Update Math auto-download data (#1700)
|
2024-11-18 20:24:35 +08:00 |
|
abrohamLee
|
e9e4b69ddb
|
[Feature] MuSR Datset Evaluation (#1689)
* MuSR Datset Evaluation
* MuSR Datset Evaluation
Add an assertion and a Readme.md
|
2024-11-14 20:42:12 +08:00 |
|
Linchen Xiao
|
e92a5d4230
|
[Feature] BABILong Dataset added (#1684)
* update
* update
* update
* update
|
2024-11-14 15:32:43 +08:00 |
|
Linchen Xiao
|
2fee63f537
|
[Update] Auto-download for followbench (#1685)
|
2024-11-13 15:47:29 +08:00 |
|
liushz
|
f7d899823c
|
[Update] Update mmmlu_lite dataload (#1658)
* update mmmlu_lite dataload from oss
* update mmmlu_lite dataload from oss
|
2024-11-01 17:32:29 +08:00 |
|
Songyang Zhang
|
c789ce5698
|
[Fix] the automatically download for several datasets (#1652)
* [Fix] the automatically download for several datasets
* Update
* Update
* Update CI
|
2024-11-01 15:57:18 +08:00 |
|
Linchen Xiao
|
d91d66792a
|
[Update] Update Needlebench OSS path (#1651)
|
2024-10-29 18:05:44 +08:00 |
|
Junnan Liu
|
645c5f3b2c
|
[Datasets] Add datasets CMO&AIME (#1610)
* add datasets cmo&aime
* delete unused modules
* modify prompt
* update __init__
* update data load and add README
* update data load
* update performance
* update md5
* remove indents
* add indent
* fix log for debug mode
|
2024-10-28 18:08:02 +08:00 |
|
Songyang Zhang
|
a4d5a6c81b
|
[Feature] Support LiveCodeBench (#1617)
* Update
* Update LCB
* Update
* Update
* Update
* Update
* Update
|
2024-10-21 20:50:39 +08:00 |
|
Songyang Zhang
|
6997990c93
|
[Feature] Update Models (#1518)
* Update Models
* Update
* Update humanevalx
* Update
* Update
|
2024-09-12 23:35:30 +08:00 |
|
Linchen Xiao
|
317763381c
|
update (#1517)
|
2024-09-11 13:31:20 +08:00 |
|
Linchen Xiao
|
87ffa71d68
|
[Feature] Longbench dataset update
|
2024-09-06 15:50:12 +08:00 |
|
Hari Seldon
|
faf5260155
|
[Feature] Optimize Evaluation Speed of SciCode (#1489)
* update scicode
* update comments
* remove redundant variable
* Update
---------
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
|
2024-09-06 00:59:41 +08:00 |
|
Linchen Xiao
|
6c9cd9a260
|
[Feature] Needlebench auto-download update (#1480)
* update
* update
* update
|
2024-09-05 17:22:42 +08:00 |
|
Linchen Xiao
|
9693be46b7
|
[Feature] Mmlu-pro auto-download (#1464)
* update
* update
* update
* update
* update
|
2024-08-30 10:03:40 +08:00 |
|
Songyang Zhang
|
e5a8eb2283
|
[Feature] Update Lint and Leaderboard (#1458)
* [Feature] Update Lint and Leaderboard
* Update
* Update
|
2024-08-28 22:36:42 +08:00 |
|
Linchen Xiao
|
245664f4c0
|
[Feature] Fullbench v0.1 language update (#1463)
* update
* update
* update
* update
|
2024-08-28 14:01:05 +08:00 |
|
Songyang Zhang
|
7c2d25b557
|
[Fix] Update SciCode and Gemma model (#1449)
* [Fix] Update SciCode and Gemma model
* Update
* Update
|
2024-08-23 10:42:27 +08:00 |
|
Hari Seldon
|
14b4b735cb
|
[Feature] Add support for SciCode (#1417)
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode w/ bg
* add scicode
* Update README.md
* Update README.md
* Delete configs/eval_SciCode.py
* rename
* 1
* rename
* Update README.md
* Update scicode.py
* Update scicode.py
* fix some bugs
* Update
* Update
---------
Co-authored-by: root <HariSeldon0>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
|
2024-08-22 13:42:25 +08:00 |
|
Linchen Xiao
|
a4b54048ae
|
[Feature] Add Ruler datasets (#1310)
* [Feature] Add Ruler datasets
* pre-commit fixed
* Add model specific tokenizer to dataset
* pre-commit modified
* remove unused import
* fix linting
* add trust_remote to tokenizer load
* lint fix
* comments resolved
* fix lint
* Add readme
* Fix lint
* ruler refactorize
* fix lint
* lint fix
* updated
* lint fix
* fix wonderwords import issue
* prompt modified
* update
* readme updated
* update
* ruler dataset added
* Update
---------
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
|
2024-08-20 11:40:11 +08:00 |
|
Songyang Zhang
|
9b3613f10b
|
[Update] Support auto-download of FOFO/MT-Bench-101 (#1423)
* [Update] Support auto-download of FOFO/MT-Bench-101
* Update wildbench
|
2024-08-16 11:57:41 +08:00 |
|
Songyang Zhang
|
c81329b548
|
[Fix] Fix Slurm ENV (#1392)
1. Support Slurm Cluster
2. Support automatic data download
3. Update InternLM2.5-1.8B/20B-Chat
|
2024-08-06 01:35:20 +08:00 |
|