JuchengHu
|
a2093a81ef
|
[Dataset] Matbench (#2021)
* add support for matbench
* fix dataset path
* fix data load
* fix
* fix lint
---------
Co-authored-by: Jucheng Hu <jucheng.hu.20@ucl.ac.uk>
Co-authored-by: Myhs-phz <demarcia2014@126.com>
|
2025-04-21 15:50:47 +08:00 |
|
Linchen Xiao
|
b2da1c08a8
|
[Dataset] Add SmolInstruct, Update Chembench (#2025)
* [Dataset] Add SmolInstruct, Update Chembench
* Add dataset metadata
* update
* update
* update
|
2025-04-18 17:21:29 +08:00 |
|
Myhs_phz
|
75e7834b59
|
[Feature] Add Datasets: ClimateQA,Physics (#2017)
* feat ClimateQA
* feat PHYSICS
* fix
* fix
* fix
* fix
|
2025-04-14 20:18:47 +08:00 |
|
Dongsheng Zhu
|
8a5029b121
|
[Feature] Add MultiPL-E & Code Evaluator (#1963)
* multiple_code develop
* multiple_code update
* comments upadate
* index upadate
|
2025-03-21 20:09:25 +08:00 |
|
liushz
|
709bc4af0e
|
[Update] Add AIME2025 oss info (#1936)
* Support OlympiadBench Benchmark
* Support OlympiadBench Benchmark
* Support OlympiadBench Benchmark
* update dataset path
* Update olmpiadBench
* Update olmpiadBench
* Update olmpiadBench
* Add HLE dataset
* Add HLE dataset
* Add HLE dataset
* Add AIME2025 oss info
---------
Co-authored-by: sudanl <sudanl@foxmail.com>
|
2025-03-12 18:41:16 +08:00 |
|
Yufeng Zhao
|
bc2969dba8
|
[Feature] Add support for BBEH dataset (#1925)
* bbeh
* bbeh
* fix_smallbugs_bbeh
* removeprint
* results
---------
Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>
|
2025-03-12 10:53:31 +08:00 |
|
Dongsheng Zhu
|
fff2d51440
|
[Update] Code evaluation alignment (#1909)
* code alignment
* update oss md5
* bigcodebench update
* lint
* lint_
* lint yapf
|
2025-03-04 18:49:38 +08:00 |
|
Shudong Liu
|
412199f802
|
[Feature] Support OlympiadBench Benchmark (#1841)
* Support OlympiadBench Benchmark
* Support OlympiadBench Benchmark
* Support OlympiadBench Benchmark
* update dataset path
* Update olmpiadBench
* Update olmpiadBench
* Update olmpiadBench
---------
Co-authored-by: liushz <qq1791167085@163.com>
|
2025-01-24 10:00:01 +08:00 |
|
Songyang Zhang
|
8fdb72f567
|
[Update] Update o1 eval prompt (#1806)
* Update XML prediction post-process
* Update LiveMathBench
* Update LiveMathBench
* Update New O1 Evaluation
|
2025-01-07 00:14:32 +08:00 |
|
Linchen Xiao
|
117dc500ad
|
[Feature] Add Longbenchv2 support (#1801)
* Create eval_longbenchv2.py
* Create longbenchv2_gen.py
* Update __init__.py
* Create longbenchv2.py
* Update datasets_info.py
* update
* update
* update
* update
* update
* update
---------
Co-authored-by: abrohamLee <146956824+abrohamLee@users.noreply.github.com>
|
2025-01-03 12:04:29 +08:00 |
|
liushz
|
9c980cbc62
|
[Feature] Add LiveStemBench Dataset (#1794)
* [Fix] Fix vllm max_seq_len parameter transfer
* [Fix] Fix vllm max_seq_len parameter transfer
* Add livestembench dataset
* Add livestembench dataset
* Add livestembench dataset
* Update livestembench_gen_3e3c50.py
* Update eval_livestembench.py
* Update eval_livestembench.py
|
2024-12-31 15:17:39 +08:00 |
|
zhulinJulia24
|
aeded4c4db
|
add new dataset summerizer (#1758)
add new dataset summerizer
|
2024-12-13 09:50:43 +08:00 |
|
OpenStellarTeam
|
1a5b3fc11e
|
Add Chinese SimpleQA config (#1697)
* add chinese simpleqa config
* add chinese simpleqa config
* add chinese simpleqa config
* add chinese simpleqa config
* Update CsimpleQA
* Update CsimpleQA
* Update CsimpleQA
* Update CsimpleQA
* Update CsimpleQA
* Update CsimpleQA
* pdate Csimpleqa
---------
Co-authored-by: 明念 <heyancheng.hyc@taobao.com>
Co-authored-by: liushz <qq1791167085@163.com>
|
2024-12-11 18:03:39 +08:00 |
|
Songyang Zhang
|
fb43dd1906
|
[Update] Update Skywork/Qwen-QwQ (#1728)
* Update JuderBench
* Support O1-style Prompts
* Update Code
* Update OpenAI
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update
|
2024-12-05 19:30:43 +08:00 |
|
liushz
|
b063779034
|
[Fix] Update P-MMEVAL OSS data (#1722)
* Update with PMMEval
* Update
* Update __init__.py
* Fix Bugs
* Delete .pre-commit-config.yaml
* Pull merge
* Fix pmmeval_gen config
* Update P-MMEVAL data
---------
Co-authored-by: wanyu <wanyu2018umac@gmail.com>
Co-authored-by: wanyu2018umac <42405907+wanyu2018umac@users.noreply.github.com>
|
2024-11-28 20:55:46 +08:00 |
|
liushz
|
c437135fad
|
[Feature] Add Openai Simpleqa dataset (#1720)
* Add Openai SimpleQA dataset
* Add Openai SimpleQA dataset
* Add Openai SimpleQA dataset
* Update eval_simpleqa.py
---------
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
|
2024-11-28 19:16:07 +08:00 |
|
wanyu2018umac
|
90efcf2216
|
[Feature] Add P-MMEval (#1714)
* Update with PMMEval
* Update
* Update __init__.py
* Fix Bugs
* Delete .pre-commit-config.yaml
* Pull merge
---------
Co-authored-by: liushz <qq1791167085@163.com>
|
2024-11-27 21:26:18 +08:00 |
|
Junnan Liu
|
f7dbe6bb7d
|
[Feature] Add Arc Prize Public Evaluation (#1690)
* support arc prize
* update arc-prize dataset info & update arc-prize evaluation performance
|
2024-11-27 15:44:41 +08:00 |
|
Songyang Zhang
|
f97c4eae42
|
[Update] Update Fullbench (#1712)
* Update JuderBench
* Support O1-style Prompts
* Update Code
|
2024-11-26 14:26:55 +08:00 |
|
Yufeng Zhao
|
300adc31e8
|
[Feature] Add Korbench dataset (#1713)
* first version for korbench
* first stage for korbench
* korbench_1
* korbench_1
* korbench_1
* korbench_1
* korbench_1_revised
* korbench_combined_1
* korbench_combined_1
* kor_combined
* kor_combined
* update
---------
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
|
2024-11-25 20:11:27 +08:00 |
|
Linchen Xiao
|
ab8fdbbaab
|
[Update] Update Math auto-download data (#1700)
|
2024-11-18 20:24:35 +08:00 |
|
abrohamLee
|
e9e4b69ddb
|
[Feature] MuSR Datset Evaluation (#1689)
* MuSR Datset Evaluation
* MuSR Datset Evaluation
Add an assertion and a Readme.md
|
2024-11-14 20:42:12 +08:00 |
|
Linchen Xiao
|
e92a5d4230
|
[Feature] BABILong Dataset added (#1684)
* update
* update
* update
* update
|
2024-11-14 15:32:43 +08:00 |
|
Linchen Xiao
|
2fee63f537
|
[Update] Auto-download for followbench (#1685)
|
2024-11-13 15:47:29 +08:00 |
|
liushz
|
f7d899823c
|
[Update] Update mmmlu_lite dataload (#1658)
* update mmmlu_lite dataload from oss
* update mmmlu_lite dataload from oss
|
2024-11-01 17:32:29 +08:00 |
|
Songyang Zhang
|
c789ce5698
|
[Fix] the automatically download for several datasets (#1652)
* [Fix] the automatically download for several datasets
* Update
* Update
* Update CI
|
2024-11-01 15:57:18 +08:00 |
|
Linchen Xiao
|
d91d66792a
|
[Update] Update Needlebench OSS path (#1651)
|
2024-10-29 18:05:44 +08:00 |
|
Junnan Liu
|
645c5f3b2c
|
[Datasets] Add datasets CMO&AIME (#1610)
* add datasets cmo&aime
* delete unused modules
* modify prompt
* update __init__
* update data load and add README
* update data load
* update performance
* update md5
* remove indents
* add indent
* fix log for debug mode
|
2024-10-28 18:08:02 +08:00 |
|
Songyang Zhang
|
a4d5a6c81b
|
[Feature] Support LiveCodeBench (#1617)
* Update
* Update LCB
* Update
* Update
* Update
* Update
* Update
|
2024-10-21 20:50:39 +08:00 |
|
Songyang Zhang
|
6997990c93
|
[Feature] Update Models (#1518)
* Update Models
* Update
* Update humanevalx
* Update
* Update
|
2024-09-12 23:35:30 +08:00 |
|
Linchen Xiao
|
317763381c
|
update (#1517)
|
2024-09-11 13:31:20 +08:00 |
|
Linchen Xiao
|
87ffa71d68
|
[Feature] Longbench dataset update
|
2024-09-06 15:50:12 +08:00 |
|
Hari Seldon
|
faf5260155
|
[Feature] Optimize Evaluation Speed of SciCode (#1489)
* update scicode
* update comments
* remove redundant variable
* Update
---------
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
|
2024-09-06 00:59:41 +08:00 |
|
Linchen Xiao
|
6c9cd9a260
|
[Feature] Needlebench auto-download update (#1480)
* update
* update
* update
|
2024-09-05 17:22:42 +08:00 |
|
Linchen Xiao
|
9693be46b7
|
[Feature] Mmlu-pro auto-download (#1464)
* update
* update
* update
* update
* update
|
2024-08-30 10:03:40 +08:00 |
|
Songyang Zhang
|
e5a8eb2283
|
[Feature] Update Lint and Leaderboard (#1458)
* [Feature] Update Lint and Leaderboard
* Update
* Update
|
2024-08-28 22:36:42 +08:00 |
|
Linchen Xiao
|
245664f4c0
|
[Feature] Fullbench v0.1 language update (#1463)
* update
* update
* update
* update
|
2024-08-28 14:01:05 +08:00 |
|
Songyang Zhang
|
7c2d25b557
|
[Fix] Update SciCode and Gemma model (#1449)
* [Fix] Update SciCode and Gemma model
* Update
* Update
|
2024-08-23 10:42:27 +08:00 |
|
Hari Seldon
|
14b4b735cb
|
[Feature] Add support for SciCode (#1417)
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode w/ bg
* add scicode
* Update README.md
* Update README.md
* Delete configs/eval_SciCode.py
* rename
* 1
* rename
* Update README.md
* Update scicode.py
* Update scicode.py
* fix some bugs
* Update
* Update
---------
Co-authored-by: root <HariSeldon0>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
|
2024-08-22 13:42:25 +08:00 |
|
Linchen Xiao
|
a4b54048ae
|
[Feature] Add Ruler datasets (#1310)
* [Feature] Add Ruler datasets
* pre-commit fixed
* Add model specific tokenizer to dataset
* pre-commit modified
* remove unused import
* fix linting
* add trust_remote to tokenizer load
* lint fix
* comments resolved
* fix lint
* Add readme
* Fix lint
* ruler refactorize
* fix lint
* lint fix
* updated
* lint fix
* fix wonderwords import issue
* prompt modified
* update
* readme updated
* update
* ruler dataset added
* Update
---------
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
|
2024-08-20 11:40:11 +08:00 |
|
Songyang Zhang
|
9b3613f10b
|
[Update] Support auto-download of FOFO/MT-Bench-101 (#1423)
* [Update] Support auto-download of FOFO/MT-Bench-101
* Update wildbench
|
2024-08-16 11:57:41 +08:00 |
|
Songyang Zhang
|
c81329b548
|
[Fix] Fix Slurm ENV (#1392)
1. Support Slurm Cluster
2. Support automatic data download
3. Update InternLM2.5-1.8B/20B-Chat
|
2024-08-06 01:35:20 +08:00 |
|