Wei Li
a685ed7daf
[Dataset] Add nejm ai benchmark ( #2063 )
...
* support nejm ai benchmark
* add dataset files
* revise gen name
* revise gen name
* revise class name & remove csv file & add dataset-index.yml info
* update
* update
---------
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-05-08 16:44:05 +08:00
JuchengHu
a2093a81ef
[Dataset] Matbench ( #2021 )
...
* add support for matbench
* fix dataset path
* fix data load
* fix
* fix lint
---------
Co-authored-by: Jucheng Hu <jucheng.hu.20@ucl.ac.uk>
Co-authored-by: Myhs-phz <demarcia2014@126.com>
2025-04-21 15:50:47 +08:00
Linchen Xiao
b2da1c08a8
[Dataset] Add SmolInstruct, Update Chembench ( #2025 )
...
* [Dataset] Add SmolInstruct, Update Chembench
* Add dataset metadata
* update
* update
* update
2025-04-18 17:21:29 +08:00
Myhs_phz
75e7834b59
[Feature] Add Datasets: ClimateQA,Physics ( #2017 )
...
* feat ClimateQA
* feat PHYSICS
* fix
* fix
* fix
* fix
2025-04-14 20:18:47 +08:00
Linchen Xiao
12213207b6
[Refactor] Refactorize openicl eval task ( #1990 )
...
* [Refactor] Refactorize openicl eval task
* update
2025-04-09 15:52:23 +08:00
Dongsheng Zhu
8a5029b121
[Feature] Add MultiPL-E & Code Evaluator ( #1963 )
...
* multiple_code develop
* multiple_code update
* comments upadate
* index upadate
2025-03-21 20:09:25 +08:00
Songyang Zhang
c98599271b
[Update] Update OlympiadBench and Update LLM Judge ( #1954 )
2025-03-18 20:15:20 +08:00
Jason Cheung
5d2d253d83
[BUG] Fix model_kwargs pass logic for vllm ( #1958 )
2025-03-18 20:08:15 +08:00
liushz
709bc4af0e
[Update] Add AIME2025 oss info ( #1936 )
...
* Support OlympiadBench Benchmark
* Support OlympiadBench Benchmark
* Support OlympiadBench Benchmark
* update dataset path
* Update olmpiadBench
* Update olmpiadBench
* Update olmpiadBench
* Add HLE dataset
* Add HLE dataset
* Add HLE dataset
* Add AIME2025 oss info
---------
Co-authored-by: sudanl <sudanl@foxmail.com>
2025-03-12 18:41:16 +08:00
Yufeng Zhao
bc2969dba8
[Feature] Add support for BBEH dataset ( #1925 )
...
* bbeh
* bbeh
* fix_smallbugs_bbeh
* removeprint
* results
---------
Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>
2025-03-12 10:53:31 +08:00
Myhs_phz
570c30cf1b
[Fix] Fix CLI option for results persistence ( #1920 )
...
* fix
* fix
* fix
2025-03-07 18:24:30 +08:00
Myhs_phz
1585c0adbe
[Feature] Evaluation Results Persistence ( #1894 )
...
* feat results_station.py
* lint
* feat save_to_station
* feat result_station.py and lint
* feat
* fix
* fix and lint
* fix
* fix subjective processing
* fix
* fix
* style function name
* lint
2025-03-05 18:33:34 +08:00
Dongsheng Zhu
fff2d51440
[Update] Code evaluation alignment ( #1909 )
...
* code alignment
* update oss md5
* bigcodebench update
* lint
* lint_
* lint yapf
2025-03-04 18:49:38 +08:00
Junnan Liu
73c80953c6
[Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator ( #1886 )
...
* support dataset repeat and g-pass compute for each evaluator
* fix pre-commit errors
* delete print
* delete gpassk_evaluator and fix potential errors
* change `repeat` to `n`
* fix `repeat` to `n` in openicl_eval
* update doc for multi-run and g-pass
* update latex equation in doc
* update eng doc for multi-run and g-pass
* update datasets.md
* update datasets.md
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation in zh_cn user_guides
* mmodify pre-commit-zh-cn
* recover pre-commit and edit math expr in doc
* del [TIP]
* del cite tag in doc
* del extract_model param in livemathbench config
2025-02-26 19:43:12 +08:00
Linchen Xiao
27c916661d
[Feature] Math Verify with model post_processor ( #1881 )
...
* update
* [Feature] Update model post_processor
* update
* update
* update
2025-02-20 19:32:12 +08:00
bittersweet1999
f407930475
[Feature] Support subjective evaluation for reasoning model ( #1868 )
...
* fix pip version
* fix pip version
* add subeval for reasoning model
* add subeval for reasoning model
* update configs
* update config
* update config
* update config
* update files
2025-02-20 12:19:46 +08:00
Shudong Liu
412199f802
[Feature] Support OlympiadBench Benchmark ( #1841 )
...
* Support OlympiadBench Benchmark
* Support OlympiadBench Benchmark
* Support OlympiadBench Benchmark
* update dataset path
* Update olmpiadBench
* Update olmpiadBench
* Update olmpiadBench
---------
Co-authored-by: liushz <qq1791167085@163.com>
2025-01-24 10:00:01 +08:00
Songyang Zhang
8fdb72f567
[Update] Update o1 eval prompt ( #1806 )
...
* Update XML prediction post-process
* Update LiveMathBench
* Update LiveMathBench
* Update New O1 Evaluation
2025-01-07 00:14:32 +08:00
Linchen Xiao
117dc500ad
[Feature] Add Longbenchv2 support ( #1801 )
...
* Create eval_longbenchv2.py
* Create longbenchv2_gen.py
* Update __init__.py
* Create longbenchv2.py
* Update datasets_info.py
* update
* update
* update
* update
* update
* update
---------
Co-authored-by: abrohamLee <146956824+abrohamLee@users.noreply.github.com>
2025-01-03 12:04:29 +08:00
liushz
9c980cbc62
[Feature] Add LiveStemBench Dataset ( #1794 )
...
* [Fix] Fix vllm max_seq_len parameter transfer
* [Fix] Fix vllm max_seq_len parameter transfer
* Add livestembench dataset
* Add livestembench dataset
* Add livestembench dataset
* Update livestembench_gen_3e3c50.py
* Update eval_livestembench.py
* Update eval_livestembench.py
2024-12-31 15:17:39 +08:00
liushz
5c8e91f329
[Fix] Fix vllm max_seq_len parameter transfer ( #1745 )
...
* [Fix] Fix vllm max_seq_len parameter transfer
* [Fix] Fix vllm max_seq_len parameter transfer
* Update pr-run-test.yml
* Update pr-run-test.yml
---------
Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
2024-12-16 21:44:36 +08:00
zhulinJulia24
aeded4c4db
add new dataset summerizer ( #1758 )
...
add new dataset summerizer
2024-12-13 09:50:43 +08:00
OpenStellarTeam
1a5b3fc11e
Add Chinese SimpleQA config ( #1697 )
...
* add chinese simpleqa config
* add chinese simpleqa config
* add chinese simpleqa config
* add chinese simpleqa config
* Update CsimpleQA
* Update CsimpleQA
* Update CsimpleQA
* Update CsimpleQA
* Update CsimpleQA
* Update CsimpleQA
* pdate Csimpleqa
---------
Co-authored-by: 明念 <heyancheng.hyc@taobao.com>
Co-authored-by: liushz <qq1791167085@163.com>
2024-12-11 18:03:39 +08:00
Songyang Zhang
fb43dd1906
[Update] Update Skywork/Qwen-QwQ ( #1728 )
...
* Update JuderBench
* Support O1-style Prompts
* Update Code
* Update OpenAI
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update
2024-12-05 19:30:43 +08:00
liushz
b063779034
[Fix] Update P-MMEVAL OSS data ( #1722 )
...
* Update with PMMEval
* Update
* Update __init__.py
* Fix Bugs
* Delete .pre-commit-config.yaml
* Pull merge
* Fix pmmeval_gen config
* Update P-MMEVAL data
---------
Co-authored-by: wanyu <wanyu2018umac@gmail.com>
Co-authored-by: wanyu2018umac <42405907+wanyu2018umac@users.noreply.github.com>
2024-11-28 20:55:46 +08:00
liushz
c437135fad
[Feature] Add Openai Simpleqa dataset ( #1720 )
...
* Add Openai SimpleQA dataset
* Add Openai SimpleQA dataset
* Add Openai SimpleQA dataset
* Update eval_simpleqa.py
---------
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2024-11-28 19:16:07 +08:00
wanyu2018umac
90efcf2216
[Feature] Add P-MMEval ( #1714 )
...
* Update with PMMEval
* Update
* Update __init__.py
* Fix Bugs
* Delete .pre-commit-config.yaml
* Pull merge
---------
Co-authored-by: liushz <qq1791167085@163.com>
2024-11-27 21:26:18 +08:00
Junnan Liu
f7dbe6bb7d
[Feature] Add Arc Prize Public Evaluation ( #1690 )
...
* support arc prize
* update arc-prize dataset info & update arc-prize evaluation performance
2024-11-27 15:44:41 +08:00
Songyang Zhang
f97c4eae42
[Update] Update Fullbench ( #1712 )
...
* Update JuderBench
* Support O1-style Prompts
* Update Code
2024-11-26 14:26:55 +08:00
Yufeng Zhao
300adc31e8
[Feature] Add Korbench dataset ( #1713 )
...
* first version for korbench
* first stage for korbench
* korbench_1
* korbench_1
* korbench_1
* korbench_1
* korbench_1_revised
* korbench_combined_1
* korbench_combined_1
* kor_combined
* kor_combined
* update
---------
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2024-11-25 20:11:27 +08:00
liushz
e49fcfd3a3
[Update] Update MATH dataset with model judge ( #1711 )
...
* Update math with llm judge
* Update math with llm judge
* Update math with llm judge
* Update math with llm judge
* Update math with llm judge
2024-11-25 15:14:55 +08:00
Linchen Xiao
ab8fdbbaab
[Update] Update Math auto-download data ( #1700 )
2024-11-18 20:24:35 +08:00
Linchen Xiao
98242ff1d1
[Update] first_option_postprocess ( #1699 )
...
* update first_option_postprocess
* update
2024-11-18 20:14:29 +08:00
abrohamLee
e9e4b69ddb
[Feature] MuSR Datset Evaluation ( #1689 )
...
* MuSR Datset Evaluation
* MuSR Datset Evaluation
Add an assertion and a Readme.md
2024-11-14 20:42:12 +08:00
Linchen Xiao
d415439f9b
[Fix] Fix bug for first_option_postprocess ( #1688 )
2024-11-14 16:45:59 +08:00
Linchen Xiao
e92a5d4230
[Feature] BABILong Dataset added ( #1684 )
...
* update
* update
* update
* update
2024-11-14 15:32:43 +08:00
Linchen Xiao
2fee63f537
[Update] Auto-download for followbench ( #1685 )
2024-11-13 15:47:29 +08:00
liushz
f7d899823c
[Update] Update mmmlu_lite dataload ( #1658 )
...
* update mmmlu_lite dataload from oss
* update mmmlu_lite dataload from oss
2024-11-01 17:32:29 +08:00
Songyang Zhang
c789ce5698
[Fix] the automatically download for several datasets ( #1652 )
...
* [Fix] the automatically download for several datasets
* Update
* Update
* Update CI
2024-11-01 15:57:18 +08:00
Linchen Xiao
df57c08ccf
[Feature] Update Models, Summarizers ( #1600 )
2024-10-29 18:37:15 +08:00
Linchen Xiao
d91d66792a
[Update] Update Needlebench OSS path ( #1651 )
2024-10-29 18:05:44 +08:00
Junnan Liu
645c5f3b2c
[Datasets] Add datasets CMO&AIME ( #1610 )
...
* add datasets cmo&aime
* delete unused modules
* modify prompt
* update __init__
* update data load and add README
* update data load
* update performance
* update md5
* remove indents
* add indent
* fix log for debug mode
2024-10-28 18:08:02 +08:00
Songyang Zhang
a4d5a6c81b
[Feature] Support LiveCodeBench ( #1617 )
...
* Update
* Update LCB
* Update
* Update
* Update
* Update
* Update
2024-10-21 20:50:39 +08:00
bittersweet1999
fa54aa62f6
[Feature] Add Judgerbench and reorg subeval ( #1593 )
...
* fix pip version
* fix pip version
* update (#1522 )
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
* [Feature] Update Models (#1518 )
* Update Models
* Update
* Update humanevalx
* Update
* Update
* [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527 )
add judgerbench and reorg sub
add judgerbench and reorg subeval
add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
---------
Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2024-10-15 16:36:05 +08:00
liushz
5faee929db
[Feature] Add GaoKaoMath Dataset for Evaluation & MATH Model Eval Config ( #1589 )
...
* Add GaoKaoMath Dataset
* Add MATH LLM Eval
* Update GAOKAO Math Eval Dataset
* Update GAOKAO Math Eval Dataset
2024-10-12 19:13:06 +08:00
Lyu Han
b52ba65c26
[Feature] Integrate lmdeploy pipeline api ( #1198 )
...
* integrate lmdeploy's pipeline api
* fix linting
* update user guide
* rename
* update
* update
* update
* rollback class name
* update
* remove unused code
* update
* update
* fix ci check
* compatibility
* remove concurrency
* Update configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py
* Update docs/zh_cn/advanced_guides/evaluation_lmdeploy.md
* [Bug] fix lint
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-10-09 22:58:06 +08:00
liushz
2e9db77d57
[Feature] Add custom model postprocess function ( #1519 )
...
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-18 14:40:51 +08:00
Songyang Zhang
6997990c93
[Feature] Update Models ( #1518 )
...
* Update Models
* Update
* Update humanevalx
* Update
* Update
2024-09-12 23:35:30 +08:00
Linchen Xiao
317763381c
update ( #1517 )
2024-09-11 13:31:20 +08:00
Linchen Xiao
f04f3546bc
[Fix] Import fix ( #1500 )
2024-09-06 18:29:24 +08:00