JuchengHu
|
a2093a81ef
|
[Dataset] Matbench (#2021)
* add support for matbench
* fix dataset path
* fix data load
* fix
* fix lint
---------
Co-authored-by: Jucheng Hu <jucheng.hu.20@ucl.ac.uk>
Co-authored-by: Myhs-phz <demarcia2014@126.com>
|
2025-04-21 15:50:47 +08:00 |
|
Linchen Xiao
|
b2da1c08a8
|
[Dataset] Add SmolInstruct, Update Chembench (#2025)
* [Dataset] Add SmolInstruct, Update Chembench
* Add dataset metadata
* update
* update
* update
|
2025-04-18 17:21:29 +08:00 |
|
Myhs_phz
|
75e7834b59
|
[Feature] Add Datasets: ClimateQA,Physics (#2017)
* feat ClimateQA
* feat PHYSICS
* fix
* fix
* fix
* fix
|
2025-04-14 20:18:47 +08:00 |
|
Linchen Xiao
|
12213207b6
|
[Refactor] Refactorize openicl eval task (#1990)
* [Refactor] Refactorize openicl eval task
* update
|
2025-04-09 15:52:23 +08:00 |
|
Dongsheng Zhu
|
8a5029b121
|
[Feature] Add MultiPL-E & Code Evaluator (#1963)
* multiple_code develop
* multiple_code update
* comments upadate
* index upadate
|
2025-03-21 20:09:25 +08:00 |
|
Songyang Zhang
|
c98599271b
|
[Update] Update OlympiadBench and Update LLM Judge (#1954)
|
2025-03-18 20:15:20 +08:00 |
|
Jason Cheung
|
5d2d253d83
|
[BUG] Fix model_kwargs pass logic for vllm (#1958)
|
2025-03-18 20:08:15 +08:00 |
|
liushz
|
709bc4af0e
|
[Update] Add AIME2025 oss info (#1936)
* Support OlympiadBench Benchmark
* Support OlympiadBench Benchmark
* Support OlympiadBench Benchmark
* update dataset path
* Update olmpiadBench
* Update olmpiadBench
* Update olmpiadBench
* Add HLE dataset
* Add HLE dataset
* Add HLE dataset
* Add AIME2025 oss info
---------
Co-authored-by: sudanl <sudanl@foxmail.com>
|
2025-03-12 18:41:16 +08:00 |
|
Yufeng Zhao
|
bc2969dba8
|
[Feature] Add support for BBEH dataset (#1925)
* bbeh
* bbeh
* fix_smallbugs_bbeh
* removeprint
* results
---------
Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>
|
2025-03-12 10:53:31 +08:00 |
|
Myhs_phz
|
570c30cf1b
|
[Fix] Fix CLI option for results persistence (#1920)
* fix
* fix
* fix
|
2025-03-07 18:24:30 +08:00 |
|
Myhs_phz
|
1585c0adbe
|
[Feature] Evaluation Results Persistence (#1894)
* feat results_station.py
* lint
* feat save_to_station
* feat result_station.py and lint
* feat
* fix
* fix and lint
* fix
* fix subjective processing
* fix
* fix
* style function name
* lint
|
2025-03-05 18:33:34 +08:00 |
|
Dongsheng Zhu
|
fff2d51440
|
[Update] Code evaluation alignment (#1909)
* code alignment
* update oss md5
* bigcodebench update
* lint
* lint_
* lint yapf
|
2025-03-04 18:49:38 +08:00 |
|
Junnan Liu
|
73c80953c6
|
[Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator (#1886)
* support dataset repeat and g-pass compute for each evaluator
* fix pre-commit errors
* delete print
* delete gpassk_evaluator and fix potential errors
* change `repeat` to `n`
* fix `repeat` to `n` in openicl_eval
* update doc for multi-run and g-pass
* update latex equation in doc
* update eng doc for multi-run and g-pass
* update datasets.md
* update datasets.md
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation in zh_cn user_guides
* mmodify pre-commit-zh-cn
* recover pre-commit and edit math expr in doc
* del [TIP]
* del cite tag in doc
* del extract_model param in livemathbench config
|
2025-02-26 19:43:12 +08:00 |
|
Linchen Xiao
|
27c916661d
|
[Feature] Math Verify with model post_processor (#1881)
* update
* [Feature] Update model post_processor
* update
* update
* update
|
2025-02-20 19:32:12 +08:00 |
|
bittersweet1999
|
f407930475
|
[Feature] Support subjective evaluation for reasoning model (#1868)
* fix pip version
* fix pip version
* add subeval for reasoning model
* add subeval for reasoning model
* update configs
* update config
* update config
* update config
* update files
|
2025-02-20 12:19:46 +08:00 |
|
Shudong Liu
|
412199f802
|
[Feature] Support OlympiadBench Benchmark (#1841)
* Support OlympiadBench Benchmark
* Support OlympiadBench Benchmark
* Support OlympiadBench Benchmark
* update dataset path
* Update olmpiadBench
* Update olmpiadBench
* Update olmpiadBench
---------
Co-authored-by: liushz <qq1791167085@163.com>
|
2025-01-24 10:00:01 +08:00 |
|
Songyang Zhang
|
8fdb72f567
|
[Update] Update o1 eval prompt (#1806)
* Update XML prediction post-process
* Update LiveMathBench
* Update LiveMathBench
* Update New O1 Evaluation
|
2025-01-07 00:14:32 +08:00 |
|
Linchen Xiao
|
117dc500ad
|
[Feature] Add Longbenchv2 support (#1801)
* Create eval_longbenchv2.py
* Create longbenchv2_gen.py
* Update __init__.py
* Create longbenchv2.py
* Update datasets_info.py
* update
* update
* update
* update
* update
* update
---------
Co-authored-by: abrohamLee <146956824+abrohamLee@users.noreply.github.com>
|
2025-01-03 12:04:29 +08:00 |
|
liushz
|
9c980cbc62
|
[Feature] Add LiveStemBench Dataset (#1794)
* [Fix] Fix vllm max_seq_len parameter transfer
* [Fix] Fix vllm max_seq_len parameter transfer
* Add livestembench dataset
* Add livestembench dataset
* Add livestembench dataset
* Update livestembench_gen_3e3c50.py
* Update eval_livestembench.py
* Update eval_livestembench.py
|
2024-12-31 15:17:39 +08:00 |
|
liushz
|
5c8e91f329
|
[Fix] Fix vllm max_seq_len parameter transfer (#1745)
* [Fix] Fix vllm max_seq_len parameter transfer
* [Fix] Fix vllm max_seq_len parameter transfer
* Update pr-run-test.yml
* Update pr-run-test.yml
---------
Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
|
2024-12-16 21:44:36 +08:00 |
|
zhulinJulia24
|
aeded4c4db
|
add new dataset summerizer (#1758)
add new dataset summerizer
|
2024-12-13 09:50:43 +08:00 |
|
OpenStellarTeam
|
1a5b3fc11e
|
Add Chinese SimpleQA config (#1697)
* add chinese simpleqa config
* add chinese simpleqa config
* add chinese simpleqa config
* add chinese simpleqa config
* Update CsimpleQA
* Update CsimpleQA
* Update CsimpleQA
* Update CsimpleQA
* Update CsimpleQA
* Update CsimpleQA
* pdate Csimpleqa
---------
Co-authored-by: 明念 <heyancheng.hyc@taobao.com>
Co-authored-by: liushz <qq1791167085@163.com>
|
2024-12-11 18:03:39 +08:00 |
|
Songyang Zhang
|
fb43dd1906
|
[Update] Update Skywork/Qwen-QwQ (#1728)
* Update JuderBench
* Support O1-style Prompts
* Update Code
* Update OpenAI
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update
|
2024-12-05 19:30:43 +08:00 |
|
liushz
|
b063779034
|
[Fix] Update P-MMEVAL OSS data (#1722)
* Update with PMMEval
* Update
* Update __init__.py
* Fix Bugs
* Delete .pre-commit-config.yaml
* Pull merge
* Fix pmmeval_gen config
* Update P-MMEVAL data
---------
Co-authored-by: wanyu <wanyu2018umac@gmail.com>
Co-authored-by: wanyu2018umac <42405907+wanyu2018umac@users.noreply.github.com>
|
2024-11-28 20:55:46 +08:00 |
|
liushz
|
c437135fad
|
[Feature] Add Openai Simpleqa dataset (#1720)
* Add Openai SimpleQA dataset
* Add Openai SimpleQA dataset
* Add Openai SimpleQA dataset
* Update eval_simpleqa.py
---------
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
|
2024-11-28 19:16:07 +08:00 |
|
wanyu2018umac
|
90efcf2216
|
[Feature] Add P-MMEval (#1714)
* Update with PMMEval
* Update
* Update __init__.py
* Fix Bugs
* Delete .pre-commit-config.yaml
* Pull merge
---------
Co-authored-by: liushz <qq1791167085@163.com>
|
2024-11-27 21:26:18 +08:00 |
|
Junnan Liu
|
f7dbe6bb7d
|
[Feature] Add Arc Prize Public Evaluation (#1690)
* support arc prize
* update arc-prize dataset info & update arc-prize evaluation performance
|
2024-11-27 15:44:41 +08:00 |
|
Songyang Zhang
|
f97c4eae42
|
[Update] Update Fullbench (#1712)
* Update JuderBench
* Support O1-style Prompts
* Update Code
|
2024-11-26 14:26:55 +08:00 |
|
Yufeng Zhao
|
300adc31e8
|
[Feature] Add Korbench dataset (#1713)
* first version for korbench
* first stage for korbench
* korbench_1
* korbench_1
* korbench_1
* korbench_1
* korbench_1_revised
* korbench_combined_1
* korbench_combined_1
* kor_combined
* kor_combined
* update
---------
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
|
2024-11-25 20:11:27 +08:00 |
|
liushz
|
e49fcfd3a3
|
[Update] Update MATH dataset with model judge (#1711)
* Update math with llm judge
* Update math with llm judge
* Update math with llm judge
* Update math with llm judge
* Update math with llm judge
|
2024-11-25 15:14:55 +08:00 |
|
Linchen Xiao
|
ab8fdbbaab
|
[Update] Update Math auto-download data (#1700)
|
2024-11-18 20:24:35 +08:00 |
|
Linchen Xiao
|
98242ff1d1
|
[Update] first_option_postprocess (#1699)
* update first_option_postprocess
* update
|
2024-11-18 20:14:29 +08:00 |
|
abrohamLee
|
e9e4b69ddb
|
[Feature] MuSR Datset Evaluation (#1689)
* MuSR Datset Evaluation
* MuSR Datset Evaluation
Add an assertion and a Readme.md
|
2024-11-14 20:42:12 +08:00 |
|
Linchen Xiao
|
d415439f9b
|
[Fix] Fix bug for first_option_postprocess (#1688)
|
2024-11-14 16:45:59 +08:00 |
|
Linchen Xiao
|
e92a5d4230
|
[Feature] BABILong Dataset added (#1684)
* update
* update
* update
* update
|
2024-11-14 15:32:43 +08:00 |
|
Linchen Xiao
|
2fee63f537
|
[Update] Auto-download for followbench (#1685)
|
2024-11-13 15:47:29 +08:00 |
|
liushz
|
f7d899823c
|
[Update] Update mmmlu_lite dataload (#1658)
* update mmmlu_lite dataload from oss
* update mmmlu_lite dataload from oss
|
2024-11-01 17:32:29 +08:00 |
|
Songyang Zhang
|
c789ce5698
|
[Fix] the automatically download for several datasets (#1652)
* [Fix] the automatically download for several datasets
* Update
* Update
* Update CI
|
2024-11-01 15:57:18 +08:00 |
|
Linchen Xiao
|
df57c08ccf
|
[Feature] Update Models, Summarizers (#1600)
|
2024-10-29 18:37:15 +08:00 |
|
Linchen Xiao
|
d91d66792a
|
[Update] Update Needlebench OSS path (#1651)
|
2024-10-29 18:05:44 +08:00 |
|
Junnan Liu
|
645c5f3b2c
|
[Datasets] Add datasets CMO&AIME (#1610)
* add datasets cmo&aime
* delete unused modules
* modify prompt
* update __init__
* update data load and add README
* update data load
* update performance
* update md5
* remove indents
* add indent
* fix log for debug mode
|
2024-10-28 18:08:02 +08:00 |
|
Songyang Zhang
|
a4d5a6c81b
|
[Feature] Support LiveCodeBench (#1617)
* Update
* Update LCB
* Update
* Update
* Update
* Update
* Update
|
2024-10-21 20:50:39 +08:00 |
|
bittersweet1999
|
fa54aa62f6
|
[Feature] Add Judgerbench and reorg subeval (#1593)
* fix pip version
* fix pip version
* update (#1522)
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
* [Feature] Update Models (#1518)
* Update Models
* Update
* Update humanevalx
* Update
* Update
* [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527)
add judgerbench and reorg sub
add judgerbench and reorg subeval
add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
---------
Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
|
2024-10-15 16:36:05 +08:00 |
|
liushz
|
5faee929db
|
[Feature] Add GaoKaoMath Dataset for Evaluation & MATH Model Eval Config (#1589)
* Add GaoKaoMath Dataset
* Add MATH LLM Eval
* Update GAOKAO Math Eval Dataset
* Update GAOKAO Math Eval Dataset
|
2024-10-12 19:13:06 +08:00 |
|
Lyu Han
|
b52ba65c26
|
[Feature] Integrate lmdeploy pipeline api (#1198)
* integrate lmdeploy's pipeline api
* fix linting
* update user guide
* rename
* update
* update
* update
* rollback class name
* update
* remove unused code
* update
* update
* fix ci check
* compatibility
* remove concurrency
* Update configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py
* Update docs/zh_cn/advanced_guides/evaluation_lmdeploy.md
* [Bug] fix lint
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
|
2024-10-09 22:58:06 +08:00 |
|
liushz
|
2e9db77d57
|
[Feature] Add custom model postprocess function (#1519)
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
|
2024-09-18 14:40:51 +08:00 |
|
Songyang Zhang
|
6997990c93
|
[Feature] Update Models (#1518)
* Update Models
* Update
* Update humanevalx
* Update
* Update
|
2024-09-12 23:35:30 +08:00 |
|
Linchen Xiao
|
317763381c
|
update (#1517)
|
2024-09-11 13:31:20 +08:00 |
|
Linchen Xiao
|
f04f3546bc
|
[Fix] Import fix (#1500)
|
2024-09-06 18:29:24 +08:00 |
|
Linchen Xiao
|
87ffa71d68
|
[Feature] Longbench dataset update
|
2024-09-06 15:50:12 +08:00 |
|