Commit Graph

108 Commits

Author SHA1 Message Date
Jun
d4a69ba65f add 2 2025-05-27 03:26:40 +00:00
tcheng
3d1760aba2
[Dataset] Add Scieval (#2089)
* style: pass all formatting hooks (yapf & quote fixer)

* revise name:Add Lifescience Sub-set Support for MMLU & SciEval (datasets + configs + loader)

* revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)

* Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)

* all categories of SciEval (datasets + configs + loader+dataset-index.yml)

* revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)

* revise :SciEval 5shot

---------

Co-authored-by: root <tangcheng231@mails.ucas.edu.cn>
2025-05-14 10:25:03 +08:00
Wei Li
b84518c656
[Dataset] Support MedMCQA and MedBullets benchmark (#2054)
* support medmcqa and medbullets benchmark

* Add Medbullets data folder for benchmark support

* revise gen name

* revise config file & remove csv file & add dataset info to dataset-index.yml

* remove csv file

* remove print in medbullets.py

* revise class name

* update_oss_info

---------

Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-05-13 17:10:50 +08:00
Dongsheng Zhu
2c79dc5227
[Dataset] Add human_eval/mbpp pro (#2092)
* add bench

* update

* bug fix

* time update

* add index

* fix repeat bug
2025-05-12 18:38:13 +08:00
huihui1999
345674f700
[Dataset] Add SciknowEval Dataset (#2070)
* first

* first

* first

* first

* SciKnowEval

* fix hash

* fix dataset-index & use official llm_judge_postprocess

* fix dataset-index.yml

* use official llmjudge_postprocess

* fix lint

* fix lint

* fix lint

* fix lint

* fix lint

* merge with main

---------

Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2025-05-12 17:23:44 +08:00
huihui1999
44a7024ed5
[Dataset] MedCalc_Bench (#2072)
* MedCalc_Bench

* MedCal_Bench

* add hash

* fix hash

* fix comments &dataset-index yml

* fix lint

* fix lint

* fix lint

* fix lint

* fix lint

---------

Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2025-05-09 16:58:55 +08:00
Jin Ye
6097186a95
[Datasets] MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 (#2064)
* Add Datasets: MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1

* Fix bugs for MedQA. Add info in dataset-index

* Add version code for MedQA and ProteinLMBench

* Add version code for MedQA and ProteinLMBench
2025-05-09 14:47:44 +08:00
Linchen Xiao
d72df59363
[Revert] Add Lifescience Sub-set Support for SciEval (#2059) (#2087)
This reverts commit c5048bfec7.
2025-05-09 14:46:27 +08:00
tcheng
c5048bfec7
[Dataset] Add Lifescience Sub-set Support for SciEval (#2059)
* style: pass all formatting hooks (yapf & quote fixer)

* revise name:Add Lifescience Sub-set Support for MMLU & SciEval (datasets + configs + loader)

* revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)

* Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)

---------

Co-authored-by: root <tangcheng231@mails.ucas.edu.cn>
2025-05-09 14:31:12 +08:00
huihui1999
a7f3ac20b2
[Dataset] Add CARDBiomedBench (#2071)
* CARDBiomedBench

* fix hash

* fix dataset-index

* use official llmjudge postprocess

* use official llmjudge_postprocess

* fix lint

* fix init
2025-05-08 19:44:46 +08:00
Wei Li
a685ed7daf
[Dataset] Add nejm ai benchmark (#2063)
* support nejm ai benchmark

* add dataset files

* revise gen name

* revise gen name

* revise class name & remove csv file & add dataset-index.yml info

* update

* update

---------

Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-05-08 16:44:05 +08:00
Dongsheng Zhu
ba0e32292c
[Feature] Support InternSandbox (#2049)
* internsandbox init

* internsandbox

* dataset_index

* dataset_index_add
2025-05-07 16:42:09 +08:00
Taolin Zhang
c69110361b
[Add] add rewardbench (#2029)
* add rewardbench

* add rewardbench
2025-04-21 17:18:51 +08:00
JuchengHu
a2093a81ef
[Dataset] Matbench (#2021)
* add support for matbench

* fix dataset path

* fix data load

* fix

* fix lint

---------

Co-authored-by: Jucheng Hu <jucheng.hu.20@ucl.ac.uk>
Co-authored-by: Myhs-phz <demarcia2014@126.com>
2025-04-21 15:50:47 +08:00
Linchen Xiao
b2da1c08a8
[Dataset] Add SmolInstruct, Update Chembench (#2025)
* [Dataset] Add SmolInstruct, Update Chembench

* Add dataset metadata

* update

* update

* update
2025-04-18 17:21:29 +08:00
Myhs_phz
75e7834b59
[Feature] Add Datasets: ClimateQA,Physics (#2017)
* feat ClimateQA

* feat PHYSICS

* fix

* fix

* fix

* fix
2025-04-14 20:18:47 +08:00
Jin Ye
b564e608b1
[Dataset] Add MedXpertQA (#2002)
* Add MedXpertQA

* Add MedXpertQA

* Add MedXpertQA

* Fix lint

---------

Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-04-08 10:44:48 +08:00
liushz
32d6859679
[Feature] Add olymmath dataset (#1982)
* Add olymmath dataset

* Add olymmath dataset

* Add olymmath dataset

* Update olymmath dataset
2025-04-02 17:34:07 +08:00
Dongsheng Zhu
8a5029b121
[Feature] Add MultiPL-E & Code Evaluator (#1963)
* multiple_code develop

* multiple_code update

* comments upadate

* index upadate
2025-03-21 20:09:25 +08:00
Yufeng Zhao
bc2969dba8
[Feature] Add support for BBEH dataset (#1925)
* bbeh

* bbeh

* fix_smallbugs_bbeh

* removeprint

* results

---------

Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>
2025-03-12 10:53:31 +08:00
Kangreen
59e49aedf1
[Feature] Support SuperGPQA (#1924)
* support supergpqa

* remove unnecessary code

* remove unnecessary code

* Add Readme

* Add Readme

* fix lint

* fix lint

* update

* update

---------

Co-authored-by: mkj3085003 <mkj3085003@gmail.com>
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-03-11 19:32:08 +08:00
liushz
198c08632e
[Feature] Add HLE (Humanity's Last Exam) dataset (#1902)
* Support OlympiadBench Benchmark

* Support OlympiadBench Benchmark

* Support OlympiadBench Benchmark

* update dataset path

* Update olmpiadBench

* Update olmpiadBench

* Update olmpiadBench

* Add HLE dataset

* Add HLE dataset

* Add HLE dataset

---------

Co-authored-by: sudanl <sudanl@foxmail.com>
2025-03-04 16:42:37 +08:00
Shudong Liu
412199f802
[Feature] Support OlympiadBench Benchmark (#1841)
* Support OlympiadBench Benchmark

* Support OlympiadBench Benchmark

* Support OlympiadBench Benchmark

* update dataset path

* Update olmpiadBench

* Update olmpiadBench

* Update olmpiadBench

---------

Co-authored-by: liushz <qq1791167085@163.com>
2025-01-24 10:00:01 +08:00
Zhao Qihao
e039f3efa0
[Feature] Support MMLU-CF Benchmark (#1775)
* [Feature] Support MMLU-CF Benchmark

* [Feature] Support MMLU-CF Benchmark

* [Feature] Support MMLU-CF Benchmark

* [Feature] Support MMLU-CF Benchmark

* [Feature] Support MMLU-CF Benchmark

* [Feature] Support MMLU-CF Benchmark

* [Feature] Support MMLU-CF Benchmark

* [Feature] Support MMLU-CF Benchmark

* [Feature] Support MMLU-CF Benchmark

* [Feature] Support MMLU-CF Benchmark

* [Feature] Support MMLU-CF Benchmark

* [Feature] Support MMLU-CF Benchmark

* [Feature] Support MMLU-CF Benchmark

* [Feature] Support MMLU-CF Benchmark

* [Feature] Support MMLU-CF Benchmark

* [Feature] Support MMLU-CF Benchmark

* [Feature] Support MMLU-CF Benchmark

* [Feature] Support MMLU-CF Benchmark

* [Feature] Support MMLU-CF Benchmark

* Update mmlu-cf

* Update mmlu-cf

* Update mmlu-cf

* [Feature] Support MMLU-CF Benchmark

* [Feature] Support MMLU-CF Benchmark

* [Feature] Support MMLU-CF Benchmark

* Remove outside configs

---------

Co-authored-by: liushz <qq1791167085@163.com>
2025-01-09 14:11:20 +08:00
Linchen Xiao
117dc500ad
[Feature] Add Longbenchv2 support (#1801)
* Create eval_longbenchv2.py

* Create longbenchv2_gen.py

* Update __init__.py

* Create longbenchv2.py

* Update datasets_info.py

* update

* update

* update

* update

* update

* update

---------

Co-authored-by: abrohamLee <146956824+abrohamLee@users.noreply.github.com>
2025-01-03 12:04:29 +08:00
liushz
9c980cbc62
[Feature] Add LiveStemBench Dataset (#1794)
* [Fix] Fix vllm max_seq_len parameter transfer

* [Fix] Fix vllm max_seq_len parameter transfer

* Add livestembench dataset

* Add livestembench dataset

* Add livestembench dataset

* Update livestembench_gen_3e3c50.py

* Update eval_livestembench.py

* Update eval_livestembench.py
2024-12-31 15:17:39 +08:00
OpenStellarTeam
1a5b3fc11e
Add Chinese SimpleQA config (#1697)
* add chinese simpleqa config

* add chinese simpleqa config

* add chinese simpleqa config

* add chinese simpleqa config

* Update CsimpleQA

* Update CsimpleQA

* Update CsimpleQA

* Update CsimpleQA

* Update CsimpleQA

* Update CsimpleQA

* pdate Csimpleqa

---------

Co-authored-by: 明念 <heyancheng.hyc@taobao.com>
Co-authored-by: liushz <qq1791167085@163.com>
2024-12-11 18:03:39 +08:00
Songyang Zhang
fb43dd1906
[Update] Update Skywork/Qwen-QwQ (#1728)
* Update JuderBench

* Support O1-style Prompts

* Update Code

* Update OpenAI

* Update BigCodeBench

* Update BigCodeBench

* Update BigCodeBench

* Update BigCodeBench

* Update BigCodeBench

* Update
2024-12-05 19:30:43 +08:00
Linchen Xiao
9de27b4d85
[Update] Update max_out_len for datasets (#1726)
* [Update] Update max_out_len for datasets

* Update eval_regression_chat_objective_fullbench.py

* Update eval_regression_chat.py

* Update eval_regression_chat.py

* Update oc_score_baseline_fullbench.yaml

---------

Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
2024-12-02 11:42:07 +08:00
liushz
c437135fad
[Feature] Add Openai Simpleqa dataset (#1720)
* Add Openai SimpleQA dataset

* Add Openai SimpleQA dataset

* Add Openai SimpleQA dataset

* Update eval_simpleqa.py

---------

Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2024-11-28 19:16:07 +08:00
Yufeng Zhao
300adc31e8
[Feature] Add Korbench dataset (#1713)
* first version for korbench

* first stage for korbench

* korbench_1

* korbench_1

* korbench_1

* korbench_1

* korbench_1_revised

* korbench_combined_1

* korbench_combined_1

* kor_combined

* kor_combined

* update

---------

Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2024-11-25 20:11:27 +08:00
abrohamLee
e9e4b69ddb
[Feature] MuSR Datset Evaluation (#1689)
* MuSR Datset Evaluation

* MuSR Datset Evaluation

Add an assertion and a Readme.md
2024-11-14 20:42:12 +08:00
Linchen Xiao
e92a5d4230
[Feature] BABILong Dataset added (#1684)
* update

* update

* update

* update
2024-11-14 15:32:43 +08:00
Junnan Liu
645c5f3b2c
[Datasets] Add datasets CMO&AIME (#1610)
* add datasets cmo&aime

* delete unused modules

* modify prompt

* update __init__

* update data load and add README

* update data load

* update performance

* update md5

* remove indents

* add indent

* fix log for debug mode
2024-10-28 18:08:02 +08:00
Songyang Zhang
a4d5a6c81b
[Feature] Support LiveCodeBench (#1617)
* Update

* Update LCB

* Update

* Update

* Update

* Update

* Update
2024-10-21 20:50:39 +08:00
Bob Tsang
dd0b655bd0
[Feature] Support MMMLU & MMMLU-lite Benchmark (#1565)
* rm folder

* modify format according to reviewer

* modify format according to reviewer

* modify format according to reviewer

* add some files requirement

* fix some bug

* fix bug

* change load type

* Update MMMLU Dataset

* Update MMMLU Dataset

* Add MMMLU-Lite Dataset

* update MMMMLU datast

* update MMMMLU datast

* update MMMMLU datast

---------

Co-authored-by: BobTsang <BobTsang1995@gmail.com>
Co-authored-by: liushz <qq1791167085@163.com>
2024-10-17 19:09:34 +08:00
liushz
5faee929db
[Feature] Add GaoKaoMath Dataset for Evaluation & MATH Model Eval Config (#1589)
* Add GaoKaoMath Dataset

* Add MATH LLM Eval

* Update GAOKAO Math Eval Dataset

* Update GAOKAO Math Eval Dataset
2024-10-12 19:13:06 +08:00
bittersweet1999
3f7a3730d7
[Fix] fix Flames (#1599)
* fix pip version

* fix pip version

* fix flames

* fix flames
2024-10-12 14:34:59 +08:00
shijinpjlab
7528b8ab8a
[Feature] Add dingo test (#1529)
* add qa dingo

* update

* change name qa to dingo

* eval model: llm_base

* update path

* change name and move path

* add eval_dingo

* update import

* add for pip

* add dingo package

* change import place

* update import place

* fix lint fail

* isort

* double quoted

---------

Co-authored-by: sj <shijin@pjlab.org.cn>
2024-09-29 19:24:58 +08:00
Hari Seldon
14b4b735cb
[Feature] Add support for SciCode (#1417)
* add SciCode

* add SciCode

* add SciCode

* add SciCode

* add SciCode

* add SciCode

* add SciCode

* add SciCode w/ bg

* add scicode

* Update README.md

* Update README.md

* Delete configs/eval_SciCode.py

* rename

* 1

* rename

* Update README.md

* Update scicode.py

* Update scicode.py

* fix some bugs

* Update

* Update

---------

Co-authored-by: root <HariSeldon0>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-22 13:42:25 +08:00
Linchen Xiao
a4b54048ae
[Feature] Add Ruler datasets (#1310)
* [Feature] Add Ruler datasets

* pre-commit fixed

* Add model specific tokenizer to dataset

* pre-commit modified

* remove unused import

* fix linting

* add trust_remote to tokenizer load

* lint fix

* comments resolved

* fix lint

* Add readme

* Fix lint

* ruler refactorize

* fix lint

* lint fix

* updated

* lint fix

* fix wonderwords import issue

* prompt modified

* update

* readme updated

* update

* ruler dataset added

* Update

---------

Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-20 11:40:11 +08:00
Peng Bo
07c96ac659
Calm dataset (#1385)
* Add CALM Dataset
2024-08-01 10:03:21 +08:00
Songyang Zhang
704853e5e7
[Feature] Update pip install (#1324)
* [Feature] Update pip install

* Update Configuration

* Update

* Update

* Update

* Update Internal Config

* Update collect env
2024-07-29 18:32:50 +08:00
Xingjun.Wang
edab1c07ba
[Feature] Support ModelScope datasets (#1289)
* add ceval, gsm8k modelscope surpport

* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest

* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets

* format file

* format file

* update dataset format

* support ms_dataset

* udpate dataset for modelscope support

* merge myl_dev and update test_ms_dataset

* udpate dataset for modelscope support

* update readme

* update eval_api_zhipu_v2

* remove unused code

* add get_data_path function

* update readme

* remove tydiqa japanese subset

* add ceval, gsm8k modelscope surpport

* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest

* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets

* format file

* format file

* update dataset format

* support ms_dataset

* udpate dataset for modelscope support

* merge myl_dev and update test_ms_dataset

* update readme

* udpate dataset for modelscope support

* update eval_api_zhipu_v2

* remove unused code

* add get_data_path function

* remove tydiqa japanese subset

* update util

* remove .DS_Store

* fix md format

* move util into package

* update docs/get_started.md

* restore eval_api_zhipu_v2.py, add environment setting

* Update dataset

* Update

* Update

* Update

* Update

---------

Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local>
Co-authored-by: Yunnglin <mao.looper@qq.com>
Co-authored-by: Yun lin <yunlin@laptop.local>
Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-29 13:48:32 +08:00
bittersweet1999
d3782c1d47
Revert "Calm dataset (#1287)" (#1366)
This reverts commit edd0ffdf70.
2024-07-26 18:27:29 +08:00
Peng Bo
edd0ffdf70
Calm dataset (#1287)
* add calm dataset

* modify config max_out_len

* update README

* Modify README

* update README

* update README

* update README

* update README

* update README

* add summarizer and modify readme

* delete summarizer config comment

* update summarizer

* modify same response to all questions

* update README
2024-07-26 11:48:16 +08:00
Que Haoran
a244453d9e
[Feature] Support inference ppl datasets (#1315)
* commit inference ppl datasets

* revised format

* revise

* revise

* revise

* revise

* revise

* revise
2024-07-22 17:59:30 +08:00
Fengzhe Zhou
a32f21a356
[Sync] Sync with internal codes 2024.06.28 (#1279) 2024-06-28 14:16:34 +08:00
jxd
608ff5810d
support CHARM (https://github.com/opendatalab/CHARM) reasoning tasks (#1190)
* support CHARM (https://github.com/opendatalab/CHARM) reasoning tasks

* fix lint error

* add dataset card for CHARM

* minor refactor

* add txt

---------

Co-authored-by: wujiang <wujiang@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-27 13:48:22 +08:00
bittersweet1999
826d8307ac
fix links (#1120) 2024-05-08 15:13:18 +08:00