OpenCompass

Yu Sun d572761cef Some checks failed lint / lint (push) Has been cancelled Details [Dataset] Add Smolinstruct configs (#2127 ) * 0-shot Smolinstruct Add 0-shot evaluation and postprocess functions for Smolinstruct * fix acc postprocessor * update 0-shot acc postprocessor * rename 0-shot	2025-05-29 14:09:08 +08:00
..
agieval	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
babilong	[Feature] BABILong Dataset added (#1684 )	2024-11-14 15:32:43 +08:00
bigcodebench	[Update] History code bench pass@k update (#2102 )	2025-05-19 17:03:33 +08:00
calm	[Fix] Fix CaLM import (#1395 )	2024-08-06 12:17:45 +08:00
IFEval	[Dataset] Add SuperGPQA subfield configs (#2124 )	2025-05-28 14:12:58 +08:00
infinitebench	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
judge	[Add] Add Judgerbenchv2 (#2067 )	2025-04-30 17:12:34 +08:00
korbench	[Update] Add CascadeEvaluator with Data Replica (#2022 )	2025-05-20 16:46:55 +08:00
lawbench	[Fix] Update lawbench data path (#2037 )	2025-05-07 16:18:43 +08:00
leval	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
livecodebench	[Dataset] Add SuperGPQA subfield configs (#2124 )	2025-05-28 14:12:58 +08:00
livemathbench	[Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard (#1899 )	2025-03-03 18:56:11 +08:00
livereasonbench	[Update] Update o1 eval prompt (#1806 )	2025-01-07 00:14:32 +08:00
longbench	[Feature] Longbench dataset update	2024-09-06 15:50:12 +08:00
lveval	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
matbench	[Dataset] Matbench (#2021 )	2025-04-21 15:50:47 +08:00
medbench	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
musr	[Dataset] Add SuperGPQA subfield configs (#2124 )	2025-05-28 14:12:58 +08:00
needlebench	[Feature] Add long context evaluation for base models (#1666 )	2024-11-08 10:53:29 +08:00
NPHardEval	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
PMMEval	[Dataset] Add SuperGPQA subfield configs (#2124 )	2025-05-28 14:12:58 +08:00
reasonbench	[Sync] Sync with internal codes 2023.01.08 (#777 )	2024-01-08 14:07:24 +00:00
ruler	[Feature] Add Ruler datasets (#1310 )	2024-08-20 11:40:11 +08:00
subjective	[Dataset] Add SuperGPQA subfield configs (#2124 )	2025-05-28 14:12:58 +08:00
supergpqa	[Dataset] Add SuperGPQA subfield configs (#2124 )	2025-05-28 14:12:58 +08:00
teval	[Update] Add CascadeEvaluator with Data Replica (#2022 )	2025-05-20 16:46:55 +08:00
TheoremQA	[Update] Update dataset configs (#2030 )	2025-04-21 18:55:06 +08:00
__init__.py	[Dataset] Add Scieval (#2089 )	2025-05-14 10:25:03 +08:00
advglue.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
afqmcd.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
aime2024.py	[Update] Update Skywork/Qwen-QwQ (#1728 )	2024-12-05 19:30:43 +08:00
anli.py	[Feature] Add Xiezhi SQuAD2.0 ANLI (#101 )	2023-08-10 14:04:18 +08:00
anthropics_evals.py	[Feat] support antropics evals dataset (#422 )	2023-09-20 18:36:44 +08:00
apps.py	[Dataset] Add SuperGPQA subfield configs (#2124 )	2025-05-28 14:12:58 +08:00
arc_prize_public_evaluation.py	[Feature] Add Arc Prize Public Evaluation (#1690 )	2024-11-27 15:44:41 +08:00
arc.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
ax.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
base.py	[Update] Add CascadeEvaluator with Data Replica (#2022 )	2025-05-20 16:46:55 +08:00
bbeh.py	[Feature] Add support for BBEH dataset (#1925 )	2025-03-12 10:53:31 +08:00
bbh.py	[Feature] Update pip install (#1324 )	2024-07-29 18:32:50 +08:00
benbench.py	[Sync] Sync with internal codes 2024.06.28 (#1279 )	2024-06-28 14:16:34 +08:00
boolq.py	[Feature] Fullbench v0.1 language update (#1463 )	2024-08-28 14:01:05 +08:00
bustum.py	[Sync] update (#517 )	2023-10-27 20:31:22 +08:00
c3.py	[Sync] update (#517 )	2023-10-27 20:31:22 +08:00
CARDBiomedBench.py	[Dataset] Add CARDBiomedBench (#2071 )	2025-05-08 19:44:46 +08:00
cb.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
ceval.py	[Fix] modelscope dataset load problem (#1406 )	2024-08-08 14:01:06 +08:00
charm.py	[Fix] Fix Slurm ENV (#1392 )	2024-08-06 01:35:20 +08:00
chembench.py	[Dataset] Add SmolInstruct, Update Chembench (#2025 )	2025-04-18 17:21:29 +08:00
chid.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
chinese_simpleqa.py	Add Chinese SimpleQA config (#1697 )	2024-12-11 18:03:39 +08:00
cibench.py	Update CIBench (#1089 )	2024-04-26 18:46:02 +08:00
circular.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
civilcomments.py	[Feat] support opencompass	2023-07-04 22:11:33 +08:00
climaqa.py	[Feature] Add Datasets: ClimateQA,Physics (#2017 )	2025-04-14 20:18:47 +08:00
ClinicBench.py	[Datasets] Add ClinicBench, PubMedQA and ScienceQA (#2061 )	2025-05-08 16:25:43 +08:00
clozeTest_maxmin.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
cluewsc.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
cmb.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
cmmlu.py	[Update] Update Skywork/Qwen-QwQ (#1728 )	2024-12-05 19:30:43 +08:00
cmnli.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
cmo_fib.py	[Datasets] Add datasets CMO&AIME (#1610 )	2024-10-28 18:08:02 +08:00
cmrc.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
commonsenseqa_cn.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
commonsenseqa.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
compassbench_obj.py	[Update] Update MATH dataset with model judge (#1711 )	2024-11-25 15:14:55 +08:00
copa.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
crowspairs_cn.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
crowspairs.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
csl.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
custom.py	[Feature] Add MultiPL-E & Code Evaluator (#1963 )	2025-03-21 20:09:25 +08:00
cvalues.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
dingo.py	[Dataset] Update dingo 1.5.0 (#2008 )	2025-04-07 17:21:15 +08:00
drcd.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
drop_simple_eval.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
drop.py	[Feature] Use dataset in local path (#570 )	2023-11-13 13:00:37 +08:00
ds1000_interpreter.py	[Feat] Support cibench (#538 )	2023-11-07 19:11:44 +08:00
ds1000.py	[Feature] Update Models (#1518 )	2024-09-12 23:35:30 +08:00
eprstmt.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
FinanceIQ.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
flores.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
game24.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
gaokao_math.py	[Update] Update MATH dataset with model judge (#1711 )	2024-11-25 15:14:55 +08:00
GaokaoBench.py	[Fix] the automatically download for several datasets (#1652 )	2024-11-01 15:57:18 +08:00
generic.py	[Update] Fix LLM Judge metrics cacluation & Add reasoning content concat to OpenAI SDK	2025-04-15 11:33:16 +08:00
govrepcrs.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
gpqa.py	[Update] Update Skywork/Qwen-QwQ (#1728 )	2024-12-05 19:30:43 +08:00
gsm8k.py	[Fix] modelscope dataset load problem (#1406 )	2024-08-08 14:01:06 +08:00
gsm_hard.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
hellaswag.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
hle.py	[Dataset] HLE Biomedical version support (#2080 )	2025-05-12 10:14:11 +08:00
huggingface.py	[Feature] Support OpenAI ChatCompletion (#1389 )	2024-08-01 19:10:13 +08:00
humaneval_multi.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
humaneval_pro.py	[Dataset] Add human_eval/mbpp pro (#2092 )	2025-05-12 18:38:13 +08:00
humaneval.py	[Update] History code bench pass@k update (#2102 )	2025-05-19 17:03:33 +08:00
humanevalx.py	[Feature] Update Models (#1518 )	2024-09-12 23:35:30 +08:00
hungarian_math.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
inference_ppl.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
internsandbox.py	[Feature] Support InternSandbox (#2049 )	2025-05-07 16:42:09 +08:00
iwslt2017.py	Add release contribution	2023-07-05 03:15:31 +00:00
jigsawmultilingual.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
jsonl.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
kaoshi.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
lambada.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
LCBench.py	[Update] Update BigCodeBench & LCBench load path (#1857 )	2025-02-08 15:15:47 +08:00
lcsts.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
livestembench.py	[Feature] Add LiveStemBench Dataset (#1794 )	2024-12-31 15:17:39 +08:00
llm_compression.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
lmeval.py	[Sync] update github token (#475 )	2023-10-13 06:50:54 -05:00
longbenchv2.py	[Feature] Add Longbenchv2 support (#1801 )	2025-01-03 12:04:29 +08:00
mastermath2024v1.py	[Feature] Add new dataset mastermath2024v1 (#744 )	2024-01-01 15:53:24 +08:00
math401.py	[Sync] Sync with internal codes 2023.01.08 (#777 )	2024-01-08 14:07:24 +00:00
math_intern.py	[Sync] Updata dataset cfg for internMath (#837 )	2024-01-24 16:30:32 +08:00
math.py	[Dataset] Add SuperGPQA subfield configs (#2124 )	2025-05-28 14:12:58 +08:00
mathbench.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
mbpp_pro.py	[Dataset] Add human_eval/mbpp pro (#2092 )	2025-05-12 18:38:13 +08:00
mbpp.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
Medbullets.py	[Dataset] Support MedMCQA and MedBullets benchmark (#2054 )	2025-05-13 17:10:50 +08:00
MedCalc_Bench.py	[Dataset] MedCalc_Bench (#2072 )	2025-05-09 16:58:55 +08:00
medmcqa.py	[Dataset] Support MedMCQA and MedBullets benchmark (#2054 )	2025-05-13 17:10:50 +08:00
MedQA.py	[Datasets] MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 (#2064 )	2025-05-09 14:47:44 +08:00
MedXpertQA.py	[Dataset] Add MedXpertQA (#2002 )	2025-04-08 10:44:48 +08:00
mgsm.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
mmlu_cf.py	[Feature] Support MMLU-CF Benchmark (#1775 )	2025-01-09 14:11:20 +08:00
mmlu_pro.py	[Feature] Update MathBench & WikiBench for FullBench (#1521 )	2024-09-18 14:35:30 +08:00
mmlu.py	[Update] Update Skywork/Qwen-QwQ (#1728 )	2024-12-05 19:30:43 +08:00
MMLUArabic.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
mmmlu.py	[Update] Update mmmlu_lite dataload (#1658 )	2024-11-01 17:32:29 +08:00
multipl_e.py	[Dataset] Add human_eval/mbpp pro (#2092 )	2025-05-12 18:38:13 +08:00
multirc.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
narrativeqa.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
natural_question_cn.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
natural_question.py	[Fix] the automatically download for several datasets (#1652 )	2024-11-01 15:57:18 +08:00
nejmaibench.py	[Dataset] Add nejm ai benchmark (#2063 )	2025-05-08 16:44:05 +08:00
obqa.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
olymmath.py	[Feature] Add olymmath dataset (#1982 )	2025-04-02 17:34:07 +08:00
OlympiadBench.py	[Dataset] Add SuperGPQA subfield configs (#2124 )	2025-05-28 14:12:58 +08:00
omni_math.py	[Feature] Support Omni-Math (#1837 )	2025-01-23 18:36:54 +08:00
OpenFinData.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
physics.py	[Feature] Add Datasets: ClimateQA,Physics (#2017 )	2025-04-14 20:18:47 +08:00
piqa.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
ProteinLMBench.py	[Datasets] MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 (#2064 )	2025-05-09 14:47:44 +08:00
PubMedQA.py	[Datasets] Add ClinicBench, PubMedQA and ScienceQA (#2061 )	2025-05-08 16:25:43 +08:00
py150.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
qasper.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
qaspercut.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
QuALITY.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
race.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
realtoxicprompts.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
record.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
rolebench.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
s3eval.py	[Feature] Add S3Eval Dataset (#916 )	2024-05-06 19:41:52 +08:00
safety.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
scibench.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
scicode.py	[Feature] Optimize Evaluation Speed of SciCode (#1489 )	2024-09-06 00:59:41 +08:00
ScienceQA.py	[Datasets] Add ClinicBench, PubMedQA and ScienceQA (#2061 )	2025-05-08 16:25:43 +08:00
SciEval.py	[Dataset] Add Scieval (#2089 )	2025-05-14 10:25:03 +08:00
SciKnowEval.py	[Dataset] Add SciknowEval Dataset (#2070 )	2025-05-12 17:23:44 +08:00
simpleqa.py	[Feature] Add Openai Simpleqa dataset (#1720 )	2024-11-28 19:16:07 +08:00
siqa.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
smolinstruct.py	[Dataset] Add Smolinstruct configs (#2127 )	2025-05-29 14:09:08 +08:00
squad20.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
storycloze.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
strategyqa.py	[Fix] modelscope dataset load problem (#1406 )	2024-08-08 14:01:06 +08:00
summedits.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
summscreen.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
svamp.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
tabmwp.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
taco.py	[Dataset] Add SuperGPQA subfield configs (#2124 )	2025-05-28 14:12:58 +08:00
tnews.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
triviaqa.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
triviaqarc.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
truthfulqa.py	[Bug] Fix-NPU-Support (#1618 )	2024-10-21 17:42:53 +08:00
tydiqa.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
wic.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
wikibench.py	[Fix] the automatically download for several datasets (#1652 )	2024-11-01 15:57:18 +08:00
winograd.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
winogrande.py	[Fix] modelscope dataset load problem (#1406 )	2024-08-08 14:01:06 +08:00
wnli.py	[Feat] implementation for support promptbench (#239 )	2023-09-15 15:06:53 +08:00
wsc.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
xcopa.py	Add Release Contraibution	2023-07-05 02:22:40 +00:00
xiezhi.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00
xlsum.py	update datasets	2023-07-05 01:45:26 +00:00
xsum.py	[Feature] Support ModelScope datasets (#1289 )	2024-07-29 13:48:32 +08:00

agieval

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

babilong

[Feature] BABILong Dataset added (#1684 )

2024-11-14 15:32:43 +08:00

bigcodebench

[Update] History code bench pass@k update (#2102 )

2025-05-19 17:03:33 +08:00

calm

[Fix] Fix CaLM import (#1395 )

2024-08-06 12:17:45 +08:00

IFEval

[Dataset] Add SuperGPQA subfield configs (#2124 )

2025-05-28 14:12:58 +08:00

infinitebench

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

judge

[Add] Add Judgerbenchv2 (#2067 )

2025-04-30 17:12:34 +08:00

korbench

[Update] Add CascadeEvaluator with Data Replica (#2022 )

2025-05-20 16:46:55 +08:00

lawbench

[Fix] Update lawbench data path (#2037 )

2025-05-07 16:18:43 +08:00

leval

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

livecodebench

[Dataset] Add SuperGPQA subfield configs (#2124 )

2025-05-28 14:12:58 +08:00

livemathbench

[Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard (#1899 )

2025-03-03 18:56:11 +08:00

livereasonbench

[Update] Update o1 eval prompt (#1806 )

2025-01-07 00:14:32 +08:00

longbench

[Feature] Longbench dataset update

2024-09-06 15:50:12 +08:00

lveval

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

matbench

[Dataset] Matbench (#2021 )

2025-04-21 15:50:47 +08:00

medbench

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

musr

[Dataset] Add SuperGPQA subfield configs (#2124 )

2025-05-28 14:12:58 +08:00

needlebench

[Feature] Add long context evaluation for base models (#1666 )

2024-11-08 10:53:29 +08:00

NPHardEval

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

PMMEval

[Dataset] Add SuperGPQA subfield configs (#2124 )

2025-05-28 14:12:58 +08:00

reasonbench

[Sync] Sync with internal codes 2023.01.08 (#777 )

2024-01-08 14:07:24 +00:00

ruler

[Feature] Add Ruler datasets (#1310 )

2024-08-20 11:40:11 +08:00

subjective

[Dataset] Add SuperGPQA subfield configs (#2124 )

2025-05-28 14:12:58 +08:00

supergpqa

[Dataset] Add SuperGPQA subfield configs (#2124 )

2025-05-28 14:12:58 +08:00

teval

[Update] Add CascadeEvaluator with Data Replica (#2022 )

2025-05-20 16:46:55 +08:00

TheoremQA

[Update] Update dataset configs (#2030 )

2025-04-21 18:55:06 +08:00

__init__.py

[Dataset] Add Scieval (#2089 )

2025-05-14 10:25:03 +08:00

advglue.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

afqmcd.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

aime2024.py

[Update] Update Skywork/Qwen-QwQ (#1728 )

2024-12-05 19:30:43 +08:00

anli.py

[Feature] Add Xiezhi SQuAD2.0 ANLI (#101 )

2023-08-10 14:04:18 +08:00

anthropics_evals.py

[Feat] support antropics evals dataset (#422 )

2023-09-20 18:36:44 +08:00

apps.py

[Dataset] Add SuperGPQA subfield configs (#2124 )

2025-05-28 14:12:58 +08:00

arc_prize_public_evaluation.py

[Feature] Add Arc Prize Public Evaluation (#1690 )

2024-11-27 15:44:41 +08:00

arc.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

ax.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

base.py

[Update] Add CascadeEvaluator with Data Replica (#2022 )

2025-05-20 16:46:55 +08:00

bbeh.py

[Feature] Add support for BBEH dataset (#1925 )

2025-03-12 10:53:31 +08:00

bbh.py

[Feature] Update pip install (#1324 )

2024-07-29 18:32:50 +08:00

benbench.py

[Sync] Sync with internal codes 2024.06.28 (#1279 )

2024-06-28 14:16:34 +08:00

boolq.py

[Feature] Fullbench v0.1 language update (#1463 )

2024-08-28 14:01:05 +08:00

bustum.py

[Sync] update (#517 )

2023-10-27 20:31:22 +08:00

c3.py

[Sync] update (#517 )

2023-10-27 20:31:22 +08:00

CARDBiomedBench.py

[Dataset] Add CARDBiomedBench (#2071 )

2025-05-08 19:44:46 +08:00

cb.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

ceval.py

[Fix] modelscope dataset load problem (#1406 )

2024-08-08 14:01:06 +08:00

charm.py

[Fix] Fix Slurm ENV (#1392 )

2024-08-06 01:35:20 +08:00

chembench.py

[Dataset] Add SmolInstruct, Update Chembench (#2025 )

2025-04-18 17:21:29 +08:00

chid.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

chinese_simpleqa.py

Add Chinese SimpleQA config (#1697 )

2024-12-11 18:03:39 +08:00

cibench.py

Update CIBench (#1089 )

2024-04-26 18:46:02 +08:00

circular.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

civilcomments.py

[Feat] support opencompass

2023-07-04 22:11:33 +08:00

climaqa.py

[Feature] Add Datasets: ClimateQA,Physics (#2017 )

2025-04-14 20:18:47 +08:00

ClinicBench.py

[Datasets] Add ClinicBench, PubMedQA and ScienceQA (#2061 )

2025-05-08 16:25:43 +08:00

clozeTest_maxmin.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

cluewsc.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

cmb.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

cmmlu.py

[Update] Update Skywork/Qwen-QwQ (#1728 )

2024-12-05 19:30:43 +08:00

cmnli.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

cmo_fib.py

[Datasets] Add datasets CMO&AIME (#1610 )

2024-10-28 18:08:02 +08:00

cmrc.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

commonsenseqa_cn.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

commonsenseqa.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

compassbench_obj.py

[Update] Update MATH dataset with model judge (#1711 )

2024-11-25 15:14:55 +08:00

copa.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

crowspairs_cn.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

crowspairs.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

csl.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

custom.py

[Feature] Add MultiPL-E & Code Evaluator (#1963 )

2025-03-21 20:09:25 +08:00

cvalues.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

dingo.py

[Dataset] Update dingo 1.5.0 (#2008 )

2025-04-07 17:21:15 +08:00

drcd.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

drop_simple_eval.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

drop.py

[Feature] Use dataset in local path (#570 )

2023-11-13 13:00:37 +08:00

ds1000_interpreter.py

[Feat] Support cibench (#538 )

2023-11-07 19:11:44 +08:00

ds1000.py

[Feature] Update Models (#1518 )

2024-09-12 23:35:30 +08:00

eprstmt.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

FinanceIQ.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

flores.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

game24.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

gaokao_math.py

[Update] Update MATH dataset with model judge (#1711 )

2024-11-25 15:14:55 +08:00

GaokaoBench.py

[Fix] the automatically download for several datasets (#1652 )

2024-11-01 15:57:18 +08:00

generic.py

[Update] Fix LLM Judge metrics cacluation & Add reasoning content concat to OpenAI SDK

2025-04-15 11:33:16 +08:00

govrepcrs.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

gpqa.py

[Update] Update Skywork/Qwen-QwQ (#1728 )

2024-12-05 19:30:43 +08:00

gsm8k.py

[Fix] modelscope dataset load problem (#1406 )

2024-08-08 14:01:06 +08:00

gsm_hard.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

hellaswag.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

hle.py

[Dataset] HLE Biomedical version support (#2080 )

2025-05-12 10:14:11 +08:00

huggingface.py

[Feature] Support OpenAI ChatCompletion (#1389 )

2024-08-01 19:10:13 +08:00

humaneval_multi.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

humaneval_pro.py

[Dataset] Add human_eval/mbpp pro (#2092 )

2025-05-12 18:38:13 +08:00

humaneval.py

[Update] History code bench pass@k update (#2102 )

2025-05-19 17:03:33 +08:00

humanevalx.py

[Feature] Update Models (#1518 )

2024-09-12 23:35:30 +08:00

hungarian_math.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

inference_ppl.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

internsandbox.py

[Feature] Support InternSandbox (#2049 )

2025-05-07 16:42:09 +08:00

iwslt2017.py

Add release contribution

2023-07-05 03:15:31 +00:00

jigsawmultilingual.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

jsonl.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

kaoshi.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

lambada.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

LCBench.py

[Update] Update BigCodeBench & LCBench load path (#1857 )

2025-02-08 15:15:47 +08:00

lcsts.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

livestembench.py

[Feature] Add LiveStemBench Dataset (#1794 )

2024-12-31 15:17:39 +08:00

llm_compression.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

lmeval.py

[Sync] update github token (#475 )

2023-10-13 06:50:54 -05:00

longbenchv2.py

[Feature] Add Longbenchv2 support (#1801 )

2025-01-03 12:04:29 +08:00

mastermath2024v1.py

[Feature] Add new dataset mastermath2024v1 (#744 )

2024-01-01 15:53:24 +08:00

math401.py

[Sync] Sync with internal codes 2023.01.08 (#777 )

2024-01-08 14:07:24 +00:00

math_intern.py

[Sync] Updata dataset cfg for internMath (#837 )

2024-01-24 16:30:32 +08:00

math.py

[Dataset] Add SuperGPQA subfield configs (#2124 )

2025-05-28 14:12:58 +08:00

mathbench.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

mbpp_pro.py

[Dataset] Add human_eval/mbpp pro (#2092 )

2025-05-12 18:38:13 +08:00

mbpp.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

Medbullets.py

[Dataset] Support MedMCQA and MedBullets benchmark (#2054 )

2025-05-13 17:10:50 +08:00

MedCalc_Bench.py

[Dataset] MedCalc_Bench (#2072 )

2025-05-09 16:58:55 +08:00

medmcqa.py

[Dataset] Support MedMCQA and MedBullets benchmark (#2054 )

2025-05-13 17:10:50 +08:00

MedQA.py

[Datasets] MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 (#2064 )

2025-05-09 14:47:44 +08:00

MedXpertQA.py

[Dataset] Add MedXpertQA (#2002 )

2025-04-08 10:44:48 +08:00

mgsm.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

mmlu_cf.py

[Feature] Support MMLU-CF Benchmark (#1775 )

2025-01-09 14:11:20 +08:00

mmlu_pro.py

[Feature] Update MathBench & WikiBench for FullBench (#1521 )

2024-09-18 14:35:30 +08:00

mmlu.py

[Update] Update Skywork/Qwen-QwQ (#1728 )

2024-12-05 19:30:43 +08:00

MMLUArabic.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

mmmlu.py

[Update] Update mmmlu_lite dataload (#1658 )

2024-11-01 17:32:29 +08:00

multipl_e.py

[Dataset] Add human_eval/mbpp pro (#2092 )

2025-05-12 18:38:13 +08:00

multirc.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

narrativeqa.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

natural_question_cn.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

natural_question.py

[Fix] the automatically download for several datasets (#1652 )

2024-11-01 15:57:18 +08:00

nejmaibench.py

[Dataset] Add nejm ai benchmark (#2063 )

2025-05-08 16:44:05 +08:00

obqa.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

olymmath.py

[Feature] Add olymmath dataset (#1982 )

2025-04-02 17:34:07 +08:00

OlympiadBench.py

[Dataset] Add SuperGPQA subfield configs (#2124 )

2025-05-28 14:12:58 +08:00

omni_math.py

[Feature] Support Omni-Math (#1837 )

2025-01-23 18:36:54 +08:00

OpenFinData.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

physics.py

[Feature] Add Datasets: ClimateQA,Physics (#2017 )

2025-04-14 20:18:47 +08:00

piqa.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

ProteinLMBench.py

[Datasets] MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 (#2064 )

2025-05-09 14:47:44 +08:00

PubMedQA.py

[Datasets] Add ClinicBench, PubMedQA and ScienceQA (#2061 )

2025-05-08 16:25:43 +08:00

py150.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

qasper.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

qaspercut.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

QuALITY.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

race.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

realtoxicprompts.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

record.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

rolebench.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

s3eval.py

[Feature] Add S3Eval Dataset (#916 )

2024-05-06 19:41:52 +08:00

safety.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

scibench.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

scicode.py

[Feature] Optimize Evaluation Speed of SciCode (#1489 )

2024-09-06 00:59:41 +08:00

ScienceQA.py

[Datasets] Add ClinicBench, PubMedQA and ScienceQA (#2061 )

2025-05-08 16:25:43 +08:00

SciEval.py

[Dataset] Add Scieval (#2089 )

2025-05-14 10:25:03 +08:00

SciKnowEval.py

[Dataset] Add SciknowEval Dataset (#2070 )

2025-05-12 17:23:44 +08:00

simpleqa.py

[Feature] Add Openai Simpleqa dataset (#1720 )

2024-11-28 19:16:07 +08:00

siqa.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

smolinstruct.py

[Dataset] Add Smolinstruct configs (#2127 )

2025-05-29 14:09:08 +08:00

squad20.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

storycloze.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

strategyqa.py

[Fix] modelscope dataset load problem (#1406 )

2024-08-08 14:01:06 +08:00

summedits.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

summscreen.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

svamp.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

tabmwp.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

taco.py

[Dataset] Add SuperGPQA subfield configs (#2124 )

2025-05-28 14:12:58 +08:00

tnews.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

triviaqa.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

triviaqarc.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

truthfulqa.py

[Bug] Fix-NPU-Support (#1618 )

2024-10-21 17:42:53 +08:00

tydiqa.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

wic.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

wikibench.py

[Fix] the automatically download for several datasets (#1652 )

2024-11-01 15:57:18 +08:00

winograd.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

winogrande.py

[Fix] modelscope dataset load problem (#1406 )

2024-08-08 14:01:06 +08:00

wnli.py

[Feat] implementation for support promptbench (#239 )

2023-09-15 15:06:53 +08:00

wsc.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

xcopa.py

Add Release Contraibution

2023-07-05 02:22:40 +00:00

xiezhi.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00

xlsum.py

update datasets

2023-07-05 01:45:26 +00:00

xsum.py

[Feature] Support ModelScope datasets (#1289 )

2024-07-29 13:48:32 +08:00