.. |
agieval
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
babilong
|
[Feature] BABILong Dataset added (#1684)
|
2024-11-14 15:32:43 +08:00 |
bigcodebench
|
[Update] Code evaluation alignment (#1909)
|
2025-03-04 18:49:38 +08:00 |
calm
|
[Fix] Fix CaLM import (#1395)
|
2024-08-06 12:17:45 +08:00 |
IFEval
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
infinitebench
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
korbench
|
[Update] Update Skywork/Qwen-QwQ (#1728)
|
2024-12-05 19:30:43 +08:00 |
lawbench
|
[Fix] the issue where scores are negative in the Lawbench dataset evaluation(#1402) (#1403)
|
2024-08-08 16:08:26 +08:00 |
leval
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
livecodebench
|
[Update] Add configurations for llmjudge dataset (#1940)
|
2025-03-13 17:30:04 +08:00 |
livemathbench
|
[Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard (#1899)
|
2025-03-03 18:56:11 +08:00 |
livereasonbench
|
[Update] Update o1 eval prompt (#1806)
|
2025-01-07 00:14:32 +08:00 |
longbench
|
[Feature] Longbench dataset update
|
2024-09-06 15:50:12 +08:00 |
lveval
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
medbench
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
musr
|
[Feature] MuSR Datset Evaluation (#1689)
|
2024-11-14 20:42:12 +08:00 |
needlebench
|
[Feature] Add long context evaluation for base models (#1666)
|
2024-11-08 10:53:29 +08:00 |
NPHardEval
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
PMMEval
|
[Feature] Add P-MMEval (#1714)
|
2024-11-27 21:26:18 +08:00 |
reasonbench
|
[Sync] Sync with internal codes 2023.01.08 (#777)
|
2024-01-08 14:07:24 +00:00 |
ruler
|
[Feature] Add Ruler datasets (#1310)
|
2024-08-20 11:40:11 +08:00 |
subjective
|
[Fix] fix order bug Update arena_hard.py (#2015)
|
2025-04-11 16:59:40 +08:00 |
supergpqa
|
[Update] Add SuperGPQA subset metrics (#1966)
|
2025-03-24 14:25:12 +08:00 |
teval
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
TheoremQA
|
[Update] Requirements update (#1993)
|
2025-04-02 12:03:45 +08:00 |
__init__.py
|
[Dataset] Add SmolInstruct, Update Chembench (#2025)
|
2025-04-18 17:21:29 +08:00 |
advglue.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
afqmcd.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
aime2024.py
|
[Update] Update Skywork/Qwen-QwQ (#1728)
|
2024-12-05 19:30:43 +08:00 |
anli.py
|
[Feature] Add Xiezhi SQuAD2.0 ANLI (#101)
|
2023-08-10 14:04:18 +08:00 |
anthropics_evals.py
|
[Feat] support antropics evals dataset (#422)
|
2023-09-20 18:36:44 +08:00 |
apps.py
|
[Sync] deprecate old mbpps (#1064)
|
2024-04-19 20:49:46 +08:00 |
arc_prize_public_evaluation.py
|
[Feature] Add Arc Prize Public Evaluation (#1690)
|
2024-11-27 15:44:41 +08:00 |
arc.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
ax.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
base.py
|
[Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator (#1886)
|
2025-02-26 19:43:12 +08:00 |
bbeh.py
|
[Feature] Add support for BBEH dataset (#1925)
|
2025-03-12 10:53:31 +08:00 |
bbh.py
|
[Feature] Update pip install (#1324)
|
2024-07-29 18:32:50 +08:00 |
benbench.py
|
[Sync] Sync with internal codes 2024.06.28 (#1279)
|
2024-06-28 14:16:34 +08:00 |
boolq.py
|
[Feature] Fullbench v0.1 language update (#1463)
|
2024-08-28 14:01:05 +08:00 |
bustum.py
|
[Sync] update (#517)
|
2023-10-27 20:31:22 +08:00 |
c3.py
|
[Sync] update (#517)
|
2023-10-27 20:31:22 +08:00 |
cb.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
ceval.py
|
[Fix] modelscope dataset load problem (#1406)
|
2024-08-08 14:01:06 +08:00 |
charm.py
|
[Fix] Fix Slurm ENV (#1392)
|
2024-08-06 01:35:20 +08:00 |
chembench.py
|
[Dataset] Add SmolInstruct, Update Chembench (#2025)
|
2025-04-18 17:21:29 +08:00 |
chid.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
chinese_simpleqa.py
|
Add Chinese SimpleQA config (#1697)
|
2024-12-11 18:03:39 +08:00 |
cibench.py
|
Update CIBench (#1089)
|
2024-04-26 18:46:02 +08:00 |
circular.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
civilcomments.py
|
[Feat] support opencompass
|
2023-07-04 22:11:33 +08:00 |
climaqa.py
|
[Feature] Add Datasets: ClimateQA,Physics (#2017)
|
2025-04-14 20:18:47 +08:00 |
clozeTest_maxmin.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
cluewsc.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
cmb.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
cmmlu.py
|
[Update] Update Skywork/Qwen-QwQ (#1728)
|
2024-12-05 19:30:43 +08:00 |
cmnli.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
cmo_fib.py
|
[Datasets] Add datasets CMO&AIME (#1610)
|
2024-10-28 18:08:02 +08:00 |
cmrc.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
commonsenseqa_cn.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
commonsenseqa.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
compassbench_obj.py
|
[Update] Update MATH dataset with model judge (#1711)
|
2024-11-25 15:14:55 +08:00 |
copa.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
crowspairs_cn.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
crowspairs.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
csl.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
custom.py
|
[Feature] Add MultiPL-E & Code Evaluator (#1963)
|
2025-03-21 20:09:25 +08:00 |
cvalues.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
dingo.py
|
[Dataset] Update dingo 1.5.0 (#2008)
|
2025-04-07 17:21:15 +08:00 |
drcd.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
drop_simple_eval.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
drop.py
|
[Feature] Use dataset in local path (#570)
|
2023-11-13 13:00:37 +08:00 |
ds1000_interpreter.py
|
[Feat] Support cibench (#538)
|
2023-11-07 19:11:44 +08:00 |
ds1000.py
|
[Feature] Update Models (#1518)
|
2024-09-12 23:35:30 +08:00 |
eprstmt.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
FinanceIQ.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
flores.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
game24.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
gaokao_math.py
|
[Update] Update MATH dataset with model judge (#1711)
|
2024-11-25 15:14:55 +08:00 |
GaokaoBench.py
|
[Fix] the automatically download for several datasets (#1652)
|
2024-11-01 15:57:18 +08:00 |
generic.py
|
[Update] Fix LLM Judge metrics cacluation & Add reasoning content concat to OpenAI SDK
|
2025-04-15 11:33:16 +08:00 |
govrepcrs.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
gpqa.py
|
[Update] Update Skywork/Qwen-QwQ (#1728)
|
2024-12-05 19:30:43 +08:00 |
gsm8k.py
|
[Fix] modelscope dataset load problem (#1406)
|
2024-08-08 14:01:06 +08:00 |
gsm_hard.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
hellaswag.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
hle.py
|
[Feature] Add HLE (Humanity's Last Exam) dataset (#1902)
|
2025-03-04 16:42:37 +08:00 |
huggingface.py
|
[Feature] Support OpenAI ChatCompletion (#1389)
|
2024-08-01 19:10:13 +08:00 |
humaneval_multi.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
humaneval.py
|
[Update] Update Skywork/Qwen-QwQ (#1728)
|
2024-12-05 19:30:43 +08:00 |
humanevalx.py
|
[Feature] Update Models (#1518)
|
2024-09-12 23:35:30 +08:00 |
hungarian_math.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
inference_ppl.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
iwslt2017.py
|
Add release contribution
|
2023-07-05 03:15:31 +00:00 |
jigsawmultilingual.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
jsonl.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
kaoshi.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
lambada.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
LCBench.py
|
[Update] Update BigCodeBench & LCBench load path (#1857)
|
2025-02-08 15:15:47 +08:00 |
lcsts.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
livestembench.py
|
[Feature] Add LiveStemBench Dataset (#1794)
|
2024-12-31 15:17:39 +08:00 |
llm_compression.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
lmeval.py
|
[Sync] update github token (#475)
|
2023-10-13 06:50:54 -05:00 |
longbenchv2.py
|
[Feature] Add Longbenchv2 support (#1801)
|
2025-01-03 12:04:29 +08:00 |
mastermath2024v1.py
|
[Feature] Add new dataset mastermath2024v1 (#744)
|
2024-01-01 15:53:24 +08:00 |
math401.py
|
[Sync] Sync with internal codes 2023.01.08 (#777)
|
2024-01-08 14:07:24 +00:00 |
math_intern.py
|
[Sync] Updata dataset cfg for internMath (#837)
|
2024-01-24 16:30:32 +08:00 |
math.py
|
[Update] Update Skywork/Qwen-QwQ (#1728)
|
2024-12-05 19:30:43 +08:00 |
mathbench.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
mbpp.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
MedXpertQA.py
|
[Dataset] Add MedXpertQA (#2002)
|
2025-04-08 10:44:48 +08:00 |
mgsm.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
mmlu_cf.py
|
[Feature] Support MMLU-CF Benchmark (#1775)
|
2025-01-09 14:11:20 +08:00 |
mmlu_pro.py
|
[Feature] Update MathBench & WikiBench for FullBench (#1521)
|
2024-09-18 14:35:30 +08:00 |
mmlu.py
|
[Update] Update Skywork/Qwen-QwQ (#1728)
|
2024-12-05 19:30:43 +08:00 |
MMLUArabic.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
mmmlu.py
|
[Update] Update mmmlu_lite dataload (#1658)
|
2024-11-01 17:32:29 +08:00 |
multipl_e.py
|
[Feature] Add MultiPL-E & Code Evaluator (#1963)
|
2025-03-21 20:09:25 +08:00 |
multirc.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
narrativeqa.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
natural_question_cn.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
natural_question.py
|
[Fix] the automatically download for several datasets (#1652)
|
2024-11-01 15:57:18 +08:00 |
obqa.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
olymmath.py
|
[Feature] Add olymmath dataset (#1982)
|
2025-04-02 17:34:07 +08:00 |
OlympiadBench.py
|
[Feature] Support OlympiadBench Benchmark (#1841)
|
2025-01-24 10:00:01 +08:00 |
omni_math.py
|
[Feature] Support Omni-Math (#1837)
|
2025-01-23 18:36:54 +08:00 |
OpenFinData.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
physics.py
|
[Feature] Add Datasets: ClimateQA,Physics (#2017)
|
2025-04-14 20:18:47 +08:00 |
piqa.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
py150.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
qasper.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
qaspercut.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
QuALITY.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
race.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
realtoxicprompts.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
record.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
rolebench.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
s3eval.py
|
[Feature] Add S3Eval Dataset (#916)
|
2024-05-06 19:41:52 +08:00 |
safety.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
scibench.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
scicode.py
|
[Feature] Optimize Evaluation Speed of SciCode (#1489)
|
2024-09-06 00:59:41 +08:00 |
simpleqa.py
|
[Feature] Add Openai Simpleqa dataset (#1720)
|
2024-11-28 19:16:07 +08:00 |
siqa.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
smolinstruct.py
|
[Dataset] Add SmolInstruct, Update Chembench (#2025)
|
2025-04-18 17:21:29 +08:00 |
squad20.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
storycloze.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
strategyqa.py
|
[Fix] modelscope dataset load problem (#1406)
|
2024-08-08 14:01:06 +08:00 |
summedits.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
summscreen.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
svamp.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
tabmwp.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
taco.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
tnews.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
triviaqa.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
triviaqarc.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
truthfulqa.py
|
[Bug] Fix-NPU-Support (#1618)
|
2024-10-21 17:42:53 +08:00 |
tydiqa.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
wic.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
wikibench.py
|
[Fix] the automatically download for several datasets (#1652)
|
2024-11-01 15:57:18 +08:00 |
winograd.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
winogrande.py
|
[Fix] modelscope dataset load problem (#1406)
|
2024-08-08 14:01:06 +08:00 |
wnli.py
|
[Feat] implementation for support promptbench (#239)
|
2023-09-15 15:06:53 +08:00 |
wsc.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
xcopa.py
|
Add Release Contraibution
|
2023-07-05 02:22:40 +00:00 |
xiezhi.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |
xlsum.py
|
update datasets
|
2023-07-05 01:45:26 +00:00 |
xsum.py
|
[Feature] Support ModelScope datasets (#1289)
|
2024-07-29 13:48:32 +08:00 |