..
adv_glue
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
agieval
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
aime2024
[Bug] Aime2024 config fix ( #1974 )
2025-03-25 17:57:11 +08:00
aime2025
[Update] Add configurations for llmjudge dataset ( #1940 )
2025-03-13 17:30:04 +08:00
anli
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
anthropics_evals
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
apps
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
ARC_c
[Feature] Dataset prompts update for ARC, BoolQ, Race ( #1527 )
2024-09-13 10:30:43 +08:00
ARC_e
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
ARC_Prize_Public_Evaluation
[Update] Add dataset configurations of no max_out_len ( #1967 )
2025-03-24 14:24:12 +08:00
babilong
[Feature] BABILong Dataset added ( #1684 )
2024-11-14 15:32:43 +08:00
bbeh
[Update] Add configurations for llmjudge dataset ( #1940 )
2025-03-13 17:30:04 +08:00
bbh
[Feature] Add recommendation configs for datasets ( #1937 )
2025-03-25 14:54:13 +08:00
bigcodebench
[Feature] Add recommendation configs for datasets ( #1937 )
2025-03-25 14:54:13 +08:00
calm
[Feature] Support OpenAI ChatCompletion ( #1389 )
2024-08-01 19:10:13 +08:00
CARDBiomedBench
[Dataset] Add CARDBiomedBench ( #2071 )
2025-05-08 19:44:46 +08:00
ceval
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
CHARM
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
ChemBench
[Update] Update dataset configs ( #2030 )
2025-04-21 18:55:06 +08:00
chinese_simpleqa
[Refactor] Code refactoarization ( #1831 )
2025-01-20 19:17:38 +08:00
CIBench
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
civilcomments
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
ClimaQA
[Feature] Add Datasets: ClimateQA,Physics ( #2017 )
2025-04-14 20:18:47 +08:00
ClinicBench
[Datasets] Add ClinicBench, PubMedQA and ScienceQA ( #2061 )
2025-05-08 16:25:43 +08:00
clozeTest_maxmin
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
CLUE_afqmc
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
CLUE_C3
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
CLUE_cmnli
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
CLUE_CMRC
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
CLUE_DRCD
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
CLUE_ocnli
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
cmb
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
cmmlu
[Feature] Add recommendation configs for datasets ( #1937 )
2025-03-25 14:54:13 +08:00
cmo_fib
[Update] Add dataset configurations of no max_out_len ( #1967 )
2025-03-24 14:24:12 +08:00
collections
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
commonsenseqa
[Bug] Commonsenseqa dataset fix ( #1425 )
2024-08-16 15:54:07 +08:00
commonsenseqa_cn
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
compassbench_20_v1_1
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
compassbench_20_v1_1_public
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
compassbench_v1_3
[Update] Compassbench v1.3 ( #1396 )
2024-08-12 19:09:19 +08:00
contamination
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
crowspairs
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
crowspairs_cn
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
cvalues
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
demo
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
dingo
[Feature] Add dingo test ( #1529 )
2024-09-29 19:24:58 +08:00
drop
[Feature] Add recommendation configs for datasets ( #1937 )
2025-03-25 14:54:13 +08:00
ds1000
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
FewCLUE_bustm
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
FewCLUE_chid
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
FewCLUE_cluewsc
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
FewCLUE_csl
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
FewCLUE_eprstmt
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
FewCLUE_ocnli_fc
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
FewCLUE_tnews
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
FinanceIQ
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
flores
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
game24
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
gaokao_math
[Feature] Add GaoKaoMath Dataset for Evaluation & MATH Model Eval Config ( #1589 )
2024-10-12 19:13:06 +08:00
GaokaoBench
[Update] Add dataset configurations of no max_out_len ( #1967 )
2025-03-24 14:24:12 +08:00
GLUE_CoLA
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
GLUE_MRPC
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
GLUE_QQP
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
govrepcrs
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
gpqa
[Feature] Add recommendation configs for datasets ( #1937 )
2025-03-25 14:54:13 +08:00
gsm8k
[Update] Add dataset configurations of no max_out_len ( #1967 )
2025-03-24 14:24:12 +08:00
gsm8k_contamination
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
gsm_hard
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
hellaswag
[Feature] Add recommendation configs for datasets ( #1937 )
2025-03-25 14:54:13 +08:00
HLE
[Dataset] HLE Biomedical version support ( #2080 )
2025-05-12 10:14:11 +08:00
humaneval
[Feature] Add recommendation configs for datasets ( #1937 )
2025-03-25 14:54:13 +08:00
humaneval_cn
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
humaneval_multi
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
humaneval_plus
[Update] Update Fullbench ( #1712 )
2024-11-26 14:26:55 +08:00
humaneval_pro
[Dataset] Add human_eval/mbpp pro ( #2092 )
2025-05-12 18:38:13 +08:00
humanevalx
[Update] Update dataset configuration with no max_out_len ( #1754 )
2024-12-11 18:20:29 +08:00
hungarian_exam
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
IFEval
[Feature] Add recommendation configs for datasets ( #1937 )
2025-03-25 14:54:13 +08:00
inference_ppl
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
infinitebench
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
internsandbox
[Feature] Support InternSandbox ( #2049 )
2025-05-07 16:42:09 +08:00
iwslt2017
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
jigsawmultilingual
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
judge
[Add] Add Judgerbenchv2 ( #2067 )
2025-04-30 17:12:34 +08:00
kaoshi
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
korbench
[Feature] Add recommendation configs for datasets ( #1937 )
2025-03-25 14:54:13 +08:00
lambada
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
lawbench
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
LCBench
[Update] Update Skywork/Qwen-QwQ ( #1728 )
2024-12-05 19:30:43 +08:00
lcsts
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
leval
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
livecodebench
[Feature] Add recommendation configs for datasets ( #1937 )
2025-03-25 14:54:13 +08:00
livemathbench
[Dataset] Add SmolInstruct, Update Chembench ( #2025 )
2025-04-18 17:21:29 +08:00
livereasonbench
[Refactor] Code refactoarization ( #1831 )
2025-01-20 19:17:38 +08:00
livestembench
[Refactor] Code refactoarization ( #1831 )
2025-01-20 19:17:38 +08:00
llm_compression
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
longbench
[Feature] Longbench dataset update
2024-09-06 15:50:12 +08:00
longbenchv2
[Feature] Add Longbenchv2 support ( #1801 )
2025-01-03 12:04:29 +08:00
lveval
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
mastermath2024v1
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
matbench
[Dataset] Matbench ( #2021 )
2025-04-21 15:50:47 +08:00
math
[Fix] OpenICL Math Evaluator Config ( #2007 )
2025-04-08 14:38:35 +08:00
math401
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
MathBench
[Update] Add dataset configurations of no max_out_len ( #1967 )
2025-03-24 14:24:12 +08:00
mbpp
[Update] Update Fullbench ( #1712 )
2024-11-26 14:26:55 +08:00
mbpp_cn
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
mbpp_plus
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
mbpp_pro
[Dataset] Add human_eval/mbpp pro ( #2092 )
2025-05-12 18:38:13 +08:00
MedBench
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
Medbullets
[Dataset] Support MedMCQA and MedBullets benchmark ( #2054 )
2025-05-13 17:10:50 +08:00
MedCalc_Bench
[Dataset] MedCalc_Bench ( #2072 )
2025-05-09 16:58:55 +08:00
medmcqa
[Dataset] Support MedMCQA and MedBullets benchmark ( #2054 )
2025-05-13 17:10:50 +08:00
MedQA
[Datasets] MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 ( #2064 )
2025-05-09 14:47:44 +08:00
MedXpertQA
[Dataset] Add MedXpertQA ( #2002 )
2025-04-08 10:44:48 +08:00
mgsm
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
mmlu
[Feature] Add recommendation configs for datasets ( #1937 )
2025-03-25 14:54:13 +08:00
mmlu_cf
[Feature] Support MMLU-CF Benchmark ( #1775 )
2025-01-09 14:11:20 +08:00
mmlu_pro
[Dataset] MMLU_Pro Biomedical Version Support ( #2081 )
2025-05-09 15:07:26 +08:00
MMLUArabic
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
mmmlu
[Refactor] Code refactoarization ( #1831 )
2025-01-20 19:17:38 +08:00
mmmlu_lite
[Update] Update mmmlu_lite dataload ( #1658 )
2024-11-01 17:32:29 +08:00
multipl_e
[Dataset] Add human_eval/mbpp pro ( #2092 )
2025-05-12 18:38:13 +08:00
musr
[Feature] Add recommendation configs for datasets ( #1937 )
2025-03-25 14:54:13 +08:00
narrativeqa
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
needlebench
[Feature] Add long context evaluation for base models ( #1666 )
2024-11-08 10:53:29 +08:00
nejm_ai_benchmark
[Dataset] Add nejm ai benchmark ( #2063 )
2025-05-08 16:44:05 +08:00
NPHardEval
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
nq
[Update] Add dataset configurations of no max_out_len ( #1967 )
2025-03-24 14:24:12 +08:00
nq_cn
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
obqa
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
OlymMATH
[Feature] Add olymmath dataset ( #1982 )
2025-04-02 17:34:07 +08:00
OlympiadBench
[Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard ( #1899 )
2025-03-03 18:56:11 +08:00
omni_math
[Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard ( #1899 )
2025-03-03 18:56:11 +08:00
OpenFinData
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
PHYSICS
[Feature] Add Datasets: ClimateQA,Physics ( #2017 )
2025-04-14 20:18:47 +08:00
piqa
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
PJExam
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
PMMEval
[Refactor] Code refactoarization ( #1831 )
2025-01-20 19:17:38 +08:00
promptbench
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
ProteinLMBench
[Datasets] MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 ( #2064 )
2025-05-09 14:47:44 +08:00
PubMedQA
[Datasets] Add ClinicBench, PubMedQA and ScienceQA ( #2061 )
2025-05-08 16:25:43 +08:00
py150
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
qabench
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
qasper
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
qaspercut
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
QuALITY
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
race
[Feature] Dataset prompts update for ARC, BoolQ, Race ( #1527 )
2024-09-13 10:30:43 +08:00
realtoxicprompts
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
rolebench
[Feature] Add abbr for rolebench dataset ( #1431 )
2024-08-20 11:22:48 +08:00
ruler
[Update] Customizable tokenizer for RULER ( #1731 )
2024-12-19 18:02:11 +08:00
s3eval
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
safety
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
scibench
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
scicode
[Update] Add dataset configurations of no max_out_len ( #1967 )
2025-03-24 14:24:12 +08:00
ScienceQA
[Datasets] Add ClinicBench, PubMedQA and ScienceQA ( #2061 )
2025-05-08 16:25:43 +08:00
SciKnowEval
[Dataset] Add SciknowEval Dataset ( #2070 )
2025-05-12 17:23:44 +08:00
SimpleQA
[Feature] Add Openai Simpleqa dataset ( #1720 )
2024-11-28 19:16:07 +08:00
siqa
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
SmolInstruct
[Dataset] Add SmolInstruct, Update Chembench ( #2025 )
2025-04-18 17:21:29 +08:00
squad20
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
storycloze
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
strategyqa
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
subjective
[Add] Add writingbench ( #2028 )
2025-04-29 16:29:32 +08:00
summedits
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
summscreen
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
SuperGLUE_AX_b
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
SuperGLUE_AX_g
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
SuperGLUE_BoolQ
[Feature] Dataset prompts update for ARC, BoolQ, Race ( #1527 )
2024-09-13 10:30:43 +08:00
SuperGLUE_CB
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
SuperGLUE_COPA
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
SuperGLUE_MultiRC
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
SuperGLUE_ReCoRD
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
SuperGLUE_RTE
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
SuperGLUE_WiC
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
SuperGLUE_WSC
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
supergpqa
[Update] Add SuperGPQA subset metrics ( #1966 )
2025-03-24 14:25:12 +08:00
SVAMP
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
TabMWP
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
taco
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
teval
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
TheoremQA
[Update] Add 0shot CoT config for TheoremQA ( #1783 )
2024-12-27 16:17:27 +08:00
triviaqa
[Update] Add dataset configurations of no max_out_len ( #1967 )
2025-03-24 14:24:12 +08:00
triviaqarc
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
truthfulqa
[Fix] Update SciCode and Gemma model ( #1449 )
2024-08-23 10:42:27 +08:00
tydiqa
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
wikibench
[Fix] the automatically download for several datasets ( #1652 )
2024-11-01 15:57:18 +08:00
wikitext
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
winograd
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
winogrande
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
XCOPA
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
xiezhi
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
XLSum
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00
Xsum
[Feature] Support import configs/models/summarizers from whl ( #1376 )
2024-08-01 00:42:48 +08:00