.. |
adv_glue
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
agieval
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
aime2024
|
[Feature] Add recommendation configs for datasets (#1937)
|
2025-03-25 14:54:13 +08:00 |
aime2025
|
[Update] Add configurations for llmjudge dataset (#1940)
|
2025-03-13 17:30:04 +08:00 |
anli
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
anthropics_evals
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
apps
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
ARC_c
|
[Feature] Dataset prompts update for ARC, BoolQ, Race (#1527)
|
2024-09-13 10:30:43 +08:00 |
ARC_e
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
ARC_Prize_Public_Evaluation
|
[Update] Add dataset configurations of no max_out_len (#1967)
|
2025-03-24 14:24:12 +08:00 |
babilong
|
[Feature] BABILong Dataset added (#1684)
|
2024-11-14 15:32:43 +08:00 |
bbeh
|
[Update] Add configurations for llmjudge dataset (#1940)
|
2025-03-13 17:30:04 +08:00 |
bbh
|
[Feature] Add recommendation configs for datasets (#1937)
|
2025-03-25 14:54:13 +08:00 |
bigcodebench
|
[Feature] Add recommendation configs for datasets (#1937)
|
2025-03-25 14:54:13 +08:00 |
calm
|
[Feature] Support OpenAI ChatCompletion (#1389)
|
2024-08-01 19:10:13 +08:00 |
ceval
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
CHARM
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
ChemBench
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
chinese_simpleqa
|
[Refactor] Code refactoarization (#1831)
|
2025-01-20 19:17:38 +08:00 |
CIBench
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
civilcomments
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
clozeTest_maxmin
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
CLUE_afqmc
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
CLUE_C3
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
CLUE_cmnli
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
CLUE_CMRC
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
CLUE_DRCD
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
CLUE_ocnli
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
cmb
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
cmmlu
|
[Feature] Add recommendation configs for datasets (#1937)
|
2025-03-25 14:54:13 +08:00 |
cmo_fib
|
[Update] Add dataset configurations of no max_out_len (#1967)
|
2025-03-24 14:24:12 +08:00 |
collections
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
commonsenseqa
|
[Bug] Commonsenseqa dataset fix (#1425)
|
2024-08-16 15:54:07 +08:00 |
commonsenseqa_cn
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
compassbench_20_v1_1
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
compassbench_20_v1_1_public
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
compassbench_v1_3
|
[Update] Compassbench v1.3 (#1396)
|
2024-08-12 19:09:19 +08:00 |
contamination
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
crowspairs
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
crowspairs_cn
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
cvalues
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
demo
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
dingo
|
[Feature] Add dingo test (#1529)
|
2024-09-29 19:24:58 +08:00 |
drop
|
[Feature] Add recommendation configs for datasets (#1937)
|
2025-03-25 14:54:13 +08:00 |
ds1000
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
FewCLUE_bustm
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
FewCLUE_chid
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
FewCLUE_cluewsc
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
FewCLUE_csl
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
FewCLUE_eprstmt
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
FewCLUE_ocnli_fc
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
FewCLUE_tnews
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
FinanceIQ
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
flores
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
game24
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
gaokao_math
|
[Feature] Add GaoKaoMath Dataset for Evaluation & MATH Model Eval Config (#1589)
|
2024-10-12 19:13:06 +08:00 |
GaokaoBench
|
[Update] Add dataset configurations of no max_out_len (#1967)
|
2025-03-24 14:24:12 +08:00 |
GLUE_CoLA
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
GLUE_MRPC
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
GLUE_QQP
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
govrepcrs
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
gpqa
|
[Feature] Add recommendation configs for datasets (#1937)
|
2025-03-25 14:54:13 +08:00 |
gsm8k
|
[Update] Add dataset configurations of no max_out_len (#1967)
|
2025-03-24 14:24:12 +08:00 |
gsm8k_contamination
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
gsm_hard
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
hellaswag
|
[Feature] Add recommendation configs for datasets (#1937)
|
2025-03-25 14:54:13 +08:00 |
HLE
|
[Feature] Add HLE (Humanity's Last Exam) dataset (#1902)
|
2025-03-04 16:42:37 +08:00 |
humaneval
|
[Feature] Add recommendation configs for datasets (#1937)
|
2025-03-25 14:54:13 +08:00 |
humaneval_cn
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
humaneval_multi
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
humaneval_plus
|
[Update] Update Fullbench (#1712)
|
2024-11-26 14:26:55 +08:00 |
humanevalx
|
[Update] Update dataset configuration with no max_out_len (#1754)
|
2024-12-11 18:20:29 +08:00 |
hungarian_exam
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
IFEval
|
[Feature] Add recommendation configs for datasets (#1937)
|
2025-03-25 14:54:13 +08:00 |
inference_ppl
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
infinitebench
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
iwslt2017
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
jigsawmultilingual
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
kaoshi
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
korbench
|
[Feature] Add recommendation configs for datasets (#1937)
|
2025-03-25 14:54:13 +08:00 |
lambada
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
lawbench
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
LCBench
|
[Update] Update Skywork/Qwen-QwQ (#1728)
|
2024-12-05 19:30:43 +08:00 |
lcsts
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
leval
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
livecodebench
|
[Feature] Add recommendation configs for datasets (#1937)
|
2025-03-25 14:54:13 +08:00 |
livemathbench
|
[Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard (#1899)
|
2025-03-03 18:56:11 +08:00 |
livereasonbench
|
[Refactor] Code refactoarization (#1831)
|
2025-01-20 19:17:38 +08:00 |
livestembench
|
[Refactor] Code refactoarization (#1831)
|
2025-01-20 19:17:38 +08:00 |
llm_compression
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
longbench
|
[Feature] Longbench dataset update
|
2024-09-06 15:50:12 +08:00 |
longbenchv2
|
[Feature] Add Longbenchv2 support (#1801)
|
2025-01-03 12:04:29 +08:00 |
lveval
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
mastermath2024v1
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
math
|
[Feature] Add recommendation configs for datasets (#1937)
|
2025-03-25 14:54:13 +08:00 |
math401
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
MathBench
|
[Update] Add dataset configurations of no max_out_len (#1967)
|
2025-03-24 14:24:12 +08:00 |
mbpp
|
[Update] Update Fullbench (#1712)
|
2024-11-26 14:26:55 +08:00 |
mbpp_cn
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
mbpp_plus
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
MedBench
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
mgsm
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
mmlu
|
[Feature] Add recommendation configs for datasets (#1937)
|
2025-03-25 14:54:13 +08:00 |
mmlu_cf
|
[Feature] Support MMLU-CF Benchmark (#1775)
|
2025-01-09 14:11:20 +08:00 |
mmlu_pro
|
[Feature] Add recommendation configs for datasets (#1937)
|
2025-03-25 14:54:13 +08:00 |
MMLUArabic
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
mmmlu
|
[Refactor] Code refactoarization (#1831)
|
2025-01-20 19:17:38 +08:00 |
mmmlu_lite
|
[Update] Update mmmlu_lite dataload (#1658)
|
2024-11-01 17:32:29 +08:00 |
multipl_e
|
[Feature] Add MultiPL-E & Code Evaluator (#1963)
|
2025-03-21 20:09:25 +08:00 |
musr
|
[Feature] Add recommendation configs for datasets (#1937)
|
2025-03-25 14:54:13 +08:00 |
narrativeqa
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
needlebench
|
[Feature] Add long context evaluation for base models (#1666)
|
2024-11-08 10:53:29 +08:00 |
NPHardEval
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
nq
|
[Update] Add dataset configurations of no max_out_len (#1967)
|
2025-03-24 14:24:12 +08:00 |
nq_cn
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
obqa
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
OlympiadBench
|
[Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard (#1899)
|
2025-03-03 18:56:11 +08:00 |
omni_math
|
[Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard (#1899)
|
2025-03-03 18:56:11 +08:00 |
OpenFinData
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
piqa
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
PJExam
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
PMMEval
|
[Refactor] Code refactoarization (#1831)
|
2025-01-20 19:17:38 +08:00 |
promptbench
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
py150
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
qabench
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
qasper
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
qaspercut
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
QuALITY
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
race
|
[Feature] Dataset prompts update for ARC, BoolQ, Race (#1527)
|
2024-09-13 10:30:43 +08:00 |
realtoxicprompts
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
rolebench
|
[Feature] Add abbr for rolebench dataset (#1431)
|
2024-08-20 11:22:48 +08:00 |
ruler
|
[Update] Customizable tokenizer for RULER (#1731)
|
2024-12-19 18:02:11 +08:00 |
s3eval
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
safety
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
scibench
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
scicode
|
[Update] Add dataset configurations of no max_out_len (#1967)
|
2025-03-24 14:24:12 +08:00 |
SimpleQA
|
[Feature] Add Openai Simpleqa dataset (#1720)
|
2024-11-28 19:16:07 +08:00 |
siqa
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
squad20
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
storycloze
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
strategyqa
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
subjective
|
[Feature] Support subjective evaluation for reasoning model (#1868)
|
2025-02-20 12:19:46 +08:00 |
summedits
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
summscreen
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
SuperGLUE_AX_b
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
SuperGLUE_AX_g
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
SuperGLUE_BoolQ
|
[Feature] Dataset prompts update for ARC, BoolQ, Race (#1527)
|
2024-09-13 10:30:43 +08:00 |
SuperGLUE_CB
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
SuperGLUE_COPA
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
SuperGLUE_MultiRC
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
SuperGLUE_ReCoRD
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
SuperGLUE_RTE
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
SuperGLUE_WiC
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
SuperGLUE_WSC
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
supergpqa
|
[Update] Add SuperGPQA subset metrics (#1966)
|
2025-03-24 14:25:12 +08:00 |
SVAMP
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
TabMWP
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
taco
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
teval
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
TheoremQA
|
[Update] Add 0shot CoT config for TheoremQA (#1783)
|
2024-12-27 16:17:27 +08:00 |
triviaqa
|
[Update] Add dataset configurations of no max_out_len (#1967)
|
2025-03-24 14:24:12 +08:00 |
triviaqarc
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
truthfulqa
|
[Fix] Update SciCode and Gemma model (#1449)
|
2024-08-23 10:42:27 +08:00 |
tydiqa
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
wikibench
|
[Fix] the automatically download for several datasets (#1652)
|
2024-11-01 15:57:18 +08:00 |
wikitext
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
winograd
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
winogrande
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
XCOPA
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
xiezhi
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
XLSum
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |
Xsum
|
[Feature] Support import configs/models/summarizers from whl (#1376)
|
2024-08-01 00:42:48 +08:00 |