Commit Graph

99 Commits

Author SHA1 Message Date
liushz
e49fcfd3a3
[Update] Update MATH dataset with model judge (#1711)
* Update math with llm judge

* Update math with llm judge

* Update math with llm judge

* Update math with llm judge

* Update math with llm judge
2024-11-25 15:14:55 +08:00
Linchen Xiao
80e3b9ef37
[Update] Add math prm 800k (#1708) 2024-11-21 21:29:43 +08:00
Linchen Xiao
500fb1032a
[Update] Update configurations (#1704) 2024-11-21 16:51:18 +08:00
Linchen Xiao
40a9f0be0d
[Update] MUSR dataset config prefix update (#1692) 2024-11-15 11:06:30 +08:00
abrohamLee
e9e4b69ddb
[Feature] MuSR Datset Evaluation (#1689)
* MuSR Datset Evaluation

* MuSR Datset Evaluation

Add an assertion and a Readme.md
2024-11-14 20:42:12 +08:00
Linchen Xiao
e92a5d4230
[Feature] BABILong Dataset added (#1684)
* update

* update

* update

* update
2024-11-14 15:32:43 +08:00
Linchen Xiao
835bf75a36
[Feature] Add long context evaluation for base models (#1666)
* [Update] Add base long context evaluation

* update
2024-11-08 10:53:29 +08:00
liushz
f7d899823c
[Update] Update mmmlu_lite dataload (#1658)
* update mmmlu_lite dataload from oss

* update mmmlu_lite dataload from oss
2024-11-01 17:32:29 +08:00
Songyang Zhang
c789ce5698
[Fix] the automatically download for several datasets (#1652)
* [Fix] the automatically download for several datasets

* Update

* Update

* Update CI
2024-11-01 15:57:18 +08:00
bittersweet1999
a0853c939d
[Add] Add CompassArenaSubjectiveBench (#1645)
* fix pip version

* fix pip version

* add compassarenasubjectivebench

* add compassarenasubjectivebench

* add compassarenabench
2024-11-01 13:52:22 +08:00
Chang Lan
46affab882
[Fix] Fix ruler_16k_gen (#1643) 2024-10-29 17:58:43 +08:00
Linchen Xiao
8172af49bb
[Update] Update wildbench max_seq_len (#1648)
* [Update] Wildbench max_seq_len update

* [Update] Wildbench max_seq_len update
2024-10-29 13:21:31 +08:00
Junnan Liu
645c5f3b2c
[Datasets] Add datasets CMO&AIME (#1610)
* add datasets cmo&aime

* delete unused modules

* modify prompt

* update __init__

* update data load and add README

* update data load

* update performance

* update md5

* remove indents

* add indent

* fix log for debug mode
2024-10-28 18:08:02 +08:00
Linchen Xiao
a61e8a0803
[Update] Internal humaneval add (#1641)
* [Update] internal_humaneval_add

* update
2024-10-25 19:08:42 +08:00
Chang Lan
a927bba1cf
[Fix] Fix RULER datasets (#1628)
We need to ensure that we don't import anything that ends with "_datasets",
or they will be picked up by the runner, leading to duplicate / unwanted datasets
being evaluated.
2024-10-22 11:59:02 +08:00
Songyang Zhang
a4d5a6c81b
[Feature] Support LiveCodeBench (#1617)
* Update

* Update LCB

* Update

* Update

* Update

* Update

* Update
2024-10-21 20:50:39 +08:00
liushz
500b44ba2d
[Fix] gpqa_few_shot_ppl prompt bug (#1627) 2024-10-21 16:59:06 +08:00
bittersweet1999
a11e2b2fd4
[Fix] Compatible with old versions (#1616)
* fix pip version

* fix pip version

* Compatible with old versions

* compati old version

* compati old version

* compati old version

* update configs
2024-10-21 10:16:29 +08:00
Bob Tsang
dd0b655bd0
[Feature] Support MMMLU & MMMLU-lite Benchmark (#1565)
* rm folder

* modify format according to reviewer

* modify format according to reviewer

* modify format according to reviewer

* add some files requirement

* fix some bug

* fix bug

* change load type

* Update MMMLU Dataset

* Update MMMLU Dataset

* Add MMMLU-Lite Dataset

* update MMMMLU datast

* update MMMMLU datast

* update MMMMLU datast

---------

Co-authored-by: BobTsang <BobTsang1995@gmail.com>
Co-authored-by: liushz <qq1791167085@163.com>
2024-10-17 19:09:34 +08:00
bittersweet1999
f0d436496e
[Update] update docs and add compassarena (#1614)
* fix pip version

* fix pip version

* update docs and add compassarena

* update docs
2024-10-17 14:39:06 +08:00
Haoran Que
4fe251729b
Upload HelloBench (#1607)
* upload hellobench

* update hellobench

* update readme.md

* update eval_hellobench.py

* update lastest

---------

Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-10-15 17:11:37 +08:00
bittersweet1999
fa54aa62f6
[Feature] Add Judgerbench and reorg subeval (#1593)
* fix pip version

* fix pip version

* update (#1522)

Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>

* [Feature] Update Models (#1518)

* Update Models

* Update

* Update humanevalx

* Update

* Update

* [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527)

add judgerbench and reorg sub

add judgerbench and reorg subeval

add judgerbench and reorg subeval

* add judgerbench and reorg subeval

* add judgerbench and reorg subeval

* add judgerbench and reorg subeval

* add judgerbench and reorg subeval

---------

Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2024-10-15 16:36:05 +08:00
liushz
5faee929db
[Feature] Add GaoKaoMath Dataset for Evaluation & MATH Model Eval Config (#1589)
* Add GaoKaoMath Dataset

* Add MATH LLM Eval

* Update GAOKAO Math Eval Dataset

* Update GAOKAO Math Eval Dataset
2024-10-12 19:13:06 +08:00
bittersweet1999
3f7a3730d7
[Fix] fix Flames (#1599)
* fix pip version

* fix pip version

* fix flames

* fix flames
2024-10-12 14:34:59 +08:00
shijinpjlab
7528b8ab8a
[Feature] Add dingo test (#1529)
* add qa dingo

* update

* change name qa to dingo

* eval model: llm_base

* update path

* change name and move path

* add eval_dingo

* update import

* add for pip

* add dingo package

* change import place

* update import place

* fix lint fail

* isort

* double quoted

---------

Co-authored-by: sj <shijin@pjlab.org.cn>
2024-09-29 19:24:58 +08:00
Linchen Xiao
80cda1980e
[BUG] fix followbench dataset config (#1564)
* [BUG] fix followbench dataset config

* [BUG] fix followbench dataset config
2024-09-25 20:58:34 +08:00
liushz
83eeb52b09
[Feature] Update WikiBench base model config (#1553)
* Update MathBench & WikiBench for FullBench

* Update MathBench & WikiBench for FullBench

* Update GPQA & MMLU_Pro

* Update MathBench & WikiBench for FullBench

* Update MathBench & WikiBench for FullBench

* Update MathBench & WikiBench for FullBench

* Update MathBench & Math base config

* Update WikiBench base model config

---------

Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-25 11:26:36 +08:00
liushz
a0cfd61129
[Feature] Update MathBench & Math base model config (#1550)
* Update MathBench & WikiBench for FullBench

* Update MathBench & WikiBench for FullBench

* Update GPQA & MMLU_Pro

* Update MathBench & WikiBench for FullBench

* Update MathBench & WikiBench for FullBench

* Update MathBench & WikiBench for FullBench

* Update MathBench & Math base config

---------

Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-23 14:03:59 +08:00
liushz
2e9db77d57
[Feature] Add custom model postprocess function (#1519)
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-18 14:40:51 +08:00
liushz
c9a7026f59
[Feature] Update MathBench & WikiBench for FullBench (#1521)
* Update MathBench & WikiBench for FullBench

* Update MathBench & WikiBench for FullBench

* Update GPQA & MMLU_Pro

* Update MathBench & WikiBench for FullBench

* Update MathBench & WikiBench for FullBench

* Update MathBench & WikiBench for FullBench

---------

Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-18 14:35:30 +08:00
Linchen Xiao
90279b6461
[Feature] Dataset prompts update for ARC, BoolQ, Race (#1527) 2024-09-13 10:30:43 +08:00
bittersweet1999
7c7fa36235
[Feature] add support for internal Followbench (#1511)
* fix pip version

* fix pip version

* add internal followbench

* add internal followbench

* fix lint

* fix lint
2024-09-11 13:32:34 +08:00
bittersweet1999
c2bcd8725e
[Fix] Fix wildbench (#1508)
* fix pip version

* fix pip version

* fix_wildbench
2024-09-10 17:35:07 +08:00
Alexander Lam
a31a77c5c1
[Feature] Add SciCode summarizer config (#1514)
* [Feature] added SciCode  summarizer config and dataset config for with background evaluation

* fix lint issues

* removed unnecessary type in summarizer group
2024-09-10 16:06:02 +08:00
Linchen Xiao
87ffa71d68
[Feature] Longbench dataset update 2024-09-06 15:50:12 +08:00
liushz
00fc8da5be
[Feature] Add model postprocess function (#1484)
* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

---------

Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-05 21:10:29 +08:00
Linchen Xiao
6c9cd9a260
[Feature] Needlebench auto-download update (#1480)
* update

* update

* update
2024-09-05 17:22:42 +08:00
Linchen Xiao
9693be46b7
[Feature] Mmlu-pro auto-download (#1464)
* update

* update

* update

* update

* update
2024-08-30 10:03:40 +08:00
Linchen Xiao
245664f4c0
[Feature] Fullbench v0.1 language update (#1463)
* update

* update

* update

* update
2024-08-28 14:01:05 +08:00
Songyang Zhang
7c2d25b557
[Fix] Update SciCode and Gemma model (#1449)
* [Fix] Update SciCode and Gemma model

* Update

* Update
2024-08-23 10:42:27 +08:00
Hari Seldon
14b4b735cb
[Feature] Add support for SciCode (#1417)
* add SciCode

* add SciCode

* add SciCode

* add SciCode

* add SciCode

* add SciCode

* add SciCode

* add SciCode w/ bg

* add scicode

* Update README.md

* Update README.md

* Delete configs/eval_SciCode.py

* rename

* 1

* rename

* Update README.md

* Update scicode.py

* Update scicode.py

* fix some bugs

* Update

* Update

---------

Co-authored-by: root <HariSeldon0>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-22 13:42:25 +08:00
Linchen Xiao
a4b54048ae
[Feature] Add Ruler datasets (#1310)
* [Feature] Add Ruler datasets

* pre-commit fixed

* Add model specific tokenizer to dataset

* pre-commit modified

* remove unused import

* fix linting

* add trust_remote to tokenizer load

* lint fix

* comments resolved

* fix lint

* Add readme

* Fix lint

* ruler refactorize

* fix lint

* lint fix

* updated

* lint fix

* fix wonderwords import issue

* prompt modified

* update

* readme updated

* update

* ruler dataset added

* Update

---------

Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-20 11:40:11 +08:00
Xu Song
99b5122ed5
[Feature] Add abbr for rolebench dataset (#1431)
* Add abbr for rolebench dataset

* add
2024-08-20 11:22:48 +08:00
Linchen Xiao
ecf9bb3e4c
[Bug] Commonsenseqa dataset fix (#1425)
* longbench dataset load fix

* update

* Update

* Update

* Update

* update

* update

---------

Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-16 15:54:07 +08:00
Songyang Zhang
9b3613f10b
[Update] Support auto-download of FOFO/MT-Bench-101 (#1423)
* [Update] Support auto-download of FOFO/MT-Bench-101

* Update wildbench
2024-08-16 11:57:41 +08:00
Linchen Xiao
8e55c9c6ee
[Update] Compassbench v1.3 (#1396)
* stash files

* compassbench subjective evaluation added

* evaluation update

* fix lint

* update docs

* Update lint

* changes saved

* changes saved

* CompassBench subjective summarizer added (#1349)

* subjective summarizer added

* fix lint

[Fix] Fix MathBench (#1351)

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>

[Update] Update model support list (#1353)

* fix pip version

* fix pip version

* update model support

subjective summarizer updated

knowledge, math objective done (data need update)

remove secrets

objective changes saved

knowledge data added

* secrets removed

* changed added

* summarizer modified

* summarizer modified

* compassbench coding added

* fix lint

* objective summarizer updated

* compass_bench_v1.3 updated

* update files in config folder

* remove unused model

* lcbench modified

* removed model evaluation configs

* remove duplicated sdk implementation

---------

Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-08-12 19:09:19 +08:00
Songyang Zhang
c81329b548
[Fix] Fix Slurm ENV (#1392)
1. Support Slurm Cluster
2. Support automatic data download
3. Update InternLM2.5-1.8B/20B-Chat
2024-08-06 01:35:20 +08:00
Songyang Zhang
c09fc79ba8
[Feature] Support OpenAI ChatCompletion (#1389)
* [Feature] Support import configs/models/summarizers from whl

* Update

* Update openai sdk

* Update

* Update gemma
2024-08-01 19:10:13 +08:00
Songyang Zhang
46cc7894e1
[Feature] Support import configs/models/summarizers from whl (#1376)
* [Feature] Support import configs/models/summarizers from whl

* Update LCBench configs

* Update

* Update

* Update

* Update

* update

* Update

* Update

* Update

* Update

* Update
2024-08-01 00:42:48 +08:00