liushz
c9a7026f59
[Feature] Update MathBench & WikiBench for FullBench ( #1521 )
...
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update GPQA & MMLU_Pro
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
---------
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-18 14:35:30 +08:00
Linchen Xiao
90279b6461
[Feature] Dataset prompts update for ARC, BoolQ, Race ( #1527 )
2024-09-13 10:30:43 +08:00
bittersweet1999
7c7fa36235
[Feature] add support for internal Followbench ( #1511 )
...
* fix pip version
* fix pip version
* add internal followbench
* add internal followbench
* fix lint
* fix lint
2024-09-11 13:32:34 +08:00
bittersweet1999
c2bcd8725e
[Fix] Fix wildbench ( #1508 )
...
* fix pip version
* fix pip version
* fix_wildbench
2024-09-10 17:35:07 +08:00
Alexander Lam
a31a77c5c1
[Feature] Add SciCode summarizer config ( #1514 )
...
* [Feature] added SciCode summarizer config and dataset config for with background evaluation
* fix lint issues
* removed unnecessary type in summarizer group
2024-09-10 16:06:02 +08:00
Linchen Xiao
87ffa71d68
[Feature] Longbench dataset update
2024-09-06 15:50:12 +08:00
liushz
00fc8da5be
[Feature] Add model postprocess function ( #1484 )
...
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
---------
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-05 21:10:29 +08:00
Linchen Xiao
6c9cd9a260
[Feature] Needlebench auto-download update ( #1480 )
...
* update
* update
* update
2024-09-05 17:22:42 +08:00
Linchen Xiao
9693be46b7
[Feature] Mmlu-pro auto-download ( #1464 )
...
* update
* update
* update
* update
* update
2024-08-30 10:03:40 +08:00
Linchen Xiao
245664f4c0
[Feature] Fullbench v0.1 language update ( #1463 )
...
* update
* update
* update
* update
2024-08-28 14:01:05 +08:00
Songyang Zhang
7c2d25b557
[Fix] Update SciCode and Gemma model ( #1449 )
...
* [Fix] Update SciCode and Gemma model
* Update
* Update
2024-08-23 10:42:27 +08:00
Hari Seldon
14b4b735cb
[Feature] Add support for SciCode ( #1417 )
...
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode w/ bg
* add scicode
* Update README.md
* Update README.md
* Delete configs/eval_SciCode.py
* rename
* 1
* rename
* Update README.md
* Update scicode.py
* Update scicode.py
* fix some bugs
* Update
* Update
---------
Co-authored-by: root <HariSeldon0>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-22 13:42:25 +08:00
Linchen Xiao
a4b54048ae
[Feature] Add Ruler datasets ( #1310 )
...
* [Feature] Add Ruler datasets
* pre-commit fixed
* Add model specific tokenizer to dataset
* pre-commit modified
* remove unused import
* fix linting
* add trust_remote to tokenizer load
* lint fix
* comments resolved
* fix lint
* Add readme
* Fix lint
* ruler refactorize
* fix lint
* lint fix
* updated
* lint fix
* fix wonderwords import issue
* prompt modified
* update
* readme updated
* update
* ruler dataset added
* Update
---------
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-20 11:40:11 +08:00
Xu Song
99b5122ed5
[Feature] Add abbr for rolebench dataset ( #1431 )
...
* Add abbr for rolebench dataset
* add
2024-08-20 11:22:48 +08:00
Linchen Xiao
ecf9bb3e4c
[Bug] Commonsenseqa dataset fix ( #1425 )
...
* longbench dataset load fix
* update
* Update
* Update
* Update
* update
* update
---------
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-16 15:54:07 +08:00
Songyang Zhang
9b3613f10b
[Update] Support auto-download of FOFO/MT-Bench-101 ( #1423 )
...
* [Update] Support auto-download of FOFO/MT-Bench-101
* Update wildbench
2024-08-16 11:57:41 +08:00
Linchen Xiao
8e55c9c6ee
[Update] Compassbench v1.3 ( #1396 )
...
* stash files
* compassbench subjective evaluation added
* evaluation update
* fix lint
* update docs
* Update lint
* changes saved
* changes saved
* CompassBench subjective summarizer added (#1349 )
* subjective summarizer added
* fix lint
[Fix] Fix MathBench (#1351 )
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
[Update] Update model support list (#1353 )
* fix pip version
* fix pip version
* update model support
subjective summarizer updated
knowledge, math objective done (data need update)
remove secrets
objective changes saved
knowledge data added
* secrets removed
* changed added
* summarizer modified
* summarizer modified
* compassbench coding added
* fix lint
* objective summarizer updated
* compass_bench_v1.3 updated
* update files in config folder
* remove unused model
* lcbench modified
* removed model evaluation configs
* remove duplicated sdk implementation
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-08-12 19:09:19 +08:00
Songyang Zhang
c81329b548
[Fix] Fix Slurm ENV ( #1392 )
...
1. Support Slurm Cluster
2. Support automatic data download
3. Update InternLM2.5-1.8B/20B-Chat
2024-08-06 01:35:20 +08:00
Songyang Zhang
c09fc79ba8
[Feature] Support OpenAI ChatCompletion ( #1389 )
...
* [Feature] Support import configs/models/summarizers from whl
* Update
* Update openai sdk
* Update
* Update gemma
2024-08-01 19:10:13 +08:00
Songyang Zhang
46cc7894e1
[Feature] Support import configs/models/summarizers from whl ( #1376 )
...
* [Feature] Support import configs/models/summarizers from whl
* Update LCBench configs
* Update
* Update
* Update
* Update
* update
* Update
* Update
* Update
* Update
* Update
2024-08-01 00:42:48 +08:00