liushz
c437135fad
[Feature] Add Openai Simpleqa dataset ( #1720 )
...
* Add Openai SimpleQA dataset
* Add Openai SimpleQA dataset
* Add Openai SimpleQA dataset
* Update eval_simpleqa.py
---------
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2024-11-28 19:16:07 +08:00
wanyu2018umac
90efcf2216
[Feature] Add P-MMEval ( #1714 )
...
* Update with PMMEval
* Update
* Update __init__.py
* Fix Bugs
* Delete .pre-commit-config.yaml
* Pull merge
---------
Co-authored-by: liushz <qq1791167085@163.com>
2024-11-27 21:26:18 +08:00
Yufeng Zhao
300adc31e8
[Feature] Add Korbench dataset ( #1713 )
...
* first version for korbench
* first stage for korbench
* korbench_1
* korbench_1
* korbench_1
* korbench_1
* korbench_1_revised
* korbench_combined_1
* korbench_combined_1
* kor_combined
* kor_combined
* update
---------
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2024-11-25 20:11:27 +08:00
Chang Lan
5c1916ea4c
[Update] Add RULER 64k config ( #1709 )
2024-11-25 19:35:27 +08:00
abrohamLee
e9e4b69ddb
[Feature] MuSR Datset Evaluation ( #1689 )
...
* MuSR Datset Evaluation
* MuSR Datset Evaluation
Add an assertion and a Readme.md
2024-11-14 20:42:12 +08:00
Linchen Xiao
e92a5d4230
[Feature] BABILong Dataset added ( #1684 )
...
* update
* update
* update
* update
2024-11-14 15:32:43 +08:00
Linchen Xiao
835bf75a36
[Feature] Add long context evaluation for base models ( #1666 )
...
* [Update] Add base long context evaluation
* update
2024-11-08 10:53:29 +08:00
Bob Tsang
dd0b655bd0
[Feature] Support MMMLU & MMMLU-lite Benchmark ( #1565 )
...
* rm folder
* modify format according to reviewer
* modify format according to reviewer
* modify format according to reviewer
* add some files requirement
* fix some bug
* fix bug
* change load type
* Update MMMLU Dataset
* Update MMMLU Dataset
* Add MMMLU-Lite Dataset
* update MMMMLU datast
* update MMMMLU datast
* update MMMMLU datast
---------
Co-authored-by: BobTsang <BobTsang1995@gmail.com>
Co-authored-by: liushz <qq1791167085@163.com>
2024-10-17 19:09:34 +08:00
klein
24915aeb3f
[BUG] Update CIbench config( #1544 )
...
* BUG: Update cibench.py
* BUG: Update cibench.py
2024-09-23 18:32:27 +08:00
Songyang Zhang
6997990c93
[Feature] Update Models ( #1518 )
...
* Update Models
* Update
* Update humanevalx
* Update
* Update
2024-09-12 23:35:30 +08:00
Alexander Lam
a31a77c5c1
[Feature] Add SciCode summarizer config ( #1514 )
...
* [Feature] added SciCode summarizer config and dataset config for with background evaluation
* fix lint issues
* removed unnecessary type in summarizer group
2024-09-10 16:06:02 +08:00
Linchen Xiao
6c9cd9a260
[Feature] Needlebench auto-download update ( #1480 )
...
* update
* update
* update
2024-09-05 17:22:42 +08:00
liushz
9fdbc744dc
[Fix] Update option postprocess & mathbench language summarizer ( #1413 )
...
* Update option postprocess & mathbench language summarizer
* Update option postprocess & mathbench language summarizer
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-08-22 14:49:07 +08:00
Linchen Xiao
a4b54048ae
[Feature] Add Ruler datasets ( #1310 )
...
* [Feature] Add Ruler datasets
* pre-commit fixed
* Add model specific tokenizer to dataset
* pre-commit modified
* remove unused import
* fix linting
* add trust_remote to tokenizer load
* lint fix
* comments resolved
* fix lint
* Add readme
* Fix lint
* ruler refactorize
* fix lint
* lint fix
* updated
* lint fix
* fix wonderwords import issue
* prompt modified
* update
* readme updated
* update
* ruler dataset added
* Update
---------
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-20 11:40:11 +08:00
Linchen Xiao
8e55c9c6ee
[Update] Compassbench v1.3 ( #1396 )
...
* stash files
* compassbench subjective evaluation added
* evaluation update
* fix lint
* update docs
* Update lint
* changes saved
* changes saved
* CompassBench subjective summarizer added (#1349 )
* subjective summarizer added
* fix lint
[Fix] Fix MathBench (#1351 )
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
[Update] Update model support list (#1353 )
* fix pip version
* fix pip version
* update model support
subjective summarizer updated
knowledge, math objective done (data need update)
remove secrets
objective changes saved
knowledge data added
* secrets removed
* changed added
* summarizer modified
* summarizer modified
* compassbench coding added
* fix lint
* objective summarizer updated
* compass_bench_v1.3 updated
* update files in config folder
* remove unused model
* lcbench modified
* removed model evaluation configs
* remove duplicated sdk implementation
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-08-12 19:09:19 +08:00
Songyang Zhang
c09fc79ba8
[Feature] Support OpenAI ChatCompletion ( #1389 )
...
* [Feature] Support import configs/models/summarizers from whl
* Update
* Update openai sdk
* Update
* Update gemma
2024-08-01 19:10:13 +08:00
Songyang Zhang
46cc7894e1
[Feature] Support import configs/models/summarizers from whl ( #1376 )
...
* [Feature] Support import configs/models/summarizers from whl
* Update LCBench configs
* Update
* Update
* Update
* Update
* update
* Update
* Update
* Update
* Update
* Update
2024-08-01 00:42:48 +08:00