OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
liushz	e49fcfd3a3	[Update] Update MATH dataset with model judge (#1711 ) * Update math with llm judge * Update math with llm judge * Update math with llm judge * Update math with llm judge * Update math with llm judge	2024-11-25 15:14:55 +08:00
Linchen Xiao	80e3b9ef37	[Update] Add math prm 800k (#1708 )	2024-11-21 21:29:43 +08:00
Linchen Xiao	500fb1032a	[Update] Update configurations (#1704 )	2024-11-21 16:51:18 +08:00
Linchen Xiao	40a9f0be0d	[Update] MUSR dataset config prefix update (#1692 )	2024-11-15 11:06:30 +08:00
abrohamLee	e9e4b69ddb	[Feature] MuSR Datset Evaluation (#1689 ) * MuSR Datset Evaluation * MuSR Datset Evaluation Add an assertion and a Readme.md	2024-11-14 20:42:12 +08:00
Linchen Xiao	e92a5d4230	[Feature] BABILong Dataset added (#1684 ) * update * update * update * update	2024-11-14 15:32:43 +08:00
Linchen Xiao	835bf75a36	[Feature] Add long context evaluation for base models (#1666 ) * [Update] Add base long context evaluation * update	2024-11-08 10:53:29 +08:00
liushz	f7d899823c	[Update] Update mmmlu_lite dataload (#1658 ) * update mmmlu_lite dataload from oss * update mmmlu_lite dataload from oss	2024-11-01 17:32:29 +08:00
Songyang Zhang	c789ce5698	[Fix] the automatically download for several datasets (#1652 ) * [Fix] the automatically download for several datasets * Update * Update * Update CI	2024-11-01 15:57:18 +08:00
bittersweet1999	a0853c939d	[Add] Add CompassArenaSubjectiveBench (#1645 ) * fix pip version * fix pip version * add compassarenasubjectivebench * add compassarenasubjectivebench * add compassarenabench	2024-11-01 13:52:22 +08:00
Chang Lan	46affab882	[Fix] Fix ruler_16k_gen (#1643 )	2024-10-29 17:58:43 +08:00
Linchen Xiao	8172af49bb	[Update] Update wildbench max_seq_len (#1648 ) * [Update] Wildbench max_seq_len update * [Update] Wildbench max_seq_len update	2024-10-29 13:21:31 +08:00
Junnan Liu	645c5f3b2c	[Datasets] Add datasets CMO&AIME (#1610 ) * add datasets cmo&aime * delete unused modules * modify prompt * update __init__ * update data load and add README * update data load * update performance * update md5 * remove indents * add indent * fix log for debug mode	2024-10-28 18:08:02 +08:00
Linchen Xiao	a61e8a0803	[Update] Internal humaneval add (#1641 ) * [Update] internal_humaneval_add * update	2024-10-25 19:08:42 +08:00
Chang Lan	a927bba1cf	[Fix] Fix RULER datasets (#1628 ) We need to ensure that we don't import anything that ends with "_datasets", or they will be picked up by the runner, leading to duplicate / unwanted datasets being evaluated.	2024-10-22 11:59:02 +08:00
Songyang Zhang	a4d5a6c81b	[Feature] Support LiveCodeBench (#1617 ) * Update * Update LCB * Update * Update * Update * Update * Update	2024-10-21 20:50:39 +08:00
liushz	500b44ba2d	[Fix] gpqa_few_shot_ppl prompt bug (#1627 )	2024-10-21 16:59:06 +08:00
bittersweet1999	a11e2b2fd4	[Fix] Compatible with old versions (#1616 ) * fix pip version * fix pip version * Compatible with old versions * compati old version * compati old version * compati old version * update configs	2024-10-21 10:16:29 +08:00
Bob Tsang	dd0b655bd0	[Feature] Support MMMLU & MMMLU-lite Benchmark (#1565 ) * rm folder * modify format according to reviewer * modify format according to reviewer * modify format according to reviewer * add some files requirement * fix some bug * fix bug * change load type * Update MMMLU Dataset * Update MMMLU Dataset * Add MMMLU-Lite Dataset * update MMMMLU datast * update MMMMLU datast * update MMMMLU datast --------- Co-authored-by: BobTsang <BobTsang1995@gmail.com> Co-authored-by: liushz <qq1791167085@163.com>	2024-10-17 19:09:34 +08:00
bittersweet1999	f0d436496e	[Update] update docs and add compassarena (#1614 ) * fix pip version * fix pip version * update docs and add compassarena * update docs	2024-10-17 14:39:06 +08:00
Haoran Que	4fe251729b	Upload HelloBench (#1607 ) * upload hellobench * update hellobench * update readme.md * update eval_hellobench.py * update lastest --------- Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>	2024-10-15 17:11:37 +08:00
bittersweet1999	fa54aa62f6	[Feature] Add Judgerbench and reorg subeval (#1593 ) * fix pip version * fix pip version * update (#1522) Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> * [Feature] Update Models (#1518) * Update Models * Update * Update humanevalx * Update * Update * [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527) add judgerbench and reorg sub add judgerbench and reorg subeval add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com> Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>	2024-10-15 16:36:05 +08:00
liushz	5faee929db	[Feature] Add GaoKaoMath Dataset for Evaluation & MATH Model Eval Config (#1589 ) * Add GaoKaoMath Dataset * Add MATH LLM Eval * Update GAOKAO Math Eval Dataset * Update GAOKAO Math Eval Dataset	2024-10-12 19:13:06 +08:00
bittersweet1999	3f7a3730d7	[Fix] fix Flames (#1599 ) * fix pip version * fix pip version * fix flames * fix flames	2024-10-12 14:34:59 +08:00
shijinpjlab	7528b8ab8a	[Feature] Add dingo test (#1529 ) * add qa dingo * update * change name qa to dingo * eval model: llm_base * update path * change name and move path * add eval_dingo * update import * add for pip * add dingo package * change import place * update import place * fix lint fail * isort * double quoted --------- Co-authored-by: sj <shijin@pjlab.org.cn>	2024-09-29 19:24:58 +08:00
Linchen Xiao	80cda1980e	[BUG] fix followbench dataset config (#1564 ) * [BUG] fix followbench dataset config * [BUG] fix followbench dataset config	2024-09-25 20:58:34 +08:00
liushz	83eeb52b09	[Feature] Update WikiBench base model config (#1553 ) * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update GPQA & MMLU_Pro * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & Math base config * Update WikiBench base model config --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-25 11:26:36 +08:00
liushz	a0cfd61129	[Feature] Update MathBench & Math base model config (#1550 ) * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update GPQA & MMLU_Pro * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & Math base config --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-23 14:03:59 +08:00
liushz	2e9db77d57	[Feature] Add custom model postprocess function (#1519 ) Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-18 14:40:51 +08:00
liushz	c9a7026f59	[Feature] Update MathBench & WikiBench for FullBench (#1521 ) * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update GPQA & MMLU_Pro * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-18 14:35:30 +08:00
Linchen Xiao	90279b6461	[Feature] Dataset prompts update for ARC, BoolQ, Race (#1527 )	2024-09-13 10:30:43 +08:00
bittersweet1999	7c7fa36235	[Feature] add support for internal Followbench (#1511 ) * fix pip version * fix pip version * add internal followbench * add internal followbench * fix lint * fix lint	2024-09-11 13:32:34 +08:00
bittersweet1999	c2bcd8725e	[Fix] Fix wildbench (#1508 ) * fix pip version * fix pip version * fix_wildbench	2024-09-10 17:35:07 +08:00
Alexander Lam	a31a77c5c1	[Feature] Add SciCode summarizer config (#1514 ) * [Feature] added SciCode summarizer config and dataset config for with background evaluation * fix lint issues * removed unnecessary type in summarizer group	2024-09-10 16:06:02 +08:00
Linchen Xiao	87ffa71d68	[Feature] Longbench dataset update	2024-09-06 15:50:12 +08:00
liushz	00fc8da5be	[Feature] Add model postprocess function (#1484 ) * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-05 21:10:29 +08:00
Linchen Xiao	6c9cd9a260	[Feature] Needlebench auto-download update (#1480 ) * update * update * update	2024-09-05 17:22:42 +08:00
Linchen Xiao	9693be46b7	[Feature] Mmlu-pro auto-download (#1464 ) * update * update * update * update * update	2024-08-30 10:03:40 +08:00
Linchen Xiao	245664f4c0	[Feature] Fullbench v0.1 language update (#1463 ) * update * update * update * update	2024-08-28 14:01:05 +08:00
Songyang Zhang	7c2d25b557	[Fix] Update SciCode and Gemma model (#1449 ) * [Fix] Update SciCode and Gemma model * Update * Update	2024-08-23 10:42:27 +08:00
Hari Seldon	14b4b735cb	[Feature] Add support for SciCode (#1417 ) * add SciCode * add SciCode * add SciCode * add SciCode * add SciCode * add SciCode * add SciCode * add SciCode w/ bg * add scicode * Update README.md * Update README.md * Delete configs/eval_SciCode.py * rename * 1 * rename * Update README.md * Update scicode.py * Update scicode.py * fix some bugs * Update * Update --------- Co-authored-by: root <HariSeldon0> Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-08-22 13:42:25 +08:00
Linchen Xiao	a4b54048ae	[Feature] Add Ruler datasets (#1310 ) * [Feature] Add Ruler datasets * pre-commit fixed * Add model specific tokenizer to dataset * pre-commit modified * remove unused import * fix linting * add trust_remote to tokenizer load * lint fix * comments resolved * fix lint * Add readme * Fix lint * ruler refactorize * fix lint * lint fix * updated * lint fix * fix wonderwords import issue * prompt modified * update * readme updated * update * ruler dataset added * Update --------- Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-08-20 11:40:11 +08:00
Xu Song	99b5122ed5	[Feature] Add abbr for rolebench dataset (#1431 ) * Add abbr for rolebench dataset * add	2024-08-20 11:22:48 +08:00
Linchen Xiao	ecf9bb3e4c	[Bug] Commonsenseqa dataset fix (#1425 ) * longbench dataset load fix * update * Update * Update * Update * update * update --------- Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-08-16 15:54:07 +08:00
Songyang Zhang	9b3613f10b	[Update] Support auto-download of FOFO/MT-Bench-101 (#1423 ) * [Update] Support auto-download of FOFO/MT-Bench-101 * Update wildbench	2024-08-16 11:57:41 +08:00
Linchen Xiao	8e55c9c6ee	[Update] Compassbench v1.3 (#1396 ) * stash files * compassbench subjective evaluation added * evaluation update * fix lint * update docs * Update lint * changes saved * changes saved * CompassBench subjective summarizer added (#1349) * subjective summarizer added * fix lint [Fix] Fix MathBench (#1351) Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn> [Update] Update model support list (#1353) * fix pip version * fix pip version * update model support subjective summarizer updated knowledge, math objective done (data need update) remove secrets objective changes saved knowledge data added * secrets removed * changed added * summarizer modified * summarizer modified * compassbench coding added * fix lint * objective summarizer updated * compass_bench_v1.3 updated * update files in config folder * remove unused model * lcbench modified * removed model evaluation configs * remove duplicated sdk implementation --------- Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>	2024-08-12 19:09:19 +08:00
Songyang Zhang	c81329b548	[Fix] Fix Slurm ENV (#1392 ) 1. Support Slurm Cluster 2. Support automatic data download 3. Update InternLM2.5-1.8B/20B-Chat	2024-08-06 01:35:20 +08:00
Songyang Zhang	c09fc79ba8	[Feature] Support OpenAI ChatCompletion (#1389 ) * [Feature] Support import configs/models/summarizers from whl * Update * Update openai sdk * Update * Update gemma	2024-08-01 19:10:13 +08:00
Songyang Zhang	46cc7894e1	[Feature] Support import configs/models/summarizers from whl (#1376 ) * [Feature] Support import configs/models/summarizers from whl * Update LCBench configs * Update * Update * Update * Update * update * Update * Update * Update * Update * Update	2024-08-01 00:42:48 +08:00

1 2

99 Commits