OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Linchen Xiao	a0ef2fd3b4	[Update] Dingo Dataset update (#1670 ) * [Update] Dingo Dataset update * update	2024-11-08 14:38:43 +08:00
Linchen Xiao	835bf75a36	[Feature] Add long context evaluation for base models (#1666 ) * [Update] Add base long context evaluation * update	2024-11-08 10:53:29 +08:00
Chang Cheng	fd7aa83c01	[Update] Update DLC Runner(#1662 ) * push interntrain hard code * push interntrain hard code * remove redundant post process --------- Co-authored-by: changcheng <changcheng@pjlab.org.cb> Co-authored-by: changcheng <changcheng@pjlab.org.cn>	2024-11-07 15:45:35 +08:00
Linchen Xiao	db258eb7d5	[Bump] Bump version to v0.3.5 (#1657 )	2024-11-03 21:23:35 +08:00
Lyu Han	888f1f3bef	[Fix] Update loglikehood compatibility (#1659 )	2024-11-02 17:19:11 +08:00
liushz	f7d899823c	[Update] Update mmmlu_lite dataload (#1658 ) * update mmmlu_lite dataload from oss * update mmmlu_lite dataload from oss	2024-11-01 17:32:29 +08:00
Songyang Zhang	c789ce5698	[Fix] the automatically download for several datasets (#1652 ) * [Fix] the automatically download for several datasets * Update * Update * Update CI	2024-11-01 15:57:18 +08:00
Linchen Xiao	695738a89b	[Update] Add lmdeploy DeepSeek configs (#1656 ) * [Update] Add lmdeploy DeepSeek configs * update max out length	2024-11-01 15:34:23 +08:00
bittersweet1999	a0853c939d	[Add] Add CompassArenaSubjectiveBench (#1645 ) * fix pip version * fix pip version * add compassarenasubjectivebench * add compassarenasubjectivebench * add compassarenabench	2024-11-01 13:52:22 +08:00
Songyang Zhang	d611907d14	[Doc] Update Doc (#1655 )	2024-10-31 18:08:09 +08:00
Linchen Xiao	5212ffe8e2	[Update] Add new model configs (#1653 )	2024-10-30 17:24:53 +08:00
Linchen Xiao	df57c08ccf	[Feature] Update Models, Summarizers (#1600 )	2024-10-29 18:37:15 +08:00
Linchen Xiao	d91d66792a	[Update] Update Needlebench OSS path (#1651 )	2024-10-29 18:05:44 +08:00
Chang Lan	46affab882	[Fix] Fix ruler_16k_gen (#1643 )	2024-10-29 17:58:43 +08:00
Linchen Xiao	8172af49bb	[Update] Update wildbench max_seq_len (#1648 ) * [Update] Wildbench max_seq_len update * [Update] Wildbench max_seq_len update	2024-10-29 13:21:31 +08:00
Junnan Liu	645c5f3b2c	[Datasets] Add datasets CMO&AIME (#1610 ) * add datasets cmo&aime * delete unused modules * modify prompt * update __init__ * update data load and add README * update data load * update performance * update md5 * remove indents * add indent * fix log for debug mode	2024-10-28 18:08:02 +08:00
Linchen Xiao	9c39cb68d4	[Bump] Bump version to 0.3.4 (#1639 )	2024-10-25 20:10:16 +08:00
Linchen Xiao	a61e8a0803	[Update] Internal humaneval add (#1641 ) * [Update] internal_humaneval_add * update	2024-10-25 19:08:42 +08:00
Songyang Zhang	84be90669b	[Update] Fix issue of *_param.py, avoid name conflict;add keep_tmp_file flag to support keep the temp config file. (#1640 )	2024-10-25 16:39:25 +08:00
BigDong	2542bc6907	[Feature] Support results saving as md format table (#1638 )	2024-10-25 15:50:33 +08:00
Linchen Xiao	22fdea4bf2	[Update] Update DLC runner (#1637 )	2024-10-24 21:36:16 +08:00
Lyu Han	fb12c3f98a	[Update] strip stop_words (#1635 )	2024-10-24 20:39:20 +08:00
Linchen Xiao	662dddf41a	[Update] Add internal humaneval postprocess (#1636 )	2024-10-24 17:45:21 +08:00
Linchen Xiao	be3c06a158	[Fix] Update common summarizer regex extraction (#1631 )	2024-10-22 14:35:45 +08:00
Chang Lan	a927bba1cf	[Fix] Fix RULER datasets (#1628 ) We need to ensure that we don't import anything that ends with "_datasets", or they will be picked up by the runner, leading to duplicate / unwanted datasets being evaluated.	2024-10-22 11:59:02 +08:00
Songyang Zhang	a4d5a6c81b	[Feature] Support LiveCodeBench (#1617 ) * Update * Update LCB * Update * Update * Update * Update * Update	2024-10-21 20:50:39 +08:00
Chenguang Li	5868d5afa4	[Bug] Fix-NPU-Support (#1618 ) * bugfix NPU support * formatting --------- Co-authored-by: noemotiovon <noemotiovon@gmail.com>	2024-10-21 17:42:53 +08:00
liushz	500b44ba2d	[Fix] gpqa_few_shot_ppl prompt bug (#1627 )	2024-10-21 16:59:06 +08:00
Linchen Xiao	096c347e7d	[Fix] Qwen 2.5 model config (#1626 ) * [Fix] Fix Qwen 2.5 model config * [Fix] Fix Qwen 2.5 model config * [Fix] Fix Qwen 2.5 model config	2024-10-21 16:58:18 +08:00
bittersweet1999	1188e1ecf0	[Update] eval_judgerbench.py (#1625 )	2024-10-21 15:30:29 +08:00
zhulinJulia24	825d3388d5	[CI] Test PR staging fixed (#1624 ) * Update oc_score_baseline.yaml * Update runtime.txt	2024-10-21 11:02:37 +08:00
bittersweet1999	a11e2b2fd4	[Fix] Compatible with old versions (#1616 ) * fix pip version * fix pip version * Compatible with old versions * compati old version * compati old version * compati old version * update configs	2024-10-21 10:16:29 +08:00
Lyu Han	6e8adf5221	[Bug] Remove prefix bos_token from messages when using lmdeploy as the accelerator (#1623 ) * remove prefix bos_token from messages when using lmdeploy as the accelerator * update	2024-10-19 20:03:47 +08:00
zhulinJulia24	b89c7b2fc3	[CI] Update daily-run-test.yml (#1620 )	2024-10-18 18:30:35 +08:00
Bob Tsang	dd0b655bd0	[Feature] Support MMMLU & MMMLU-lite Benchmark (#1565 ) * rm folder * modify format according to reviewer * modify format according to reviewer * modify format according to reviewer * add some files requirement * fix some bug * fix bug * change load type * Update MMMLU Dataset * Update MMMLU Dataset * Add MMMLU-Lite Dataset * update MMMMLU datast * update MMMMLU datast * update MMMMLU datast --------- Co-authored-by: BobTsang <BobTsang1995@gmail.com> Co-authored-by: liushz <qq1791167085@163.com>	2024-10-17 19:09:34 +08:00
bittersweet1999	f0d436496e	[Update] update docs and add compassarena (#1614 ) * fix pip version * fix pip version * update docs and add compassarena * update docs	2024-10-17 14:39:06 +08:00
Haoran Que	4fe251729b	Upload HelloBench (#1607 ) * upload hellobench * update hellobench * update readme.md * update eval_hellobench.py * update lastest --------- Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>	2024-10-15 17:11:37 +08:00
bittersweet1999	fa54aa62f6	[Feature] Add Judgerbench and reorg subeval (#1593 ) * fix pip version * fix pip version * update (#1522) Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> * [Feature] Update Models (#1518) * Update Models * Update * Update humanevalx * Update * Update * [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527) add judgerbench and reorg sub add judgerbench and reorg subeval add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com> Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>	2024-10-15 16:36:05 +08:00
x54-729	2b1afa7d1e	[Fix] fix interntrain's tokenizer truncate (#1605 ) Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>	2024-10-15 16:03:57 +08:00
zhulinJulia24	8aba547e06	[ci] fix stable issue of daily test (#1602 ) * update * update * update * Update daily-run-test.yml * update * Update daily-run-test.yml * update * update * update * Update pr-run-test.yml * Update pr-run-test.yml * update * update * Update daily-run-test.yml * update * update * update * update * Update daily-run-test.yml * Update daily-run-test.yml * updaste --------- Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>	2024-10-15 10:14:49 +08:00
Linchen Xiao	f390697a5e	[Fix] Update dlc runner python env (#1604 )	2024-10-14 15:50:21 +08:00
Lyu Han	4fde41036f	[Feature] Update TurboMindModel by integrating lmdeploy pipeline API (#1556 ) * integrate lmdeploy's pipeline api * fix linting * update user guide * rename * update * update * update * rollback class name * update * remove unused code * update * update * use pipeline * fix ci check * compatibility * compatibility * remove concurrency * update * fix table content * update	2024-10-14 15:33:40 +08:00
liushz	5faee929db	[Feature] Add GaoKaoMath Dataset for Evaluation & MATH Model Eval Config (#1589 ) * Add GaoKaoMath Dataset * Add MATH LLM Eval * Update GAOKAO Math Eval Dataset * Update GAOKAO Math Eval Dataset	2024-10-12 19:13:06 +08:00
Linchen Xiao	69997f11f8	[Feature] Update requirements.txt (#1601 ) * update crb * update crbbench * update crbbench * update crbbench * minor update wildbench * [Fix] Update doc of wildbench, and merge wildbench into subjective * [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench * Update crb.md * Update crb_pair_judge.py * Update crb_single_judge.py * Update subjective_evaluation.md * Update openai_api.py * [Update] update wildbench readme * [Update] update wildbench readme * [Update] update wildbench readme, remove crb * Delete configs/eval_subjective_wildbench_pair.py * Delete configs/eval_subjective_wildbench_single.py * Update __init__.py * [Fix] fix version mismatch for CIBench * [Fix] fix version mismatch for CIBench, local runer * [Fix] fix version mismatch for CIBench, local runer, remove oracle mode * BUG: Update cibench.py * BUG: Update cibench.py * [Bug] Update agent.txt * update agent * Update agent.txt * update readme * update --------- Co-authored-by: kleinzcy <zhangchy2@shanghaitech.edu.cn> Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>	2024-10-12 18:26:57 +08:00
bittersweet1999	3f7a3730d7	[Fix] fix Flames (#1599 ) * fix pip version * fix pip version * fix flames * fix flames	2024-10-12 14:34:59 +08:00
Lyu Han	b52ba65c26	[Feature] Integrate lmdeploy pipeline api (#1198 ) * integrate lmdeploy's pipeline api * fix linting * update user guide * rename * update * update * update * rollback class name * update * remove unused code * update * update * fix ci check * compatibility * remove concurrency * Update configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py * Update docs/zh_cn/advanced_guides/evaluation_lmdeploy.md * [Bug] fix lint --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-10-09 22:58:06 +08:00
Songyang Zhang	d2ab51abbd	[Bug] Fix pre-commit hook (#1592 )	2024-10-09 17:09:48 +08:00
x54-729	4d6349dfe1	[FIX] fix interntrain get_loglikelihood (#1584 )	2024-10-08 11:34:04 +08:00
zhulinJulia24	89abcba486	[CI] Fix testcase failure (#1582 ) * update * Update oc_score_baseline.yaml * Update daily-run-test.yml * Update daily-run-test.yml * Update daily-run-test.yml * Update daily-run-test.yml --------- Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>	2024-10-02 12:30:38 +08:00
Linchen Xiao	22a4e76511	[BUMP] Bump version to 0.3.3 (#1581 )	2024-09-30 16:57:41 +08:00

... 3 4 5 6 7 ...

953 Commits