OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Alexander Lam	1bd594fc62	[Feature] Added CompassArena-SubjectiveBench with Bradley-Terry Model (#1751 ) * fix lint issues * updated gitignore * changed infer_order from random to double for the pairwise_judge.py (not changing for pairwise_bt_judge.py * added return statement to CompassArenaBradleyTerrySummarizer to return overall score for each judger model	2024-12-16 13:41:28 +08:00
Linchen Xiao	bd7b705be4	[Update] Update dataset configuration with no max_out_len (#1754 )	2024-12-11 18:20:29 +08:00
OpenStellarTeam	1a5b3fc11e	Add Chinese SimpleQA config (#1697 ) * add chinese simpleqa config * add chinese simpleqa config * add chinese simpleqa config * add chinese simpleqa config * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * pdate Csimpleqa --------- Co-authored-by: 明念 <heyancheng.hyc@taobao.com> Co-authored-by: liushz <qq1791167085@163.com>	2024-12-11 18:03:39 +08:00
Linchen Xiao	0d26b348e4	[Feature] Add OC academic 2412 (#1750 )	2024-12-10 21:53:06 +08:00
bittersweet1999	54c0fb7a93	[Change] Change Compassarena metric (#1749 ) * fix pip version * fix pip version * fix summarizer bug * fix compassarena * fix compassarena * fix compassarena	2024-12-10 14:45:32 +08:00
Songyang Zhang	fb43dd1906	[Update] Update Skywork/Qwen-QwQ (#1728 ) * Update JuderBench * Support O1-style Prompts * Update Code * Update OpenAI * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update	2024-12-05 19:30:43 +08:00
Linchen Xiao	9de27b4d85	[Update] Update max_out_len for datasets (#1726 ) * [Update] Update max_out_len for datasets * Update eval_regression_chat_objective_fullbench.py * Update eval_regression_chat.py * Update eval_regression_chat.py * Update oc_score_baseline_fullbench.yaml --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>	2024-12-02 11:42:07 +08:00
liushz	c437135fad	[Feature] Add Openai Simpleqa dataset (#1720 ) * Add Openai SimpleQA dataset * Add Openai SimpleQA dataset * Add Openai SimpleQA dataset * Update eval_simpleqa.py --------- Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>	2024-11-28 19:16:07 +08:00
wanyu2018umac	90efcf2216	[Feature] Add P-MMEval (#1714 ) * Update with PMMEval * Update * Update __init__.py * Fix Bugs * Delete .pre-commit-config.yaml * Pull merge --------- Co-authored-by: liushz <qq1791167085@163.com>	2024-11-27 21:26:18 +08:00
Yi Ding	bcb707dbfc	[Fix] Fix BailingAPI model (#1707 ) * [fix] sequence under the multiple samples * resolve the lint problems * change the parameter name * add another error code for retry * output the log for invalid response * format correction * update * update * update * update * add two model python files * update the default parameter * use random for delay * update the api example of bailing * remove the unnecessary parameter	2024-11-26 19:24:47 +08:00
Songyang Zhang	f97c4eae42	[Update] Update Fullbench (#1712 ) * Update JuderBench * Support O1-style Prompts * Update Code	2024-11-26 14:26:55 +08:00
Yufeng Zhao	300adc31e8	[Feature] Add Korbench dataset (#1713 ) * first version for korbench * first stage for korbench * korbench_1 * korbench_1 * korbench_1 * korbench_1 * korbench_1_revised * korbench_combined_1 * korbench_combined_1 * kor_combined * kor_combined * update --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>	2024-11-25 20:11:27 +08:00
Chang Lan	5c1916ea4c	[Update] Add RULER 64k config (#1709 )	2024-11-25 19:35:27 +08:00
liushz	e49fcfd3a3	[Update] Update MATH dataset with model judge (#1711 ) * Update math with llm judge * Update math with llm judge * Update math with llm judge * Update math with llm judge * Update math with llm judge	2024-11-25 15:14:55 +08:00
Linchen Xiao	500fb1032a	[Update] Update configurations (#1704 )	2024-11-21 16:51:18 +08:00
Linchen Xiao	40a9f0be0d	[Update] MUSR dataset config prefix update (#1692 )	2024-11-15 11:06:30 +08:00
abrohamLee	e9e4b69ddb	[Feature] MuSR Datset Evaluation (#1689 ) * MuSR Datset Evaluation * MuSR Datset Evaluation Add an assertion and a Readme.md	2024-11-14 20:42:12 +08:00
Linchen Xiao	e92a5d4230	[Feature] BABILong Dataset added (#1684 ) * update * update * update * update	2024-11-14 15:32:43 +08:00
Linchen Xiao	835bf75a36	[Feature] Add long context evaluation for base models (#1666 ) * [Update] Add base long context evaluation * update	2024-11-08 10:53:29 +08:00
Songyang Zhang	c789ce5698	[Fix] the automatically download for several datasets (#1652 ) * [Fix] the automatically download for several datasets * Update * Update * Update CI	2024-11-01 15:57:18 +08:00
bittersweet1999	a0853c939d	[Add] Add CompassArenaSubjectiveBench (#1645 ) * fix pip version * fix pip version * add compassarenasubjectivebench * add compassarenasubjectivebench * add compassarenabench	2024-11-01 13:52:22 +08:00
Chang Lan	46affab882	[Fix] Fix ruler_16k_gen (#1643 )	2024-10-29 17:58:43 +08:00
Linchen Xiao	8172af49bb	[Update] Update wildbench max_seq_len (#1648 ) * [Update] Wildbench max_seq_len update * [Update] Wildbench max_seq_len update	2024-10-29 13:21:31 +08:00
Chang Lan	a927bba1cf	[Fix] Fix RULER datasets (#1628 ) We need to ensure that we don't import anything that ends with "_datasets", or they will be picked up by the runner, leading to duplicate / unwanted datasets being evaluated.	2024-10-22 11:59:02 +08:00
Songyang Zhang	a4d5a6c81b	[Feature] Support LiveCodeBench (#1617 ) * Update * Update LCB * Update * Update * Update * Update * Update	2024-10-21 20:50:39 +08:00
liushz	500b44ba2d	[Fix] gpqa_few_shot_ppl prompt bug (#1627 )	2024-10-21 16:59:06 +08:00
Linchen Xiao	096c347e7d	[Fix] Qwen 2.5 model config (#1626 ) * [Fix] Fix Qwen 2.5 model config * [Fix] Fix Qwen 2.5 model config * [Fix] Fix Qwen 2.5 model config	2024-10-21 16:58:18 +08:00
bittersweet1999	1188e1ecf0	[Update] eval_judgerbench.py (#1625 )	2024-10-21 15:30:29 +08:00
bittersweet1999	a11e2b2fd4	[Fix] Compatible with old versions (#1616 ) * fix pip version * fix pip version * Compatible with old versions * compati old version * compati old version * compati old version * update configs	2024-10-21 10:16:29 +08:00
bittersweet1999	f0d436496e	[Update] update docs and add compassarena (#1614 ) * fix pip version * fix pip version * update docs and add compassarena * update docs	2024-10-17 14:39:06 +08:00
Haoran Que	4fe251729b	Upload HelloBench (#1607 ) * upload hellobench * update hellobench * update readme.md * update eval_hellobench.py * update lastest --------- Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>	2024-10-15 17:11:37 +08:00
bittersweet1999	fa54aa62f6	[Feature] Add Judgerbench and reorg subeval (#1593 ) * fix pip version * fix pip version * update (#1522) Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> * [Feature] Update Models (#1518) * Update Models * Update * Update humanevalx * Update * Update * [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527) add judgerbench and reorg sub add judgerbench and reorg subeval add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com> Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>	2024-10-15 16:36:05 +08:00
liushz	5faee929db	[Feature] Add GaoKaoMath Dataset for Evaluation & MATH Model Eval Config (#1589 ) * Add GaoKaoMath Dataset * Add MATH LLM Eval * Update GAOKAO Math Eval Dataset * Update GAOKAO Math Eval Dataset	2024-10-12 19:13:06 +08:00
bittersweet1999	3f7a3730d7	[Fix] fix Flames (#1599 ) * fix pip version * fix pip version * fix flames * fix flames	2024-10-12 14:34:59 +08:00
Lyu Han	b52ba65c26	[Feature] Integrate lmdeploy pipeline api (#1198 ) * integrate lmdeploy's pipeline api * fix linting * update user guide * rename * update * update * update * rollback class name * update * remove unused code * update * update * fix ci check * compatibility * remove concurrency * Update configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py * Update docs/zh_cn/advanced_guides/evaluation_lmdeploy.md * [Bug] fix lint --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-10-09 22:58:06 +08:00
shijinpjlab	7528b8ab8a	[Feature] Add dingo test (#1529 ) * add qa dingo * update * change name qa to dingo * eval model: llm_base * update path * change name and move path * add eval_dingo * update import * add for pip * add dingo package * change import place * update import place * fix lint fail * isort * double quoted --------- Co-authored-by: sj <shijin@pjlab.org.cn>	2024-09-29 19:24:58 +08:00
Songyang Zhang	e8437db98f	[Feature] Update BailingLM/OpenAI verbose (#1568 ) * [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI * Update * [Feature] Update API * Update	2024-09-27 11:15:25 +08:00
Songyang Zhang	a7bacfdf7e	[Feature] Update CoreBench 2.0 (#1566 ) * [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI * Update * Update	2024-09-26 18:44:00 +08:00
Yi Ding	3f833186dc	[Feature] Support the reasoning from BaiLing LLM (#1541 ) * [Feature] Support the reasoning from BaiLing LLM This commit includes the access to BaiLing LLM and gets the reasoning. * Add the api example The example of evalute bailing api * Revise the generation arguments Based on current experiment, we update some generation arguments for better reasoning * [fix] set the batch size * Retry under flowcontrol of serverside * add dependent package into requirement.txt add dependent package retrying to clean up the pre-comment check. * correct the file names and make the file copy correct the file names. copy the files under configs to opencompass * fix the lint issue --------- Co-authored-by: christopher.dy <christopher.dy@antgroup.com>	2024-09-26 16:49:52 +08:00
Linchen Xiao	80cda1980e	[BUG] fix followbench dataset config (#1564 ) * [BUG] fix followbench dataset config * [BUG] fix followbench dataset config	2024-09-25 20:58:34 +08:00
Songyang Zhang	fe84bbd9a0	[Feature] Add Config for CoreBench (#1547 ) * [Feature] Add Config for CoreBench * Update	2024-09-25 11:36:43 +08:00
liushz	83eeb52b09	[Feature] Update WikiBench base model config (#1553 ) * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update GPQA & MMLU_Pro * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & Math base config * Update WikiBench base model config --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-25 11:26:36 +08:00
Songyang Zhang	e7681943f3	[Feature] Update the max_out_len for many models (#1559 )	2024-09-24 21:52:28 +08:00
klein	24915aeb3f	[BUG] Update CIbench config(#1544 ) * BUG: Update cibench.py * BUG: Update cibench.py	2024-09-23 18:32:27 +08:00
liushz	a0cfd61129	[Feature] Update MathBench & Math base model config (#1550 ) * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update GPQA & MMLU_Pro * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & Math base config --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-23 14:03:59 +08:00
Songyang Zhang	5a27c2bd6f	[Model] Support Qwen2.5 Instruct (#1543 )	2024-09-19 16:16:07 +08:00
Songyang Zhang	be460fbb21	[Feature] Support OpenAI O1 models (#1539 ) * [Feature] Support OpenAI O1 models * Update README.md --------- Co-authored-by: liushz <qq1791167085@163.com>	2024-09-18 22:41:17 +08:00
liushz	2e9db77d57	[Feature] Add custom model postprocess function (#1519 ) Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-18 14:40:51 +08:00
liushz	c9a7026f59	[Feature] Update MathBench & WikiBench for FullBench (#1521 ) * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update GPQA & MMLU_Pro * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-18 14:35:30 +08:00
Linchen Xiao	90279b6461	[Feature] Dataset prompts update for ARC, BoolQ, Race (#1527 )	2024-09-13 10:30:43 +08:00

1 2 3 4 5 ...

419 Commits