OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Linchen Xiao	e8bc8c1e8c	[Bug] Concat OpenaiSDK reasoning content (#2041 ) * [Bug] Concat OpenaiSDK reasoning content * [Bug] Concat OpenaiSDK reasoning content * update * update	2025-04-25 14:10:33 +08:00
Linchen Xiao	65ff602cf5	[Update] Fix LLM Judge metrics cacluation & Add reasoning content concat to OpenAI SDK	2025-04-15 11:33:16 +08:00
Linchen Xiao	f66b0b347a	[Update] Requirements update (#1993 )	2025-04-02 12:03:45 +08:00
Linchen Xiao	b9de8b0e2b	[Update] Unset disallowed_special token for Openai model (#1960 )	2025-03-18 20:24:07 +08:00
Songyang Zhang	c84bc18ac1	[Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard (#1899 ) * [Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard with LLM Verify * Update * Update * Update DeepSeek-R1 example * Update DeepSeek-R1 example * Update DeepSeek-R1 example	2025-03-03 18:56:11 +08:00
Junnan Liu	22a33d8759	[Update] Update LiveMathBench Hard Configs (#1826 ) * support G-Pass@k and livemathbench * fix bugs * fix comments of GPassKEvaluator * update saved details of GPassKEvaluator * update saved details of GPassKEvaluator * fix eval api configs & update openai_api for ease of debugging * update huggingface path * fix method name of G-Pass@k * fix default value of eval_model_name * refactor G-Pass@k evaluator * log generation params for each backend * fix evaluation resume * add notimplementerror * update livemathbench-hard configs * remove max_out_len from livemathbench_hard_greedy_gen_9befbf.py * remove max_out_len from livemathbench_hard_gen_9befbf.py * rename livemathbench_hard_gen_9befbf.py to livemathbench_hard_gen_353ae7.py * rename livemathbench_hard_greedy_gen_9befbf.py to livemathbench_hard_greedy_gen_353ae7.py * update livemathbench_gen_9befbf.py * remove whitespace * upload livemathbench hard configs	2025-02-25 17:24:36 +08:00
Linchen Xiao	d7daee6e25	[Update] OpenAI model update, bigcodebench update (#1879 ) * [Update] Openai model update, bigcodebench update * update	2025-02-20 19:33:25 +08:00
Junnan Liu	70f2c963d3	[Feature] Support Omni-Math (#1837 ) * support omni-math * update config * upload README * Delete opencompass/configs/datasets/omni_math/__init__.py --------- Co-authored-by: liushz <qq1791167085@163.com>	2025-01-23 18:36:54 +08:00
Linchen Xiao	03415b2a66	[Fix] Update max_out_len logic for OpenAI model (#1839 )	2025-01-21 15:46:14 +08:00
Linchen Xiao	a6193b4c02	[Refactor] Code refactoarization (#1831 ) * Update * fix lint * update * fix lint	2025-01-20 19:17:38 +08:00
Linchen Xiao	117dc500ad	[Feature] Add Longbenchv2 support (#1801 ) * Create eval_longbenchv2.py * Create longbenchv2_gen.py * Update __init__.py * Create longbenchv2.py * Update datasets_info.py * update * update * update * update * update * update --------- Co-authored-by: abrohamLee <146956824+abrohamLee@users.noreply.github.com>	2025-01-03 12:04:29 +08:00
Junnan Liu	8e8d4f1c64	[Feature] Support G-Pass@k and LiveMathBench (#1772 ) * support G-Pass@k and livemathbench * fix bugs * fix comments of GPassKEvaluator * update saved details of GPassKEvaluator * update saved details of GPassKEvaluator * fix eval api configs & update openai_api for ease of debugging * update huggingface path * fix method name of G-Pass@k * fix default value of eval_model_name * refactor G-Pass@k evaluator * log generation params for each backend * fix evaluation resume * add notimplementerror	2024-12-30 16:59:39 +08:00
Junnan Liu	499302857f	[Fix] Fix Local Runner Params Save Path (#1768 ) * update local runner params save dir * fix remove * fix directory remove * Fix *_params.py by uuid4	2024-12-19 16:07:34 +08:00
Songyang Zhang	0d8df541bc	[Update] Update O1-style Benchmark and Prompts (#1742 ) * Update JuderBench * Support O1-style Prompts * Update Code * Update OpenAI * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update * Update * Update * Update	2024-12-09 13:48:56 +08:00
Songyang Zhang	fb43dd1906	[Update] Update Skywork/Qwen-QwQ (#1728 ) * Update JuderBench * Support O1-style Prompts * Update Code * Update OpenAI * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update	2024-12-05 19:30:43 +08:00
Linchen Xiao	9de27b4d85	[Update] Update max_out_len for datasets (#1726 ) * [Update] Update max_out_len for datasets * Update eval_regression_chat_objective_fullbench.py * Update eval_regression_chat.py * Update eval_regression_chat.py * Update oc_score_baseline_fullbench.yaml --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>	2024-12-02 11:42:07 +08:00
Yi Ding	bcb707dbfc	[Fix] Fix BailingAPI model (#1707 ) * [fix] sequence under the multiple samples * resolve the lint problems * change the parameter name * add another error code for retry * output the log for invalid response * format correction * update * update * update * update * add two model python files * update the default parameter * use random for delay * update the api example of bailing * remove the unnecessary parameter	2024-11-26 19:24:47 +08:00
Songyang Zhang	f97c4eae42	[Update] Update Fullbench (#1712 ) * Update JuderBench * Support O1-style Prompts * Update Code	2024-11-26 14:26:55 +08:00
Linchen Xiao	80e3b9ef37	[Update] Add math prm 800k (#1708 )	2024-11-21 21:29:43 +08:00
Linchen Xiao	500fb1032a	[Update] Update configurations (#1704 )	2024-11-21 16:51:18 +08:00
Yi Ding	05044dfaf2	[Update] Support new error code for Bailing model (#1702 ) * support new error code * fix the lint problems	2024-11-20 16:40:22 +08:00
bittersweet1999	aca8ec3c6a	[Hotfix] Hotfix (#1683 ) * fix pip version * fix pip version * fix lint * hotfix	2024-11-13 10:14:27 +08:00
sobeit	3ec178f4a9	add single lora adapter support for vLLM inference. (#1679 )	2024-11-12 17:31:36 +08:00
bittersweet1999	17b5e52f6c	[Hotfix] lmdeploy temp (#1674 ) * fix pip version * fix pip version * hotfix	2024-11-12 16:10:16 +08:00
Linchen Xiao	835bf75a36	[Feature] Add long context evaluation for base models (#1666 ) * [Update] Add base long context evaluation * update	2024-11-08 10:53:29 +08:00
Chang Cheng	fd7aa83c01	[Update] Update DLC Runner(#1662 ) * push interntrain hard code * push interntrain hard code * remove redundant post process --------- Co-authored-by: changcheng <changcheng@pjlab.org.cb> Co-authored-by: changcheng <changcheng@pjlab.org.cn>	2024-11-07 15:45:35 +08:00
Lyu Han	888f1f3bef	[Fix] Update loglikehood compatibility (#1659 )	2024-11-02 17:19:11 +08:00
Linchen Xiao	df57c08ccf	[Feature] Update Models, Summarizers (#1600 )	2024-10-29 18:37:15 +08:00
Lyu Han	fb12c3f98a	[Update] strip stop_words (#1635 )	2024-10-24 20:39:20 +08:00
Chenguang Li	5868d5afa4	[Bug] Fix-NPU-Support (#1618 ) * bugfix NPU support * formatting --------- Co-authored-by: noemotiovon <noemotiovon@gmail.com>	2024-10-21 17:42:53 +08:00
Lyu Han	6e8adf5221	[Bug] Remove prefix bos_token from messages when using lmdeploy as the accelerator (#1623 ) * remove prefix bos_token from messages when using lmdeploy as the accelerator * update	2024-10-19 20:03:47 +08:00
x54-729	2b1afa7d1e	[Fix] fix interntrain's tokenizer truncate (#1605 ) Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>	2024-10-15 16:03:57 +08:00
Lyu Han	4fde41036f	[Feature] Update TurboMindModel by integrating lmdeploy pipeline API (#1556 ) * integrate lmdeploy's pipeline api * fix linting * update user guide * rename * update * update * update * rollback class name * update * remove unused code * update * update * use pipeline * fix ci check * compatibility * compatibility * remove concurrency * update * fix table content * update	2024-10-14 15:33:40 +08:00
Lyu Han	b52ba65c26	[Feature] Integrate lmdeploy pipeline api (#1198 ) * integrate lmdeploy's pipeline api * fix linting * update user guide * rename * update * update * update * rollback class name * update * remove unused code * update * update * fix ci check * compatibility * remove concurrency * Update configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py * Update docs/zh_cn/advanced_guides/evaluation_lmdeploy.md * [Bug] fix lint --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-10-09 22:58:06 +08:00
x54-729	4d6349dfe1	[FIX] fix interntrain get_loglikelihood (#1584 )	2024-10-08 11:34:04 +08:00
x54-729	bbdca5eb4c	[BUG] Fix eos token handling and add comments for InternTrain (#1569 ) Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>	2024-09-30 15:46:06 +08:00
Yi Ding	85a28874aa	[BUG]: Fix Bailing API configs (#1570 )	2024-09-27 11:56:57 +08:00
Songyang Zhang	e8437db98f	[Feature] Update BailingLM/OpenAI verbose (#1568 ) * [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI * Update * [Feature] Update API * Update	2024-09-27 11:15:25 +08:00
Songyang Zhang	7d50294117	[Feature] Update Bailing (#1567 ) * [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI * Update * Update * Update	2024-09-26 18:56:17 +08:00
Songyang Zhang	a7bacfdf7e	[Feature] Update CoreBench 2.0 (#1566 ) * [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI * Update * Update	2024-09-26 18:44:00 +08:00
Yi Ding	3f833186dc	[Feature] Support the reasoning from BaiLing LLM (#1541 ) * [Feature] Support the reasoning from BaiLing LLM This commit includes the access to BaiLing LLM and gets the reasoning. * Add the api example The example of evalute bailing api * Revise the generation arguments Based on current experiment, we update some generation arguments for better reasoning * [fix] set the batch size * Retry under flowcontrol of serverside * add dependent package into requirement.txt add dependent package retrying to clean up the pre-comment check. * correct the file names and make the file copy correct the file names. copy the files under configs to opencompass * fix the lint issue --------- Co-authored-by: christopher.dy <christopher.dy@antgroup.com>	2024-09-26 16:49:52 +08:00
x54-729	335667183a	[Feature] Add Interntrain model support (#1548 ) Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>	2024-09-23 19:10:26 +08:00
Songyang Zhang	ee058e25b2	[Feature] Support verbose for OpenAI API (#1546 )	2024-09-20 17:12:52 +08:00
hailsham	a81bbb85bf	[FIX] Added handling for the "begin section" in meta_template to APITemplateParser (#1405 ) Co-authored-by: leifei <nuuooo@icloud.com>	2024-09-19 18:12:04 +08:00
Songyang Zhang	be460fbb21	[Feature] Support OpenAI O1 models (#1539 ) * [Feature] Support OpenAI O1 models * Update README.md --------- Co-authored-by: liushz <qq1791167085@163.com>	2024-09-18 22:41:17 +08:00
zhulinJulia24	3754dc1b67	update (#1522 ) Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>	2024-09-12 15:00:52 +08:00
Albert Yan	928d0cfc3a	[Feature] Add support for Rendu API (#1468 ) * Add support for Rendu API * fix lint issue * fix lint issue * fix lint issue * Update --------- Co-authored-by: 13190 <zeyu.yan@transn.com> Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-09-06 01:00:43 +08:00
Maxime SHE	45efdc994d	[Feature] Add an attribute api_key into TurboMindAPIModel default None (#1475 ) Co-authored-by: Maxime <maximeshe@163.com> Add an attribute api_key into TurboMindAPIModel default None then we can set the api_key while using lmdeploy to deploy the llm model	2024-09-05 17:51:16 +08:00
zhulinJulia24	716d46e1f5	[ci] fix badcase and add env info (#1491 ) * update * update --------- Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>	2024-09-05 16:43:45 +08:00
zhulinJulia24	fb6a0df652	[ci] fix test env for vllm and add vllm baselines (#1481 ) * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update --------- Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>	2024-09-04 19:24:09 +08:00

1 2 3 4

173 Commits