OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Myhs_phz	6f3c670b99	add qwen3 lmdeply (#2126 )	2025-05-27 19:41:13 +08:00
Jin Ye	6097186a95	[Datasets] MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 (#2064 ) * Add Datasets: MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 * Fix bugs for MedQA. Add info in dataset-index * Add version code for MedQA and ProteinLMBench * Add version code for MedQA and ProteinLMBench	2025-05-09 14:47:44 +08:00
Mo Li	ff3275edf0	[Update] Add Long-Context configs for Gemma, OREAL, and Qwen2.5 models (#2048 ) * [Update] Update Gemma, Oreal, Qwen Config * fix lint	2025-05-08 19:06:56 +08:00
Dongsheng Zhu	d62b69aaef	[Fix] Fix InternVL model config (#2068 ) * intervl-8b&38b * intervl adjustment * internvl fix	2025-05-07 15:51:18 +08:00
zhulinJulia24	f982d6278e	[CI] fix baseline score (#2000 ) * update * update * update * update * update * update * update * updaste * update * update * updaste * updaste * update * update * update * update * update * update * update * update	2025-04-03 19:32:36 +08:00
Dongsheng Zhu	330a6e5ca7	[Update] Add Intervl-8b&38b model configs (#1978 )	2025-04-01 11:51:37 +08:00
Myhs_phz	37307fa996	[Update] Add QWQ32b model config (#1959 ) Some checks are pending lint / lint (push) Waiting to run Details * feat qwq-32b * fix * feat phi_4 --------- Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>	2025-03-24 14:51:39 +08:00
Dongsheng Zhu	8a5029b121	[Feature] Add MultiPL-E & Code Evaluator (#1963 ) * multiple_code develop * multiple_code update * comments upadate * index upadate	2025-03-21 20:09:25 +08:00
Dongsheng Zhu	465e93e10e	[Update] Academic bench llm judge update (#1876 ) * BigCodeBench update * update LCBench * update LCBench 2 * update code * academicBench update * academic bench ifeval&math update * generic_llmjudge_aime_academic_postprocess delete * aime delete * postprocessors update * ifeval delete * update work_dir * linting * linting double-quote-string-fixer * r1-distill out_len update * fix lint --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>	2025-02-24 15:45:24 +08:00
Linchen Xiao	a6193b4c02	[Refactor] Code refactoarization (#1831 ) * Update * fix lint * update * fix lint	2025-01-20 19:17:38 +08:00
Linchen Xiao	531643e771	[Feature] Add support for InternLM3 (#1829 ) * update * update * update * update	2025-01-16 14:28:27 +08:00
Linchen Xiao	eadbdcb4cb	[Update] Update requirement and deepseek configurations (#1764 )	2024-12-17 10:16:47 +08:00
Songyang Zhang	0d8df541bc	[Update] Update O1-style Benchmark and Prompts (#1742 ) * Update JuderBench * Support O1-style Prompts * Update Code * Update OpenAI * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update * Update * Update * Update	2024-12-09 13:48:56 +08:00
Songyang Zhang	fb43dd1906	[Update] Update Skywork/Qwen-QwQ (#1728 ) * Update JuderBench * Support O1-style Prompts * Update Code * Update OpenAI * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update	2024-12-05 19:30:43 +08:00
Linchen Xiao	e2a290fd46	[Bump] Bump version to 0.3.7 (#1733 )	2024-12-03 19:34:57 +08:00
Linchen Xiao	9de27b4d85	[Update] Update max_out_len for datasets (#1726 ) * [Update] Update max_out_len for datasets * Update eval_regression_chat_objective_fullbench.py * Update eval_regression_chat.py * Update eval_regression_chat.py * Update oc_score_baseline_fullbench.yaml --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>	2024-12-02 11:42:07 +08:00
Yi Ding	bcb707dbfc	[Fix] Fix BailingAPI model (#1707 ) * [fix] sequence under the multiple samples * resolve the lint problems * change the parameter name * add another error code for retry * output the log for invalid response * format correction * update * update * update * update * add two model python files * update the default parameter * use random for delay * update the api example of bailing * remove the unnecessary parameter	2024-11-26 19:24:47 +08:00
Linchen Xiao	ef695e28e5	[Bug] Fix Korbench dataset module (#1717 )	2024-11-26 17:13:28 +08:00
Linchen Xiao	500fb1032a	[Update] Update configurations (#1704 )	2024-11-21 16:51:18 +08:00
Linchen Xiao	835bf75a36	[Feature] Add long context evaluation for base models (#1666 ) * [Update] Add base long context evaluation * update	2024-11-08 10:53:29 +08:00
Linchen Xiao	695738a89b	[Update] Add lmdeploy DeepSeek configs (#1656 ) * [Update] Add lmdeploy DeepSeek configs * update max out length	2024-11-01 15:34:23 +08:00
Linchen Xiao	5212ffe8e2	[Update] Add new model configs (#1653 )	2024-10-30 17:24:53 +08:00
Linchen Xiao	df57c08ccf	[Feature] Update Models, Summarizers (#1600 )	2024-10-29 18:37:15 +08:00
Songyang Zhang	a4d5a6c81b	[Feature] Support LiveCodeBench (#1617 ) * Update * Update LCB * Update * Update * Update * Update * Update	2024-10-21 20:50:39 +08:00
Linchen Xiao	096c347e7d	[Fix] Qwen 2.5 model config (#1626 ) * [Fix] Fix Qwen 2.5 model config * [Fix] Fix Qwen 2.5 model config * [Fix] Fix Qwen 2.5 model config	2024-10-21 16:58:18 +08:00
Lyu Han	b52ba65c26	[Feature] Integrate lmdeploy pipeline api (#1198 ) * integrate lmdeploy's pipeline api * fix linting * update user guide * rename * update * update * update * rollback class name * update * remove unused code * update * update * fix ci check * compatibility * remove concurrency * Update configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py * Update docs/zh_cn/advanced_guides/evaluation_lmdeploy.md * [Bug] fix lint --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-10-09 22:58:06 +08:00
Songyang Zhang	a7bacfdf7e	[Feature] Update CoreBench 2.0 (#1566 ) * [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI * Update * Update	2024-09-26 18:44:00 +08:00
Yi Ding	3f833186dc	[Feature] Support the reasoning from BaiLing LLM (#1541 ) * [Feature] Support the reasoning from BaiLing LLM This commit includes the access to BaiLing LLM and gets the reasoning. * Add the api example The example of evalute bailing api * Revise the generation arguments Based on current experiment, we update some generation arguments for better reasoning * [fix] set the batch size * Retry under flowcontrol of serverside * add dependent package into requirement.txt add dependent package retrying to clean up the pre-comment check. * correct the file names and make the file copy correct the file names. copy the files under configs to opencompass * fix the lint issue --------- Co-authored-by: christopher.dy <christopher.dy@antgroup.com>	2024-09-26 16:49:52 +08:00
Songyang Zhang	e7681943f3	[Feature] Update the max_out_len for many models (#1559 )	2024-09-24 21:52:28 +08:00
Songyang Zhang	5a27c2bd6f	[Model] Support Qwen2.5 Instruct (#1543 )	2024-09-19 16:16:07 +08:00
Songyang Zhang	be460fbb21	[Feature] Support OpenAI O1 models (#1539 ) * [Feature] Support OpenAI O1 models * Update README.md --------- Co-authored-by: liushz <qq1791167085@163.com>	2024-09-18 22:41:17 +08:00
Songyang Zhang	6997990c93	[Feature] Update Models (#1518 ) * Update Models * Update * Update humanevalx * Update * Update	2024-09-12 23:35:30 +08:00
Linchen Xiao	da74cbfa39	[Fix] Model configs update	2024-09-04 18:57:10 +08:00
Linchen Xiao	245664f4c0	[Feature] Fullbench v0.1 language update (#1463 ) * update * update * update * update	2024-08-28 14:01:05 +08:00
Songyang Zhang	7c2d25b557	[Fix] Update SciCode and Gemma model (#1449 ) * [Fix] Update SciCode and Gemma model * Update * Update	2024-08-23 10:42:27 +08:00
Linchen Xiao	8e55c9c6ee	[Update] Compassbench v1.3 (#1396 ) * stash files * compassbench subjective evaluation added * evaluation update * fix lint * update docs * Update lint * changes saved * changes saved * CompassBench subjective summarizer added (#1349) * subjective summarizer added * fix lint [Fix] Fix MathBench (#1351) Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn> [Update] Update model support list (#1353) * fix pip version * fix pip version * update model support subjective summarizer updated knowledge, math objective done (data need update) remove secrets objective changes saved knowledge data added * secrets removed * changed added * summarizer modified * summarizer modified * compassbench coding added * fix lint * objective summarizer updated * compass_bench_v1.3 updated * update files in config folder * remove unused model * lcbench modified * removed model evaluation configs * remove duplicated sdk implementation --------- Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>	2024-08-12 19:09:19 +08:00
Songyang Zhang	c81329b548	[Fix] Fix Slurm ENV (#1392 ) 1. Support Slurm Cluster 2. Support automatic data download 3. Update InternLM2.5-1.8B/20B-Chat	2024-08-06 01:35:20 +08:00
Songyang Zhang	c09fc79ba8	[Feature] Support OpenAI ChatCompletion (#1389 ) * [Feature] Support import configs/models/summarizers from whl * Update * Update openai sdk * Update * Update gemma	2024-08-01 19:10:13 +08:00
Songyang Zhang	46cc7894e1	[Feature] Support import configs/models/summarizers from whl (#1376 ) * [Feature] Support import configs/models/summarizers from whl * Update LCBench configs * Update * Update * Update * Update * update * Update * Update * Update * Update * Update	2024-08-01 00:42:48 +08:00

39 Commits