OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Myhs_phz	75e7834b59	[Feature] Add Datasets: ClimateQA,Physics (#2017 ) * feat ClimateQA * feat PHYSICS * fix * fix * fix * fix	2025-04-14 20:18:47 +08:00
Linchen Xiao	6a6a1a5c0b	[Feature] LLM Judge sanity check (#2012 ) * update * update	2025-04-11 19:01:39 +08:00
bittersweet1999	3f50b1dc49	[Fix] fix order bug Update arena_hard.py (#2015 )	2025-04-11 16:59:40 +08:00
Junnan Liu	20660ab507	[Fix] Fix compare error when k is list in base_evaluator (#2010 ) * fix gpass compare error of list k * fix compare error in 177	2025-04-10 19:47:21 +08:00
Linchen Xiao	12213207b6	[Refactor] Refactorize openicl eval task (#1990 ) * [Refactor] Refactorize openicl eval task * update	2025-04-09 15:52:23 +08:00
zhulinJulia24	6ac9b06bc2	[ci] update baseline for kernal change of vllm and lmdeploy (#2011 ) * update * update * update * update * update * update * update	2025-04-09 14:09:35 +08:00
Linchen Xiao	a05f9da134	[Feature] Make dump-eval-details default behavior (#1999 ) * Update * update * update	2025-04-08 14:42:26 +08:00
Myhs_phz	fd82bea747	[Fix] OpenICL Math Evaluator Config (#2007 ) * fix * fix recommended * fix * fix * fix * fix	2025-04-08 14:38:35 +08:00
Linchen Xiao	bb58cfc85d	[Feature] Add CascadeEvaluator (#1992 ) * [Feature] Add CascadeEvaluator * update * updat	2025-04-08 11:58:14 +08:00
Jin Ye	b564e608b1	[Dataset] Add MedXpertQA (#2002 ) * Add MedXpertQA * Add MedXpertQA * Add MedXpertQA * Fix lint --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>	2025-04-08 10:44:48 +08:00
shijinpjlab	828fb745c9	[Dataset] Update dingo 1.5.0 (#2008 ) Co-authored-by: shiin <shijin@pjlab.org.cn>	2025-04-07 17:21:15 +08:00
zhulinJulia24	f982d6278e	[CI] fix baseline score (#2000 ) * update * update * update * update * update * update * update * updaste * update * update * updaste * updaste * update * update * update * update * update * update * update * update	2025-04-03 19:32:36 +08:00
Myhs_phz	3a9a384173	[Doc] Fix links between zh & en (#2001 ) * test * test * test	2025-04-03 17:37:53 +08:00
Myhs_phz	9b489e9ea0	[Update] Revert math500 dataset configs (#1998 )	2025-04-03 15:11:02 +08:00
Linchen Xiao	dc8deb6af0	[BUMP] Bump version to 0.4.2 (#1997 )	2025-04-02 17:47:15 +08:00
liushz	32d6859679	[Feature] Add olymmath dataset (#1982 ) * Add olymmath dataset * Add olymmath dataset * Add olymmath dataset * Update olymmath dataset	2025-04-02 17:34:07 +08:00
zhulinJulia24	97236c8e97	[CI] Fix baseline score (#1996 ) * update * update * update * update	2025-04-02 14:25:16 +08:00
Linchen Xiao	f66b0b347a	[Update] Requirements update (#1993 )	2025-04-02 12:03:45 +08:00
Dongsheng Zhu	330a6e5ca7	[Update] Add Intervl-8b&38b model configs (#1978 )	2025-04-01 11:51:37 +08:00
Myhs_phz	f71eb78c72	[Doc] Add TBD Token in Datasets Statistics (#1986 ) * feat * doc * doc * doc * doc	2025-03-31 19:08:55 +08:00
Linchen Xiao	0f46c35211	[Bug] Aime2024 config fix (#1974 ) Some checks failed lint / lint (push) Has been cancelled Details * [Bug] Aime2024 config fix * fix	2025-03-25 17:57:11 +08:00
Myhs_phz	6118596362	[Feature] Add recommendation configs for datasets (#1937 ) * feat datasetrefine drop * fix datasets in fullbench_int3 * fix * fix * back * fix * fix and doc * feat * fix hook * fix * fix * fix * fix * fix * fix * fix * fix * fix * doc * fix * fix * Update dataset-index.yml	2025-03-25 14:54:13 +08:00
Linchen Xiao	07930b854a	[Update] Add Korbench config with no max_out_len (#1968 ) Some checks are pending lint / lint (push) Waiting to run Details * Add Korbench no max_out_len * Add Korbench no max_out_len	2025-03-24 18:38:06 +08:00
Myhs_phz	37307fa996	[Update] Add QWQ32b model config (#1959 ) Some checks are pending lint / lint (push) Waiting to run Details * feat qwq-32b * fix * feat phi_4 --------- Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>	2025-03-24 14:51:39 +08:00
Linchen Xiao	db96161a4e	[Update] Add SuperGPQA subset metrics (#1966 )	2025-03-24 14:25:12 +08:00
Linchen Xiao	aa05993922	[Update] Add dataset configurations of no max_out_len (#1967 ) * [Update] Add dataset configurations of no max_out_len * update test torch version * update test torch version * update test torch version * update test torch version	2025-03-24 14:24:12 +08:00
Linchen Xiao	64128916d0	[Update] Increase memory size for CPU job of VOLC Runner (#1962 ) * [Update] Increase memory size for CPU job of VOLC Runner * [Update] Increase memory size for CPU job of VOLC Runner	2025-03-24 11:21:14 +08:00
Dongsheng Zhu	8a5029b121	[Feature] Add MultiPL-E & Code Evaluator (#1963 ) * multiple_code develop * multiple_code update * comments upadate * index upadate	2025-03-21 20:09:25 +08:00
Linchen Xiao	b9de8b0e2b	[Update] Unset disallowed_special token for Openai model (#1960 )	2025-03-18 20:24:07 +08:00
Songyang Zhang	c98599271b	[Update] Update OlympiadBench and Update LLM Judge (#1954 )	2025-03-18 20:15:20 +08:00
Jason Cheung	5d2d253d83	[BUG] Fix model_kwargs pass logic for vllm (#1958 )	2025-03-18 20:08:15 +08:00
Linchen Xiao	0b7f76e193	[Bug] Fix Summarizer logic (#1953 )	2025-03-17 18:25:08 +08:00
Yufeng Zhao	15c825a51a	[Update] Bbeh harmony summarizer added (#1951 ) * bbeh * bbeh * fix_smallbugs_bbeh * removeprint * harmonic * update_summerizer * harmonic-tested * harmonic-tested * clean * clean * cleaned_rebased --------- Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>	2025-03-17 17:19:56 +08:00
Linchen Xiao	854c6bf025	[Update] Update requirement and base evaluator	2025-03-13 20:52:50 +08:00
Linchen Xiao	1c60e3a0f6	[Update] Add configurations for llmjudge dataset (#1940 ) * Add configurations for llmjudge dataset * update	2025-03-13 17:30:04 +08:00
liushz	709bc4af0e	[Update] Add AIME2025 oss info (#1936 ) * Support OlympiadBench Benchmark * Support OlympiadBench Benchmark * Support OlympiadBench Benchmark * update dataset path * Update olmpiadBench * Update olmpiadBench * Update olmpiadBench * Add HLE dataset * Add HLE dataset * Add HLE dataset * Add AIME2025 oss info --------- Co-authored-by: sudanl <sudanl@foxmail.com>	2025-03-12 18:41:16 +08:00
Yufeng Zhao	bc2969dba8	[Feature] Add support for BBEH dataset (#1925 ) * bbeh * bbeh * fix_smallbugs_bbeh * removeprint * results --------- Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>	2025-03-12 10:53:31 +08:00
Kangreen	59e49aedf1	[Feature] Support SuperGPQA (#1924 ) * support supergpqa * remove unnecessary code * remove unnecessary code * Add Readme * Add Readme * fix lint * fix lint * update * update --------- Co-authored-by: mkj3085003 <mkj3085003@gmail.com> Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>	2025-03-11 19:32:08 +08:00
Linchen Xiao	e403fd21be	[Fix] Fix math-verify evaluator (#1917 ) * update * update * update	2025-03-11 17:35:04 +08:00
Linchen Xiao	cbf84fb33c	[Feature] Update LLM Evaluation for MMLU-Pro (#1923 )	2025-03-07 21:01:20 +08:00
Myhs_phz	570c30cf1b	[Fix] Fix CLI option for results persistence (#1920 ) * fix * fix * fix	2025-03-07 18:24:30 +08:00
Shudong Liu	277d7946f5	[Fix] Fix typo in deepseed_r1.md (#1916 )	2025-03-05 19:37:22 +08:00
Myhs_phz	1585c0adbe	[Feature] Evaluation Results Persistence (#1894 ) * feat results_station.py * lint * feat save_to_station * feat result_station.py and lint * feat * fix * fix and lint * fix * fix subjective processing * fix * fix * style function name * lint	2025-03-05 18:33:34 +08:00
Myhs_phz	54324657f0	[Docs] Results persistance (#1908 ) * feat persistance.md * doc * doc * lint * doc * fix * doc	2025-03-05 18:23:54 +08:00
Dongsheng Zhu	fff2d51440	[Update] Code evaluation alignment (#1909 ) * code alignment * update oss md5 * bigcodebench update * lint * lint_ * lint yapf	2025-03-04 18:49:38 +08:00
Linchen Xiao	5547fd1592	[Bump] Bump version to 0.4.1	2025-03-04 18:26:14 +08:00
liushz	198c08632e	[Feature] Add HLE (Humanity's Last Exam) dataset (#1902 ) * Support OlympiadBench Benchmark * Support OlympiadBench Benchmark * Support OlympiadBench Benchmark * update dataset path * Update olmpiadBench * Update olmpiadBench * Update olmpiadBench * Add HLE dataset * Add HLE dataset * Add HLE dataset --------- Co-authored-by: sudanl <sudanl@foxmail.com>	2025-03-04 16:42:37 +08:00
Songyang Zhang	c84bc18ac1	[Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard (#1899 ) * [Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard with LLM Verify * Update * Update * Update DeepSeek-R1 example * Update DeepSeek-R1 example * Update DeepSeek-R1 example	2025-03-03 18:56:11 +08:00
Junnan Liu	f0809fe6f6	[Update] Fix Hard Configs With General GPassK (#1906 ) * support dataset repeat and g-pass compute for each evaluator * fix pre-commit errors * delete print * delete gpassk_evaluator and fix potential errors * change `repeat` to `n` * fix `repeat` to `n` in openicl_eval * update doc for multi-run and g-pass * update latex equation in doc * update eng doc for multi-run and g-pass * update datasets.md * update datasets.md * fix multi-line equation * fix multi-line equation * fix multi-line equation * fix multi-line equation * fix multi-line equation * fix multi-line equation * fix multi-line equation in zh_cn user_guides * mmodify pre-commit-zh-cn * recover pre-commit and edit math expr in doc * del [TIP] * del cite tag in doc * del extract_model param in livemathbench config * fix livemathbench hard configs	2025-03-03 18:17:15 +08:00
Linchen Xiao	6a573f671b	[Fix] Fix compatible issue	2025-03-03 15:35:57 +08:00

1 2 3 4 5 ...

908 Commits