OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Junnan Liu	73c80953c6	[Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator (#1886 ) * support dataset repeat and g-pass compute for each evaluator * fix pre-commit errors * delete print * delete gpassk_evaluator and fix potential errors * change `repeat` to `n` * fix `repeat` to `n` in openicl_eval * update doc for multi-run and g-pass * update latex equation in doc * update eng doc for multi-run and g-pass * update datasets.md * update datasets.md * fix multi-line equation * fix multi-line equation * fix multi-line equation * fix multi-line equation * fix multi-line equation * fix multi-line equation * fix multi-line equation in zh_cn user_guides * mmodify pre-commit-zh-cn * recover pre-commit and edit math expr in doc * del [TIP] * del cite tag in doc * del extract_model param in livemathbench config	2025-02-26 19:43:12 +08:00
Junnan Liu	22a33d8759	[Update] Update LiveMathBench Hard Configs (#1826 ) * support G-Pass@k and livemathbench * fix bugs * fix comments of GPassKEvaluator * update saved details of GPassKEvaluator * update saved details of GPassKEvaluator * fix eval api configs & update openai_api for ease of debugging * update huggingface path * fix method name of G-Pass@k * fix default value of eval_model_name * refactor G-Pass@k evaluator * log generation params for each backend * fix evaluation resume * add notimplementerror * update livemathbench-hard configs * remove max_out_len from livemathbench_hard_greedy_gen_9befbf.py * remove max_out_len from livemathbench_hard_gen_9befbf.py * rename livemathbench_hard_gen_9befbf.py to livemathbench_hard_gen_353ae7.py * rename livemathbench_hard_greedy_gen_9befbf.py to livemathbench_hard_greedy_gen_353ae7.py * update livemathbench_gen_9befbf.py * remove whitespace * upload livemathbench hard configs	2025-02-25 17:24:36 +08:00
Songyang Zhang	8fdb72f567	[Update] Update o1 eval prompt (#1806 ) * Update XML prediction post-process * Update LiveMathBench * Update LiveMathBench * Update New O1 Evaluation	2025-01-07 00:14:32 +08:00
Junnan Liu	8e8d4f1c64	[Feature] Support G-Pass@k and LiveMathBench (#1772 ) * support G-Pass@k and livemathbench * fix bugs * fix comments of GPassKEvaluator * update saved details of GPassKEvaluator * update saved details of GPassKEvaluator * fix eval api configs & update openai_api for ease of debugging * update huggingface path * fix method name of G-Pass@k * fix default value of eval_model_name * refactor G-Pass@k evaluator * log generation params for each backend * fix evaluation resume * add notimplementerror	2024-12-30 16:59:39 +08:00
Songyang Zhang	0d8df541bc	[Update] Update O1-style Benchmark and Prompts (#1742 ) * Update JuderBench * Support O1-style Prompts * Update Code * Update OpenAI * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update * Update * Update * Update	2024-12-09 13:48:56 +08:00
Junnan Liu	f333be177c	[Update] Add MATH500 & AIME2024 to LiveMathBench (#1741 ) * upload dataset definitions & configs * add single dataset split specific metrics * add k-pass@threshold & MATH500 * update std computation & k-pass computation * add AIME224 * update README	2024-12-06 14:36:49 +08:00
Songyang Zhang	fb43dd1906	[Update] Update Skywork/Qwen-QwQ (#1728 ) * Update JuderBench * Support O1-style Prompts * Update Code * Update OpenAI * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update	2024-12-05 19:30:43 +08:00
Junnan Liu	6181ac1122	[Update] Update LiveMathBench Evaluation to Support Single Dataset Split Metric Computation (#1730 ) * upload dataset definitions & configs * add single dataset split specific metrics * add k-pass@threshold & MATH500	2024-12-05 16:54:16 +08:00
Junnan Liu	fe6d76fb13	[Feature] Support LiveMathBench (#1727 )	2024-11-30 00:07:19 +08:00

9 Commits