OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
jnanliu	b0330ef1c6	change `repeat` to `n`	2025-02-24 08:11:27 +00:00
jnanliu	2349fcff2c	delete gpassk_evaluator and fix potential errors	2025-02-24 06:25:17 +00:00
jnanliu	8def69369a	support dataset repeat and g-pass compute for each evaluator	2025-02-23 03:05:42 +00:00
Junnan Liu	046b6f75c6	[Update] Update Greedy Config & README of LiveMathBench (#1862 ) * support omni-math * update config * upload README * Delete opencompass/configs/datasets/omni_math/__init__.py * update greedy config & README of LiveMathBench * update intro for max_out_len * rename livemathbench greedy confi * delete greedy config --------- Co-authored-by: liushz <qq1791167085@163.com>	2025-02-20 19:47:04 +08:00
Songyang Zhang	f1e50d4bf0	[Update] Update LiveMathBench (#1809 ) * Update LiveMathBench * Update New O1 Evaluation * Update O1 evaluation	2025-01-07 19:16:12 +08:00
Songyang Zhang	98435dd98e	[Feature] Update o1 evaluation with JudgeLLM (#1795 ) * Update Generic LLM Evaluator * Update o1 style evaluator	2024-12-30 17:31:00 +08:00
Junnan Liu	8e8d4f1c64	[Feature] Support G-Pass@k and LiveMathBench (#1772 ) * support G-Pass@k and livemathbench * fix bugs * fix comments of GPassKEvaluator * update saved details of GPassKEvaluator * update saved details of GPassKEvaluator * fix eval api configs & update openai_api for ease of debugging * update huggingface path * fix method name of G-Pass@k * fix default value of eval_model_name * refactor G-Pass@k evaluator * log generation params for each backend * fix evaluation resume * add notimplementerror	2024-12-30 16:59:39 +08:00
Songyang Zhang	0d8df541bc	[Update] Update O1-style Benchmark and Prompts (#1742 ) * Update JuderBench * Support O1-style Prompts * Update Code * Update OpenAI * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update * Update * Update * Update	2024-12-09 13:48:56 +08:00
Junnan Liu	f333be177c	[Update] Add MATH500 & AIME2024 to LiveMathBench (#1741 ) * upload dataset definitions & configs * add single dataset split specific metrics * add k-pass@threshold & MATH500 * update std computation & k-pass computation * add AIME224 * update README	2024-12-06 14:36:49 +08:00
Junnan Liu	6181ac1122	[Update] Update LiveMathBench Evaluation to Support Single Dataset Split Metric Computation (#1730 ) * upload dataset definitions & configs * add single dataset split specific metrics * add k-pass@threshold & MATH500	2024-12-05 16:54:16 +08:00
Junnan Liu	fe6d76fb13	[Feature] Support LiveMathBench (#1727 )	2024-11-30 00:07:19 +08:00

11 Commits