OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Junnan Liu	8e8d4f1c64	[Feature] Support G-Pass@k and LiveMathBench (#1772 ) * support G-Pass@k and livemathbench * fix bugs * fix comments of GPassKEvaluator * update saved details of GPassKEvaluator * update saved details of GPassKEvaluator * fix eval api configs & update openai_api for ease of debugging * update huggingface path * fix method name of G-Pass@k * fix default value of eval_model_name * refactor G-Pass@k evaluator * log generation params for each backend * fix evaluation resume * add notimplementerror	2024-12-30 16:59:39 +08:00
Songyang Zhang	0d8df541bc	[Update] Update O1-style Benchmark and Prompts (#1742 ) * Update JuderBench * Support O1-style Prompts * Update Code * Update OpenAI * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update * Update * Update * Update	2024-12-09 13:48:56 +08:00
Junnan Liu	f333be177c	[Update] Add MATH500 & AIME2024 to LiveMathBench (#1741 ) * upload dataset definitions & configs * add single dataset split specific metrics * add k-pass@threshold & MATH500 * update std computation & k-pass computation * add AIME224 * update README	2024-12-06 14:36:49 +08:00
Songyang Zhang	fb43dd1906	[Update] Update Skywork/Qwen-QwQ (#1728 ) * Update JuderBench * Support O1-style Prompts * Update Code * Update OpenAI * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update	2024-12-05 19:30:43 +08:00
Junnan Liu	6181ac1122	[Update] Update LiveMathBench Evaluation to Support Single Dataset Split Metric Computation (#1730 ) * upload dataset definitions & configs * add single dataset split specific metrics * add k-pass@threshold & MATH500	2024-12-05 16:54:16 +08:00
Junnan Liu	fe6d76fb13	[Feature] Support LiveMathBench (#1727 )	2024-11-30 00:07:19 +08:00

6 Commits