OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
jnanliu	a4c42b3cb3	Merge branch 'main' of https://github.com/open-compass/opencompass into general-gpass	2025-03-03 02:41:14 +00:00
Junnan Liu	73c80953c6	[Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator (#1886 ) * support dataset repeat and g-pass compute for each evaluator * fix pre-commit errors * delete print * delete gpassk_evaluator and fix potential errors * change `repeat` to `n` * fix `repeat` to `n` in openicl_eval * update doc for multi-run and g-pass * update latex equation in doc * update eng doc for multi-run and g-pass * update datasets.md * update datasets.md * fix multi-line equation * fix multi-line equation * fix multi-line equation * fix multi-line equation * fix multi-line equation * fix multi-line equation * fix multi-line equation in zh_cn user_guides * mmodify pre-commit-zh-cn * recover pre-commit and edit math expr in doc * del [TIP] * del cite tag in doc * del extract_model param in livemathbench config	2025-02-26 19:43:12 +08:00
zhulinJulia24	6042b88e58	[CI] update dailytest sceduler and baseline's score(#1898 )	2025-02-26 19:04:01 +08:00
Linchen Xiao	bdb2d46f59	[Feature] Add general math, llm judge evaluator (#1892 ) * update_doc * update llm_judge * update README * update md file name	2025-02-26 15:08:50 +08:00
jnanliu	32a8d81b1d	del extract_model param in livemathbench config	2025-02-26 06:39:12 +00:00
jnanliu	66b1c6c64c	del cite tag in doc	2025-02-26 04:23:30 +00:00
jnanliu	12f46044f0	Merge branch 'general-gpass' of https://github.com/jnanliu/opencompass into general-gpass	2025-02-26 04:01:30 +00:00
jnanliu	97594676e8	del [TIP]	2025-02-26 04:01:05 +00:00
Junnan Liu	bb4d53e0cb	Merge branch 'main' into general-gpass	2025-02-26 11:56:45 +08:00
jnanliu	46cd631e13	recover pre-commit and edit math expr in doc	2025-02-26 03:53:10 +00:00
Songyang Zhang	fd6fbf01a2	[Update] Support AIME-24 Evaluation for DeepSeek-R1 series (#1888 ) * Update * Update * Update * Update	2025-02-25 20:34:41 +08:00
jnanliu	830142ecfd	mmodify pre-commit-zh-cn	2025-02-25 09:47:21 +00:00
jnanliu	76381c94ee	fix multi-line equation in zh_cn user_guides	2025-02-25 09:41:36 +00:00
jnanliu	7fc189d715	fix multi-line equation	2025-02-25 09:40:29 +00:00
jnanliu	fea7411820	fix multi-line equation	2025-02-25 09:39:34 +00:00
jnanliu	6a6ac3c7f7	fix multi-line equation	2025-02-25 09:35:25 +00:00
jnanliu	a7d15f8aa7	fix multi-line equation	2025-02-25 09:31:51 +00:00
jnanliu	516313d42e	fix multi-line equation	2025-02-25 09:30:21 +00:00
jnanliu	fed2df4c3e	fix multi-line equation	2025-02-25 09:29:16 +00:00
Junnan Liu	22a33d8759	[Update] Update LiveMathBench Hard Configs (#1826 ) * support G-Pass@k and livemathbench * fix bugs * fix comments of GPassKEvaluator * update saved details of GPassKEvaluator * update saved details of GPassKEvaluator * fix eval api configs & update openai_api for ease of debugging * update huggingface path * fix method name of G-Pass@k * fix default value of eval_model_name * refactor G-Pass@k evaluator * log generation params for each backend * fix evaluation resume * add notimplementerror * update livemathbench-hard configs * remove max_out_len from livemathbench_hard_greedy_gen_9befbf.py * remove max_out_len from livemathbench_hard_gen_9befbf.py * rename livemathbench_hard_gen_9befbf.py to livemathbench_hard_gen_353ae7.py * rename livemathbench_hard_greedy_gen_9befbf.py to livemathbench_hard_greedy_gen_353ae7.py * update livemathbench_gen_9befbf.py * remove whitespace * upload livemathbench hard configs	2025-02-25 17:24:36 +08:00
Junnan Liu	91111ce9ec	update datasets.md	2025-02-25 17:17:39 +08:00
Junnan Liu	2915d77045	update datasets.md	2025-02-25 17:17:07 +08:00
jnanliu	c1fe59d015	update eng doc for multi-run and g-pass	2025-02-25 09:15:08 +00:00
jnanliu	8ebb8a5d11	update latex equation in doc	2025-02-25 09:05:53 +00:00
jnanliu	4e07fcbfac	update doc for multi-run and g-pass	2025-02-25 08:21:21 +00:00
jnanliu	4e63ebbf0c	fix `repeat` to `n` in openicl_eval	2025-02-24 08:14:08 +00:00
jnanliu	b0330ef1c6	change `repeat` to `n`	2025-02-24 08:11:27 +00:00
Dongsheng Zhu	465e93e10e	[Update] Academic bench llm judge update (#1876 ) * BigCodeBench update * update LCBench * update LCBench 2 * update code * academicBench update * academic bench ifeval&math update * generic_llmjudge_aime_academic_postprocess delete * aime delete * postprocessors update * ifeval delete * update work_dir * linting * linting double-quote-string-fixer * r1-distill out_len update * fix lint --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>	2025-02-24 15:45:24 +08:00
jnanliu	2349fcff2c	delete gpassk_evaluator and fix potential errors	2025-02-24 06:25:17 +00:00
jnanliu	6d5a996deb	delete print	2025-02-23 03:25:58 +00:00
jnanliu	762b66d740	fix pre-commit errors	2025-02-23 03:14:13 +00:00
jnanliu	8def69369a	support dataset repeat and g-pass compute for each evaluator	2025-02-23 03:05:42 +00:00
Junnan Liu	046b6f75c6	[Update] Update Greedy Config & README of LiveMathBench (#1862 ) * support omni-math * update config * upload README * Delete opencompass/configs/datasets/omni_math/__init__.py * update greedy config & README of LiveMathBench * update intro for max_out_len * rename livemathbench greedy confi * delete greedy config --------- Co-authored-by: liushz <qq1791167085@163.com>	2025-02-20 19:47:04 +08:00
Linchen Xiao	d7daee6e25	[Update] OpenAI model update, bigcodebench update (#1879 ) * [Update] Openai model update, bigcodebench update * update	2025-02-20 19:33:25 +08:00
Linchen Xiao	27c916661d	[Feature] Math Verify with model post_processor (#1881 ) * update * [Feature] Update model post_processor * update * update * update	2025-02-20 19:32:12 +08:00
zhulinJulia24	bc22749fd8	[CI] update daily test scores (#1870 ) * update * Update daily-run-test.yml * Update dlc.py	2025-02-20 14:08:18 +08:00
bittersweet1999	f407930475	[Feature] Support subjective evaluation for reasoning model (#1868 ) * fix pip version * fix pip version * add subeval for reasoning model * add subeval for reasoning model * update configs * update config * update config * update config * update files	2025-02-20 12:19:46 +08:00
Myhs_phz	68a9838907	[Feature] Add list of supported datasets at html page (#1850 ) * feat dataset-index.yml and stat.py * fix * fix * fix * feat url of paper and config file * doc all supported dataset list * docs zh and en * docs README zh and en * docs new_dataset * docs new_dataset	2025-02-14 16:17:30 +08:00
Dongsheng Zhu	3fd8b4e0cd	[Update] Update BigCodeBench & LCBench load path (#1857 ) * BigCodeBench update * update LCBench * update LCBench 2 * update code	2025-02-08 15:15:47 +08:00
Pablo Hinojosa	9c2e6a192c	[Fix] Update broken links in README.md (#1852 )	2025-02-07 15:41:08 +08:00
zhulinJulia24	ffc04cf650	[CI] Update daily-run-test.yml (#1854 )	2025-02-07 14:40:16 +08:00
Linchen Xiao	862bf78464	[Demo] Internlm3 math500 thinking demo (#1846 ) * [Demo] Add demo for Internlm3 math500 thinking * [Demo] Add demo for Internlm3 math500 thinking * update max_out_len * update start instruction	2025-01-24 14:56:41 +08:00
Shudong Liu	412199f802	[Feature] Support OlympiadBench Benchmark (#1841 ) * Support OlympiadBench Benchmark * Support OlympiadBench Benchmark * Support OlympiadBench Benchmark * update dataset path * Update olmpiadBench * Update olmpiadBench * Update olmpiadBench --------- Co-authored-by: liushz <qq1791167085@163.com>	2025-01-24 10:00:01 +08:00
Junnan Liu	70f2c963d3	[Feature] Support Omni-Math (#1837 ) * support omni-math * update config * upload README * Delete opencompass/configs/datasets/omni_math/__init__.py --------- Co-authored-by: liushz <qq1791167085@163.com>	2025-01-23 18:36:54 +08:00
Linchen Xiao	35ec307c6b	[Bump] Bump version to 0.4.0 (#1838 )	2025-01-22 11:41:46 +08:00
Linchen Xiao	03415b2a66	[Fix] Update max_out_len logic for OpenAI model (#1839 )	2025-01-21 15:46:14 +08:00
Linchen Xiao	a6193b4c02	[Refactor] Code refactoarization (#1831 ) * Update * fix lint * update * fix lint	2025-01-20 19:17:38 +08:00
Jishnu Nair	ffdc917523	[Doc] Installation.md update (#1830 )	2025-01-17 11:08:09 +08:00
Myhs_phz	70da9b7776	[Update] Update method to add dataset in docs (#1827 ) * create new branch * docs new_dataset.md zh * docs new_dataset.md zh and en	2025-01-17 11:07:19 +08:00
Linchen Xiao	531643e771	[Feature] Add support for InternLM3 (#1829 ) * update * update * update * update	2025-01-16 14:28:27 +08:00

1 2 3 4 5 ...

884 Commits