OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Songyang Zhang	aa2b89b6f8	[Update] Add CascadeEvaluator with Data Replica (#2022 ) * Update CascadeEvaluator * Update CascadeEvaluator * Update CascadeEvaluator * Update Config * Update * Update * Update * Update * Update * Update * Update * Update * Update * Update * Update * Update * Update * Update * Update	2025-05-20 16:46:55 +08:00
Linchen Xiao	a05f9da134	[Feature] Make dump-eval-details default behavior (#1999 ) * Update * update * update	2025-04-08 14:42:26 +08:00
Myhs_phz	fd82bea747	[Fix] OpenICL Math Evaluator Config (#2007 ) * fix * fix recommended * fix * fix * fix * fix	2025-04-08 14:38:35 +08:00
Linchen Xiao	bb58cfc85d	[Feature] Add CascadeEvaluator (#1992 ) * [Feature] Add CascadeEvaluator * update * updat	2025-04-08 11:58:14 +08:00
Myhs_phz	3a9a384173	[Doc] Fix links between zh & en (#2001 ) * test * test * test	2025-04-03 17:37:53 +08:00
Myhs_phz	f71eb78c72	[Doc] Add TBD Token in Datasets Statistics (#1986 ) * feat * doc * doc * doc * doc	2025-03-31 19:08:55 +08:00
Myhs_phz	6118596362	[Feature] Add recommendation configs for datasets (#1937 ) * feat datasetrefine drop * fix datasets in fullbench_int3 * fix * fix * back * fix * fix and doc * feat * fix hook * fix * fix * fix * fix * fix * fix * fix * fix * fix * doc * fix * fix * Update dataset-index.yml	2025-03-25 14:54:13 +08:00
Songyang Zhang	c98599271b	[Update] Update OlympiadBench and Update LLM Judge (#1954 )	2025-03-18 20:15:20 +08:00
Shudong Liu	277d7946f5	[Fix] Fix typo in deepseed_r1.md (#1916 )	2025-03-05 19:37:22 +08:00
Myhs_phz	54324657f0	[Docs] Results persistance (#1908 ) * feat persistance.md * doc * doc * lint * doc * fix * doc	2025-03-05 18:23:54 +08:00
Songyang Zhang	c84bc18ac1	[Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard (#1899 ) * [Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard with LLM Verify * Update * Update * Update DeepSeek-R1 example * Update DeepSeek-R1 example * Update DeepSeek-R1 example	2025-03-03 18:56:11 +08:00
Junnan Liu	73c80953c6	[Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator (#1886 ) * support dataset repeat and g-pass compute for each evaluator * fix pre-commit errors * delete print * delete gpassk_evaluator and fix potential errors * change `repeat` to `n` * fix `repeat` to `n` in openicl_eval * update doc for multi-run and g-pass * update latex equation in doc * update eng doc for multi-run and g-pass * update datasets.md * update datasets.md * fix multi-line equation * fix multi-line equation * fix multi-line equation * fix multi-line equation * fix multi-line equation * fix multi-line equation * fix multi-line equation in zh_cn user_guides * mmodify pre-commit-zh-cn * recover pre-commit and edit math expr in doc * del [TIP] * del cite tag in doc * del extract_model param in livemathbench config	2025-02-26 19:43:12 +08:00
Linchen Xiao	bdb2d46f59	[Feature] Add general math, llm judge evaluator (#1892 ) * update_doc * update llm_judge * update README * update md file name	2025-02-26 15:08:50 +08:00
Myhs_phz	68a9838907	[Feature] Add list of supported datasets at html page (#1850 ) * feat dataset-index.yml and stat.py * fix * fix * fix * feat url of paper and config file * doc all supported dataset list * docs zh and en * docs README zh and en * docs new_dataset * docs new_dataset	2025-02-14 16:17:30 +08:00
Linchen Xiao	a6193b4c02	[Refactor] Code refactoarization (#1831 ) * Update * fix lint * update * fix lint	2025-01-20 19:17:38 +08:00
Jishnu Nair	ffdc917523	[Doc] Installation.md update (#1830 )	2025-01-17 11:08:09 +08:00
Myhs_phz	70da9b7776	[Update] Update method to add dataset in docs (#1827 ) * create new branch * docs new_dataset.md zh * docs new_dataset.md zh and en	2025-01-17 11:07:19 +08:00
Songyang Zhang	d611907d14	[Doc] Update Doc (#1655 )	2024-10-31 18:08:09 +08:00
bittersweet1999	f0d436496e	[Update] update docs and add compassarena (#1614 ) * fix pip version * fix pip version * update docs and add compassarena * update docs	2024-10-17 14:39:06 +08:00
Lyu Han	4fde41036f	[Feature] Update TurboMindModel by integrating lmdeploy pipeline API (#1556 ) * integrate lmdeploy's pipeline api * fix linting * update user guide * rename * update * update * update * rollback class name * update * remove unused code * update * update * use pipeline * fix ci check * compatibility * compatibility * remove concurrency * update * fix table content * update	2024-10-14 15:33:40 +08:00
Lyu Han	b52ba65c26	[Feature] Integrate lmdeploy pipeline api (#1198 ) * integrate lmdeploy's pipeline api * fix linting * update user guide * rename * update * update * update * rollback class name * update * remove unused code * update * update * fix ci check * compatibility * remove concurrency * Update configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py * Update docs/zh_cn/advanced_guides/evaluation_lmdeploy.md * [Bug] fix lint --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-10-09 22:58:06 +08:00
Songyang Zhang	cfbd308edf	[Doc] Update README (#1528 ) * ' * Update	2024-09-14 16:02:17 +08:00
zhulinJulia24	fb6a0df652	[ci] fix test env for vllm and add vllm baselines (#1481 ) * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update --------- Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>	2024-09-04 19:24:09 +08:00
Linchen Xiao	0fe9756c5d	[Doc] Update Readme (#1439 ) * update * update * update * update * update * update * update * update * update * update * update * update	2024-08-22 14:48:45 +08:00
liushz	e076dc5acf	[Fix] Fix openai api tiktoken bug for api server (#1433 ) * Fix openai api tiktoken * Fix openai api tiktoken --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-08-20 22:02:14 +08:00
Xingjun.Wang	edab1c07ba	[Feature] Support ModelScope datasets (#1289 ) * add ceval, gsm8k modelscope surpport * update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest * update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets * format file * format file * update dataset format * support ms_dataset * udpate dataset for modelscope support * merge myl_dev and update test_ms_dataset * udpate dataset for modelscope support * update readme * update eval_api_zhipu_v2 * remove unused code * add get_data_path function * update readme * remove tydiqa japanese subset * add ceval, gsm8k modelscope surpport * update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest * update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets * format file * format file * update dataset format * support ms_dataset * udpate dataset for modelscope support * merge myl_dev and update test_ms_dataset * update readme * udpate dataset for modelscope support * update eval_api_zhipu_v2 * remove unused code * add get_data_path function * remove tydiqa japanese subset * update util * remove .DS_Store * fix md format * move util into package * update docs/get_started.md * restore eval_api_zhipu_v2.py, add environment setting * Update dataset * Update * Update * Update * Update --------- Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local> Co-authored-by: Yunnglin <mao.looper@qq.com> Co-authored-by: Yun lin <yunlin@laptop.local> Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn> Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>	2024-07-29 13:48:32 +08:00
klein	65fad8e2ac	[Fix] minor update wildbench (#1335 ) * update crb * update crbbench * update crbbench * update crbbench * minor update wildbench * [Fix] Update doc of wildbench, and merge wildbench into subjective * [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench * Update crb.md * Update crb_pair_judge.py * Update crb_single_judge.py * Update subjective_evaluation.md * Update openai_api.py * [Update] update wildbench readme * [Update] update wildbench readme * [Update] update wildbench readme, remove crb * Delete configs/eval_subjective_wildbench_pair.py * Delete configs/eval_subjective_wildbench_single.py * Update __init__.py --------- Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>	2024-07-26 11:19:04 +08:00
Mo Li	69aa2f2d57	[Feature] Make NeedleBench available on HF (#1364 ) * update_lint * update_huggingface format * fix bug * update docs	2024-07-25 19:01:56 +08:00
Fengzhe Zhou	c3c02c2960	update docs (#1318 ) * update docs * 高效评测 -> 数据分片 * update * update * Update faq.md --------- Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>	2024-07-25 18:44:25 +08:00
Linchen Xiao	a56678190b	[Feature] CompassBench v1_3 subjective evaluation (#1341 ) * stash files * compassbench subjective evaluation added * evaluation update * remove unneeded content * fix lint * update docs * Update lint * Update --------- Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>	2024-07-19 23:12:23 +08:00
Mo Li	104bddf647	[Doc] Update NeedleBench Docs (#1330 ) * update needlebench docs * update model_name_mapping dict * update README * Update README_zh-CN.md --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-07-18 13:16:19 +08:00
bittersweet1999	3aeabbc427	[Fix] update Faq (#1313 ) * fix pip version * fix pip version * update faq * update faq * update faq --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-07-12 11:29:26 +08:00
Fengzhe Zhou	1d3a26c732	[Doc] quick start swap tabs (#1263 ) * [doc] quick start swap tabs * update docs * update * update * update * update * update * update * update	2024-07-05 23:51:42 +08:00
bittersweet1999	68ca48496b	[Refactor] Reorganize subjective eval (#1284 ) * fix pip version * fix pip version * reorganize subjective eval * reorg sub * reorg subeval * reorg subeval * update subjective doc * reorg subeval * reorg subeval	2024-07-05 22:11:37 +08:00
Fengzhe Zhou	a32f21a356	[Sync] Sync with internal codes 2024.06.28 (#1279 )	2024-06-28 14:16:34 +08:00
liushz	e5ee1647fb	Add doc for accelerator function (#1252 ) * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Update acclerator * Update MathBench * Update accelerator * Add Doc for accelerator * Add Doc for accelerator * Add Doc for accelerator * Add Doc for accelerator --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-06-24 14:53:51 +08:00
Fengzhe Zhou	d656e818f8	[Docs] Remove --no-batch-padding and Use --hf-num-gpus (#1205 ) * [Docs] Remove --no-batch-padding and Use -hf-num-gpus * update	2024-05-29 16:30:10 +08:00
Fengzhe Zhou	7505b3cadf	[Feature] Add huggingface apply_chat_template (#1098 ) * add TheoremQA with 5-shot * add huggingface_above_v4_33 classes * use num_worker partitioner in cli * update theoremqa * update TheoremQA * add TheoremQA * rename theoremqa -> TheoremQA * update TheoremQA output path * rewrite many model configs * update huggingface * further update * refine configs * update configs * update configs * add configs/eval_llama3_instruct.py * add summarizer multi faceted * update bbh datasets * update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py * rename class * update readme * update hf above v4.33	2024-05-14 14:50:16 +08:00
Mo Li	cb080fa7de	[Fix] Fix NeedleBench Summarizer Typo (#1125 ) * update needleinahaystack eval docs * update needlebench summarizer * fix english docs typo	2024-05-08 20:00:15 +08:00
Songyang Zhang	063f5f5f49	[Update] Update performance of common benchmarks (#1109 ) * [Update] Update performance of common benchmarks * [Update] Update performance of common benchmarks * [Update] Update performance of common benchmarks	2024-04-30 00:09:08 +08:00
liushz	a6f67e1a65	[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README (#1103 ) * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-04-28 21:58:58 +08:00
Mo Li	76dd814c4d	[Doc] Update NeedleInAHaystack Docs (#1102 ) * update NeedleInAHaystack Test Docs * update docs	2024-04-28 18:51:47 +08:00
Haodong Duan	3a232db471	[Deperecate] Remove multi-modal related stuff (#1072 ) * Remove MultiModal * update index.rst * update README * remove mmbench codes * update news --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 21:20:14 +08:00
bittersweet1999	e404b72c52	[Feature] support arenahard evaluation (#1096 ) * support arenahard * support arenahard * support arenahard	2024-04-26 15:42:00 +08:00
Fengzhe Zhou	a256753221	[Feature] Add LLaMA-3 Series Configs (#1065 ) * add LLaMA-3 Series configs * update readme	2024-04-22 14:39:31 +08:00
Fengzhe Zhou	8c85edd1cd	[Sync] deprecate old mbpps (#1064 )	2024-04-19 20:49:46 +08:00
Y0oMu	c220550fb9	updates docs (#1015 ) Co-authored-by: youmuspc <yejiayi2004@outlook.com>	2024-04-02 10:30:04 +08:00
bittersweet1999	02e7eec911	[Feature] Support AlpacaEval_V2 (#1006 ) * support alpacaeval_v2 * support alpacaeval * update docs * update docs	2024-03-28 16:49:04 +08:00
seanzhang-zhichen	7baa711fc7	[Fix] Fix doc problem (#975 ) Co-authored-by: zhangzc <2608882093@qq.com>	2024-03-15 13:44:46 +08:00
Fengzhe Zhou	2a741477fe	update links and checkers (#890 )	2024-03-13 11:01:35 +08:00

1 2 3 4

158 Commits