OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Xingjun.Wang	edab1c07ba	[Feature] Support ModelScope datasets (#1289 ) * add ceval, gsm8k modelscope surpport * update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest * update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets * format file * format file * update dataset format * support ms_dataset * udpate dataset for modelscope support * merge myl_dev and update test_ms_dataset * udpate dataset for modelscope support * update readme * update eval_api_zhipu_v2 * remove unused code * add get_data_path function * update readme * remove tydiqa japanese subset * add ceval, gsm8k modelscope surpport * update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest * update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets * format file * format file * update dataset format * support ms_dataset * udpate dataset for modelscope support * merge myl_dev and update test_ms_dataset * update readme * udpate dataset for modelscope support * update eval_api_zhipu_v2 * remove unused code * add get_data_path function * remove tydiqa japanese subset * update util * remove .DS_Store * fix md format * move util into package * update docs/get_started.md * restore eval_api_zhipu_v2.py, add environment setting * Update dataset * Update * Update * Update * Update --------- Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local> Co-authored-by: Yunnglin <mao.looper@qq.com> Co-authored-by: Yun lin <yunlin@laptop.local> Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn> Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>	2024-07-29 13:48:32 +08:00
klein	65fad8e2ac	[Fix] minor update wildbench (#1335 ) * update crb * update crbbench * update crbbench * update crbbench * minor update wildbench * [Fix] Update doc of wildbench, and merge wildbench into subjective * [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench * Update crb.md * Update crb_pair_judge.py * Update crb_single_judge.py * Update subjective_evaluation.md * Update openai_api.py * [Update] update wildbench readme * [Update] update wildbench readme * [Update] update wildbench readme, remove crb * Delete configs/eval_subjective_wildbench_pair.py * Delete configs/eval_subjective_wildbench_single.py * Update __init__.py --------- Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>	2024-07-26 11:19:04 +08:00
Mo Li	69aa2f2d57	[Feature] Make NeedleBench available on HF (#1364 ) * update_lint * update_huggingface format * fix bug * update docs	2024-07-25 19:01:56 +08:00
Fengzhe Zhou	c3c02c2960	update docs (#1318 ) * update docs * 高效评测 -> 数据分片 * update * update * Update faq.md --------- Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>	2024-07-25 18:44:25 +08:00
Linchen Xiao	a56678190b	[Feature] CompassBench v1_3 subjective evaluation (#1341 ) * stash files * compassbench subjective evaluation added * evaluation update * remove unneeded content * fix lint * update docs * Update lint * Update --------- Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>	2024-07-19 23:12:23 +08:00
Mo Li	104bddf647	[Doc] Update NeedleBench Docs (#1330 ) * update needlebench docs * update model_name_mapping dict * update README * Update README_zh-CN.md --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-07-18 13:16:19 +08:00
bittersweet1999	3aeabbc427	[Fix] update Faq (#1313 ) * fix pip version * fix pip version * update faq * update faq * update faq --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-07-12 11:29:26 +08:00
Fengzhe Zhou	1d3a26c732	[Doc] quick start swap tabs (#1263 ) * [doc] quick start swap tabs * update docs * update * update * update * update * update * update * update	2024-07-05 23:51:42 +08:00
bittersweet1999	68ca48496b	[Refactor] Reorganize subjective eval (#1284 ) * fix pip version * fix pip version * reorganize subjective eval * reorg sub * reorg subeval * reorg subeval * update subjective doc * reorg subeval * reorg subeval	2024-07-05 22:11:37 +08:00
Fengzhe Zhou	a32f21a356	[Sync] Sync with internal codes 2024.06.28 (#1279 )	2024-06-28 14:16:34 +08:00
liushz	e5ee1647fb	Add doc for accelerator function (#1252 ) * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Update acclerator * Update MathBench * Update accelerator * Add Doc for accelerator * Add Doc for accelerator * Add Doc for accelerator * Add Doc for accelerator --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-06-24 14:53:51 +08:00
Fengzhe Zhou	d656e818f8	[Docs] Remove --no-batch-padding and Use --hf-num-gpus (#1205 ) * [Docs] Remove --no-batch-padding and Use -hf-num-gpus * update	2024-05-29 16:30:10 +08:00
Fengzhe Zhou	7505b3cadf	[Feature] Add huggingface apply_chat_template (#1098 ) * add TheoremQA with 5-shot * add huggingface_above_v4_33 classes * use num_worker partitioner in cli * update theoremqa * update TheoremQA * add TheoremQA * rename theoremqa -> TheoremQA * update TheoremQA output path * rewrite many model configs * update huggingface * further update * refine configs * update configs * update configs * add configs/eval_llama3_instruct.py * add summarizer multi faceted * update bbh datasets * update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py * rename class * update readme * update hf above v4.33	2024-05-14 14:50:16 +08:00
Mo Li	cb080fa7de	[Fix] Fix NeedleBench Summarizer Typo (#1125 ) * update needleinahaystack eval docs * update needlebench summarizer * fix english docs typo	2024-05-08 20:00:15 +08:00
Songyang Zhang	063f5f5f49	[Update] Update performance of common benchmarks (#1109 ) * [Update] Update performance of common benchmarks * [Update] Update performance of common benchmarks * [Update] Update performance of common benchmarks	2024-04-30 00:09:08 +08:00
liushz	a6f67e1a65	[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README (#1103 ) * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-04-28 21:58:58 +08:00
Mo Li	76dd814c4d	[Doc] Update NeedleInAHaystack Docs (#1102 ) * update NeedleInAHaystack Test Docs * update docs	2024-04-28 18:51:47 +08:00
Haodong Duan	3a232db471	[Deperecate] Remove multi-modal related stuff (#1072 ) * Remove MultiModal * update index.rst * update README * remove mmbench codes * update news --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 21:20:14 +08:00
bittersweet1999	e404b72c52	[Feature] support arenahard evaluation (#1096 ) * support arenahard * support arenahard * support arenahard	2024-04-26 15:42:00 +08:00
Fengzhe Zhou	a256753221	[Feature] Add LLaMA-3 Series Configs (#1065 ) * add LLaMA-3 Series configs * update readme	2024-04-22 14:39:31 +08:00
Fengzhe Zhou	8c85edd1cd	[Sync] deprecate old mbpps (#1064 )	2024-04-19 20:49:46 +08:00
Y0oMu	c220550fb9	updates docs (#1015 ) Co-authored-by: youmuspc <yejiayi2004@outlook.com>	2024-04-02 10:30:04 +08:00
bittersweet1999	02e7eec911	[Feature] Support AlpacaEval_V2 (#1006 ) * support alpacaeval_v2 * support alpacaeval * update docs * update docs	2024-03-28 16:49:04 +08:00
seanzhang-zhichen	7baa711fc7	[Fix] Fix doc problem (#975 ) Co-authored-by: zhangzc <2608882093@qq.com>	2024-03-15 13:44:46 +08:00
Fengzhe Zhou	2a741477fe	update links and checkers (#890 )	2024-03-13 11:01:35 +08:00
Songyang Zhang	47cb75a3f7	[Docs] Update README (#956 ) * [Docs] Update README * Update README.md * [Docs] Update README	2024-03-12 11:40:34 +08:00
bittersweet1999	848e7c8a76	[fix] add different temp for different question in mtbench (#954 ) * add temp for mtbench * add document for mtbench * add document for mtbench	2024-03-11 17:24:39 +08:00
Songyang Zhang	7c1a819bb4	[Fix] Chinese version of ReadTheDoc (#947 ) * [Fix] Chinese version of ReadTheDoc * rename --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-03-08 18:10:05 +08:00
Yang Yong	107e022cf4	Support prompt template for LightllmApi. Update LightllmApi token bucket. (#945 )	2024-03-06 15:33:53 +08:00
Fengzhe Zhou	ba7cd58da3	[Update] Rename dataset pack (#922 )	2024-02-28 10:54:04 +08:00
RunningLeon	4c87e777d8	[Feature] Add end_str for turbomind (#859 ) * fix * update * fix internlm1 * fix docs * remove sys	2024-02-01 22:31:14 +08:00
Fengzhe Zhou	f367551668	update doc (#830 )	2024-01-24 13:39:28 +08:00
Yang Yong	f09a2ff418	Add LightllmApi KeyError log & Update doc (#816 ) * Add LightllmApi KeyError log * Update LightllmApi doc	2024-01-18 22:23:38 +08:00
RunningLeon	61fe873c89	[Fix] Fix turbomind and update docs (#808 ) * update * update docs * add engine_config and gen_config in eval_config * update * fix * fix * fix * fix docstr * fix url	2024-01-18 14:41:35 +08:00
Fengzhe Zhou	9e5746d3d8	[Doc] Update News (#810 )	2024-01-17 18:22:12 +08:00
Mo Li	acae560911	Added support for multi-needle testing in needle-in-a-haystack test (#802 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test * update plot function in tools_needleinahaystack.py * optimizing needleinahaystack dataset generation strategy * modify minor formatting issues * add English version support * change NeedleInAHaystackDataset to dynamic loading * change NeedleInAHaystackDataset to dynamic loading * fix needleinahaystack test eval bug * fix needleinahaystack config bug * Added support for multi-needle testing in needle-in-a-haystack test * Optimize the code for plotting in the needle-in-a-haystack test. * Correct the typo in the dataset parameters. * update needleinahaystack test docs --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-01-17 13:47:34 +08:00
RunningLeon	0836aec67b	[Feature] Update evaluate turbomind (#804 ) * update * fix * fix * fix	2024-01-17 11:09:50 +08:00
Fengzhe Zhou	f78fcf6eeb	[Docs] Update contamination docs (#775 )	2024-01-08 16:37:28 +08:00
tpoisonooo	ba1b684fec	typo(installation.md): fix unzip commands (#774 ) * Update installation.md * Update installation.md	2024-01-08 14:23:35 +08:00
Songyang Zhang	0c75f0f95a	[Update] Update introduction of CompassBench-2024-Q1 (#769 ) * [Doc] Update Example of CompassBench * [Doc] Update Example of CompassBench * [Doc] Update Example of CompassBench * update * Update docs/zh_cn/advanced_guides/compassbench_intro.md Co-authored-by: Fengzhe Zhou <zfz-960727@163.com> --------- Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>	2024-01-05 20:39:36 +08:00
Fengzhe Zhou	3a68083ecc	[Sync] update configs (#734 )	2023-12-25 21:59:16 +08:00
Mo Li	0e24f4213e	[Feature] Add NeedleInAHaystack Test Support (#714 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2023-12-23 12:00:51 +08:00
RunningLeon	e34c552282	[Feature] Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend (#721 ) * add llama2 test * fix * test qwen chat-7b * test w4 * add baichuan2 * update * update * update configs and docs * update	2023-12-21 18:22:17 +08:00
Hubert	fdf18a3238	[Docs] Update Docker docs (#718 ) * [Docs] update docker docs * [Docs] update docker docs	2023-12-19 23:29:43 +08:00
bittersweet1999	97c2068bd9	[Feature] Add JudgeLLMs (#710 ) * add judgellms * add judgellms * add sub_size_partition * add docs * add ref	2023-12-19 18:40:25 +08:00
Songyang Zhang	637628a70f	[Doc] Update Doc for Alignbench (#707 ) * update alignmentbench * update alignmentbench * update doc * update * update	2023-12-15 15:07:25 +08:00
Fengzhe Zhou	cadab9474f	[Doc] Update contamination docs (#698 ) * update contamination docs * add citation * Update contamination_eval.md * Update contamination_eval.md --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2023-12-13 18:03:39 +08:00
bittersweet1999	6130394165	[Feature] Add double order of subjective evaluation and removing duplicated response among two models (#692 ) * add features * add doc string * add doc string	2023-12-12 20:58:17 +08:00
bittersweet1999	465308e430	[Feature] Add Subjective Evaluation (#680 ) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint	2023-12-11 22:22:11 +08:00
liyucheng09	05bbce8b08	[Feature] Add Data Contamination Analysis (#639 ) * add contamination analysis to ceval * fix bugs * add contamination docs * to pass CI check * update --------- Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-08 10:00:11 +08:00

1 2 3

133 Commits