OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Fengzhe Zhou	a32f21a356	[Sync] Sync with internal codes 2024.06.28 (#1279 )	2024-06-28 14:16:34 +08:00
liushz	e5ee1647fb	Add doc for accelerator function (#1252 ) * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Update acclerator * Update MathBench * Update accelerator * Add Doc for accelerator * Add Doc for accelerator * Add Doc for accelerator * Add Doc for accelerator --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-06-24 14:53:51 +08:00
Fengzhe Zhou	d656e818f8	[Docs] Remove --no-batch-padding and Use --hf-num-gpus (#1205 ) * [Docs] Remove --no-batch-padding and Use -hf-num-gpus * update	2024-05-29 16:30:10 +08:00
Fengzhe Zhou	7505b3cadf	[Feature] Add huggingface apply_chat_template (#1098 ) * add TheoremQA with 5-shot * add huggingface_above_v4_33 classes * use num_worker partitioner in cli * update theoremqa * update TheoremQA * add TheoremQA * rename theoremqa -> TheoremQA * update TheoremQA output path * rewrite many model configs * update huggingface * further update * refine configs * update configs * update configs * add configs/eval_llama3_instruct.py * add summarizer multi faceted * update bbh datasets * update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py * rename class * update readme * update hf above v4.33	2024-05-14 14:50:16 +08:00
Mo Li	cb080fa7de	[Fix] Fix NeedleBench Summarizer Typo (#1125 ) * update needleinahaystack eval docs * update needlebench summarizer * fix english docs typo	2024-05-08 20:00:15 +08:00
Songyang Zhang	063f5f5f49	[Update] Update performance of common benchmarks (#1109 ) * [Update] Update performance of common benchmarks * [Update] Update performance of common benchmarks * [Update] Update performance of common benchmarks	2024-04-30 00:09:08 +08:00
liushz	a6f67e1a65	[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README (#1103 ) * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-04-28 21:58:58 +08:00
Mo Li	76dd814c4d	[Doc] Update NeedleInAHaystack Docs (#1102 ) * update NeedleInAHaystack Test Docs * update docs	2024-04-28 18:51:47 +08:00
Haodong Duan	3a232db471	[Deperecate] Remove multi-modal related stuff (#1072 ) * Remove MultiModal * update index.rst * update README * remove mmbench codes * update news --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 21:20:14 +08:00
bittersweet1999	e404b72c52	[Feature] support arenahard evaluation (#1096 ) * support arenahard * support arenahard * support arenahard	2024-04-26 15:42:00 +08:00
Fengzhe Zhou	a256753221	[Feature] Add LLaMA-3 Series Configs (#1065 ) * add LLaMA-3 Series configs * update readme	2024-04-22 14:39:31 +08:00
Fengzhe Zhou	8c85edd1cd	[Sync] deprecate old mbpps (#1064 )	2024-04-19 20:49:46 +08:00
Y0oMu	c220550fb9	updates docs (#1015 ) Co-authored-by: youmuspc <yejiayi2004@outlook.com>	2024-04-02 10:30:04 +08:00
bittersweet1999	02e7eec911	[Feature] Support AlpacaEval_V2 (#1006 ) * support alpacaeval_v2 * support alpacaeval * update docs * update docs	2024-03-28 16:49:04 +08:00
seanzhang-zhichen	7baa711fc7	[Fix] Fix doc problem (#975 ) Co-authored-by: zhangzc <2608882093@qq.com>	2024-03-15 13:44:46 +08:00
Fengzhe Zhou	2a741477fe	update links and checkers (#890 )	2024-03-13 11:01:35 +08:00
Songyang Zhang	47cb75a3f7	[Docs] Update README (#956 ) * [Docs] Update README * Update README.md * [Docs] Update README	2024-03-12 11:40:34 +08:00
bittersweet1999	848e7c8a76	[fix] add different temp for different question in mtbench (#954 ) * add temp for mtbench * add document for mtbench * add document for mtbench	2024-03-11 17:24:39 +08:00
Songyang Zhang	7c1a819bb4	[Fix] Chinese version of ReadTheDoc (#947 ) * [Fix] Chinese version of ReadTheDoc * rename --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-03-08 18:10:05 +08:00
Yang Yong	107e022cf4	Support prompt template for LightllmApi. Update LightllmApi token bucket. (#945 )	2024-03-06 15:33:53 +08:00
Fengzhe Zhou	ba7cd58da3	[Update] Rename dataset pack (#922 )	2024-02-28 10:54:04 +08:00
RunningLeon	4c87e777d8	[Feature] Add end_str for turbomind (#859 ) * fix * update * fix internlm1 * fix docs * remove sys	2024-02-01 22:31:14 +08:00
Fengzhe Zhou	f367551668	update doc (#830 )	2024-01-24 13:39:28 +08:00
Yang Yong	f09a2ff418	Add LightllmApi KeyError log & Update doc (#816 ) * Add LightllmApi KeyError log * Update LightllmApi doc	2024-01-18 22:23:38 +08:00
RunningLeon	61fe873c89	[Fix] Fix turbomind and update docs (#808 ) * update * update docs * add engine_config and gen_config in eval_config * update * fix * fix * fix * fix docstr * fix url	2024-01-18 14:41:35 +08:00
Fengzhe Zhou	9e5746d3d8	[Doc] Update News (#810 )	2024-01-17 18:22:12 +08:00
Mo Li	acae560911	Added support for multi-needle testing in needle-in-a-haystack test (#802 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test * update plot function in tools_needleinahaystack.py * optimizing needleinahaystack dataset generation strategy * modify minor formatting issues * add English version support * change NeedleInAHaystackDataset to dynamic loading * change NeedleInAHaystackDataset to dynamic loading * fix needleinahaystack test eval bug * fix needleinahaystack config bug * Added support for multi-needle testing in needle-in-a-haystack test * Optimize the code for plotting in the needle-in-a-haystack test. * Correct the typo in the dataset parameters. * update needleinahaystack test docs --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-01-17 13:47:34 +08:00
RunningLeon	0836aec67b	[Feature] Update evaluate turbomind (#804 ) * update * fix * fix * fix	2024-01-17 11:09:50 +08:00
Fengzhe Zhou	f78fcf6eeb	[Docs] Update contamination docs (#775 )	2024-01-08 16:37:28 +08:00
tpoisonooo	ba1b684fec	typo(installation.md): fix unzip commands (#774 ) * Update installation.md * Update installation.md	2024-01-08 14:23:35 +08:00
Songyang Zhang	0c75f0f95a	[Update] Update introduction of CompassBench-2024-Q1 (#769 ) * [Doc] Update Example of CompassBench * [Doc] Update Example of CompassBench * [Doc] Update Example of CompassBench * update * Update docs/zh_cn/advanced_guides/compassbench_intro.md Co-authored-by: Fengzhe Zhou <zfz-960727@163.com> --------- Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>	2024-01-05 20:39:36 +08:00
Fengzhe Zhou	3a68083ecc	[Sync] update configs (#734 )	2023-12-25 21:59:16 +08:00
Mo Li	0e24f4213e	[Feature] Add NeedleInAHaystack Test Support (#714 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2023-12-23 12:00:51 +08:00
RunningLeon	e34c552282	[Feature] Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend (#721 ) * add llama2 test * fix * test qwen chat-7b * test w4 * add baichuan2 * update * update * update configs and docs * update	2023-12-21 18:22:17 +08:00
Hubert	fdf18a3238	[Docs] Update Docker docs (#718 ) * [Docs] update docker docs * [Docs] update docker docs	2023-12-19 23:29:43 +08:00
bittersweet1999	97c2068bd9	[Feature] Add JudgeLLMs (#710 ) * add judgellms * add judgellms * add sub_size_partition * add docs * add ref	2023-12-19 18:40:25 +08:00
Songyang Zhang	637628a70f	[Doc] Update Doc for Alignbench (#707 ) * update alignmentbench * update alignmentbench * update doc * update * update	2023-12-15 15:07:25 +08:00
Fengzhe Zhou	cadab9474f	[Doc] Update contamination docs (#698 ) * update contamination docs * add citation * Update contamination_eval.md * Update contamination_eval.md --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2023-12-13 18:03:39 +08:00
bittersweet1999	6130394165	[Feature] Add double order of subjective evaluation and removing duplicated response among two models (#692 ) * add features * add doc string * add doc string	2023-12-12 20:58:17 +08:00
bittersweet1999	465308e430	[Feature] Add Subjective Evaluation (#680 ) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint	2023-12-11 22:22:11 +08:00
liyucheng09	05bbce8b08	[Feature] Add Data Contamination Analysis (#639 ) * add contamination analysis to ceval * fix bugs * add contamination docs * to pass CI check * update --------- Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-08 10:00:11 +08:00
Fengzhe Zhou	79f6449d85	[Doc] Update FAQ (#628 ) * update faq * Update docs/zh_cn/get_started/faq.md * Update docs/en/get_started/faq.md * Update docs/zh_cn/get_started/faq.md --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2023-11-23 18:19:17 +08:00
Fengzhe Zhou	d949e3c003	[Feature] Add circular eval (#610 ) * refactor default, add circular summarizer * add circular * update impl * update doc * minor update * no more to be added	2023-11-23 16:45:47 +08:00
Songyang Zhang	5329724b65	[Doc] Update README and requirements. (#622 ) * update readme * update doc	2023-11-22 19:16:54 +08:00
Hubert	8c1483e3ce	[Docs] update ds1000 code eval docs (#618 )	2023-11-22 13:37:53 +08:00
Lyu Han	eb56fd6d16	Integrate turbomind python api (#484 ) * integrate turbomind python api * update * update user guide * update * fix according to reviewer's comments * fix error * fix linting * update user guide * remove debug log --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2023-11-21 22:34:46 +08:00
Yang Yong	d3b0d5c4ce	[Feature] Support Lightllm API (#613 ) * [Feature] Support Lightllm api * formatting & renaming --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-11-21 19:18:40 +08:00
Hubert	91fba2c2e9	[Feat] support humaneval and mbpp pass@k (#598 ) * [Feat] support pass@ k * [Feat] support pass@k * [Feat] support pass@k * [Feat] support pass@k * [Feat] support pass@k * [Feat] support pass@k docs * update naming --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-11-16 21:22:06 +08:00
Wei Jueqi	14e6fe6f13	Fix bugs in subjective evaluation (#589 ) * rename * fix sub bugs and update docs * update * update	2023-11-14 16:11:55 +08:00
Songyang Zhang	01a0f2f3c7	[Doc] Update README (#582 )	2023-11-13 20:39:43 +08:00

1 2 3

124 Commits