OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Fengzhe Zhou	2b3d4150f3	[Sync] update evaluator (#1175 )	2024-05-21 14:22:46 +08:00
Fengzhe Zhou	7505b3cadf	[Feature] Add huggingface apply_chat_template (#1098 ) * add TheoremQA with 5-shot * add huggingface_above_v4_33 classes * use num_worker partitioner in cli * update theoremqa * update TheoremQA * add TheoremQA * rename theoremqa -> TheoremQA * update TheoremQA output path * rewrite many model configs * update huggingface * further update * refine configs * update configs * update configs * add configs/eval_llama3_instruct.py * add summarizer multi faceted * update bbh datasets * update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py * rename class * update readme * update hf above v4.33	2024-05-14 14:50:16 +08:00
Alexander Lam	35c94d0cde	[Feature] Adding support for LLM Compression Evaluation (#1108 ) * fixed formatting based on pre-commit tests * fixed typo in comments; reduced the number of models in the eval config * fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset * removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English	2024-04-30 10:51:01 +08:00
bittersweet1999	6ba1c4937d	[Feature] Support Math evaluation via judgemodel (#1094 ) * support openai math evaluation * support openai math evaluation * support openai math evaluation * support math llm judge * support math llm judge	2024-04-26 14:56:23 +08:00
bittersweet1999	6f98c8d9ab	[Fix] Fix MultiRound Subjective Evaluation(#1043 ) * fix multiround * fix	2024-04-22 12:06:03 +08:00
Fengzhe Zhou	b39f501563	[Sync] update taco (#1030 )	2024-04-09 17:50:23 +08:00
bittersweet1999	2d4e559763	[Feature] Add multi-model judge and fix some problems (#1016 ) * support multi-model judge and moe judge * test_moe * test_moe * test * add moe judge * support multi-judge-model	2024-04-02 11:52:06 +08:00
Fengzhe Zhou	ab6cdb2be8	[Sync] Bump version 0.2.3 (#957 )	2024-03-12 11:51:56 +08:00
bittersweet1999	848e7c8a76	[fix] add different temp for different question in mtbench (#954 ) * add temp for mtbench * add document for mtbench * add document for mtbench	2024-03-11 17:24:39 +08:00
Yang Yong	3829be87b1	Fix LightllmApi ppl test (#951 )	2024-03-08 12:04:44 +08:00
Fengzhe Zhou	9afbfa3639	[Sync] Fix TEvalEvaluator (#929 )	2024-02-28 16:05:30 +08:00
Hubert	4aa74565e2	[Feat] minor update agent related (#839 ) * [Feat] update cibench * [Feat] Support CIBench * [Feat] Support CIBench * [Feat] Support CIBench * [Feat] Support CIBench	2024-01-26 14:15:51 +08:00
bittersweet1999	2ee8e8a1a1	[Feature] add mtbench (#829 ) * add mtbench * add mtbench * Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/__init__.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/mtbench.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * fix mtbench --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-01-24 12:11:47 +08:00
Fengzhe Zhou	b4afe3e7c1	[Sync] Add InternLM2 Keyset Evaluation Demo (#807 ) Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>	2024-01-17 13:48:12 +08:00
Fengzhe Zhou	32f40a8f83	[Sync] Sync with internal codes 2023.01.08 (#777 )	2024-01-08 14:07:24 +00:00
bittersweet1999	be369c3e06	[Feature] Add multi_round dataset evaluation (#766 ) * multi_round dataset * add multi_round evaluation	2024-01-04 10:37:52 +00:00
Fengzhe Zhou	3a68083ecc	[Sync] update configs (#734 )	2023-12-25 21:59:16 +08:00
bittersweet1999	1fe152b3e8	[Feature] Support AlignmentBench infer and judge (#697 ) * alignmentbench infer and judge * alignmentbench * alignmentbench done * alignment all done * alignment all done	2023-12-13 19:59:30 +08:00
bittersweet1999	6130394165	[Feature] Add double order of subjective evaluation and removing duplicated response among two models (#692 ) * add features * add doc string * add doc string	2023-12-12 20:58:17 +08:00
bittersweet1999	465308e430	[Feature] Add Subjective Evaluation (#680 ) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint	2023-12-11 22:22:11 +08:00
Hubert	e78857ac36	[Sync] minor test (#683 )	2023-12-11 17:42:53 +08:00
liyucheng09	05bbce8b08	[Feature] Add Data Contamination Analysis (#639 ) * add contamination analysis to ceval * fix bugs * add contamination docs * to pass CI check * update --------- Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-08 10:00:11 +08:00
Ma Zerun	6aaf3b91ec	[Feature] Support chat style inferencer. (#643 ) * [Feature] Support chat style inferencer. * [Fix] use new prompt * [Fix] use new prompt --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-11-30 14:00:06 +08:00
Fengzhe Zhou	d4d1330a5a	[Sync] Fix cmnli, fix vicuna meta template, fix longbench postprocess and other minor fixes (#625 )	2023-11-23 14:05:59 +08:00
Fengzhe Zhou	fb30b7c7a2	[Fix] Fix gen inferencer (#615 )	2023-11-22 12:04:31 +08:00
Songyang Zhang	721a45c68f	[Bug] Update api with generation_kargs (#614 ) * update api * update generation_kwargs impl --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-11-22 10:02:57 +08:00
Hubert	91fba2c2e9	[Feat] support humaneval and mbpp pass@k (#598 ) * [Feat] support pass@ k * [Feat] support pass@k * [Feat] support pass@k * [Feat] support pass@k * [Feat] support pass@k * [Feat] support pass@k docs * update naming --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-11-16 21:22:06 +08:00
Hubert	fcab30f82e	[Fix] change save_every defaults to 1 (#592 )	2023-11-15 13:00:25 +08:00
Fengzhe Zhou	d3de5c41fb	[Sync] update model configs (#574 )	2023-11-13 15:15:34 +08:00
Hubert	bb2ecf416e	[Feat] Support cibench (#538 ) * [Feat] support cidataset * [Feat] support cidataset * [Feat] support cidataset * [Feat] support cidataset * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * rename cibench * rename cibench * rename cibench * rename cibench * minor fix * minor fix * minor fix	2023-11-07 19:11:44 +08:00
Songyang Zhang	239c2a346e	[Feature] Add support for MiniMax API (#548 ) * update requirement * update requirement * update with minimax * update api model * Update readme * fix error --------- Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>	2023-11-06 21:57:32 +08:00
Fengzhe Zhou	dbb20b8270	[Sync] update (#517 )	2023-10-27 20:31:22 +08:00
Hubert	b3f5d9e421	[Feat] support math/gms8k agent config (#494 ) * support math agent * support gsm8k agent * support gsm8k agent * minor fix * minor fix * minor fix * Update configs/eval_codeagent.py	2023-10-25 23:05:15 +08:00
liushz	2737249f31	[Feature] Add mathbench dataset and circular evaluator (#408 ) * add_mathbench * update mathbench * support non circular eval dataset --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn> Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-10-18 04:08:31 -05:00
Leymore	fbf5089c40	[Sync] update github token (#475 )	2023-10-13 06:50:54 -05:00
Leymore	362c33dff4	fix jieba rouge (#467 )	2023-10-12 10:25:19 +08:00
Leymore	d7ff933a73	[Fix] Use jieba rouge in lcsts (#459 ) * use jieba rouge in lcsts * use rouge_chinese	2023-10-09 10:10:33 +08:00
Tong Gao	119bfd1569	[Refactor] Move fix_id_list to Retriever (#442 ) * [Refactor] Move fix_id_list to Retriever * update * move to base * fix	2023-10-07 12:53:41 +08:00
Hubert	d9f3e88dfe	[Fix] fix clp potential error and support bs>1 (#439 ) * [Fix] fix clp potential error and support bs>1 * [Fix] fix clp potential error and support bs>1 * minor fix * minor fix	2023-09-27 16:32:57 +08:00
Tong Gao	a1ea3c094a	[Sync] Initial support of subjective evaluation (#421 ) Co-authored-by: Leymore <zfz-960727@163.com>	2023-09-22 15:42:31 +08:00
Ma Zerun	0f2c388280	Support GSM8k evaluation with tools by Lagent and LangChain (#277 ) * Support GSM8k evaluation with tools by Lagent and LangChain * Avoid to use MMEngine new feature * update document --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-09-22 15:28:22 +08:00
Tong Gao	681d3013de	[Feature] Log gold answer in prediction output (#419 ) * [Feature] Log gold answer in prediction output * support clp golden ans * minor fix --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-09-22 12:44:40 +08:00
Leymore	ae0cd8752f	[Feature] Use local accuracy from hf implements (#416 ) * use local accuracy from hf implements * add load from hf fallback	2023-09-20 16:35:22 +08:00
Hubert	a11cb45c83	[Feat] implementation for support promptbench (#239 ) * [Feat] support adv_glue dataset for adversarial robustness * reorg files * minor fix * minor fix * support prompt bench demo * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix	2023-09-15 15:06:53 +08:00
cdpath	722eb39526	fix potential oom issue (#387 )	2023-09-12 10:41:03 +08:00
Leymore	880b34e759	[Fix] Quick lint fix (#362 ) * add default value * lint fix * use None	2023-09-06 14:33:13 +08:00
Leymore	b8bf16e81c	[Fix] zero retriever add default value (#361 )	2023-09-05 10:37:42 +08:00
Leymore	8774465a8f	[Enhancement] ignore ZeroRetriever error when id_list provided (#340 )	2023-09-04 11:12:16 +08:00
Leymore	e810974068	[Fix] Fix when missing both pad and eos token (#287 ) * fix when missing both pad and eos token * update pad_token_id impl	2023-08-31 16:53:39 +08:00
liushz	02ce139bc6	[Feature] Add Tree-of-Thought method (#173 ) * Add ToT method * Update ToT * Update ToT * Update ToT * Update ToT * Update ToT * Update ToT * Update ToT * Update chain_of_thought.md * Update icl_tot_inferencer.py --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2023-08-23 12:23:05 +08:00

1 2

72 Commits