OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Alexander Lam	35c94d0cde	[Feature] Adding support for LLM Compression Evaluation (#1108 ) * fixed formatting based on pre-commit tests * fixed typo in comments; reduced the number of models in the eval config * fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset * removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English	2024-04-30 10:51:01 +08:00
liushz	a6f67e1a65	[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README (#1103 ) * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-04-28 21:58:58 +08:00
Yggdrasill7D6	58a57a4c45	[Feature] add support for Flames datasets (#1093 ) * add flames datasets * fix lint * rm quota * add judgemodel info and fix os path * support flames dataset * support flames dataset --------- Co-authored-by: bittersweet1999 <1487910649@qq.com>	2024-04-28 18:56:24 +08:00
klein	e4830a6926	Update CIBench (#1089 ) * modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4 * update cibench: dataset and evluation * cibench summarizer bug * update cibench * move extract_code import --------- Co-authored-by: zhangchuyu@pjlab.org.cn <zhangchuyu@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 18:46:02 +08:00
bittersweet1999	e404b72c52	[Feature] support arenahard evaluation (#1096 ) * support arenahard * support arenahard * support arenahard	2024-04-26 15:42:00 +08:00
bittersweet1999	6ba1c4937d	[Feature] Support Math evaluation via judgemodel (#1094 ) * support openai math evaluation * support openai math evaluation * support openai math evaluation * support math llm judge * support math llm judge	2024-04-26 14:56:23 +08:00
bittersweet1999	6f98c8d9ab	[Fix] Fix MultiRound Subjective Evaluation(#1043 ) * fix multiround * fix	2024-04-22 12:06:03 +08:00
Fengzhe Zhou	8c85edd1cd	[Sync] deprecate old mbpps (#1064 )	2024-04-19 20:49:46 +08:00
Fengzhe Zhou	b39f501563	[Sync] update taco (#1030 )	2024-04-09 17:50:23 +08:00
Mo Li	16f29b25f1	[Fix] Simplify needlebench summarizer (#1024 ) * Conflicts: configs/summarizers/needlebench.py * fix lint problems	2024-04-07 17:51:13 +08:00
bittersweet1999	2d4e559763	[Feature] Add multi-model judge and fix some problems (#1016 ) * support multi-model judge and moe judge * test_moe * test_moe * test * add moe judge * support multi-judge-model	2024-04-02 11:52:06 +08:00
bittersweet1999	848e7c8a76	[fix] add different temp for different question in mtbench (#954 ) * add temp for mtbench * add document for mtbench * add document for mtbench	2024-03-11 17:24:39 +08:00
Mo Li	8142f399a8	[Feature] Upgrade the needle-in-a-haystack experiment to Needlebench (#913 ) * add needlebench * simplify needlebench 32k, 128k, 200k for eval * update act prompt * fix bug in needlebench summarizer * add needlebench intro, fix summarizer * lint summarizer * fix linting error * move readme.md * update readme for needlebench * update docs of needlebench * simplify needlebench summarizers	2024-03-04 11:10:52 +08:00
bittersweet1999	1c8e193de8	[Fix] hotfix for mtbench (#877 ) * hotfix for mtbench * hotfix	2024-02-06 21:26:47 +08:00
Fengzhe Zhou	d34ba11106	[Sync] Merge branch 'dev' into zfz/update-keyset-demo (#876 )	2024-02-05 23:29:10 +08:00
bittersweet1999	7806cd0f64	[Feature] support alpacaeval (#809 ) * support alpacaeval_v1 * Update opencompass/summarizers/subjective/__init__.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/summarizers/subjective/alpacaeval_v1.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * fix conflict * support alpacaeval v2 * support alpacav2 --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-02-04 14:18:36 +08:00
bittersweet1999	5c6dc908cd	fix compass arena (#854 )	2024-01-30 16:34:38 +08:00
bittersweet1999	2ee8e8a1a1	[Feature] add mtbench (#829 ) * add mtbench * add mtbench * Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/__init__.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/mtbench.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * fix mtbench --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-01-24 12:11:47 +08:00
bittersweet1999	3d9bb4aed7	[Fix] fix strings (#833 ) * add compass arena * add compass_arena * add compass arena * Update opencompass/summarizers/subjective/compass_arena.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/summarizers/subjective/__init__.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/compass_arena.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/__init__.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/eval_subjective_compassarena.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/datasets/subjective/compassarena/compassarena_compare.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/eval_subjective_compassarena.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/datasets/subjective/compassarena/compassarena_compare.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * fix check position bias * fix string --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-01-23 10:57:26 +00:00
bittersweet1999	2d4da8dd02	[Feature] Add CompassArena (#828 ) * add compass arena * add compass_arena * add compass arena * Update opencompass/summarizers/subjective/compass_arena.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/summarizers/subjective/__init__.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/compass_arena.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/__init__.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/eval_subjective_compassarena.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/datasets/subjective/compassarena/compassarena_compare.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/eval_subjective_compassarena.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/datasets/subjective/compassarena/compassarena_compare.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * fix check position bias --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-01-23 15:12:46 +08:00
bittersweet1999	814b3f73bd	reorganize subject files (#801 )	2024-01-16 18:03:11 +08:00
Fengzhe Zhou	32f40a8f83	[Sync] Sync with internal codes 2023.01.08 (#777 )	2024-01-08 14:07:24 +00:00
bittersweet1999	2163f9398f	[Feature] add subject ir dataset (#755 ) * add subject ir * Add ir dataset * Add ir dataset	2024-01-05 12:00:57 +00:00
bittersweet1999	be369c3e06	[Feature] Add multi_round dataset evaluation (#766 ) * multi_round dataset * add multi_round evaluation	2024-01-04 10:37:52 +00:00
bittersweet1999	7cd65d49d8	[Fix] Fix small bug in alignbench (#764 ) * fix small bugs * fix small bugs	2024-01-03 07:44:53 +00:00
bittersweet1999	fe0b717033	add creationbench (#753 )	2023-12-29 10:03:44 +00:00
bittersweet1999	dfd9ac0fd9	[Feature] Add other judgelm prompts for Alignbench (#731 ) * add judgellm prompts * add judgelm prompts * update import info * fix situation that no abbr in config * fix situation that no abbr in config * add summarizer for other judgellm * change config name * add maxlen * add maxlen * dict assert * dict assert * fix strings * fix strings	2023-12-27 17:54:53 +08:00
Fengzhe Zhou	3a68083ecc	[Sync] update configs (#734 )	2023-12-25 21:59:16 +08:00
bittersweet1999	e985100cd1	[Fix] Fix subjective alignbench (#730 )	2023-12-23 20:06:53 +08:00
bittersweet1999	fbb912ddf3	[Feature] Add abbr for judgemodel in subjective evaluation (#724 ) * add_judgemodel_abbr * add judgemodel abbr	2023-12-21 15:58:20 +08:00
Songyang Zhang	bfe4aa2af5	[Fix] Update alignmentbench (#704 ) * update alignmentbench * update alignmentbench * update alignmentbench	2023-12-14 18:24:21 +08:00
bittersweet1999	1fe152b3e8	[Feature] Support AlignmentBench infer and judge (#697 ) * alignmentbench infer and judge * alignmentbench * alignmentbench done * alignment all done * alignment all done	2023-12-13 19:59:30 +08:00
Hubert	4780b39eda	[Sync] format (#690 ) Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-12 14:03:45 +08:00
bittersweet1999	465308e430	[Feature] Add Subjective Evaluation (#680 ) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint	2023-12-11 22:22:11 +08:00
Jingming	7cb53a95fa	[Fix] fix bug on standart_deviation summarizer (#675 )	2023-12-08 13:38:07 +08:00
liyucheng09	05bbce8b08	[Feature] Add Data Contamination Analysis (#639 ) * add contamination analysis to ceval * fix bugs * add contamination docs * to pass CI check * update --------- Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-08 10:00:11 +08:00
bittersweet1999	1c95790fdd	New subjective judgement (#660 ) * TabMWP * TabMWP * fixed * fixed * fixed * done * done * done * add new subjective judgement * add new subjective judgement * add new subjective judgement * add new subjective judgement * add new subjective judgement * modified to a more general way * modified to a more general way * final * final * add summarizer * add new summarize * fixed * fixed * fixed --------- Co-authored-by: caomaosong <caomaosong@pjlab.org.cn>	2023-12-06 13:28:33 +08:00
Fengzhe Zhou	9083dea683	[Sync] some renaming (#641 )	2023-11-27 16:06:49 +08:00
Fengzhe Zhou	79f6449d85	[Doc] Update FAQ (#628 ) * update faq * Update docs/zh_cn/get_started/faq.md * Update docs/en/get_started/faq.md * Update docs/zh_cn/get_started/faq.md --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2023-11-23 18:19:17 +08:00
Fengzhe Zhou	d949e3c003	[Feature] Add circular eval (#610 ) * refactor default, add circular summarizer * add circular * update impl * update doc * minor update * no more to be added	2023-11-23 16:45:47 +08:00
Jingming	5e75e29711	[Feature] Add multi-prompt generation demo (#568 ) * [Feature] Add multi-prompt generation demo * [Fix] change form in winogrande_gen_XXX.py * [Fix] make multi prompt demo more directly * [Fix] fix bug * [Fix] minor fix --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-11-20 16:16:37 +08:00
Fengzhe Zhou	d3de5c41fb	[Sync] update model configs (#574 )	2023-11-13 15:15:34 +08:00
Qing	e2355a2ede	[Feature] Add multi model viz (#509 ) * add viz_multi_model.py tool * Modify the viz_multi_model.py script according to the review * highlight multiple optimal scores --------- Co-authored-by: wq.chu <wq.chu@tianrang-inc.com> Co-authored-by: Leymore <zfz-960727@163.com>	2023-10-30 12:11:33 +08:00
Fengzhe Zhou	dbb20b8270	[Sync] update (#517 )	2023-10-27 20:31:22 +08:00
Leymore	fbf5089c40	[Sync] update github token (#475 )	2023-10-13 06:50:54 -05:00

45 Commits