Alexander Lam
35c94d0cde
[Feature] Adding support for LLM Compression Evaluation ( #1108 )
...
* fixed formatting based on pre-commit tests
* fixed typo in comments; reduced the number of models in the eval config
* fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset
* removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English
2024-04-30 10:51:01 +08:00
liushz
a6f67e1a65
[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README ( #1103 )
...
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Fix Llama-3 meta template
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-04-28 21:58:58 +08:00
Yggdrasill7D6
58a57a4c45
[Feature] add support for Flames datasets ( #1093 )
...
* add flames datasets
* fix lint
* rm quota
* add judgemodel info and fix os path
* support flames dataset
* support flames dataset
---------
Co-authored-by: bittersweet1999 <1487910649@qq.com>
2024-04-28 18:56:24 +08:00
klein
e4830a6926
Update CIBench ( #1089 )
...
* modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4
* update cibench: dataset and evluation
* cibench summarizer bug
* update cibench
* move extract_code import
---------
Co-authored-by: zhangchuyu@pjlab.org.cn <zhangchuyu@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 18:46:02 +08:00
bittersweet1999
e404b72c52
[Feature] support arenahard evaluation ( #1096 )
...
* support arenahard
* support arenahard
* support arenahard
2024-04-26 15:42:00 +08:00
bittersweet1999
6ba1c4937d
[Feature] Support Math evaluation via judgemodel ( #1094 )
...
* support openai math evaluation
* support openai math evaluation
* support openai math evaluation
* support math llm judge
* support math llm judge
2024-04-26 14:56:23 +08:00
bittersweet1999
6f98c8d9ab
[Fix] Fix MultiRound Subjective Evaluation( #1043 )
...
* fix multiround
* fix
2024-04-22 12:06:03 +08:00
Fengzhe Zhou
8c85edd1cd
[Sync] deprecate old mbpps ( #1064 )
2024-04-19 20:49:46 +08:00
Fengzhe Zhou
b39f501563
[Sync] update taco ( #1030 )
2024-04-09 17:50:23 +08:00
Mo Li
16f29b25f1
[Fix] Simplify needlebench summarizer ( #1024 )
...
* Conflicts:
configs/summarizers/needlebench.py
* fix lint problems
2024-04-07 17:51:13 +08:00
bittersweet1999
2d4e559763
[Feature] Add multi-model judge and fix some problems ( #1016 )
...
* support multi-model judge and moe judge
* test_moe
* test_moe
* test
* add moe judge
* support multi-judge-model
2024-04-02 11:52:06 +08:00
bittersweet1999
848e7c8a76
[fix] add different temp for different question in mtbench ( #954 )
...
* add temp for mtbench
* add document for mtbench
* add document for mtbench
2024-03-11 17:24:39 +08:00
Mo Li
8142f399a8
[Feature] Upgrade the needle-in-a-haystack experiment to Needlebench ( #913 )
...
* add needlebench
* simplify needlebench 32k, 128k, 200k for eval
* update act prompt
* fix bug in needlebench summarizer
* add needlebench intro, fix summarizer
* lint summarizer
* fix linting error
* move readme.md
* update readme for needlebench
* update docs of needlebench
* simplify needlebench summarizers
2024-03-04 11:10:52 +08:00
bittersweet1999
1c8e193de8
[Fix] hotfix for mtbench ( #877 )
...
* hotfix for mtbench
* hotfix
2024-02-06 21:26:47 +08:00
Fengzhe Zhou
d34ba11106
[Sync] Merge branch 'dev' into zfz/update-keyset-demo ( #876 )
2024-02-05 23:29:10 +08:00
bittersweet1999
7806cd0f64
[Feature] support alpacaeval ( #809 )
...
* support alpacaeval_v1
* Update opencompass/summarizers/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/summarizers/subjective/alpacaeval_v1.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix conflict
* support alpacaeval v2
* support alpacav2
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-02-04 14:18:36 +08:00
bittersweet1999
5c6dc908cd
fix compass arena ( #854 )
2024-01-30 16:34:38 +08:00
bittersweet1999
2ee8e8a1a1
[Feature] add mtbench ( #829 )
...
* add mtbench
* add mtbench
* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/mtbench.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix mtbench
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-24 12:11:47 +08:00
bittersweet1999
3d9bb4aed7
[Fix] fix strings ( #833 )
...
* add compass arena
* add compass_arena
* add compass arena
* Update opencompass/summarizers/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/summarizers/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix check position bias
* fix string
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-23 10:57:26 +00:00
bittersweet1999
2d4da8dd02
[Feature] Add CompassArena ( #828 )
...
* add compass arena
* add compass_arena
* add compass arena
* Update opencompass/summarizers/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/summarizers/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix check position bias
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-23 15:12:46 +08:00
bittersweet1999
814b3f73bd
reorganize subject files ( #801 )
2024-01-16 18:03:11 +08:00
Fengzhe Zhou
32f40a8f83
[Sync] Sync with internal codes 2023.01.08 ( #777 )
2024-01-08 14:07:24 +00:00
bittersweet1999
2163f9398f
[Feature] add subject ir dataset ( #755 )
...
* add subject ir
* Add ir dataset
* Add ir dataset
2024-01-05 12:00:57 +00:00
bittersweet1999
be369c3e06
[Feature] Add multi_round dataset evaluation ( #766 )
...
* multi_round dataset
* add multi_round evaluation
2024-01-04 10:37:52 +00:00
bittersweet1999
7cd65d49d8
[Fix] Fix small bug in alignbench ( #764 )
...
* fix small bugs
* fix small bugs
2024-01-03 07:44:53 +00:00
bittersweet1999
fe0b717033
add creationbench ( #753 )
2023-12-29 10:03:44 +00:00
bittersweet1999
dfd9ac0fd9
[Feature] Add other judgelm prompts for Alignbench ( #731 )
...
* add judgellm prompts
* add judgelm prompts
* update import info
* fix situation that no abbr in config
* fix situation that no abbr in config
* add summarizer for other judgellm
* change config name
* add maxlen
* add maxlen
* dict assert
* dict assert
* fix strings
* fix strings
2023-12-27 17:54:53 +08:00
Fengzhe Zhou
3a68083ecc
[Sync] update configs ( #734 )
2023-12-25 21:59:16 +08:00
bittersweet1999
e985100cd1
[Fix] Fix subjective alignbench ( #730 )
2023-12-23 20:06:53 +08:00
bittersweet1999
fbb912ddf3
[Feature] Add abbr for judgemodel in subjective evaluation ( #724 )
...
* add_judgemodel_abbr
* add judgemodel abbr
2023-12-21 15:58:20 +08:00
Songyang Zhang
bfe4aa2af5
[Fix] Update alignmentbench ( #704 )
...
* update alignmentbench
* update alignmentbench
* update alignmentbench
2023-12-14 18:24:21 +08:00
bittersweet1999
1fe152b3e8
[Feature] Support AlignmentBench infer and judge ( #697 )
...
* alignmentbench infer and judge
* alignmentbench
* alignmentbench done
* alignment all done
* alignment all done
2023-12-13 19:59:30 +08:00
Hubert
4780b39eda
[Sync] format ( #690 )
...
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-12 14:03:45 +08:00
bittersweet1999
465308e430
[Feature] Add Subjective Evaluation ( #680 )
...
* new version of subject
* fixed draw
* fixed draw
* fixed draw
* done
* done
* done
* done
* fixed lint
2023-12-11 22:22:11 +08:00
Jingming
7cb53a95fa
[Fix] fix bug on standart_deviation summarizer ( #675 )
2023-12-08 13:38:07 +08:00
liyucheng09
05bbce8b08
[Feature] Add Data Contamination Analysis ( #639 )
...
* add contamination analysis to ceval
* fix bugs
* add contamination docs
* to pass CI check
* update
---------
Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-08 10:00:11 +08:00
bittersweet1999
1c95790fdd
New subjective judgement ( #660 )
...
* TabMWP
* TabMWP
* fixed
* fixed
* fixed
* done
* done
* done
* add new subjective judgement
* add new subjective judgement
* add new subjective judgement
* add new subjective judgement
* add new subjective judgement
* modified to a more general way
* modified to a more general way
* final
* final
* add summarizer
* add new summarize
* fixed
* fixed
* fixed
---------
Co-authored-by: caomaosong <caomaosong@pjlab.org.cn>
2023-12-06 13:28:33 +08:00
Fengzhe Zhou
9083dea683
[Sync] some renaming ( #641 )
2023-11-27 16:06:49 +08:00
Fengzhe Zhou
79f6449d85
[Doc] Update FAQ ( #628 )
...
* update faq
* Update docs/zh_cn/get_started/faq.md
* Update docs/en/get_started/faq.md
* Update docs/zh_cn/get_started/faq.md
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-11-23 18:19:17 +08:00
Fengzhe Zhou
d949e3c003
[Feature] Add circular eval ( #610 )
...
* refactor default, add circular summarizer
* add circular
* update impl
* update doc
* minor update
* no more to be added
2023-11-23 16:45:47 +08:00
Jingming
5e75e29711
[Feature] Add multi-prompt generation demo ( #568 )
...
* [Feature] Add multi-prompt generation demo
* [Fix] change form in winogrande_gen_XXX.py
* [Fix] make multi prompt demo more directly
* [Fix] fix bug
* [Fix] minor fix
---------
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-11-20 16:16:37 +08:00
Fengzhe Zhou
d3de5c41fb
[Sync] update model configs ( #574 )
2023-11-13 15:15:34 +08:00
Qing
e2355a2ede
[Feature] Add multi model viz ( #509 )
...
* add viz_multi_model.py tool
* Modify the viz_multi_model.py script according to the review
* highlight multiple optimal scores
---------
Co-authored-by: wq.chu <wq.chu@tianrang-inc.com>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-10-30 12:11:33 +08:00
Fengzhe Zhou
dbb20b8270
[Sync] update ( #517 )
2023-10-27 20:31:22 +08:00
Leymore
fbf5089c40
[Sync] update github token ( #475 )
2023-10-13 06:50:54 -05:00