OpenCompass/opencompass
RunningLeon e34c552282
[Feature] Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend (#721)
* add llama2 test

* fix

* test qwen chat-7b

* test w4

* add baichuan2

* update

* update

* update configs and docs

* update
2023-12-21 18:22:17 +08:00
..
datasets [Feature] Add ReasonBench(Internal) dataset (#577) 2023-12-20 17:57:42 +08:00
lagent [Feat] update python action and slurm (#694) 2023-12-13 10:41:10 +08:00
metrics [Feat] Support multi-modal evaluation on MME benchmark. (#197) 2023-08-21 15:53:20 +08:00
models [Feature] Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend (#721) 2023-12-21 18:22:17 +08:00
multimodal [Feature]: To be compatible with the latest version of MiniGPT-4 (#539) 2023-11-04 09:50:36 +08:00
openicl [Feature] Support AlignmentBench infer and judge (#697) 2023-12-13 19:59:30 +08:00
partitioners [Feature] Add JudgeLLMs (#710) 2023-12-19 18:40:25 +08:00
runners [Fix] Update alignmentbench (#704) 2023-12-14 18:24:21 +08:00
summarizers [Feature] Add abbr for judgemodel in subjective evaluation (#724) 2023-12-21 15:58:20 +08:00
tasks [Feature] Add abbr for judgemodel in subjective evaluation (#724) 2023-12-21 15:58:20 +08:00
utils [Sync] minor test (#683) 2023-12-11 17:42:53 +08:00
__init__.py [Sync] format (#690) 2023-12-12 14:03:45 +08:00
registry.py [Sync] update github token (#475) 2023-10-13 06:50:54 -05:00