OpenCompass/.github/scripts/oc_score_baseline.yaml
zhulinJulia24 b4a9acd7be
Update daily test (#871)
* add daily test case

* Update pr-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update pr-run-test.yml

* Update daily-run-test.yml

* Update oc_score_assert.py

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* update testcase baseline

* fix test case name

* add more models into daily test

---------

Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-02-05 15:52:00 +08:00

24 lines
458 B
YAML

internlm-7b-hf:
ARC-c: 36.27
chid-dev: 81.68
chid-test: 83.67
openai_humaneval: 10.37
openbookqa: 44.4
openbookqa_fact: 73.2
internlm-chat-7b-hf:
ARC-c: 36.95
chid-dev: 71.78
chid-test: 76.87
openai_humaneval: 21.34
openbookqa: 66.6
openbookqa_fact: 80.4
chatglm3-6b-base-hf:
ARC-c: 43.05
chid-dev: 80.2
chid-test: 80.77
openai_humaneval: 20.73
openbookqa: 79.8
openbookqa_fact: 92.2