OpenCompass/.github/scripts/oc_score_baseline.yaml
zhulinJulia24 94eb90569f
update test workflow (#1167)
* Update pr-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update pr-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update oc_score_baseline.yaml

* Update daily-run-test.yml

* Update oc_score_assert.py

---------

Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-05-16 15:32:57 +08:00

32 lines
616 B
YAML

internlm-7b-hf:
ARC-c: 34.24
chid-dev: 79.70
chid-test: 81.12
openai_humaneval: 10.98
openbookqa: 47.20
openbookqa_fact: 74.00
internlm-chat-7b-hf:
ARC-c: 36.95
chid-dev: 71.78
chid-test: 76.87
openai_humaneval: 21.34
openbookqa: 66.6
openbookqa_fact: 80.4
chatglm3-6b-base-hf:
ARC-c: 44.41
chid-dev: 78.22
chid-test: 78.57
openai_humaneval: 20.73
openbookqa: 78.40
openbookqa_fact: 92.00
internlm2-7b-hf:
ARC-c: 34.92
chid-dev: 55.94
chid-test: 53.70
openai_humaneval: 44.51
openbookqa: 83.00
openbookqa_fact: 83.00