mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

* bbeh

* bbeh

* fix_smallbugs_bbeh

* removeprint

* results

---------

Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>

2025-03-12 10:53:31 +08:00

BB#H

python3 run.py --models hf_internlm2_7b --datasets bbeh_gen --debug
python3 run.py --models hf_meta_llama3_8b_instruct --datasets bbeh_gen --debug

Models

model	score
Meta-Llama-3-8B-Instruct-LMDeploy-API	10.93

model	boolean_expressions	disambiguation_qa	geometric_shapes	hyperbaton	movie_recommendation	nycc	shuffled_objects	boardgame_qa
Meta-Llama-3-8B-Instruct-LMDeploy-API	14.00	33.33	13.50	1.00	28.00	11.00	10.00	18.50

model	buggy_tables	causal_understanding	dyck_languages	linguini	multistep_arithmetic	object_counting	object_properties	sarc_triples
Meta-Llama-3-8B-Instruct-LMDeploy-API	0.00	42.50	3.50	2.00	0.00	0.00	1.00	17.00

model	spatial_reasoning	sportqa	temporal_sequence	time_arithmetic	web_of_lies	word_sorting	zebra_puzzles
Meta-Llama-3-8B-Instruct-LMDeploy-API	4.00	5.00	2.00	3.00	7.50	2.00	3.50