OpenCompass/opencompass/configs/datasets/bbeh
Yufeng Zhao bc2969dba8
[Feature] Add support for BBEH dataset (#1925)
* bbeh

* bbeh

* fix_smallbugs_bbeh

* removeprint

* results

---------

Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>
2025-03-12 10:53:31 +08:00
..
bbeh_gen.py [Feature] Add support for BBEH dataset (#1925) 2025-03-12 10:53:31 +08:00
README.md [Feature] Add support for BBEH dataset (#1925) 2025-03-12 10:53:31 +08:00

BB#H

python3 run.py --models hf_internlm2_7b --datasets bbeh_gen --debug
python3 run.py --models hf_meta_llama3_8b_instruct --datasets bbeh_gen --debug

Models

model score
Meta-Llama-3-8B-Instruct-LMDeploy-API 10.93

Details

model boolean_expressions disambiguation_qa geometric_shapes hyperbaton movie_recommendation nycc shuffled_objects boardgame_qa
Meta-Llama-3-8B-Instruct-LMDeploy-API 14.00 33.33 13.50 1.00 28.00 11.00 10.00 18.50
model buggy_tables causal_understanding dyck_languages linguini multistep_arithmetic object_counting object_properties sarc_triples
Meta-Llama-3-8B-Instruct-LMDeploy-API 0.00 42.50 3.50 2.00 0.00 0.00 1.00 17.00
model spatial_reasoning sportqa temporal_sequence time_arithmetic web_of_lies word_sorting zebra_puzzles
Meta-Llama-3-8B-Instruct-LMDeploy-API 4.00 5.00 2.00 3.00 7.50 2.00 3.50