mirror of
https://github.com/open-compass/opencompass.git
synced 2025-05-30 16:03:24 +08:00
![]() * bbeh * bbeh * fix_smallbugs_bbeh * removeprint * results --------- Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn> |
||
---|---|---|
.. | ||
bbeh_gen.py | ||
README.md |
BB#H
python3 run.py --models hf_internlm2_7b --datasets bbeh_gen --debug
python3 run.py --models hf_meta_llama3_8b_instruct --datasets bbeh_gen --debug
Models
model | score |
---|---|
Meta-Llama-3-8B-Instruct-LMDeploy-API | 10.93 |
Details
model | boolean_expressions | disambiguation_qa | geometric_shapes | hyperbaton | movie_recommendation | nycc | shuffled_objects | boardgame_qa |
---|---|---|---|---|---|---|---|---|
Meta-Llama-3-8B-Instruct-LMDeploy-API | 14.00 | 33.33 | 13.50 | 1.00 | 28.00 | 11.00 | 10.00 | 18.50 |
model | buggy_tables | causal_understanding | dyck_languages | linguini | multistep_arithmetic | object_counting | object_properties | sarc_triples |
---|---|---|---|---|---|---|---|---|
Meta-Llama-3-8B-Instruct-LMDeploy-API | 0.00 | 42.50 | 3.50 | 2.00 | 0.00 | 0.00 | 1.00 | 17.00 |
model | spatial_reasoning | sportqa | temporal_sequence | time_arithmetic | web_of_lies | word_sorting | zebra_puzzles |
---|---|---|---|---|---|---|---|
Meta-Llama-3-8B-Instruct-LMDeploy-API | 4.00 | 5.00 | 2.00 | 3.00 | 7.50 | 2.00 | 3.50 |