OpenCompass/configs/datasets/LCBench
Songyang Zhang 46cc7894e1
[Feature] Support import configs/models/summarizers from whl (#1376)
* [Feature] Support import configs/models/summarizers from whl

* Update LCBench configs

* Update

* Update

* Update

* Update

* update

* Update

* Update

* Update

* Update

* Update
2024-08-01 00:42:48 +08:00
..
lcbench_gen_5ff288.py [Feature] Support import configs/models/summarizers from whl (#1376) 2024-08-01 00:42:48 +08:00
lcbench_gen.py [Feature] Support import configs/models/summarizers from whl (#1376) 2024-08-01 00:42:48 +08:00
lcbench_levels_gen_bb665f.py [Feature] Support import configs/models/summarizers from whl (#1376) 2024-08-01 00:42:48 +08:00
lcbench_repeat10_gen_5ff288.py [Feature] Support import configs/models/summarizers from whl (#1376) 2024-08-01 00:42:48 +08:00
lcbench_repeat10_gen.py [Feature] Support import configs/models/summarizers from whl (#1376) 2024-08-01 00:42:48 +08:00
README.md [Feature] Support import configs/models/summarizers from whl (#1376) 2024-08-01 00:42:48 +08:00

LCBench2023

LCBench2023 collects questions from leetcode weekly competitions between 2022 and 2023. It contains Chinese and English versions, each with 581 questions.

Base Models

model lcbench/pass@1 en/pass@1 cn/pass@1 lcbench/pass lcbench/timeout lcbench/failed lcbench/wrong_answer en/pass en/timeout en/failed en/wrong_answer cn/pass cn/timeout cn/failed cn/wrong_answer
llama-7b-turbomind 1.30 2.61 0.00 15 28 843 266 15 14 290 257 0 14 553 9
llama-13b-turbomind 2.09 4.17 0.00 24 31 823 274 24 16 270 266 0 15 553 8
llama-30b-turbomind 3.48 6.78 0.17 40 41 780 291 39 25 226 286 1 16 554 5
llama-65b-turbomind 4.00 7.83 0.17 46 22 755 329 45 10 205 316 1 12 550 13
llama-2-7b-turbomind 0.78 1.57 0.00 9 28 825 290 9 16 274 277 0 12 551 13
llama-2-13b-turbomind 2.52 5.04 0.00 29 29 761 333 29 17 207 323 0 12 554 10
llama-2-70b-turbomind 5.04 9.57 0.52 58 47 684 363 55 28 140 353 3 19 544 10
llama-3-8b-turbomind 16.59 16.70 16.49 191 30 236 695 96 13 119 348 95 17 117 347
llama-3-70b-turbomind 38.49 38.43 38.54 443 2 120 587 221 2 58 295 222 0 62 292
internlm2-1.8b-turbomind 4.34 5.04 3.65 50 33 333 736 29 18 177 352 21 15 156 384
internlm2-7b-turbomind 12.16 12.52 11.81 140 41 166 805 72 23 92 389 68 18 74 416
internlm2-20b-turbomind 18.46 20.96 15.97 213 54 134 751 121 24 57 374 92 30 77 377
qwen-1.8b-turbomind 1.82 1.91 1.74 21 31 449 651 11 17 208 340 10 14 241 311
qwen-7b-turbomind 4.95 5.39 4.51 57 37 388 670 31 15 197 333 26 22 191 337
qwen-14b-turbomind 8.86 9.74 7.99 102 2 245 803 56 0 120 400 46 2 125 403
qwen-72b-turbomind 16.86 19.48 14.24 194 12 229 717 112 4 112 348 82 8 117 369
qwen1.5-0.5b-hf 0.87 0.52 1.22 10 29 499 614 3 10 259 304 7 19 240 310
qwen1.5-1.8b-hf 2.00 2.26 1.74 23 26 434 669 13 10 220 333 10 16 214 336
qwen1.5-4b-hf 5.65 6.96 4.34 65 37 349 701 40 19 161 356 25 18 188 345
qwen1.5-7b-hf 6.69 8.00 5.38 77 30 283 762 46 12 124 394 31 18 159 368
qwen1.5-14b-hf 12.69 13.74 11.63 146 43 232 731 79 22 122 353 67 21 110 378
qwen1.5-32b-hf 14.34 16.70 11.98 165 45 191 751 96 18 88 374 69 27 103 377
qwen1.5-72b-hf 15.29 15.65 14.93 176 11 242 723 90 7 118 361 86 4 124 362
qwen1.5-moe-a2-7b-hf 9.56 10.09 9.03 110 10 272 760 58 5 129 384 52 5 143 376
mistral-7b-v0.1-hf 11.38 11.83 10.94 131 30 221 770 68 11 100 397 63 19 121 373
mistral-7b-v0.2-hf 11.38 11.13 11.63 131 2 259 760 64 2 124 386 67 0 135 374
mixtral-8x7b-v0.1-hf 21.11 21.39 20.83 243 7 165 737 123 4 76 373 120 3 89 364
mixtral-8x22b-v0.1-hf 30.97 31.22 30.73 357 6 131 658 180 3 66 327 177 3 65 331
yi-6b-hf 2.43 2.78 2.08 28 7 456 661 16 2 214 344 12 5 242 317
yi-34b-hf 8.25 8.35 8.16 95 8 319 730 48 5 163 360 47 3 156 370
deepseek-7b-base-hf 5.30 5.22 5.38 61 7 325 759 30 4 165 377 31 3 160 382
deepseek-67b-base-hf 26.50 26.96 26.04 305 9 202 636 155 4 105 312 150 5 97 324

Chat Models

model lcbench/pass@1 en/pass@1 cn/pass@1 lcbench/pass lcbench/timeout lcbench/failed lcbench/wrong_answer en/pass en/timeout en/failed en/wrong_answer cn/pass cn/timeout cn/failed cn/wrong_answer
qwen1.5-0.5b-chat-hf 0.00 0.00 0.00 0 0 1152 0 0 0 576 0 0 0 576 0
qwen1.5-1.8b-chat-hf 1.65 1.57 1.74 19 5 603 525 9 2 298 267 10 3 305 258
qwen1.5-4b-chat-hf 5.56 5.22 5.90 64 17 484 587 30 8 242 296 34 9 242 291
qwen1.5-7b-chat-hf 8.78 9.57 7.99 101 25 333 693 55 12 151 358 46 13 182 335
qwen1.5-14b-chat-hf 14.42 16.52 12.33 166 18 222 746 95 10 110 361 71 8 112 385
qwen1.5-32b-chat-hf 10.78 13.04 8.51 124 15 516 497 75 10 195 296 49 5 321 201
qwen1.5-72b-chat-hf 18.77 18.78 18.75 216 23 164 749 108 12 89 367 108 11 75 382
qwen1.5-110b-chat-hf 34.58 34.43 34.72 399 20 176 557 199 12 85 280 200 8 91 277
internlm2-chat-1.8b-hf 4.52 5.04 3.99 52 10 364 726 29 4 172 371 23 6 192 355
internlm2-chat-1.8b-sft-hf 3.56 3.83 3.30 41 12 403 696 22 6 211 337 19 6 192 359
internlm2-chat-7b-hf 14.60 13.74 15.45 168 12 238 734 79 7 142 348 89 5 96 386
internlm2-chat-7b-sft-hf 14.34 14.61 14.06 165 9 275 703 84 3 174 315 81 6 101 388
internlm2-chat-20b-hf 19.64 20.00 19.27 226 11 191 724 115 7 83 371 111 4 108 353
internlm2-chat-20b-sft-hf 20.55 19.91 21.18 237 11 195 709 115 6 94 361 122 5 101 348
llama-3-8b-instruct-hf 28.50 29.04 27.95 328 17 95 712 167 7 44 358 161 10 51 354
llama-3-70b-instruct-hf 45.44 46.09 44.79 523 8 52 569 265 2 25 284 258 6 27 285
llama-3-8b-instruct-lmdeploy 29.02 29.39 28.65 334 19 94 705 169 11 42 354 165 8 52 351
llama-3-70b-instruct-lmdeploy 44.66 46.78 42.53 514 11 44 583 269 5 19 283 245 6 25 300
mistral-7b-instruct-v0.1-hf 9.82 10.78 8.85 113 17 316 706 62 9 152 353 51 8 164 353
mistral-7b-instruct-v0.2-hf 7.90 6.26 9.55 91 8 572 481 36 4 345 191 55 4 227 290
mixtral-8x7b-instruct-v0.1-hf 16.29 15.91 16.67 188 13 370 581 92 6 241 237 96 7 129 344