mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

History

Songyang Zhang 46cc7894e1 [Feature] Support import configs/models/summarizers from whl (#1376 ) * [Feature] Support import configs/models/summarizers from whl * Update LCBench configs * Update * Update * Update * Update * update * Update * Update * Update * Update * Update		2024-08-01 00:42:48 +08:00
..
lcbench_gen_5ff288.py	[Feature] Support import configs/models/summarizers from whl (#1376 )	2024-08-01 00:42:48 +08:00
lcbench_gen.py	[Feature] Support import configs/models/summarizers from whl (#1376 )	2024-08-01 00:42:48 +08:00
lcbench_levels_gen_bb665f.py	[Feature] Support import configs/models/summarizers from whl (#1376 )	2024-08-01 00:42:48 +08:00
lcbench_repeat10_gen_5ff288.py	[Feature] Support import configs/models/summarizers from whl (#1376 )	2024-08-01 00:42:48 +08:00
lcbench_repeat10_gen.py	[Feature] Support import configs/models/summarizers from whl (#1376 )	2024-08-01 00:42:48 +08:00
README.md	[Feature] Support import configs/models/summarizers from whl (#1376 )	2024-08-01 00:42:48 +08:00

README.md

LCBench2023

LCBench2023 collects questions from leetcode weekly competitions between 2022 and 2023. It contains Chinese and English versions, each with 581 questions.

Base Models

model	lcbench/pass@1	en/pass@1	cn/pass@1	lcbench/pass	lcbench/timeout	lcbench/failed	lcbench/wrong_answer	en/pass	en/timeout	en/failed	en/wrong_answer	cn/pass	cn/timeout	cn/failed	cn/wrong_answer
llama-7b-turbomind	1.30	2.61	0.00	15	28	843	266	15	14	290	257	0	14	553	9
llama-13b-turbomind	2.09	4.17	0.00	24	31	823	274	24	16	270	266	0	15	553	8
llama-30b-turbomind	3.48	6.78	0.17	40	41	780	291	39	25	226	286	1	16	554	5
llama-65b-turbomind	4.00	7.83	0.17	46	22	755	329	45	10	205	316	1	12	550	13
llama-2-7b-turbomind	0.78	1.57	0.00	9	28	825	290	9	16	274	277	0	12	551	13
llama-2-13b-turbomind	2.52	5.04	0.00	29	29	761	333	29	17	207	323	0	12	554	10
llama-2-70b-turbomind	5.04	9.57	0.52	58	47	684	363	55	28	140	353	3	19	544	10
llama-3-8b-turbomind	16.59	16.70	16.49	191	30	236	695	96	13	119	348	95	17	117	347
llama-3-70b-turbomind	38.49	38.43	38.54	443	2	120	587	221	2	58	295	222	0	62	292
internlm2-1.8b-turbomind	4.34	5.04	3.65	50	33	333	736	29	18	177	352	21	15	156	384
internlm2-7b-turbomind	12.16	12.52	11.81	140	41	166	805	72	23	92	389	68	18	74	416
internlm2-20b-turbomind	18.46	20.96	15.97	213	54	134	751	121	24	57	374	92	30	77	377
qwen-1.8b-turbomind	1.82	1.91	1.74	21	31	449	651	11	17	208	340	10	14	241	311
qwen-7b-turbomind	4.95	5.39	4.51	57	37	388	670	31	15	197	333	26	22	191	337
qwen-14b-turbomind	8.86	9.74	7.99	102	2	245	803	56	0	120	400	46	2	125	403
qwen-72b-turbomind	16.86	19.48	14.24	194	12	229	717	112	4	112	348	82	8	117	369
qwen1.5-0.5b-hf	0.87	0.52	1.22	10	29	499	614	3	10	259	304	7	19	240	310
qwen1.5-1.8b-hf	2.00	2.26	1.74	23	26	434	669	13	10	220	333	10	16	214	336
qwen1.5-4b-hf	5.65	6.96	4.34	65	37	349	701	40	19	161	356	25	18	188	345
qwen1.5-7b-hf	6.69	8.00	5.38	77	30	283	762	46	12	124	394	31	18	159	368
qwen1.5-14b-hf	12.69	13.74	11.63	146	43	232	731	79	22	122	353	67	21	110	378
qwen1.5-32b-hf	14.34	16.70	11.98	165	45	191	751	96	18	88	374	69	27	103	377
qwen1.5-72b-hf	15.29	15.65	14.93	176	11	242	723	90	7	118	361	86	4	124	362
qwen1.5-moe-a2-7b-hf	9.56	10.09	9.03	110	10	272	760	58	5	129	384	52	5	143	376
mistral-7b-v0.1-hf	11.38	11.83	10.94	131	30	221	770	68	11	100	397	63	19	121	373
mistral-7b-v0.2-hf	11.38	11.13	11.63	131	2	259	760	64	2	124	386	67	0	135	374
mixtral-8x7b-v0.1-hf	21.11	21.39	20.83	243	7	165	737	123	4	76	373	120	3	89	364
mixtral-8x22b-v0.1-hf	30.97	31.22	30.73	357	6	131	658	180	3	66	327	177	3	65	331
yi-6b-hf	2.43	2.78	2.08	28	7	456	661	16	2	214	344	12	5	242	317
yi-34b-hf	8.25	8.35	8.16	95	8	319	730	48	5	163	360	47	3	156	370
deepseek-7b-base-hf	5.30	5.22	5.38	61	7	325	759	30	4	165	377	31	3	160	382
deepseek-67b-base-hf	26.50	26.96	26.04	305	9	202	636	155	4	105	312	150	5	97	324

Chat Models

model	lcbench/pass@1	en/pass@1	cn/pass@1	lcbench/pass	lcbench/timeout	lcbench/failed	lcbench/wrong_answer	en/pass	en/timeout	en/failed	en/wrong_answer	cn/pass	cn/timeout	cn/failed	cn/wrong_answer
qwen1.5-0.5b-chat-hf	0.00	0.00	0.00	0	0	1152	0	0	0	576	0	0	0	576	0
qwen1.5-1.8b-chat-hf	1.65	1.57	1.74	19	5	603	525	9	2	298	267	10	3	305	258
qwen1.5-4b-chat-hf	5.56	5.22	5.90	64	17	484	587	30	8	242	296	34	9	242	291
qwen1.5-7b-chat-hf	8.78	9.57	7.99	101	25	333	693	55	12	151	358	46	13	182	335
qwen1.5-14b-chat-hf	14.42	16.52	12.33	166	18	222	746	95	10	110	361	71	8	112	385
qwen1.5-32b-chat-hf	10.78	13.04	8.51	124	15	516	497	75	10	195	296	49	5	321	201
qwen1.5-72b-chat-hf	18.77	18.78	18.75	216	23	164	749	108	12	89	367	108	11	75	382
qwen1.5-110b-chat-hf	34.58	34.43	34.72	399	20	176	557	199	12	85	280	200	8	91	277
internlm2-chat-1.8b-hf	4.52	5.04	3.99	52	10	364	726	29	4	172	371	23	6	192	355
internlm2-chat-1.8b-sft-hf	3.56	3.83	3.30	41	12	403	696	22	6	211	337	19	6	192	359
internlm2-chat-7b-hf	14.60	13.74	15.45	168	12	238	734	79	7	142	348	89	5	96	386
internlm2-chat-7b-sft-hf	14.34	14.61	14.06	165	9	275	703	84	3	174	315	81	6	101	388
internlm2-chat-20b-hf	19.64	20.00	19.27	226	11	191	724	115	7	83	371	111	4	108	353
internlm2-chat-20b-sft-hf	20.55	19.91	21.18	237	11	195	709	115	6	94	361	122	5	101	348
llama-3-8b-instruct-hf	28.50	29.04	27.95	328	17	95	712	167	7	44	358	161	10	51	354
llama-3-70b-instruct-hf	45.44	46.09	44.79	523	8	52	569	265	2	25	284	258	6	27	285
llama-3-8b-instruct-lmdeploy	29.02	29.39	28.65	334	19	94	705	169	11	42	354	165	8	52	351
llama-3-70b-instruct-lmdeploy	44.66	46.78	42.53	514	11	44	583	269	5	19	283	245	6	25	300
mistral-7b-instruct-v0.1-hf	9.82	10.78	8.85	113	17	316	706	62	9	152	353	51	8	164	353
mistral-7b-instruct-v0.2-hf	7.90	6.26	9.55	91	8	572	481	36	4	345	191	55	4	227	290
mixtral-8x7b-instruct-v0.1-hf	16.29	15.91	16.67	188	13	370	581	92	6	241	237	96	7	129	344