OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Linchen Xiao	eadbdcb4cb	[Update] Update requirement and deepseek configurations (#1764 )	2024-12-17 10:16:47 +08:00
Alexander Lam	1bd594fc62	[Feature] Added CompassArena-SubjectiveBench with Bradley-Terry Model (#1751 ) * fix lint issues * updated gitignore * changed infer_order from random to double for the pairwise_judge.py (not changing for pairwise_bt_judge.py * added return statement to CompassArenaBradleyTerrySummarizer to return overall score for each judger model	2024-12-16 13:41:28 +08:00
liushz	c4ce0174fe	[Fix] Fix ChineseSimpleQA max_out_len (#1757 ) * add chinese simpleqa config * add chinese simpleqa config * add chinese simpleqa config * add chinese simpleqa config * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * pdate Csimpleqa * pdate Csimpleqa * Update Csimpleqa --------- Co-authored-by: 明念 <heyancheng.hyc@taobao.com>	2024-12-11 19:51:27 +08:00
Linchen Xiao	bd7b705be4	[Update] Update dataset configuration with no max_out_len (#1754 )	2024-12-11 18:20:29 +08:00
OpenStellarTeam	1a5b3fc11e	Add Chinese SimpleQA config (#1697 ) * add chinese simpleqa config * add chinese simpleqa config * add chinese simpleqa config * add chinese simpleqa config * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * pdate Csimpleqa --------- Co-authored-by: 明念 <heyancheng.hyc@taobao.com> Co-authored-by: liushz <qq1791167085@163.com>	2024-12-11 18:03:39 +08:00
Linchen Xiao	0d26b348e4	[Feature] Add OC academic 2412 (#1750 )	2024-12-10 21:53:06 +08:00
bittersweet1999	54c0fb7a93	[Change] Change Compassarena metric (#1749 ) * fix pip version * fix pip version * fix summarizer bug * fix compassarena * fix compassarena * fix compassarena	2024-12-10 14:45:32 +08:00
Songyang Zhang	0d8df541bc	[Update] Update O1-style Benchmark and Prompts (#1742 ) * Update JuderBench * Support O1-style Prompts * Update Code * Update OpenAI * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update * Update * Update * Update	2024-12-09 13:48:56 +08:00
Junnan Liu	f333be177c	[Update] Add MATH500 & AIME2024 to LiveMathBench (#1741 ) * upload dataset definitions & configs * add single dataset split specific metrics * add k-pass@threshold & MATH500 * update std computation & k-pass computation * add AIME224 * update README	2024-12-06 14:36:49 +08:00
Songyang Zhang	fb43dd1906	[Update] Update Skywork/Qwen-QwQ (#1728 ) * Update JuderBench * Support O1-style Prompts * Update Code * Update OpenAI * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update	2024-12-05 19:30:43 +08:00
Junnan Liu	6181ac1122	[Update] Update LiveMathBench Evaluation to Support Single Dataset Split Metric Computation (#1730 ) * upload dataset definitions & configs * add single dataset split specific metrics * add k-pass@threshold & MATH500	2024-12-05 16:54:16 +08:00
Yufeng Zhao	4d773904d4	[Update] Korbench readme supplementation (#1734 ) * renewed * readme --------- Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>	2024-12-05 11:24:35 +08:00
Linchen Xiao	e2a290fd46	[Bump] Bump version to 0.3.7 (#1733 )	2024-12-03 19:34:57 +08:00
Yufeng Zhao	98c4666d65	[Update] Update Korbench dataset abbr (#1729 ) Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>	2024-12-02 16:20:58 +08:00
Linchen Xiao	9de27b4d85	[Update] Update max_out_len for datasets (#1726 ) * [Update] Update max_out_len for datasets * Update eval_regression_chat_objective_fullbench.py * Update eval_regression_chat.py * Update eval_regression_chat.py * Update oc_score_baseline_fullbench.yaml --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>	2024-12-02 11:42:07 +08:00
Junnan Liu	fe6d76fb13	[Feature] Support LiveMathBench (#1727 )	2024-11-30 00:07:19 +08:00
liushz	c437135fad	[Feature] Add Openai Simpleqa dataset (#1720 ) * Add Openai SimpleQA dataset * Add Openai SimpleQA dataset * Add Openai SimpleQA dataset * Update eval_simpleqa.py --------- Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>	2024-11-28 19:16:07 +08:00
liushz	06ab27861e	[Fix] Fix pmmeval_gen config (#1719 ) * Update with PMMEval * Update * Update __init__.py * Fix Bugs * Delete .pre-commit-config.yaml * Pull merge * Fix pmmeval_gen config --------- Co-authored-by: wanyu <wanyu2018umac@gmail.com> Co-authored-by: wanyu2018umac <42405907+wanyu2018umac@users.noreply.github.com>	2024-11-28 11:53:36 +08:00
wanyu2018umac	90efcf2216	[Feature] Add P-MMEval (#1714 ) * Update with PMMEval * Update * Update __init__.py * Fix Bugs * Delete .pre-commit-config.yaml * Pull merge --------- Co-authored-by: liushz <qq1791167085@163.com>	2024-11-27 21:26:18 +08:00
Junnan Liu	f7dbe6bb7d	[Feature] Add Arc Prize Public Evaluation (#1690 ) * support arc prize * update arc-prize dataset info & update arc-prize evaluation performance	2024-11-27 15:44:41 +08:00
Yi Ding	bcb707dbfc	[Fix] Fix BailingAPI model (#1707 ) * [fix] sequence under the multiple samples * resolve the lint problems * change the parameter name * add another error code for retry * output the log for invalid response * format correction * update * update * update * update * add two model python files * update the default parameter * use random for delay * update the api example of bailing * remove the unnecessary parameter	2024-11-26 19:24:47 +08:00
Linchen Xiao	ef695e28e5	[Bug] Fix Korbench dataset module (#1717 )	2024-11-26 17:13:28 +08:00
Songyang Zhang	f97c4eae42	[Update] Update Fullbench (#1712 ) * Update JuderBench * Support O1-style Prompts * Update Code	2024-11-26 14:26:55 +08:00
Yufeng Zhao	300adc31e8	[Feature] Add Korbench dataset (#1713 ) * first version for korbench * first stage for korbench * korbench_1 * korbench_1 * korbench_1 * korbench_1 * korbench_1_revised * korbench_combined_1 * korbench_combined_1 * kor_combined * kor_combined * update --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>	2024-11-25 20:11:27 +08:00
Chang Lan	5c1916ea4c	[Update] Add RULER 64k config (#1709 )	2024-11-25 19:35:27 +08:00
liushz	e49fcfd3a3	[Update] Update MATH dataset with model judge (#1711 ) * Update math with llm judge * Update math with llm judge * Update math with llm judge * Update math with llm judge * Update math with llm judge	2024-11-25 15:14:55 +08:00
Linchen Xiao	80e3b9ef37	[Update] Add math prm 800k (#1708 )	2024-11-21 21:29:43 +08:00
Linchen Xiao	500fb1032a	[Update] Update configurations (#1704 )	2024-11-21 16:51:18 +08:00
Linchen Xiao	40a9f0be0d	[Update] MUSR dataset config prefix update (#1692 )	2024-11-15 11:06:30 +08:00
abrohamLee	e9e4b69ddb	[Feature] MuSR Datset Evaluation (#1689 ) * MuSR Datset Evaluation * MuSR Datset Evaluation Add an assertion and a Readme.md	2024-11-14 20:42:12 +08:00
Linchen Xiao	e92a5d4230	[Feature] BABILong Dataset added (#1684 ) * update * update * update * update	2024-11-14 15:32:43 +08:00
Linchen Xiao	835bf75a36	[Feature] Add long context evaluation for base models (#1666 ) * [Update] Add base long context evaluation * update	2024-11-08 10:53:29 +08:00
liushz	f7d899823c	[Update] Update mmmlu_lite dataload (#1658 ) * update mmmlu_lite dataload from oss * update mmmlu_lite dataload from oss	2024-11-01 17:32:29 +08:00
Songyang Zhang	c789ce5698	[Fix] the automatically download for several datasets (#1652 ) * [Fix] the automatically download for several datasets * Update * Update * Update CI	2024-11-01 15:57:18 +08:00
Linchen Xiao	695738a89b	[Update] Add lmdeploy DeepSeek configs (#1656 ) * [Update] Add lmdeploy DeepSeek configs * update max out length	2024-11-01 15:34:23 +08:00
bittersweet1999	a0853c939d	[Add] Add CompassArenaSubjectiveBench (#1645 ) * fix pip version * fix pip version * add compassarenasubjectivebench * add compassarenasubjectivebench * add compassarenabench	2024-11-01 13:52:22 +08:00
Linchen Xiao	5212ffe8e2	[Update] Add new model configs (#1653 )	2024-10-30 17:24:53 +08:00
Linchen Xiao	df57c08ccf	[Feature] Update Models, Summarizers (#1600 )	2024-10-29 18:37:15 +08:00
Chang Lan	46affab882	[Fix] Fix ruler_16k_gen (#1643 )	2024-10-29 17:58:43 +08:00
Linchen Xiao	8172af49bb	[Update] Update wildbench max_seq_len (#1648 ) * [Update] Wildbench max_seq_len update * [Update] Wildbench max_seq_len update	2024-10-29 13:21:31 +08:00
Junnan Liu	645c5f3b2c	[Datasets] Add datasets CMO&AIME (#1610 ) * add datasets cmo&aime * delete unused modules * modify prompt * update __init__ * update data load and add README * update data load * update performance * update md5 * remove indents * add indent * fix log for debug mode	2024-10-28 18:08:02 +08:00
Linchen Xiao	a61e8a0803	[Update] Internal humaneval add (#1641 ) * [Update] internal_humaneval_add * update	2024-10-25 19:08:42 +08:00
Chang Lan	a927bba1cf	[Fix] Fix RULER datasets (#1628 ) We need to ensure that we don't import anything that ends with "_datasets", or they will be picked up by the runner, leading to duplicate / unwanted datasets being evaluated.	2024-10-22 11:59:02 +08:00
Songyang Zhang	a4d5a6c81b	[Feature] Support LiveCodeBench (#1617 ) * Update * Update LCB * Update * Update * Update * Update * Update	2024-10-21 20:50:39 +08:00
liushz	500b44ba2d	[Fix] gpqa_few_shot_ppl prompt bug (#1627 )	2024-10-21 16:59:06 +08:00
Linchen Xiao	096c347e7d	[Fix] Qwen 2.5 model config (#1626 ) * [Fix] Fix Qwen 2.5 model config * [Fix] Fix Qwen 2.5 model config * [Fix] Fix Qwen 2.5 model config	2024-10-21 16:58:18 +08:00
bittersweet1999	a11e2b2fd4	[Fix] Compatible with old versions (#1616 ) * fix pip version * fix pip version * Compatible with old versions * compati old version * compati old version * compati old version * update configs	2024-10-21 10:16:29 +08:00
Bob Tsang	dd0b655bd0	[Feature] Support MMMLU & MMMLU-lite Benchmark (#1565 ) * rm folder * modify format according to reviewer * modify format according to reviewer * modify format according to reviewer * add some files requirement * fix some bug * fix bug * change load type * Update MMMLU Dataset * Update MMMLU Dataset * Add MMMLU-Lite Dataset * update MMMMLU datast * update MMMMLU datast * update MMMMLU datast --------- Co-authored-by: BobTsang <BobTsang1995@gmail.com> Co-authored-by: liushz <qq1791167085@163.com>	2024-10-17 19:09:34 +08:00
bittersweet1999	f0d436496e	[Update] update docs and add compassarena (#1614 ) * fix pip version * fix pip version * update docs and add compassarena * update docs	2024-10-17 14:39:06 +08:00
Haoran Que	4fe251729b	Upload HelloBench (#1607 ) * upload hellobench * update hellobench * update readme.md * update eval_hellobench.py * update lastest --------- Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>	2024-10-15 17:11:37 +08:00

1 2

89 Commits