OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Alexander Lam	7f2aeeff26	added predicted win rates reporting to bradley terry subj eval methods with an option to switch between win rates and elo ratings (#1815 )	2025-01-10 18:20:25 +08:00
Zhao Qihao	e039f3efa0	[Feature] Support MMLU-CF Benchmark (#1775 ) * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * Update mmlu-cf * Update mmlu-cf * Update mmlu-cf * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * Remove outside configs --------- Co-authored-by: liushz <qq1791167085@163.com>	2025-01-09 14:11:20 +08:00
Songyang Zhang	f1e50d4bf0	[Update] Update LiveMathBench (#1809 ) * Update LiveMathBench * Update New O1 Evaluation * Update O1 evaluation	2025-01-07 19:16:12 +08:00
Songyang Zhang	8fdb72f567	[Update] Update o1 eval prompt (#1806 ) * Update XML prediction post-process * Update LiveMathBench * Update LiveMathBench * Update New O1 Evaluation	2025-01-07 00:14:32 +08:00
Alexander Lam	f871e80887	[Feature] Add Bradley-Terry Subjective Evaluation method to Arena Hard dataset (#1802 ) * added base_models_abbrs to references (passed from LMEvaluator); added bradleyterry subjective evaluation method for wildbench, alpacaeval, and compassarena datasets; added all_scores output files for reference in CompassArenaBradleyTerrySummarizer; * added bradleyterry subjective evaluation method to arena_hard dataset	2025-01-03 16:33:43 +08:00
Linchen Xiao	117dc500ad	[Feature] Add Longbenchv2 support (#1801 ) * Create eval_longbenchv2.py * Create longbenchv2_gen.py * Update __init__.py * Create longbenchv2.py * Update datasets_info.py * update * update * update * update * update * update --------- Co-authored-by: abrohamLee <146956824+abrohamLee@users.noreply.github.com>	2025-01-03 12:04:29 +08:00
Linchen Xiao	f3220438bc	[BUMP] Bump version to 0.3.9 (#1790 )	2024-12-31 16:52:47 +08:00
liushz	9c980cbc62	[Feature] Add LiveStemBench Dataset (#1794 ) * [Fix] Fix vllm max_seq_len parameter transfer * [Fix] Fix vllm max_seq_len parameter transfer * Add livestembench dataset * Add livestembench dataset * Add livestembench dataset * Update livestembench_gen_3e3c50.py * Update eval_livestembench.py * Update eval_livestembench.py	2024-12-31 15:17:39 +08:00
Songyang Zhang	fc0556ec8e	[Fix] Fix generic_llm_evaluator output_path (#1798 ) * Fix output_path * Add Logger	2024-12-31 13:05:05 +08:00
Alexander Lam	dc6035cfcb	[Feature] Added Bradley-Terry subjective evaluation	2024-12-31 11:01:23 +08:00
Songyang Zhang	98435dd98e	[Feature] Update o1 evaluation with JudgeLLM (#1795 ) * Update Generic LLM Evaluator * Update o1 style evaluator	2024-12-30 17:31:00 +08:00
Junnan Liu	8e8d4f1c64	[Feature] Support G-Pass@k and LiveMathBench (#1772 ) * support G-Pass@k and livemathbench * fix bugs * fix comments of GPassKEvaluator * update saved details of GPassKEvaluator * update saved details of GPassKEvaluator * fix eval api configs & update openai_api for ease of debugging * update huggingface path * fix method name of G-Pass@k * fix default value of eval_model_name * refactor G-Pass@k evaluator * log generation params for each backend * fix evaluation resume * add notimplementerror	2024-12-30 16:59:39 +08:00
Linchen Xiao	42b54d6bb8	[Update] Add 0shot CoT config for TheoremQA (#1783 )	2024-12-27 16:17:27 +08:00
bittersweet1999	357ce8c7a4	[Fix] Fix model summarizer abbr (#1789 ) * fix pip version * fix pip version * fix model summarizer abbr --------- Co-authored-by: root <bittersweet1999>	2024-12-27 14:45:08 +08:00
Linchen Xiao	56eaac6d8f	[Update] Volc status exception handle (#1780 ) * update * update	2024-12-26 15:43:24 +08:00
Linchen Xiao	ebefffed61	[Update] Update OC academic 202412 (#1771 ) * [Update] Update academic settings * Update * update	2024-12-19 18:07:34 +08:00
Chang Lan	d70100cdf2	[Update] Customizable tokenizer for RULER (#1731 ) * Customizable tokenizer for RULER * Relax requirements	2024-12-19 18:02:11 +08:00
Junnan Liu	499302857f	[Fix] Fix Local Runner Params Save Path (#1768 ) * update local runner params save dir * fix remove * fix directory remove * Fix *_params.py by uuid4	2024-12-19 16:07:34 +08:00
Mashiro	9a5adbde6a	[Fix] Fix lark reporter issue (#1769 )	2024-12-18 19:33:06 +08:00
bittersweet1999	38dba9919b	[Fix] Fix Subjective summarizer order error (#1767 ) * fix pip version * fix pip version * fix order error	2024-12-18 13:21:31 +08:00
Linchen Xiao	d593bfeac8	[Bump] Bump version to 0.3.8 (#1765 ) * [Bump] Bump version to 0.3.8 * Update README.md	2024-12-17 19:17:18 +08:00
Linchen Xiao	eadbdcb4cb	[Update] Update requirement and deepseek configurations (#1764 )	2024-12-17 10:16:47 +08:00
liushz	5c8e91f329	[Fix] Fix vllm max_seq_len parameter transfer (#1745 ) * [Fix] Fix vllm max_seq_len parameter transfer * [Fix] Fix vllm max_seq_len parameter transfer * Update pr-run-test.yml * Update pr-run-test.yml --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>	2024-12-16 21:44:36 +08:00
Alexander Lam	1bd594fc62	[Feature] Added CompassArena-SubjectiveBench with Bradley-Terry Model (#1751 ) * fix lint issues * updated gitignore * changed infer_order from random to double for the pairwise_judge.py (not changing for pairwise_bt_judge.py * added return statement to CompassArenaBradleyTerrySummarizer to return overall score for each judger model	2024-12-16 13:41:28 +08:00
zhulinJulia24	aeded4c4db	add new dataset summerizer (#1758 ) add new dataset summerizer	2024-12-13 09:50:43 +08:00
zhulinJulia24	a1c00cc8b7	[ci] add common_summarizer return (#1724 ) * Update common_summarizer.py * Update common_summarizer.py	2024-12-11 20:38:32 +08:00
liushz	c4ce0174fe	[Fix] Fix ChineseSimpleQA max_out_len (#1757 ) * add chinese simpleqa config * add chinese simpleqa config * add chinese simpleqa config * add chinese simpleqa config * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * pdate Csimpleqa * pdate Csimpleqa * Update Csimpleqa --------- Co-authored-by: 明念 <heyancheng.hyc@taobao.com>	2024-12-11 19:51:27 +08:00
Linchen Xiao	bd7b705be4	[Update] Update dataset configuration with no max_out_len (#1754 )	2024-12-11 18:20:29 +08:00
OpenStellarTeam	1a5b3fc11e	Add Chinese SimpleQA config (#1697 ) * add chinese simpleqa config * add chinese simpleqa config * add chinese simpleqa config * add chinese simpleqa config * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * pdate Csimpleqa --------- Co-authored-by: 明念 <heyancheng.hyc@taobao.com> Co-authored-by: liushz <qq1791167085@163.com>	2024-12-11 18:03:39 +08:00
Linchen Xiao	0d26b348e4	[Feature] Add OC academic 2412 (#1750 )	2024-12-10 21:53:06 +08:00
bittersweet1999	54c0fb7a93	[Change] Change Compassarena metric (#1749 ) * fix pip version * fix pip version * fix summarizer bug * fix compassarena * fix compassarena * fix compassarena	2024-12-10 14:45:32 +08:00
Songyang Zhang	0d8df541bc	[Update] Update O1-style Benchmark and Prompts (#1742 ) * Update JuderBench * Support O1-style Prompts * Update Code * Update OpenAI * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update * Update * Update * Update	2024-12-09 13:48:56 +08:00
Junnan Liu	f333be177c	[Update] Add MATH500 & AIME2024 to LiveMathBench (#1741 ) * upload dataset definitions & configs * add single dataset split specific metrics * add k-pass@threshold & MATH500 * update std computation & k-pass computation * add AIME224 * update README	2024-12-06 14:36:49 +08:00
bittersweet1999	08d63b5bf3	[Fix] Fix error in subjective default summarizer (#1740 ) * fix pip version * fix pip version * fix summarizer bug	2024-12-06 11:03:53 +08:00
Songyang Zhang	fb43dd1906	[Update] Update Skywork/Qwen-QwQ (#1728 ) * Update JuderBench * Support O1-style Prompts * Update Code * Update OpenAI * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update	2024-12-05 19:30:43 +08:00
Junnan Liu	6181ac1122	[Update] Update LiveMathBench Evaluation to Support Single Dataset Split Metric Computation (#1730 ) * upload dataset definitions & configs * add single dataset split specific metrics * add k-pass@threshold & MATH500	2024-12-05 16:54:16 +08:00
Linchen Xiao	ac23f0ce1f	[Update] Update init file for Korbench (#1737 )	2024-12-05 11:26:00 +08:00
Yufeng Zhao	4d773904d4	[Update] Korbench readme supplementation (#1734 ) * renewed * readme --------- Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>	2024-12-05 11:24:35 +08:00
Linchen Xiao	a011be6798	[Feature] DLC runner Lark report (#1735 ) * [Bump] Bump version to 0.3.7 * DLC lark report update	2024-12-04 18:03:12 +08:00
Linchen Xiao	e2a290fd46	[Bump] Bump version to 0.3.7 (#1733 )	2024-12-03 19:34:57 +08:00
Yufeng Zhao	98c4666d65	[Update] Update Korbench dataset abbr (#1729 ) Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>	2024-12-02 16:20:58 +08:00
Linchen Xiao	9de27b4d85	[Update] Update max_out_len for datasets (#1726 ) * [Update] Update max_out_len for datasets * Update eval_regression_chat_objective_fullbench.py * Update eval_regression_chat.py * Update eval_regression_chat.py * Update oc_score_baseline_fullbench.yaml --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>	2024-12-02 11:42:07 +08:00
Junnan Liu	fe6d76fb13	[Feature] Support LiveMathBench (#1727 )	2024-11-30 00:07:19 +08:00
liushz	b063779034	[Fix] Update P-MMEVAL OSS data (#1722 ) * Update with PMMEval * Update * Update __init__.py * Fix Bugs * Delete .pre-commit-config.yaml * Pull merge * Fix pmmeval_gen config * Update P-MMEVAL data --------- Co-authored-by: wanyu <wanyu2018umac@gmail.com> Co-authored-by: wanyu2018umac <42405907+wanyu2018umac@users.noreply.github.com>	2024-11-28 20:55:46 +08:00
liushz	c437135fad	[Feature] Add Openai Simpleqa dataset (#1720 ) * Add Openai SimpleQA dataset * Add Openai SimpleQA dataset * Add Openai SimpleQA dataset * Update eval_simpleqa.py --------- Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>	2024-11-28 19:16:07 +08:00
liushz	06ab27861e	[Fix] Fix pmmeval_gen config (#1719 ) * Update with PMMEval * Update * Update __init__.py * Fix Bugs * Delete .pre-commit-config.yaml * Pull merge * Fix pmmeval_gen config --------- Co-authored-by: wanyu <wanyu2018umac@gmail.com> Co-authored-by: wanyu2018umac <42405907+wanyu2018umac@users.noreply.github.com>	2024-11-28 11:53:36 +08:00
wanyu2018umac	90efcf2216	[Feature] Add P-MMEval (#1714 ) * Update with PMMEval * Update * Update __init__.py * Fix Bugs * Delete .pre-commit-config.yaml * Pull merge --------- Co-authored-by: liushz <qq1791167085@163.com>	2024-11-27 21:26:18 +08:00
Junnan Liu	f7dbe6bb7d	[Feature] Add Arc Prize Public Evaluation (#1690 ) * support arc prize * update arc-prize dataset info & update arc-prize evaluation performance	2024-11-27 15:44:41 +08:00
Yi Ding	bcb707dbfc	[Fix] Fix BailingAPI model (#1707 ) * [fix] sequence under the multiple samples * resolve the lint problems * change the parameter name * add another error code for retry * output the log for invalid response * format correction * update * update * update * update * add two model python files * update the default parameter * use random for delay * update the api example of bailing * remove the unnecessary parameter	2024-11-26 19:24:47 +08:00
Linchen Xiao	ef695e28e5	[Bug] Fix Korbench dataset module (#1717 )	2024-11-26 17:13:28 +08:00
Songyang Zhang	f97c4eae42	[Update] Update Fullbench (#1712 ) * Update JuderBench * Support O1-style Prompts * Update Code	2024-11-26 14:26:55 +08:00
Yufeng Zhao	300adc31e8	[Feature] Add Korbench dataset (#1713 ) * first version for korbench * first stage for korbench * korbench_1 * korbench_1 * korbench_1 * korbench_1 * korbench_1_revised * korbench_combined_1 * korbench_combined_1 * kor_combined * kor_combined * update --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>	2024-11-25 20:11:27 +08:00
Chang Lan	5c1916ea4c	[Update] Add RULER 64k config (#1709 )	2024-11-25 19:35:27 +08:00
liushz	e49fcfd3a3	[Update] Update MATH dataset with model judge (#1711 ) * Update math with llm judge * Update math with llm judge * Update math with llm judge * Update math with llm judge * Update math with llm judge	2024-11-25 15:14:55 +08:00
Linchen Xiao	80e3b9ef37	[Update] Add math prm 800k (#1708 )	2024-11-21 21:29:43 +08:00
Linchen Xiao	500fb1032a	[Update] Update configurations (#1704 )	2024-11-21 16:51:18 +08:00
Yi Ding	05044dfaf2	[Update] Support new error code for Bailing model (#1702 ) * support new error code * fix the lint problems	2024-11-20 16:40:22 +08:00
Linchen Xiao	ff831b153e	[BUMP] Bump version to 0.3.6 (#1694 )	2024-11-18 20:24:50 +08:00
Linchen Xiao	ab8fdbbaab	[Update] Update Math auto-download data (#1700 )	2024-11-18 20:24:35 +08:00
Linchen Xiao	98242ff1d1	[Update] first_option_postprocess (#1699 ) * update first_option_postprocess * update	2024-11-18 20:14:29 +08:00
Linchen Xiao	4653f6976e	[Update] update volc CPU flavor (#1698 )	2024-11-18 12:33:51 +08:00
Linchen Xiao	40a9f0be0d	[Update] MUSR dataset config prefix update (#1692 )	2024-11-15 11:06:30 +08:00
abrohamLee	e9e4b69ddb	[Feature] MuSR Datset Evaluation (#1689 ) * MuSR Datset Evaluation * MuSR Datset Evaluation Add an assertion and a Readme.md	2024-11-14 20:42:12 +08:00
Linchen Xiao	d415439f9b	[Fix] Fix bug for first_option_postprocess (#1688 )	2024-11-14 16:45:59 +08:00
Linchen Xiao	e92a5d4230	[Feature] BABILong Dataset added (#1684 ) * update * update * update * update	2024-11-14 15:32:43 +08:00
Linchen Xiao	2fee63f537	[Update] Auto-download for followbench (#1685 )	2024-11-13 15:47:29 +08:00
bittersweet1999	aca8ec3c6a	[Hotfix] Hotfix (#1683 ) * fix pip version * fix pip version * fix lint * hotfix	2024-11-13 10:14:27 +08:00
sobeit	3ec178f4a9	add single lora adapter support for vLLM inference. (#1679 )	2024-11-12 17:31:36 +08:00
bittersweet1999	17b5e52f6c	[Hotfix] lmdeploy temp (#1674 ) * fix pip version * fix pip version * hotfix	2024-11-12 16:10:16 +08:00
Linchen Xiao	a0ef2fd3b4	[Update] Dingo Dataset update (#1670 ) * [Update] Dingo Dataset update * update	2024-11-08 14:38:43 +08:00
Linchen Xiao	835bf75a36	[Feature] Add long context evaluation for base models (#1666 ) * [Update] Add base long context evaluation * update	2024-11-08 10:53:29 +08:00
Chang Cheng	fd7aa83c01	[Update] Update DLC Runner(#1662 ) * push interntrain hard code * push interntrain hard code * remove redundant post process --------- Co-authored-by: changcheng <changcheng@pjlab.org.cb> Co-authored-by: changcheng <changcheng@pjlab.org.cn>	2024-11-07 15:45:35 +08:00
Linchen Xiao	db258eb7d5	[Bump] Bump version to v0.3.5 (#1657 )	2024-11-03 21:23:35 +08:00
Lyu Han	888f1f3bef	[Fix] Update loglikehood compatibility (#1659 )	2024-11-02 17:19:11 +08:00
liushz	f7d899823c	[Update] Update mmmlu_lite dataload (#1658 ) * update mmmlu_lite dataload from oss * update mmmlu_lite dataload from oss	2024-11-01 17:32:29 +08:00
Songyang Zhang	c789ce5698	[Fix] the automatically download for several datasets (#1652 ) * [Fix] the automatically download for several datasets * Update * Update * Update CI	2024-11-01 15:57:18 +08:00
Linchen Xiao	695738a89b	[Update] Add lmdeploy DeepSeek configs (#1656 ) * [Update] Add lmdeploy DeepSeek configs * update max out length	2024-11-01 15:34:23 +08:00
bittersweet1999	a0853c939d	[Add] Add CompassArenaSubjectiveBench (#1645 ) * fix pip version * fix pip version * add compassarenasubjectivebench * add compassarenasubjectivebench * add compassarenabench	2024-11-01 13:52:22 +08:00
Linchen Xiao	5212ffe8e2	[Update] Add new model configs (#1653 )	2024-10-30 17:24:53 +08:00
Linchen Xiao	df57c08ccf	[Feature] Update Models, Summarizers (#1600 )	2024-10-29 18:37:15 +08:00
Linchen Xiao	d91d66792a	[Update] Update Needlebench OSS path (#1651 )	2024-10-29 18:05:44 +08:00
Chang Lan	46affab882	[Fix] Fix ruler_16k_gen (#1643 )	2024-10-29 17:58:43 +08:00
Linchen Xiao	8172af49bb	[Update] Update wildbench max_seq_len (#1648 ) * [Update] Wildbench max_seq_len update * [Update] Wildbench max_seq_len update	2024-10-29 13:21:31 +08:00
Junnan Liu	645c5f3b2c	[Datasets] Add datasets CMO&AIME (#1610 ) * add datasets cmo&aime * delete unused modules * modify prompt * update __init__ * update data load and add README * update data load * update performance * update md5 * remove indents * add indent * fix log for debug mode	2024-10-28 18:08:02 +08:00
Linchen Xiao	9c39cb68d4	[Bump] Bump version to 0.3.4 (#1639 )	2024-10-25 20:10:16 +08:00
Linchen Xiao	a61e8a0803	[Update] Internal humaneval add (#1641 ) * [Update] internal_humaneval_add * update	2024-10-25 19:08:42 +08:00
Songyang Zhang	84be90669b	[Update] Fix issue of *_param.py, avoid name conflict;add keep_tmp_file flag to support keep the temp config file. (#1640 )	2024-10-25 16:39:25 +08:00
BigDong	2542bc6907	[Feature] Support results saving as md format table (#1638 )	2024-10-25 15:50:33 +08:00
Linchen Xiao	22fdea4bf2	[Update] Update DLC runner (#1637 )	2024-10-24 21:36:16 +08:00
Lyu Han	fb12c3f98a	[Update] strip stop_words (#1635 )	2024-10-24 20:39:20 +08:00
Linchen Xiao	662dddf41a	[Update] Add internal humaneval postprocess (#1636 )	2024-10-24 17:45:21 +08:00
Linchen Xiao	be3c06a158	[Fix] Update common summarizer regex extraction (#1631 )	2024-10-22 14:35:45 +08:00
Chang Lan	a927bba1cf	[Fix] Fix RULER datasets (#1628 ) We need to ensure that we don't import anything that ends with "_datasets", or they will be picked up by the runner, leading to duplicate / unwanted datasets being evaluated.	2024-10-22 11:59:02 +08:00
Songyang Zhang	a4d5a6c81b	[Feature] Support LiveCodeBench (#1617 ) * Update * Update LCB * Update * Update * Update * Update * Update	2024-10-21 20:50:39 +08:00
Chenguang Li	5868d5afa4	[Bug] Fix-NPU-Support (#1618 ) * bugfix NPU support * formatting --------- Co-authored-by: noemotiovon <noemotiovon@gmail.com>	2024-10-21 17:42:53 +08:00
liushz	500b44ba2d	[Fix] gpqa_few_shot_ppl prompt bug (#1627 )	2024-10-21 16:59:06 +08:00
Linchen Xiao	096c347e7d	[Fix] Qwen 2.5 model config (#1626 ) * [Fix] Fix Qwen 2.5 model config * [Fix] Fix Qwen 2.5 model config * [Fix] Fix Qwen 2.5 model config	2024-10-21 16:58:18 +08:00
bittersweet1999	a11e2b2fd4	[Fix] Compatible with old versions (#1616 ) * fix pip version * fix pip version * Compatible with old versions * compati old version * compati old version * compati old version * update configs	2024-10-21 10:16:29 +08:00
Lyu Han	6e8adf5221	[Bug] Remove prefix bos_token from messages when using lmdeploy as the accelerator (#1623 ) * remove prefix bos_token from messages when using lmdeploy as the accelerator * update	2024-10-19 20:03:47 +08:00
Bob Tsang	dd0b655bd0	[Feature] Support MMMLU & MMMLU-lite Benchmark (#1565 ) * rm folder * modify format according to reviewer * modify format according to reviewer * modify format according to reviewer * add some files requirement * fix some bug * fix bug * change load type * Update MMMLU Dataset * Update MMMLU Dataset * Add MMMLU-Lite Dataset * update MMMMLU datast * update MMMMLU datast * update MMMMLU datast --------- Co-authored-by: BobTsang <BobTsang1995@gmail.com> Co-authored-by: liushz <qq1791167085@163.com>	2024-10-17 19:09:34 +08:00
bittersweet1999	f0d436496e	[Update] update docs and add compassarena (#1614 ) * fix pip version * fix pip version * update docs and add compassarena * update docs	2024-10-17 14:39:06 +08:00
Haoran Que	4fe251729b	Upload HelloBench (#1607 ) * upload hellobench * update hellobench * update readme.md * update eval_hellobench.py * update lastest --------- Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>	2024-10-15 17:11:37 +08:00
bittersweet1999	fa54aa62f6	[Feature] Add Judgerbench and reorg subeval (#1593 ) * fix pip version * fix pip version * update (#1522) Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> * [Feature] Update Models (#1518) * Update Models * Update * Update humanevalx * Update * Update * [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527) add judgerbench and reorg sub add judgerbench and reorg subeval add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com> Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>	2024-10-15 16:36:05 +08:00
x54-729	2b1afa7d1e	[Fix] fix interntrain's tokenizer truncate (#1605 ) Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>	2024-10-15 16:03:57 +08:00
Linchen Xiao	f390697a5e	[Fix] Update dlc runner python env (#1604 )	2024-10-14 15:50:21 +08:00
Lyu Han	4fde41036f	[Feature] Update TurboMindModel by integrating lmdeploy pipeline API (#1556 ) * integrate lmdeploy's pipeline api * fix linting * update user guide * rename * update * update * update * rollback class name * update * remove unused code * update * update * use pipeline * fix ci check * compatibility * compatibility * remove concurrency * update * fix table content * update	2024-10-14 15:33:40 +08:00
liushz	5faee929db	[Feature] Add GaoKaoMath Dataset for Evaluation & MATH Model Eval Config (#1589 ) * Add GaoKaoMath Dataset * Add MATH LLM Eval * Update GAOKAO Math Eval Dataset * Update GAOKAO Math Eval Dataset	2024-10-12 19:13:06 +08:00
bittersweet1999	3f7a3730d7	[Fix] fix Flames (#1599 ) * fix pip version * fix pip version * fix flames * fix flames	2024-10-12 14:34:59 +08:00
Lyu Han	b52ba65c26	[Feature] Integrate lmdeploy pipeline api (#1198 ) * integrate lmdeploy's pipeline api * fix linting * update user guide * rename * update * update * update * rollback class name * update * remove unused code * update * update * fix ci check * compatibility * remove concurrency * Update configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py * Update docs/zh_cn/advanced_guides/evaluation_lmdeploy.md * [Bug] fix lint --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-10-09 22:58:06 +08:00
x54-729	4d6349dfe1	[FIX] fix interntrain get_loglikelihood (#1584 )	2024-10-08 11:34:04 +08:00
Linchen Xiao	22a4e76511	[BUMP] Bump version to 0.3.3 (#1581 )	2024-09-30 16:57:41 +08:00
x54-729	bbdca5eb4c	[BUG] Fix eos token handling and add comments for InternTrain (#1569 ) Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>	2024-09-30 15:46:06 +08:00
Linchen Xiao	763d7755b6	[BUG]GaokaoBench dataset fix (#1583 )	2024-09-30 15:13:26 +08:00
shijinpjlab	7528b8ab8a	[Feature] Add dingo test (#1529 ) * add qa dingo * update * change name qa to dingo * eval model: llm_base * update path * change name and move path * add eval_dingo * update import * add for pip * add dingo package * change import place * update import place * fix lint fail * isort * double quoted --------- Co-authored-by: sj <shijin@pjlab.org.cn>	2024-09-29 19:24:58 +08:00
Yi Ding	85a28874aa	[BUG]: Fix Bailing API configs (#1570 )	2024-09-27 11:56:57 +08:00
Songyang Zhang	e8437db98f	[Feature] Update BailingLM/OpenAI verbose (#1568 ) * [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI * Update * [Feature] Update API * Update	2024-09-27 11:15:25 +08:00
Songyang Zhang	7d50294117	[Feature] Update Bailing (#1567 ) * [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI * Update * Update * Update	2024-09-26 18:56:17 +08:00
Songyang Zhang	a7bacfdf7e	[Feature] Update CoreBench 2.0 (#1566 ) * [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI * Update * Update	2024-09-26 18:44:00 +08:00
Yi Ding	3f833186dc	[Feature] Support the reasoning from BaiLing LLM (#1541 ) * [Feature] Support the reasoning from BaiLing LLM This commit includes the access to BaiLing LLM and gets the reasoning. * Add the api example The example of evalute bailing api * Revise the generation arguments Based on current experiment, we update some generation arguments for better reasoning * [fix] set the batch size * Retry under flowcontrol of serverside * add dependent package into requirement.txt add dependent package retrying to clean up the pre-comment check. * correct the file names and make the file copy correct the file names. copy the files under configs to opencompass * fix the lint issue --------- Co-authored-by: christopher.dy <christopher.dy@antgroup.com>	2024-09-26 16:49:52 +08:00
Linchen Xiao	80cda1980e	[BUG] fix followbench dataset config (#1564 ) * [BUG] fix followbench dataset config * [BUG] fix followbench dataset config	2024-09-25 20:58:34 +08:00
zhulinJulia24	87df8a73a3	[CI] add a common summarizer for qabench summarizer (#1545 ) * update * update * update --------- Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>	2024-09-25 13:40:47 +08:00
Linchen Xiao	c3fb9065db	[Feature] Add dlc sleep time (#1562 )	2024-09-25 11:53:48 +08:00
liushz	83eeb52b09	[Feature] Update WikiBench base model config (#1553 ) * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update GPQA & MMLU_Pro * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & Math base config * Update WikiBench base model config --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-25 11:26:36 +08:00
Songyang Zhang	e7681943f3	[Feature] Update the max_out_len for many models (#1559 )	2024-09-24 21:52:28 +08:00
bittersweet1999	a2e9bc0c41	[Fix] fix duplicate error in partitioner (#1552 ) * fix pip version * fix pip version * fix duplicate error in paritioner * fix duplicate error in paritioner	2024-09-23 19:45:21 +08:00
x54-729	335667183a	[Feature] Add Interntrain model support (#1548 ) Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>	2024-09-23 19:10:26 +08:00
klein	24915aeb3f	[BUG] Update CIbench config(#1544 ) * BUG: Update cibench.py * BUG: Update cibench.py	2024-09-23 18:32:27 +08:00
liushz	a0cfd61129	[Feature] Update MathBench & Math base model config (#1550 ) * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update GPQA & MMLU_Pro * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & Math base config --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-23 14:03:59 +08:00
Songyang Zhang	ee058e25b2	[Feature] Support verbose for OpenAI API (#1546 )	2024-09-20 17:12:52 +08:00
hailsham	a81bbb85bf	[FIX] Added handling for the "begin section" in meta_template to APITemplateParser (#1405 ) Co-authored-by: leifei <nuuooo@icloud.com>	2024-09-19 18:12:04 +08:00
Songyang Zhang	5a27c2bd6f	[Model] Support Qwen2.5 Instruct (#1543 )	2024-09-19 16:16:07 +08:00
Songyang Zhang	be460fbb21	[Feature] Support OpenAI O1 models (#1539 ) * [Feature] Support OpenAI O1 models * Update README.md --------- Co-authored-by: liushz <qq1791167085@163.com>	2024-09-18 22:41:17 +08:00
liushz	2e9db77d57	[Feature] Add custom model postprocess function (#1519 ) Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-18 14:40:51 +08:00
liushz	c9a7026f59	[Feature] Update MathBench & WikiBench for FullBench (#1521 ) * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update GPQA & MMLU_Pro * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-18 14:35:30 +08:00
Linchen Xiao	90279b6461	[Feature] Dataset prompts update for ARC, BoolQ, Race (#1527 )	2024-09-13 10:30:43 +08:00
Songyang Zhang	6997990c93	[Feature] Update Models (#1518 ) * Update Models * Update * Update humanevalx * Update * Update	2024-09-12 23:35:30 +08:00
zhulinJulia24	3754dc1b67	update (#1522 ) Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>	2024-09-12 15:00:52 +08:00
bittersweet1999	7c7fa36235	[Feature] add support for internal Followbench (#1511 ) * fix pip version * fix pip version * add internal followbench * add internal followbench * fix lint * fix lint	2024-09-11 13:32:34 +08:00
Linchen Xiao	317763381c	update (#1517 )	2024-09-11 13:31:20 +08:00
bittersweet1999	c2bcd8725e	[Fix] Fix wildbench (#1508 ) * fix pip version * fix pip version * fix_wildbench	2024-09-10 17:35:07 +08:00
Alexander Lam	a31a77c5c1	[Feature] Add SciCode summarizer config (#1514 ) * [Feature] added SciCode summarizer config and dataset config for with background evaluation * fix lint issues * removed unnecessary type in summarizer group	2024-09-10 16:06:02 +08:00
Linchen Xiao	b5f8afb57b	[Bump] Bump version to 0.3.2.post1	2024-09-06 19:09:30 +08:00
Linchen Xiao	f04f3546bc	[Fix] Import fix (#1500 )	2024-09-06 18:29:24 +08:00
Linchen Xiao	ff18545f0e	[Bump] Bump version to 0.3.2 (#1497 )	2024-09-06 16:10:45 +08:00
Linchen Xiao	87ffa71d68	[Feature] Longbench dataset update	2024-09-06 15:50:12 +08:00
Albert Yan	928d0cfc3a	[Feature] Add support for Rendu API (#1468 ) * Add support for Rendu API * fix lint issue * fix lint issue * fix lint issue * Update --------- Co-authored-by: 13190 <zeyu.yan@transn.com> Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-09-06 01:00:43 +08:00
Hari Seldon	faf5260155	[Feature] Optimize Evaluation Speed of SciCode (#1489 ) * update scicode * update comments * remove redundant variable * Update --------- Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-09-06 00:59:41 +08:00
liushz	00fc8da5be	[Feature] Add model postprocess function (#1484 ) * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-05 21:10:29 +08:00
Maxime SHE	45efdc994d	[Feature] Add an attribute api_key into TurboMindAPIModel default None (#1475 ) Co-authored-by: Maxime <maximeshe@163.com> Add an attribute api_key into TurboMindAPIModel default None then we can set the api_key while using lmdeploy to deploy the llm model	2024-09-05 17:51:16 +08:00
Linchen Xiao	6c9cd9a260	[Feature] Needlebench auto-download update (#1480 ) * update * update * update	2024-09-05 17:22:42 +08:00

1 2 3 4 5 ...

680 Commits