OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Songyang Zhang	6997990c93	[Feature] Update Models (#1518 ) * Update Models * Update * Update humanevalx * Update * Update	2024-09-12 23:35:30 +08:00
bittersweet1999	7c7fa36235	[Feature] add support for internal Followbench (#1511 ) * fix pip version * fix pip version * add internal followbench * add internal followbench * fix lint * fix lint	2024-09-11 13:32:34 +08:00
bittersweet1999	c2bcd8725e	[Fix] Fix wildbench (#1508 ) * fix pip version * fix pip version * fix_wildbench	2024-09-10 17:35:07 +08:00
Alexander Lam	a31a77c5c1	[Feature] Add SciCode summarizer config (#1514 ) * [Feature] added SciCode summarizer config and dataset config for with background evaluation * fix lint issues * removed unnecessary type in summarizer group	2024-09-10 16:06:02 +08:00
Linchen Xiao	87ffa71d68	[Feature] Longbench dataset update	2024-09-06 15:50:12 +08:00
Albert Yan	928d0cfc3a	[Feature] Add support for Rendu API (#1468 ) * Add support for Rendu API * fix lint issue * fix lint issue * fix lint issue * Update --------- Co-authored-by: 13190 <zeyu.yan@transn.com> Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-09-06 01:00:43 +08:00
liushz	00fc8da5be	[Feature] Add model postprocess function (#1484 ) * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-05 21:10:29 +08:00
Maxime SHE	45efdc994d	[Feature] Add an attribute api_key into TurboMindAPIModel default None (#1475 ) Co-authored-by: Maxime <maximeshe@163.com> Add an attribute api_key into TurboMindAPIModel default None then we can set the api_key while using lmdeploy to deploy the llm model	2024-09-05 17:51:16 +08:00
Linchen Xiao	6c9cd9a260	[Feature] Needlebench auto-download update (#1480 ) * update * update * update	2024-09-05 17:22:42 +08:00
Linchen Xiao	da74cbfa39	[Fix] Model configs update	2024-09-04 18:57:10 +08:00
Linchen Xiao	9693be46b7	[Feature] Mmlu-pro auto-download (#1464 ) * update * update * update * update * update	2024-08-30 10:03:40 +08:00
Songyang Zhang	e5a8eb2283	[Feature] Update Lint and Leaderboard (#1458 ) * [Feature] Update Lint and Leaderboard * Update * Update	2024-08-28 22:36:42 +08:00
Linchen Xiao	245664f4c0	[Feature] Fullbench v0.1 language update (#1463 ) * update * update * update * update	2024-08-28 14:01:05 +08:00
Songyang Zhang	7c2d25b557	[Fix] Update SciCode and Gemma model (#1449 ) * [Fix] Update SciCode and Gemma model * Update * Update	2024-08-23 10:42:27 +08:00
liushz	9fdbc744dc	[Fix] Update option postprocess & mathbench language summarizer (#1413 ) * Update option postprocess & mathbench language summarizer * Update option postprocess & mathbench language summarizer --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-08-22 14:49:07 +08:00
Linchen Xiao	0fe9756c5d	[Doc] Update Readme (#1439 ) * update * update * update * update * update * update * update * update * update * update * update * update	2024-08-22 14:48:45 +08:00
Hari Seldon	14b4b735cb	[Feature] Add support for SciCode (#1417 ) * add SciCode * add SciCode * add SciCode * add SciCode * add SciCode * add SciCode * add SciCode * add SciCode w/ bg * add scicode * Update README.md * Update README.md * Delete configs/eval_SciCode.py * rename * 1 * rename * Update README.md * Update scicode.py * Update scicode.py * fix some bugs * Update * Update --------- Co-authored-by: root <HariSeldon0> Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-08-22 13:42:25 +08:00
Linchen Xiao	a4b54048ae	[Feature] Add Ruler datasets (#1310 ) * [Feature] Add Ruler datasets * pre-commit fixed * Add model specific tokenizer to dataset * pre-commit modified * remove unused import * fix linting * add trust_remote to tokenizer load * lint fix * comments resolved * fix lint * Add readme * Fix lint * ruler refactorize * fix lint * lint fix * updated * lint fix * fix wonderwords import issue * prompt modified * update * readme updated * update * ruler dataset added * Update --------- Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-08-20 11:40:11 +08:00
Xu Song	99b5122ed5	[Feature] Add abbr for rolebench dataset (#1431 ) * Add abbr for rolebench dataset * add	2024-08-20 11:22:48 +08:00
Linchen Xiao	ecf9bb3e4c	[Bug] Commonsenseqa dataset fix (#1425 ) * longbench dataset load fix * update * Update * Update * Update * update * update --------- Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-08-16 15:54:07 +08:00
Songyang Zhang	9b3613f10b	[Update] Support auto-download of FOFO/MT-Bench-101 (#1423 ) * [Update] Support auto-download of FOFO/MT-Bench-101 * Update wildbench	2024-08-16 11:57:41 +08:00
Linchen Xiao	8e55c9c6ee	[Update] Compassbench v1.3 (#1396 ) * stash files * compassbench subjective evaluation added * evaluation update * fix lint * update docs * Update lint * changes saved * changes saved * CompassBench subjective summarizer added (#1349) * subjective summarizer added * fix lint [Fix] Fix MathBench (#1351) Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn> [Update] Update model support list (#1353) * fix pip version * fix pip version * update model support subjective summarizer updated knowledge, math objective done (data need update) remove secrets objective changes saved knowledge data added * secrets removed * changed added * summarizer modified * summarizer modified * compassbench coding added * fix lint * objective summarizer updated * compass_bench_v1.3 updated * update files in config folder * remove unused model * lcbench modified * removed model evaluation configs * remove duplicated sdk implementation --------- Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>	2024-08-12 19:09:19 +08:00
Songyang Zhang	c81329b548	[Fix] Fix Slurm ENV (#1392 ) 1. Support Slurm Cluster 2. Support automatic data download 3. Update InternLM2.5-1.8B/20B-Chat	2024-08-06 01:35:20 +08:00
Songyang Zhang	c09fc79ba8	[Feature] Support OpenAI ChatCompletion (#1389 ) * [Feature] Support import configs/models/summarizers from whl * Update * Update openai sdk * Update * Update gemma	2024-08-01 19:10:13 +08:00
Peng Bo	07c96ac659	Calm dataset (#1385 ) * Add CALM Dataset	2024-08-01 10:03:21 +08:00
Songyang Zhang	46cc7894e1	[Feature] Support import configs/models/summarizers from whl (#1376 ) * [Feature] Support import configs/models/summarizers from whl * Update LCBench configs * Update * Update * Update * Update * update * Update * Update * Update * Update * Update	2024-08-01 00:42:48 +08:00
Mo Li	b83396f57c	add 1m config (#1383 )	2024-07-31 14:53:51 +08:00
klein	52eccc4f0e	[Fix] Fix version mismatch of CIBench (#1380 ) * update crb * update crbbench * update crbbench * update crbbench * minor update wildbench * [Fix] Update doc of wildbench, and merge wildbench into subjective * [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench * Update crb.md * Update crb_pair_judge.py * Update crb_single_judge.py * Update subjective_evaluation.md * Update openai_api.py * [Update] update wildbench readme * [Update] update wildbench readme * [Update] update wildbench readme, remove crb * Delete configs/eval_subjective_wildbench_pair.py * Delete configs/eval_subjective_wildbench_single.py * Update __init__.py * [Fix] fix version mismatch for CIBench * [Fix] fix version mismatch for CIBench, local runer * [Fix] fix version mismatch for CIBench, local runer, remove oracle mode --------- Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>	2024-07-30 17:51:24 +08:00
QXY	fea11b1d20	[Feature] add support for hf_pulse_7b (#1255 ) * add support for hf_pulse_7b * Update hf_pulse_7b.py	2024-07-29 19:01:52 +08:00
Songyang Zhang	704853e5e7	[Feature] Update pip install (#1324 ) * [Feature] Update pip install * Update Configuration * Update * Update * Update * Update Internal Config * Update collect env	2024-07-29 18:32:50 +08:00
Xingjun.Wang	edab1c07ba	[Feature] Support ModelScope datasets (#1289 ) * add ceval, gsm8k modelscope surpport * update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest * update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets * format file * format file * update dataset format * support ms_dataset * udpate dataset for modelscope support * merge myl_dev and update test_ms_dataset * udpate dataset for modelscope support * update readme * update eval_api_zhipu_v2 * remove unused code * add get_data_path function * update readme * remove tydiqa japanese subset * add ceval, gsm8k modelscope surpport * update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest * update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets * format file * format file * update dataset format * support ms_dataset * udpate dataset for modelscope support * merge myl_dev and update test_ms_dataset * update readme * udpate dataset for modelscope support * update eval_api_zhipu_v2 * remove unused code * add get_data_path function * remove tydiqa japanese subset * update util * remove .DS_Store * fix md format * move util into package * update docs/get_started.md * restore eval_api_zhipu_v2.py, add environment setting * Update dataset * Update * Update * Update * Update --------- Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local> Co-authored-by: Yunnglin <mao.looper@qq.com> Co-authored-by: Yun lin <yunlin@laptop.local> Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn> Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>	2024-07-29 13:48:32 +08:00
jxd	12b84aeb3b	[Feature] Update CHARM Memeorziation (#1230 ) * update gemini api and add gemini models * add openai models * update CHARM evaluation * add CHARM memorization tasks * add CharmMemSummarizer (output eval details for memorization-independent reasoning analysis * update CHARM readme --------- Co-authored-by: wujiang <wujiang@pjlab.org.cn>	2024-07-26 18:42:30 +08:00
bittersweet1999	d3782c1d47	Revert "Calm dataset (#1287 )" (#1366 ) This reverts commit `edd0ffdf70`.	2024-07-26 18:27:29 +08:00
Xu Song	9b9855a008	Add `en` and `zh` groups to longbench summarizer; Fix longbench overall score (#1216 ) * Add longbench groups * update * update	2024-07-26 11:50:41 +08:00
Peng Bo	edd0ffdf70	Calm dataset (#1287 ) * add calm dataset * modify config max_out_len * update README * Modify README * update README * update README * update README * update README * update README * add summarizer and modify readme * delete summarizer config comment * update summarizer * modify same response to all questions * update README	2024-07-26 11:48:16 +08:00
LeavittLang	8ee7fecb68	Adding support for Doubao API (#1218 ) * Adding support for Doubao API * Update doubao_api.py Fixed the bug that the connection would be retried even if it was normal. * Update doubao_api.py --------- Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>	2024-07-26 11:44:51 +08:00
klein	65fad8e2ac	[Fix] minor update wildbench (#1335 ) * update crb * update crbbench * update crbbench * update crbbench * minor update wildbench * [Fix] Update doc of wildbench, and merge wildbench into subjective * [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench * Update crb.md * Update crb_pair_judge.py * Update crb_single_judge.py * Update subjective_evaluation.md * Update openai_api.py * [Update] update wildbench readme * [Update] update wildbench readme * [Update] update wildbench readme, remove crb * Delete configs/eval_subjective_wildbench_pair.py * Delete configs/eval_subjective_wildbench_single.py * Update __init__.py --------- Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>	2024-07-26 11:19:04 +08:00
bittersweet1999	8fe75e9937	[Update] update Subeval demo config (#1358 ) * fix pip version * fix pip version * update demo config	2024-07-24 15:48:28 +08:00
liushz	cf3e942f73	[Fix] Fix MathBench (#1351 ) Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-07-23 13:35:38 +08:00
Linchen Xiao	8127fc3518	CompassBench subjective summarizer added (#1349 ) * subjective summarizer added * fix lint	2024-07-23 12:29:57 +08:00
Que Haoran	a244453d9e	[Feature] Support inference ppl datasets (#1315 ) * commit inference ppl datasets * revised format * revise * revise * revise * revise * revise * revise	2024-07-22 17:59:30 +08:00
Xu Song	e9384823f2	Upgrade default math `pred_postprocessor` (#1340 ) * Change default math postprocessor * Update math_gen_265cce.py	2024-07-22 14:00:49 +08:00
Songyang Zhang	96f644de69	[Fix] Update path and folder (#1344 ) * Update path and folder * Update path --------- Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>	2024-07-21 08:18:14 +08:00
Linchen Xiao	a56678190b	[Feature] CompassBench v1_3 subjective evaluation (#1341 ) * stash files * compassbench subjective evaluation added * evaluation update * remove unneeded content * fix lint * update docs * Update lint * Update --------- Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>	2024-07-19 23:12:23 +08:00
liushz	98c58f8a6c	[Feature] Add compassbench knowledge&math part (#1342 ) * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Update acclerator * Update MathBench * Update accelerator * Add Doc for accelerator * Add Doc for accelerator * Add Doc for accelerator * Add Doc for accelerator * Update compassbench august wiki&math * Update compassbench august wiki&math * Update compassbench august wiki&math * Update compassbench_aug_gen_068af0.py * Update compassbench_aug_gen_068af0.py * Update --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn> Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>	2024-07-19 22:54:46 +08:00
bittersweet1999	1f9f728f22	[Feature] support compassbench Checklist evaluation (#1339 ) * fix pip version * fix pip version * support checklist eval * init * add lan * fix typo	2024-07-19 16:40:44 +08:00
Xu Song	0a1c89e618	[Fix] Fix rouge evaluator of rolebench_zh (#1322 )	2024-07-16 16:18:13 +08:00
bittersweet1999	8e7ad2e981	[Fix] add bc for alignbench summarizer (#1306 ) * fix pip version * fix pip version * fix alignbench * fix import error	2024-07-12 11:06:20 +08:00
bittersweet1999	889e7e1140	[Fix] Change abbr for arenahard dataset (#1302 ) * fix pip version * fix pip version * change abbr for arenahard	2024-07-11 12:42:03 +08:00
Fengzhe Zhou	1d3a26c732	[Doc] quick start swap tabs (#1263 ) * [doc] quick start swap tabs * update docs * update * update * update * update * update * update * update	2024-07-05 23:51:42 +08:00
bittersweet1999	68ca48496b	[Refactor] Reorganize subjective eval (#1284 ) * fix pip version * fix pip version * reorganize subjective eval * reorg sub * reorg subeval * reorg subeval * update subjective doc * reorg subeval * reorg subeval	2024-07-05 22:11:37 +08:00
Songyang Zhang	409a042d93	[Feature] Add InternLM2.5 (#1286 ) * [Feature] Add InternLM2.5 * Update * update readme --------- Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2024-07-04 20:10:31 +08:00
zhulinJulia24	167cfdcca3	[ci] update daily testcase (#1285 ) * Update daily-run-test.yml * Create eval_regression_chat.py * Delete .github/scripts/.github/scripts/eval_regression_chat.py * Create eval_regression_chat.py * Update pr-run-test.yml * Update daily-run-test.yml * Update daily-run-test.yml * Update daily-run-test.yml * Update oc_score_baseline.yaml * Update oc_score_assert.py * Update daily-run-test.yml * Update daily-run-test.yml * Update oc_score_baseline.yaml * Update oc_score_assert.py * Update oc_score_assert.py * fix lint * update * update * update * update * update * update * update * update * update * Update daily-run-test.yml * update --------- Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>	2024-07-03 18:56:09 +08:00
liushz	fc2c9dea8c	Update MathBench summarizer & fix cot setting (#1282 ) * Update MathBench * Update MathBench * Update MathBench --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-07-01 21:51:17 +08:00
Fengzhe Zhou	a32f21a356	[Sync] Sync with internal codes 2024.06.28 (#1279 )	2024-06-28 14:16:34 +08:00
klein	1fa62c4a42	Support wildbench (#1266 ) Co-authored-by: Leymore <zfz-960727@163.com>	2024-06-24 13:16:27 +08:00
bittersweet1999	e0d7808b4e	[Fix] fix pip version (#1228 ) * fix pip version * fix pip version	2024-06-06 11:48:07 +08:00
bittersweet1999	982e024540	[Feature] add dataset Fofo (#1224 ) * add fofo dataset * add dataset fofo	2024-06-06 11:40:48 +08:00
Xingyuan Bu	02a0a4e857	MT-Bench-101 (#1215 ) * add mt-bench-101 * add readme and requirements * add mt-bench-101 data * Update readme_mtbench101.md * update readme * update leaderboard * fix typo * Update readme_mtbench101.md * fit newest opencompass * update readme.md * mtbench101 to opencompass * mtbench101 to opencompass * for code review * for code review * for code review * hook * hook --------- Co-authored-by: liujie <ljie@buaa.edu.cn>	2024-06-03 14:52:12 +08:00
bittersweet1999	7c381e5be8	[Fix] fix summarizer (#1217 ) * fix summarizer * fix summarizer	2024-05-31 11:40:47 +08:00
Fengzhe Zhou	a77b8a5cec	[Sync] format (#1214 )	2024-05-30 00:21:58 +08:00
Fengzhe Zhou	d59189b87f	[Doc] Update running command in README (#1206 )	2024-05-30 00:06:39 +08:00
Fengzhe Zhou	0b50112dc1	[Fix] Rollback opt model configs (#1213 )	2024-05-30 00:03:22 +08:00
Xu Song	808582d952	Fix VLLM argument error (#1207 )	2024-05-29 10:14:08 +08:00
Fengzhe Zhou	2954913d9b	[Sync] bump version (#1204 )	2024-05-28 23:09:59 +08:00
Fengzhe Zhou	9fa80b0f93	[Feat] Update charm summary (#1194 )	2024-05-27 16:17:01 +08:00
jxd	608ff5810d	support CHARM (https://github.com/opendatalab/CHARM ) reasoning tasks (#1190 ) * support CHARM (https://github.com/opendatalab/CHARM) reasoning tasks * fix lint error * add dataset card for CHARM * minor refactor * add txt --------- Co-authored-by: wujiang <wujiang@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2024-05-27 13:48:22 +08:00
bittersweet1999	07a6dacf33	fix length (#1180 )	2024-05-24 23:30:01 +08:00
klein	5eb8f14d97	[Fix] Fix drop_gen.py (#1191 ) Fix the bug in drop_gen: wrong import	2024-05-24 23:17:50 +08:00
bittersweet1999	31afe87026	fix yi-chat template (#1178 )	2024-05-21 18:14:12 +08:00
liushz	1448be00e2	Update MathBench (#1176 ) * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Update acclerator * Update MathBench --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-05-21 14:45:43 +08:00
Fengzhe Zhou	2b3d4150f3	[Sync] update evaluator (#1175 )	2024-05-21 14:22:46 +08:00
Fengzhe Zhou	5de85406ce	[Sync] add OC16 entry (#1171 )	2024-05-17 16:50:58 +08:00
Fengzhe Zhou	8ea2c404d7	[Feat] enable HuggingFacewithChatTemplate with --accelerator via cli (#1163 ) * enable HuggingFacewithChatTemplate with --accelerator via cli * rm vllm_internlm2_chat_7b	2024-05-15 21:51:07 +08:00
Fengzhe Zhou	62dbf04708	[Sync] update github workflow (#1156 )	2024-05-14 22:42:23 +08:00
Fengzhe Zhou	aa2dd2b58c	[Format] Add config lints (#892 )	2024-05-14 15:35:58 +08:00
Xu Song	3dbba11945	[Feat] Support dataset_suffix check for mixed configs (#973 ) * [Feat] Support dataset_suffix check for mixed configs * update mixed suffix * update suffix --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-05-14 15:03:28 +08:00
Fengzhe Zhou	7505b3cadf	[Feature] Add huggingface apply_chat_template (#1098 ) * add TheoremQA with 5-shot * add huggingface_above_v4_33 classes * use num_worker partitioner in cli * update theoremqa * update TheoremQA * add TheoremQA * rename theoremqa -> TheoremQA * update TheoremQA output path * rewrite many model configs * update huggingface * further update * refine configs * update configs * update configs * add configs/eval_llama3_instruct.py * add summarizer multi faceted * update bbh datasets * update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py * rename class * update readme * update hf above v4.33	2024-05-14 14:50:16 +08:00
Mo Li	6c711cb262	[Fix] Fix Needlebench Summarizer (#1143 ) * update few-shot example * add 128k	2024-05-13 15:59:34 +08:00
bittersweet1999	5432dfc1ff	fix multiround (#1146 )	2024-05-13 15:58:39 +08:00
bittersweet1999	833a35140b	[Fix] fix alpacaeval while add caching path (#1139 ) * fix alpacaeval * fix alpacaeval	2024-05-11 14:02:26 +08:00
Alexander Lam	a71122ee18	[Feature] Add Qwen1.5 MoE 7b and Mixtral 8x22b model configs (#1123 ) * added qwen moe and mixtral 8x22 model configs * updated README files news section	2024-05-09 11:04:26 +08:00
Mo Li	cb080fa7de	[Fix] Fix NeedleBench Summarizer Typo (#1125 ) * update needleinahaystack eval docs * update needlebench summarizer * fix english docs typo	2024-05-08 20:00:15 +08:00
JuhaoLiang	d2c40e5648	[Feature] Add AceGPT-MMLUArabic benchmark (#1099 ) * add AceGPT-MMLUArabic benchmark * update readme and fix lint issue * remove unused package * add MMLUArabic zero-shot settings * rename filename and update readme	2024-05-08 15:00:26 +08:00
Fangyu Lei	862044fb7d	[Feature] Add S3Eval Dataset (#916 ) * s3eval_branch * update s3eval	2024-05-06 19:41:52 +08:00
Xu Song	d501710155	[Fix] Fix AGIEval chinese sets (#972 ) * [Fix] Fix AGIEval chinese sets * Create agieval_gen_617738.py * [Fix] Fix AGIEval chinese sets * Restore agieval_gen_64afd3.py * Update agieval_gen.py * Create agieval_mixed_0fa998.py * Update agieval_mixed.py	2024-05-06 15:31:42 +08:00
Yggdrasill7D6	af10ecc272	add mgsm datasets (#1081 ) * add mgsm datasets * fix lint * fix lint * update mgsm * update mgsm * ease code spell * update * update * update --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-05-06 15:29:34 +08:00
klein	153c4fc988	[Feature] update drop dataset from openai simple eval (#1092 ) * [Feature] update drop dataset from openai simple eval * update drop template presentation * update --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-05-06 13:37:08 +08:00
Fengzhe Zhou	d43392a3bb	[Feature] Add mmlu prompt from simple_evals, openai (#1074 ) * add mmlu prompt from simple_evals, openai * return empty str on failure	2024-05-06 13:26:26 +08:00
Yang Yong	53fe390454	fix LightllmApi workers bug (#1113 )	2024-04-30 22:09:22 +08:00
Alexander Lam	35c94d0cde	[Feature] Adding support for LLM Compression Evaluation (#1108 ) * fixed formatting based on pre-commit tests * fixed typo in comments; reduced the number of models in the eval config * fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset * removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English	2024-04-30 10:51:01 +08:00
liushz	a6f67e1a65	[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README (#1103 ) * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-04-28 21:58:58 +08:00
bittersweet1999	0b7de67c4a	fix prompt template (#1104 )	2024-04-28 21:54:30 +08:00
Yggdrasill7D6	58a57a4c45	[Feature] add support for Flames datasets (#1093 ) * add flames datasets * fix lint * rm quota * add judgemodel info and fix os path * support flames dataset * support flames dataset --------- Co-authored-by: bittersweet1999 <1487910649@qq.com>	2024-04-28 18:56:24 +08:00
Mo Li	76dd814c4d	[Doc] Update NeedleInAHaystack Docs (#1102 ) * update NeedleInAHaystack Test Docs * update docs	2024-04-28 18:51:47 +08:00
Haodong Duan	3a232db471	[Deperecate] Remove multi-modal related stuff (#1072 ) * Remove MultiModal * update index.rst * update README * remove mmbench codes * update news --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 21:20:14 +08:00
Francis-llgg	f1ee11de14	[Feature] Add gpqa prompt from simple_evals, openai (#1080 ) * add gpqa_openai_simple_eval * 触发CI构建 * reorg --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 20:13:00 +08:00
klein	e4830a6926	Update CIBench (#1089 ) * modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4 * update cibench: dataset and evluation * cibench summarizer bug * update cibench * move extract_code import --------- Co-authored-by: zhangchuyu@pjlab.org.cn <zhangchuyu@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 18:46:02 +08:00
bittersweet1999	e404b72c52	[Feature] support arenahard evaluation (#1096 ) * support arenahard * support arenahard * support arenahard	2024-04-26 15:42:00 +08:00
bittersweet1999	6ba1c4937d	[Feature] Support Math evaluation via judgemodel (#1094 ) * support openai math evaluation * support openai math evaluation * support openai math evaluation * support math llm judge * support math llm judge	2024-04-26 14:56:23 +08:00

1 2 3 4 5 ...

419 Commits