Commit Graph

  • 6156974794 srbench_add Jun 2025-05-20 03:18:59 +0000
  • 19e7fec7fb srbench Jun 2025-05-20 02:59:00 +0000
  • 36d8b19399 srbench Jun 2025-05-20 02:57:38 +0000
  • 8274603540 Update zhangsongyang 2025-05-19 13:39:11 +0000
  • 7a7a4517ab
    [Update] History code bench pass@k update (#2102) Dongsheng Zhu 2025-05-19 17:03:33 +0800
  • d9de21a8c3 update earth silver benchmark Zhouzone 2025-05-18 19:18:10 +0800
  • 32c1e38207
    Merge 642cd2839b into 8c0ccf9a6b BigDong 2025-05-17 13:43:54 +0800
  • 95d8d2ba4d fix irrelevant files huihui 2025-05-16 12:52:07 +0000
  • 7363bd57b9 template update Dongsheng Zhu 2025-05-16 11:41:55 +0000
  • e591fe22a1 max_out fix Dongsheng Zhu 2025-05-16 09:48:53 +0000
  • 8c0ccf9a6b
    [CI] Fix Lint error (#2103) kkscilife 2025-05-16 15:36:45 +0800
  • 4ef1f65873 use another way to avoid lint error kkscilife 2025-05-16 15:00:15 +0800
  • 6f3b6a5d12
    [CI] Add gitleaks check (#2101) kkscilife 2025-05-16 14:34:57 +0800
  • a515dfbfde template Dongsheng Zhu 2025-05-16 06:05:06 +0000
  • 543bc1c1cb add check rule kkscilife 2025-05-16 11:52:51 +0800
  • 7278a4ed19 healthbench huihui 2025-05-15 08:50:05 +0000
  • 722ba22a57 Merge branch 'main' of https://github.com/open-compass/opencompass into history_code_bench_update Dongsheng Zhu 2025-05-15 05:39:20 +0000
  • d90833f8bc fix bug Dongsheng Zhu 2025-05-15 05:38:00 +0000
  • 580a2b7980 Update zhangsongyang 2025-05-15 03:46:14 +0000
  • 58b36c37bb humaneval_plus Dongsheng Zhu 2025-05-15 02:27:24 +0000
  • 3ead360328 phybench_fix yufeng zhao 2025-05-14 11:38:52 +0000
  • 01af69a685 fixed lint leoyizhang 2025-05-14 18:33:07 +0800
  • cf7f2425cf Update zhangsongyang 2025-05-14 10:14:52 +0000
  • 6ea181d15c Update zhangsongyang 2025-05-14 08:30:23 +0000
  • bfe693cc6f Update zhangsongyang 2025-05-14 07:10:10 +0000
  • 3ba4455ee6 Update zhangsongyang 2025-05-06 14:10:37 +0000
  • af9ee506c3 Update zhangsongyang 2025-05-06 13:50:39 +0000
  • 7f76b12ae6 Update zhangsongyang 2025-05-06 13:13:11 +0000
  • ba0186ba1c Update zhangsongyang 2025-05-06 09:55:26 +0000
  • 61a2a6b763 Update zhangsongyang 2025-05-06 06:51:44 +0000
  • 93ecc670df Update zhangsongyang 2025-04-30 09:47:51 +0000
  • 4c2e66d335 Update zhangsongyang 2025-04-30 08:50:25 +0000
  • 7605cc2ca4 Update zhangsongyang 2025-04-28 15:43:05 +0000
  • 8fc6343119 Update zhangsongyang 2025-04-27 11:31:07 +0000
  • 57e98f30bb Update zhangsongyang 2025-04-25 09:43:27 +0000
  • 45af358798 Update Config zhangsongyang 2025-04-25 08:35:28 +0000
  • eac7a6230d Update CascadeEvaluator zhangsongyang 2025-04-16 16:36:40 +0000
  • 14cf872184 Update CascadeEvaluator zhangsongyang 2025-04-14 16:50:04 +0000
  • 16e9884c2f Update CascadeEvaluator zhangsongyang 2025-04-14 13:15:58 +0000
  • 63ce20c8ea mbpp Dongsheng Zhu 2025-05-14 09:28:56 +0000
  • cb34f95984 livecodebench Dongsheng Zhu 2025-05-14 09:13:36 +0000
  • 431047ab05 humanevalx Dongsheng Zhu 2025-05-14 06:17:12 +0000
  • cf585621ca humanevalx Dongsheng Zhu 2025-05-14 06:16:43 +0000
  • 3d477dd265 humaneval Dongsheng Zhu 2025-05-14 03:47:10 +0000
  • febd188403 bigcodebench Dongsheng Zhu 2025-05-14 03:38:46 +0000
  • 3d1760aba2
    [Dataset] Add Scieval (#2089) tcheng 2025-05-14 10:25:03 +0800
  • 4f115ebd59 revise :SciEval 5shot root 2025-05-13 09:44:16 +0000
  • 9bde347000
    Merge d9f27fd676 into b84518c656 tcheng 2025-05-13 17:26:13 +0800
  • b84518c656
    [Dataset] Support MedMCQA and MedBullets benchmark (#2054) Wei Li 2025-05-13 17:10:50 +0800
  • 6caa345fcd update_oss_info MaiziXiao 2025-05-13 08:02:22 +0000
  • 03f16c8a83 [Fix] Fix precommit Mor-Li 2025-05-13 14:59:32 +0800
  • a0c3a24aa1 [Docs] Update Default Settings for NeedleBench and ATC Configs Mor-Li 2025-05-13 14:56:46 +0800
  • 98a6f6119b [Docs] update NeedleBenchV2 Docs Mor-Li 2025-05-13 14:32:26 +0800
  • f7242fdea8 Merge branch 'update_needlebench_docs' into needlebench_v2_pr Mor-Li 2025-05-13 14:19:48 +0800
  • 35518f612f [Docs] Update NeedleBench Docs Mor-Li 2025-05-13 14:17:11 +0800
  • d60f59dcab
    [CI] update baseline and fix lmdeploy version (#2098) zhulinJulia24 2025-05-13 14:01:47 +0800
  • 1cc85721cc update zhulinJulia24 2025-05-13 13:15:51 +0800
  • a31aabf6bc revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml) root 2025-05-13 04:39:40 +0000
  • 4641a2890f update zhulinJulia24 2025-05-13 11:24:09 +0800
  • 474769c25d update zhulinJulia24 2025-05-13 11:20:32 +0800
  • 6dbbd80af0 update zhulinJulia24 2025-05-13 10:45:13 +0800
  • 9eaa1f6fec
    Update icl_judge_evaluator.py (#2095) bittersweet1999 2025-05-13 10:44:24 +0800
  • 3472ed113d update zhulinJulia24 2025-05-13 10:32:06 +0800
  • c269cc054d update zhulinJulia24 2025-05-12 20:51:36 +0800
  • d590f557bb
    [Update] OpenaiSDK handle empty content (#2096) Linchen Xiao 2025-05-12 19:38:30 +0800
  • b4fd65924a Resolve merge conflict with upstream root 2025-05-12 11:26:54 +0000
  • 1425f423f4 revise latest conflict marcry 2025-05-12 11:10:36 +0000
  • c492e49e79
    [Update] Add o4 in OpenaiSDK (#2083) yuehua-s 2025-05-12 18:39:44 +0800
  • 2c79dc5227
    [Dataset] Add human_eval/mbpp pro (#2092) Dongsheng Zhu 2025-05-12 18:38:13 +0800
  • 225453f09b update MaiziXiao 2025-05-12 10:21:51 +0000
  • 5efe9cf479
    Update icl_judge_evaluator.py bittersweet1999-patch-3 bittersweet1999 2025-05-12 17:48:01 +0800
  • 345674f700
    [Dataset] Add SciknowEval Dataset (#2070) huihui1999 2025-05-12 17:23:44 +0800
  • e693952b32
    Merge d26e808c9f into 8aa18df368 chenzihong 2025-05-12 12:58:47 +0800
  • d9f27fd676 PromptCBLUE:Life Science dataset+data root 2025-05-12 04:36:42 +0000
  • 8aa18df368
    [Dataset] HLE Biomedical version support (#2080) Kun Yuan 2025-05-12 04:14:11 +0200
  • 59c94c778b fix repeat bug Dongsheng Zhu 2025-05-12 02:11:32 +0000
  • 1040546e95 add index Dongsheng Zhu 2025-05-11 12:08:55 +0000
  • 695f814bec time update Dongsheng Zhu 2025-05-11 07:42:10 +0000
  • 5d8c96b001 [Dataset] Add R-Bench (ICML 2025) leoyizhang 2025-05-11 13:26:25 +0800
  • 40f81179a9 bug fix Dongsheng Zhu 2025-05-10 12:39:41 +0000
  • bcb297a46b update Dongsheng Zhu 2025-05-10 11:35:53 +0000
  • 57bf9c1030 merge with main huihui 2025-05-10 09:54:12 +0000
  • c850734aa2 set up default category value for hle Flaick 2025-05-09 12:44:31 +0000
  • ef3ca3ebc1
    Merge branch 'open-compass:main' into hle_biomed Kun Yuan 2025-05-09 14:45:19 +0200
  • d75494841d remove choice version Mor-Li 2025-05-09 20:21:24 +0800
  • 40c6c68162 [Fix] Fix pre-commit Mor-Li 2025-05-09 20:19:04 +0800
  • d1da4a577c Add NeedleBench_V2 Mor-Li 2025-05-09 19:37:39 +0800
  • 493be22598 revise conflict after latest pr marcry 2025-05-09 09:04:16 +0000
  • 3119903840
    Merge branch 'main' into SciKnowEval Linchen Xiao 2025-05-09 17:03:15 +0800
  • 44a7024ed5
    [Dataset] MedCalc_Bench (#2072) huihui1999 2025-05-09 16:58:55 +0800
  • 9c2916e1ab fix lint huihui 2025-05-09 08:56:42 +0000
  • b7c7d1fd5c fix lint huihui 2025-05-09 08:55:51 +0000
  • d18fc62f53
    Merge branch 'main' into MedCalc_Bench Linchen Xiao 2025-05-09 16:45:48 +0800
  • 0e182a3845 all categories of SciEval (datasets + configs + loader+dataset-index.yml) root 2025-05-09 08:07:02 +0000
  • 075f9c53d4 fix lint huihui 2025-05-09 08:00:44 +0000
  • 508e2b0cb2
    [Update] Set load_from_cache_file to False (#2085) Linchen Xiao 2025-05-09 15:21:47 +0800
  • 8ceec52170 PromptCBLUE:Life Science dataset+data root 2025-05-09 07:13:58 +0000
  • 0180dd9be4 revise class name marcry 2025-05-09 07:10:49 +0000
  • 7bdd3c1904
    [Dataset] MMLU_Pro Biomedical Version Support (#2081) Kun Yuan 2025-05-09 09:07:26 +0200
  • 325b70be6d revise latest conflict marcry 2025-05-09 07:00:38 +0000