Commit Graph

  • 6097186a95
    [Datasets] MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 (#2064) Jin Ye 2025-05-09 16:47:44 +1000
  • fba250a094 PromptCBLUE:Life Science dataset+data root 2025-05-09 06:47:26 +0000
  • d72df59363
    [Revert] Add Lifescience Sub-set Support for SciEval (#2059) (#2087) Linchen Xiao 2025-05-09 14:46:27 +0800
  • e28868e72e
    Merge branch 'main' into SciKnowEval Linchen Xiao 2025-05-09 14:42:44 +0800
  • b1b429b680
    Revert "[Dataset] Add Lifescience Sub-set Support for SciEval (#2059)" revert-2059-mmlu_andscieval__tc Linchen Xiao 2025-05-09 14:36:49 +0800
  • c5048bfec7
    [Dataset] Add Lifescience Sub-set Support for SciEval (#2059) tcheng 2025-05-09 14:31:12 +0800
  • f09c085817 Add version code for MedQA and ProteinLMBench Yejin0111 2025-05-09 06:26:14 +0000
  • f7ae6c690e resolve new conflicts marcry 2025-05-09 05:59:40 +0000
  • 6ed5f0c8bc Add version code for MedQA and ProteinLMBench Yejin0111 2025-05-09 05:47:03 +0000
  • 9c8244aa44 Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml) root 2025-05-09 05:00:57 +0000
  • f238298512 revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml) root 2025-05-09 04:48:22 +0000
  • efae720249 fix lint huihui 2025-05-09 04:46:47 +0000
  • 47fd267d4d fix lint huihui 2025-05-09 04:39:28 +0000
  • 89752ce5bd fix lint huihui 2025-05-09 03:59:06 +0000
  • 70192c284b
    Merge branch 'main' into SciKnowEval Linchen Xiao 2025-05-09 12:19:18 +0800
  • 936acd8a3c update MaiziXiao 2025-05-09 04:03:28 +0000
  • 37155ebe5b fix lint huihui 2025-05-09 03:59:06 +0000
  • d939e32438 add bench Dongsheng Zhu 2025-05-09 02:36:39 +0000
  • d28e3e4c80 Fix bugs for MedQA. Add info in dataset-index Yejin0111 2025-05-08 14:41:15 +0000
  • 5744ae6b7b feature:change 4o-mini to 4o yuehuazhang 2025-05-08 21:25:09 +0800
  • a7f3ac20b2
    [Dataset] Add CARDBiomedBench (#2071) huihui1999 2025-05-08 19:44:46 +0800
  • 1b5e467065 fix init huihui 2025-05-08 11:14:28 +0000
  • 724472ee5d fix lint huihui 2025-05-08 11:12:17 +0000
  • ff3275edf0
    [Update] Add Long-Context configs for Gemma, OREAL, and Qwen2.5 models (#2048) Mo Li 2025-05-08 19:06:56 +0800
  • 3f2ce77543 fix lint huihui 2025-05-08 10:59:45 +0000
  • 1b05a473d2 fix lint huihui 2025-05-08 10:55:51 +0000
  • 862cf61f64 fix lint huihui 2025-05-08 06:07:08 +0000
  • a685ed7daf
    [Dataset] Add nejm ai benchmark (#2063) Wei Li 2025-05-08 16:44:05 +0800
  • 9ec23c145b
    [Datasets] Add ClinicBench, PubMedQA and ScienceQA (#2061) Jiahao Xu 2025-05-08 16:25:43 +0800
  • 295f10e749 update MaiziXiao 2025-05-08 08:10:35 +0000
  • 4579828ac3 feature:1.add o4-mini;2.o3 or o4-mini only support temperature==1 yuehuazhang 2025-05-08 16:09:39 +0800
  • 1acb3c30c0 update MaiziXiao 2025-05-08 07:26:18 +0000
  • 734073cc53 Update hf_path xuxuxuxuxuxjh 2025-05-08 14:53:29 +0800
  • 26adccc20c use official llmjudge_postprocess huihui 2025-05-08 06:07:08 +0000
  • 314cfc0754 remove print in medbullets.py marcry 2025-05-08 05:04:47 +0000
  • 021c0d896a fix dataset-index.yml huihui 2025-05-08 05:02:36 +0000
  • c40c52b221 resolve dataset-index conflict marcry 2025-05-08 05:00:28 +0000
  • 23fb3c7fa9 resove dataset-index conflicts marcry 2025-05-08 04:54:39 +0000
  • 5e8bfee3f4 fix dataset-index & use official llm_judge_postprocess huihui 2025-05-08 04:31:11 +0000
  • 85ecf3c932 use official llmjudge_postprocess huihui 2025-05-08 04:23:20 +0000
  • 6ff36c1b1f use official llmjudge postprocess huihui 2025-05-08 04:20:10 +0000
  • b9aa1c17f7 fix comments &dataset-index yml huihui 2025-05-08 04:01:48 +0000
  • b3aa62ba5c fix dataset-index huihui 2025-05-08 03:49:18 +0000
  • b6d1bc60dc Update datasets_info & hf_path xuxuxuxuxuxjh 2025-05-07 23:05:36 +0800
  • adc33cd4f8 revise class name & remove csv file & add dataset-index.yml info marcry 2025-05-07 14:35:48 +0000
  • cc7d39ecdd remove csv file marcry 2025-05-07 14:11:19 +0000
  • b9f025a902 revise config file & remove csv file & add dataset info to dataset-index.yml marcry 2025-05-07 14:06:19 +0000
  • 2a40298950 MMLU_Pro Biomedical Version Support Flaick 2025-05-07 11:38:25 +0000
  • 1359cfacea HLE Biomedical version support Flaick 2025-05-07 11:30:37 +0000
  • e8ce8f82c7
    Merge 3e63508bd0 into ba0e32292c Kun Yuan 2025-05-07 11:24:34 +0000
  • 3e63508bd0 Merge branch 'hle_biomed' of github.com:Flaick/opencompass into hle_biomed Flaick 2025-05-07 11:23:57 +0000
  • 6a13d32b57 new HLE_biomed support Flaick 2025-05-07 11:22:13 +0000
  • 93c52fb97a
    Merge 6201c3cc84 into ba0e32292c Kun Yuan 2025-05-07 17:01:51 +0800
  • ba0e32292c
    [Feature] Support InternSandbox (#2049) Dongsheng Zhu 2025-05-07 16:42:09 +0800
  • 43b2c4ed76
    [Fix] Update lawbench data path (#2037) 谢昕辰 2025-05-07 16:18:43 +0800
  • d62b69aaef
    [Fix] Fix InternVL model config (#2068) Dongsheng Zhu 2025-05-07 15:51:18 +0800
  • af8432e1d6
    [Update] OpenAI SDK model reasoning content (#2078) Linchen Xiao 2025-05-07 14:06:40 +0800
  • 024434dbdd update MaiziXiao 2025-05-07 05:51:04 +0000
  • 442c829e0f Add PubMedQA & ScienceQA & ClinicBench xuxuxuxuxuxjh 2025-05-07 13:38:15 +0800
  • c66423fc99 fix hash huihui 2025-05-07 05:28:48 +0000
  • bc9ba0126f fix hash huihui 2025-05-07 05:27:37 +0000
  • dfa157c74d fix hash huihui 2025-05-07 05:24:14 +0000
  • a240f979ab update MaiziXiao 2025-05-07 03:57:55 +0000
  • 1673501a08 update MaiziXiao 2025-05-07 03:46:49 +0000
  • ddc9cc0afb
    [Add] add a config to Judge dataset all (#2077) bittersweet1999 2025-05-07 10:57:23 +0800
  • 1af2c2cbdf add judgedatasetall bittersweet1999 2025-05-07 02:51:47 +0000
  • 66f45af8f0 add judgedatasetall bittersweet1999 2025-05-07 02:47:41 +0000
  • a77a040ba7 add judgedatasetall bittersweet1999 2025-05-07 02:44:58 +0000
  • 4c90cf9d79 Add PubMedQA & ScienceQA & ClinicBench xuxuxuxuxuxjh 2025-05-07 01:55:18 +0800
  • b65b2789fe
    Merge branch 'open-compass:main' into hle_biomed Kun Yuan 2025-05-07 01:13:54 +0800
  • d1cc275f03 rename files Flaick 2025-05-07 01:12:00 +0800
  • 4f8c1a2078 revise name: PromptCBLUE:Life Science dataset root 2025-05-06 15:05:05 +0000
  • aedfbcc809 revise name:Add Lifescience Sub-set Support for MMLU & SciEval (datasets + configs + loader) root 2025-05-06 14:56:34 +0000
  • 6a3b550881 revise gen name marcry 2025-05-06 13:19:27 +0000
  • e7b04afa3c revise gen name marcry 2025-05-06 13:11:44 +0000
  • 5ee365593e revise gen name marcry 2025-05-06 13:09:31 +0000
  • 36d085d3e2
    Merge branch 'open-compass:main' into main bittersweet1999 2025-05-06 16:58:12 +0800
  • a36d34f135 add hash huihui 2025-05-06 04:36:57 +0000
  • 61b52844be BaseInferencer batch_size and max_seq_len cast to int Francesco Bertolotti 2025-05-05 19:55:21 +0200
  • 41df5e5604 PromptCBLUE:Life Science dataset root 2025-05-04 12:00:36 +0000
  • 72ec9ba289 MedCal_Bench huihui 2025-05-02 16:35:27 +0000
  • c6e1955cae MedCalc_Bench huihui 2025-05-02 16:33:25 +0000
  • 9db1fea758 CARDBiomedBench huihui 2025-05-02 12:55:47 +0000
  • 272efd7d25 SciKnowEval huihui 2025-05-02 12:08:58 +0000
  • de6e4909bd first huihui 2025-05-02 11:57:24 +0000
  • 111c584049 phybench yufeng zhao 2025-04-30 12:34:35 +0000
  • de10fb1194 hybench yufeng zhao 2025-04-30 11:59:02 +0000
  • a159b03c81 phy_bench_newest yufeng zhao 2025-04-30 12:23:38 +0000
  • 71173c4fef phybench yufeng zhao 2025-04-30 12:29:54 +0000
  • 7f0fd50c02 internvl fix Dongsheng Zhu 2025-04-30 10:11:56 +0000
  • 37cbaf8d92
    [Add] Add Judgerbenchv2 (#2067) bittersweet1999 2025-04-30 17:12:34 +0800
  • 9ef97268a2
    Update __init__.py bittersweet1999 2025-04-30 17:05:43 +0800
  • 36ab6e25e6
    Merge branch 'main' into judgerbenchv2 bittersweet1999 2025-04-30 17:01:23 +0800
  • 18d415847f add judgerbenchv2 bittersweet1999 2025-04-30 08:59:46 +0000
  • b6148aa198
    add Judgebench (#2066) Taolin Zhang 2025-04-30 15:01:10 +0800
  • f6c519e283 add judgebench taolinzhang 2025-04-30 06:34:00 +0000
  • ea413544e2 add judgebench taolinzhang 2025-04-30 06:33:26 +0000
  • ad466bb658 add judgebench taolinzhang 2025-04-30 06:31:42 +0000
  • f931d2ca94 first huihui 2025-04-30 05:29:40 +0000
  • 44aadf627b first huihui 2025-04-30 05:29:04 +0000