Commit Graph

  • dfa26b24bd first huihui 2025-04-30 05:20:38 +0000
  • 63f80134c8 Add Datasets: MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 Yejin0111 2025-04-29 15:00:28 +0000
  • f953ad3178 add dataset files marcry 2025-04-29 09:23:42 +0000
  • 48ac21f371 support nejm ai benchmark marcry 2025-04-29 09:16:25 +0000
  • b0b209e443
    Merge branch 'open-compass:main' into main bittersweet1999 2025-04-29 16:38:35 +0800
  • 527a80947b
    [Add] Add writingbench (#2028) bittersweet1999 2025-04-29 16:29:32 +0800
  • d4daa991ba add writingbench bittersweet1999 2025-04-29 08:27:39 +0000
  • 0d1e403cbd add writingbench bittersweet1999 2025-04-29 08:24:50 +0000
  • a5ded53937 Add ClinicBench xuxuxuxuxuxjh 2025-04-29 14:09:50 +0800
  • 642cd2839b fix cache hardcode in partitioners BIGWangYuDong 2025-04-28 18:20:45 +0800
  • 86503894f9 api: add tmp_dir in runners to avoid hardcode BIGWangYuDong 2025-04-28 18:10:32 +0800
  • e2f80574ec style: pass all formatting hooks (yapf & quote fixer) root 2025-04-28 08:03:47 +0000
  • f794feb03d dataset_index_add Dongsheng Zhu 2025-04-28 02:31:53 +0000
  • b1197aa108 dataset_index Dongsheng Zhu 2025-04-28 02:30:21 +0000
  • 7b47afb757 fix llm judge evaluator import and docs Jucheng Hu 2025-04-27 17:13:42 +0800
  • f8e41dfeb4 [Docs] fix needlebench examples Mor-Li 2025-04-27 16:36:59 +0800
  • 8c74e6a39e
    add RMB Bench (#2056) Taolin Zhang 2025-04-27 16:26:01 +0800
  • 000db832f7 add rmb datasets taolinzhang 2025-04-27 08:16:40 +0000
  • 3b83a5f4a3 add rmb datasets taolinzhang 2025-04-27 08:14:42 +0000
  • 9d6f3a4866 add rmb datasets taolinzhang 2025-04-27 08:10:29 +0000
  • bb056099a0 Add Medbullets data folder for benchmark support marcry 2025-04-26 05:53:27 +0000
  • 890f051609 update docs typo Mor-Li 2025-04-26 13:38:32 +0800
  • 831713ba5d update docs Mor-Li 2025-04-26 13:35:45 +0800
  • ca1865cdac update docs typo Mor-Li 2025-04-26 13:34:12 +0800
  • 7297a00181 update bilingual needlebench docs Mor-Li 2025-04-26 13:24:56 +0800
  • 2602c1ad82 update needlebench docs for chinese Mor-Li 2025-04-26 13:18:10 +0800
  • ff7927ea48 fix needlebench pic format Mor-Li 2025-04-26 12:07:57 +0800
  • 7ad5116620 support medmcqa and medbullets benchmark marcry 2025-04-26 03:53:08 +0000
  • f1182e82f7
    Merge branch 'open-compass:main' into main Deadwalk 2025-04-26 11:32:27 +0800
  • e373a2bbcf Fix Bug:Fix GAIA datasets lint error Deadwalk 2025-04-26 11:30:24 +0800
  • d049098303 update config Mor-Li 2025-04-26 01:43:25 +0800
  • 6201c3cc84 MMLU_pro support biomed subset Flaick 2025-04-25 10:51:59 +0000
  • 9ebb5968c1 internsandbox Dongsheng Zhu 2025-04-25 10:38:40 +0000
  • 70e0dc1674 fix lint Mor-Li 2025-04-25 18:14:57 +0800
  • 3914aef997 [Update] Update Gemma, Oreal, Qwen Config Mor-Li 2025-04-25 18:09:59 +0800
  • 12597edea6 hle biomed test Flaick 2025-04-25 10:09:33 +0000
  • 5af8fb2061 update lint Mor-Li 2025-04-25 17:55:04 +0800
  • 86184fd277 fix lint for E501 Mor-Li 2025-04-25 17:53:28 +0800
  • 5b1a5fa596 fix lint Mor-Li 2025-04-25 17:51:03 +0800
  • 081c185b8f update needlebench plot Mor-Li 2025-04-25 17:46:45 +0800
  • bd17b3c984 update needlebench Mor-Li 2025-04-25 17:42:47 +0800
  • d26e808c9f
    Merge branch 'main' into SeedBench chenzihong 2025-04-25 17:04:25 +0800
  • be86ebcb4b PubMedQA & ScienceQA xuxuxuxuxuxjh 2025-04-25 14:50:45 +0800
  • 14311ec0b7 PubMedQA & ScienceQA xuxuxuxuxuxjh 2025-04-25 14:47:20 +0800
  • e8bc8c1e8c
    [Bug] Concat OpenaiSDK reasoning content (#2041) Linchen Xiao 2025-04-25 14:10:33 +0800
  • 66cf9be640 update MaiziXiao 2025-04-25 04:34:07 +0000
  • 28fa6fc63c update MaiziXiao 2025-04-25 03:55:21 +0000
  • 50bcffc4f7 [Feature] Support AntFinix LLM xsq2060 2025-04-25 00:22:20 +0800
  • 6407994f67 [Bug] Concat OpenaiSDK reasoning content MaiziXiao 2025-04-24 09:48:43 +0000
  • e2826a6f53 [Bug] Concat OpenaiSDK reasoning content MaiziXiao 2025-04-24 09:47:25 +0000
  • 7cffdf1cfb ScienceQA xuxuxuxuxuxjh 2025-04-24 00:04:21 +0800
  • 97010dc4ce
    [Update] Update dataset repeat concatenation (#2039) Junnan Liu 2025-04-23 16:16:28 +0800
  • 119fc1a47c fix dataset repeat by concatenating jnanliu 2025-04-23 07:56:57 +0000
  • dcbf899369
    [Bug] Fix SmolInsturct logger import (#2036) Linchen Xiao 2025-04-23 11:10:30 +0800
  • 51d37db5e9 --dev=fix lawbench evaluation xiexinch 2025-04-23 10:37:20 +0800
  • 9bb9605b37 update MaiziXiao 2025-04-23 02:34:38 +0000
  • d0534e308e replace the model name for new bailing christopher.dy 2025-04-21 10:37:00 +0800
  • bf74f26603
    [Update] Safe SmolInstruct meteor calculation (#2033) Linchen Xiao 2025-04-22 18:27:48 +0800
  • a63b2277b1 update MaiziXiao 2025-04-22 10:20:49 +0000
  • bd38d942c3 internsandbox init Dongsheng Zhu 2025-04-22 08:03:04 +0000
  • 455bb05d1b
    [Update] Update dataset configs (#2030) Linchen Xiao 2025-04-21 18:55:06 +0800
  • 190677fa25 Fix lint MaiziXiao 2025-04-21 09:51:55 +0000
  • b455d08891 Merge remote-tracking branch 'upstream/main' into ai4s_dataset_update MaiziXiao 2025-04-21 09:51:05 +0000
  • 26a6ab9d8d [Update] Update dataset configs MaiziXiao 2025-04-21 09:40:58 +0000
  • c69110361b
    [Add] add rewardbench (#2029) Taolin Zhang 2025-04-21 17:18:51 +0800
  • 00c3ec428e add rewardbench taolinzhang 2025-04-21 09:17:21 +0000
  • 99124aefd0 add rewardbench taolinzhang 2025-04-21 09:00:52 +0000
  • a2093a81ef
    [Dataset] Matbench (#2021) JuchengHu 2025-04-21 15:50:47 +0800
  • 2ded84a70c
    Merge branch 'main' into SeedBench chenzihong 2025-04-21 12:05:30 +0800
  • e6f4276412 add writingbench bittersweet1999 2025-04-18 10:42:29 +0000
  • b2da1c08a8
    [Dataset] Add SmolInstruct, Update Chembench (#2025) Linchen Xiao 2025-04-18 17:21:29 +0800
  • b93afe7764 add writingbench bittersweet1999 2025-04-18 09:21:01 +0000
  • 560b148b23 update MaiziXiao 2025-04-18 08:38:53 +0000
  • d74bb2ac7f update MaiziXiao 2025-04-18 07:27:13 +0000
  • 20e3759115 fix lint Myhs-phz 2025-04-18 07:18:18 +0000
  • 8b0d8f56d8 fix Myhs-phz 2025-04-18 07:01:40 +0000
  • 1eb3933f9b fix data load Myhs-phz 2025-04-18 06:46:39 +0000
  • 3460accfe9 update MaiziXiao 2025-04-17 11:21:14 +0000
  • a46c209eb4 Add dataset metadata MaiziXiao 2025-04-17 10:08:31 +0000
  • 7df956e8f9 [Dataset] Add SmolInstruct, Update Chembench MaiziXiao 2025-04-17 10:04:49 +0000
  • 5fee3b237a
    Merge branch 'open-compass:main' into main bittersweet1999 2025-04-17 15:43:19 +0800
  • e335b29e12 fix: fix typo chenzihong-gavin 2025-04-15 15:34:06 +0800
  • 332acdf448 Merge branch 'SeedBench' of https://github.com/ChenZiHong-Gavin/opencompass into SeedBench chenzihong-gavin 2025-04-15 14:26:37 +0800
  • 39b34d6488 fix: delete unnecessary code chenzihong-gavin 2025-04-15 14:26:27 +0800
  • cbfac1ea4c
    Merge branch 'open-compass:main' into SeedBench chenzihong 2025-04-15 13:19:33 +0800
  • 65ff602cf5
    [Update] Fix LLM Judge metrics cacluation & Add reasoning content concat to OpenAI SDK Linchen Xiao 2025-04-15 11:33:16 +0800
  • a484de0f25 update MaiziXiao 2025-04-15 03:19:04 +0000
  • c9ea024c67 fix: fix load function for SeedBenchDataset chenzihong-gavin 2025-04-15 03:15:00 +0800
  • 8000375069
    Merge branch 'main' into SeedBench chenzihong 2025-04-14 21:22:20 +0800
  • db04df78d4 refactor: delete unnecessary comment chenzihong-gavin 2025-04-14 21:20:29 +0800
  • 75e7834b59
    [Feature] Add Datasets: ClimateQA,Physics (#2017) Myhs_phz 2025-04-14 20:18:47 +0800
  • f9b1636598 docs: add README for SeedBench chenzihong-gavin 2025-04-14 19:51:01 +0800
  • 7d2d663ae3 fix Myhs-phz 2025-04-14 11:35:42 +0000
  • 6e1e1edcde fix Myhs-phz 2025-04-14 11:34:19 +0000
  • 02becd2261 fix Myhs-phz 2025-04-14 11:30:25 +0000
  • da9b77be1c fix Myhs-phz 2025-04-14 11:25:33 +0000
  • 7b99ffe823 fix dataset path Jucheng Hu 2025-04-14 18:02:52 +0800
  • 01c97dd32e add support for matbench Jucheng Hu 2025-04-14 17:55:43 +0800
  • dae700e0b0 [Dataset] Add SeedBench Dataset chenzihong-gavin 2025-04-14 14:23:29 +0800
  • 2223c6e915
    Merge 8e6d2ab7e6 into 6a6a1a5c0b Haoyu Zhang. 2025-04-14 11:09:16 +0800