Commit Graph

  • 520bf5867d Add AIME2025 oss info liushz 2025-03-12 10:23:46 +0000
  • db9809ca69 Merge branch 'main' of github.com:open-compass/opencompass into tmp_olmpbench liushz 2025-03-12 10:22:22 +0000
  • 25b25c8b78 [Feature] Support eval WildBench-Score Hoter Young 2025-02-15 17:00:36 +0800
  • 4b09860f4a [Fix] Enalrge token length of deepseek_v3_api Hoter Young 2025-03-12 16:48:22 +0800
  • 20c27b9ec8 [Feature] Modify HuSimpleQA Summarizer Hoter Young 2025-03-12 16:46:59 +0800
  • 6b1671e029 [Chores] Change datasets path Hoter Young 2025-03-12 16:15:46 +0800
  • f478f842cd feat phi_4 Myhs-phz 2025-03-12 08:47:27 +0000
  • 4289e3914e fix bugs when evaluate calm panzhuoshi 2025-03-12 15:44:01 +0800
  • 8709af3d25 fix Myhs-phz 2025-03-12 06:02:14 +0000
  • bc2969dba8
    [Feature] Add support for BBEH dataset (#1925) Yufeng Zhao 2025-03-12 10:53:31 +0800
  • a5abe18aa3 results yufeng zhao 2025-03-11 12:05:02 +0000
  • 4b9838fb94 feat datasetrefine drop Myhs-phz 2025-03-11 11:59:05 +0000
  • 59e49aedf1
    [Feature] Support SuperGPQA (#1924) Kangreen 2025-03-11 19:32:08 +0800
  • fd1c1769b4 update MaiziXiao 2025-03-11 11:13:32 +0000
  • 47f48f605f update MaiziXiao 2025-03-11 09:44:55 +0000
  • 4322b4a470 fix lint MaiziXiao 2025-03-11 09:37:39 +0000
  • e403fd21be
    [Fix] Fix math-verify evaluator (#1917) Linchen Xiao 2025-03-11 17:35:04 +0800
  • 7f31ef7357 fix lint MaiziXiao 2025-03-11 09:32:35 +0000
  • 7938f352d7 Add Readme MaiziXiao 2025-03-11 09:19:17 +0000
  • 7c4020bdb5 Add Readme MaiziXiao 2025-03-11 09:18:28 +0000
  • 95ff8cbb1f remove unnecessary code MaiziXiao 2025-03-11 09:10:41 +0000
  • 14dcdaa0de remove unnecessary code MaiziXiao 2025-03-11 09:07:47 +0000
  • 37ac8a1798 Merge branch 'main' of github.com:open-compass/opencompass into tmp_olmpbench liushz 2025-03-11 08:49:28 +0000
  • e313f45bac xx zichuan 2025-03-10 20:37:54 +0800
  • f95064dcda removeprint yufeng zhao 2025-03-10 06:47:11 +0000
  • d99179d0ef fix_smallbugs_bbeh yufeng zhao 2025-03-10 04:54:28 +0000
  • 9f491fa2d1 bbeh yufeng zhao 2025-03-10 04:25:52 +0000
  • 1f0c5cbb5f bbeh yufeng zhao 2025-03-10 04:24:52 +0000
  • 89bbf13f5a Merge branch 'main' of https://github.com/kangreen0210/opencompass kangreen0210 2025-03-07 16:30:36 +0000
  • cbf84fb33c
    [Feature] Update LLM Evaluation for MMLU-Pro (#1923) Linchen Xiao 2025-03-07 21:01:20 +0800
  • b67655c61d feat qwq-32b Myhs-phz 2025-03-07 10:29:23 +0000
  • 570c30cf1b
    [Fix] Fix CLI option for results persistence (#1920) Myhs_phz 2025-03-07 18:24:30 +0800
  • 81f193b775 fix Myhs-phz 2025-03-07 10:19:13 +0000
  • c696a91e8e update MaiziXiao 2025-03-07 10:10:04 +0000
  • 47ab0b4dfd 测试 zichuan 2025-03-07 18:00:48 +0800
  • 4e40563462 support supergpqa mkj3085003 2025-03-07 09:36:00 +0000
  • 583ee5ff75 fix Myhs-phz 2025-03-06 08:56:25 +0000
  • 8c63c181f0 Merge branch 'main' of github.com:open-compass/opencompass into tmp_olmpbench liushz 2025-03-06 08:13:44 +0000
  • 30aaadbc5d fix Myhs-phz 2025-03-06 08:13:40 +0000
  • 31cbd1c795 fix Myhs-phz 2025-03-06 08:09:45 +0000
  • 5922bfed26 update MaiziXiao 2025-03-05 11:38:16 +0000
  • 277d7946f5
    [Fix] Fix typo in deepseed_r1.md (#1916) Shudong Liu 2025-03-05 19:37:22 +0800
  • b835437c4c update MaiziXiao 2025-03-05 11:36:06 +0000
  • 1a67b39974 update MaiziXiao 2025-03-05 11:35:11 +0000
  • 09afcd0c7c fix typo in deepseed_r1.md sudanl 2025-03-05 10:58:04 +0000
  • 1585c0adbe
    [Feature] Evaluation Results Persistence (#1894) Myhs_phz 2025-03-05 18:33:34 +0800
  • 54324657f0
    [Docs] Results persistance (#1908) Myhs_phz 2025-03-05 18:23:54 +0800
  • afec558845 doc Myhs-phz 2025-03-05 09:48:31 +0000
  • 2cd27813ef lint Myhs-phz 2025-03-05 09:46:14 +0000
  • 28d179c540 style function name Myhs-phz 2025-03-05 09:43:31 +0000
  • 300c567b14 fix Myhs-phz 2025-03-05 09:40:39 +0000
  • 28cdf4e776 fix Myhs-phz 2025-03-05 09:30:39 +0000
  • a058b9b493 fix Myhs-phz 2025-03-05 08:51:44 +0000
  • d8a50ba5ff doc Myhs-phz 2025-03-05 07:19:13 +0000
  • cbba0a876f fix subjective processing Myhs-phz 2025-03-05 06:27:45 +0000
  • 37b894d4a1 [Model] Add new model: Ola Mr.Li 2025-03-04 23:10:00 +0800
  • fff2d51440
    [Update] Code evaluation alignment (#1909) Dongsheng Zhu 2025-03-04 18:49:38 +0800
  • 5547fd1592
    [Bump] Bump version to 0.4.1 0.4.1 Linchen Xiao 2025-03-04 18:26:14 +0800
  • 60e3b40267 lint Myhs-phz 2025-03-04 10:17:07 +0000
  • b4e2a0692f doc Myhs-phz 2025-03-04 10:15:55 +0000
  • 5c7b28420d lint yapf Dongsheng Zhu 2025-03-04 09:36:52 +0000
  • ff50937869 lint_ Dongsheng Zhu 2025-03-04 09:18:32 +0000
  • a7070ba2d2 lint Dongsheng Zhu 2025-03-04 09:16:02 +0000
  • 9d18c55095 update MaiziXiao 2025-03-04 09:10:20 +0000
  • 198c08632e
    [Feature] Add HLE (Humanity's Last Exam) dataset (#1902) liushz 2025-03-04 16:42:37 +0800
  • 63c7970937 bigcodebench update Dongsheng Zhu 2025-03-04 08:38:24 +0000
  • 175d01f44c doc Myhs-phz 2025-03-04 05:32:42 +0000
  • 64644b37b4 feat persistance.md Myhs-phz 2025-03-04 03:51:17 +0000
  • c84bc18ac1
    [Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard (#1899) Songyang Zhang 2025-03-03 18:56:11 +0800
  • a69d02f746 Add HLE dataset liushz 2025-03-03 10:54:32 +0000
  • 5a2462a26f Add HLE dataset liushz 2025-03-03 10:52:30 +0000
  • 30bb39076f fix Myhs-phz 2025-03-03 10:41:55 +0000
  • 9e06ab535a fix and lint Myhs-phz 2025-03-03 10:38:44 +0000
  • f0809fe6f6
    [Update] Fix Hard Configs With General GPassK (#1906) Junnan Liu 2025-03-03 18:17:15 +0800
  • 6a573f671b
    [Fix] Fix compatible issue Linchen Xiao 2025-03-03 15:35:57 +0800
  • fb531e1458 update MaiziXiao 2025-03-03 07:29:31 +0000
  • df64ae1997 fix Myhs-phz 2025-03-03 05:03:42 +0000
  • 8a52351e41 feat Myhs-phz 2025-03-03 04:54:54 +0000
  • 84ade2ef3c update oss md5 Dongsheng Zhu 2025-03-03 03:55:28 +0000
  • 4a2637cc58 fix livemathbench hard configs jnanliu 2025-03-03 02:42:53 +0000
  • a4c42b3cb3 Merge branch 'main' of https://github.com/open-compass/opencompass into general-gpass jnanliu 2025-03-03 02:41:14 +0000
  • 7ae59ef7f9 code alignment Dongsheng Zhu 2025-03-03 02:35:12 +0000
  • 84e7a7b793 Update DeepSeek-R1 example zhangsongyang 2025-02-28 08:54:52 +0000
  • 9843e3c63c Update DeepSeek-R1 example zhangsongyang 2025-02-28 08:52:22 +0000
  • 34cc0a5f5f Add HLE dataset liushz 2025-02-28 07:55:17 +0000
  • c3ad4b5603 feat result_station.py and lint Myhs-phz 2025-02-28 06:38:25 +0000
  • 8103c0d245 Update DeepSeek-R1 example zhangsongyang 2025-02-27 16:02:20 +0000
  • 2aaab41dc9 feat save_to_station Myhs-phz 2025-02-27 14:44:24 +0000
  • 01041af484 Merge branch 'main' of github.com:open-compass/opencompass into tmp_olmpbench liushz 2025-02-27 10:00:30 +0000
  • 61afff8836 add h2o@infinitebench implementation ziyang zhang 2025-02-27 14:22:07 +0800
  • ba7163ce2e Update zhangsongyang 2025-02-27 05:06:53 +0000
  • effddff840 Update zhangsongyang 2025-02-27 04:47:50 +0000
  • ff621ddb39 [Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard with LLM Verify zhangsongyang 2025-02-26 15:29:19 +0000
  • 73c80953c6
    [Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator (#1886) Junnan Liu 2025-02-26 19:43:12 +0800
  • 6042b88e58
    [CI] update dailytest sceduler and baseline's score(#1898) zhulinJulia24 2025-02-26 19:04:01 +0800
  • 45033bd413 update zhulinJulia24 2025-02-26 17:11:00 +0800
  • bdb2d46f59
    [Feature] Add general math, llm judge evaluator (#1892) Linchen Xiao 2025-02-26 15:08:50 +0800
  • 42644a9a2b update md file name MaiziXiao 2025-02-26 06:58:25 +0000
  • 114cf1366c lint Myhs-phz 2025-02-26 06:42:28 +0000
  • 32a8d81b1d del extract_model param in livemathbench config jnanliu 2025-02-26 06:39:12 +0000