jnanliu
a4c42b3cb3
Merge branch 'main' of https://github.com/open-compass/opencompass into general-gpass
2025-03-03 02:41:14 +00:00
Junnan Liu
73c80953c6
[Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator ( #1886 )
...
* support dataset repeat and g-pass compute for each evaluator
* fix pre-commit errors
* delete print
* delete gpassk_evaluator and fix potential errors
* change `repeat` to `n`
* fix `repeat` to `n` in openicl_eval
* update doc for multi-run and g-pass
* update latex equation in doc
* update eng doc for multi-run and g-pass
* update datasets.md
* update datasets.md
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation in zh_cn user_guides
* mmodify pre-commit-zh-cn
* recover pre-commit and edit math expr in doc
* del [TIP]
* del cite tag in doc
* del extract_model param in livemathbench config
2025-02-26 19:43:12 +08:00
zhulinJulia24
6042b88e58
[CI] update dailytest sceduler and baseline's score( #1898 )
2025-02-26 19:04:01 +08:00
Linchen Xiao
bdb2d46f59
[Feature] Add general math, llm judge evaluator ( #1892 )
...
* update_doc
* update llm_judge
* update README
* update md file name
2025-02-26 15:08:50 +08:00
jnanliu
32a8d81b1d
del extract_model param in livemathbench config
2025-02-26 06:39:12 +00:00
jnanliu
66b1c6c64c
del cite tag in doc
2025-02-26 04:23:30 +00:00
jnanliu
12f46044f0
Merge branch 'general-gpass' of https://github.com/jnanliu/opencompass into general-gpass
2025-02-26 04:01:30 +00:00
jnanliu
97594676e8
del [TIP]
2025-02-26 04:01:05 +00:00
Junnan Liu
bb4d53e0cb
Merge branch 'main' into general-gpass
2025-02-26 11:56:45 +08:00
jnanliu
46cd631e13
recover pre-commit and edit math expr in doc
2025-02-26 03:53:10 +00:00
Songyang Zhang
fd6fbf01a2
[Update] Support AIME-24 Evaluation for DeepSeek-R1 series ( #1888 )
...
* Update
* Update
* Update
* Update
2025-02-25 20:34:41 +08:00
jnanliu
830142ecfd
mmodify pre-commit-zh-cn
2025-02-25 09:47:21 +00:00
jnanliu
76381c94ee
fix multi-line equation in zh_cn user_guides
2025-02-25 09:41:36 +00:00
jnanliu
7fc189d715
fix multi-line equation
2025-02-25 09:40:29 +00:00
jnanliu
fea7411820
fix multi-line equation
2025-02-25 09:39:34 +00:00
jnanliu
6a6ac3c7f7
fix multi-line equation
2025-02-25 09:35:25 +00:00
jnanliu
a7d15f8aa7
fix multi-line equation
2025-02-25 09:31:51 +00:00
jnanliu
516313d42e
fix multi-line equation
2025-02-25 09:30:21 +00:00
jnanliu
fed2df4c3e
fix multi-line equation
2025-02-25 09:29:16 +00:00
Junnan Liu
22a33d8759
[Update] Update LiveMathBench Hard Configs ( #1826 )
...
* support G-Pass@k and livemathbench
* fix bugs
* fix comments of GPassKEvaluator
* update saved details of GPassKEvaluator
* update saved details of GPassKEvaluator
* fix eval api configs & update openai_api for ease of debugging
* update huggingface path
* fix method name of G-Pass@k
* fix default value of eval_model_name
* refactor G-Pass@k evaluator
* log generation params for each backend
* fix evaluation resume
* add notimplementerror
* update livemathbench-hard configs
* remove max_out_len from livemathbench_hard_greedy_gen_9befbf.py
* remove max_out_len from livemathbench_hard_gen_9befbf.py
* rename livemathbench_hard_gen_9befbf.py to livemathbench_hard_gen_353ae7.py
* rename livemathbench_hard_greedy_gen_9befbf.py to livemathbench_hard_greedy_gen_353ae7.py
* update livemathbench_gen_9befbf.py
* remove whitespace
* upload livemathbench hard configs
2025-02-25 17:24:36 +08:00
Junnan Liu
91111ce9ec
update datasets.md
2025-02-25 17:17:39 +08:00
Junnan Liu
2915d77045
update datasets.md
2025-02-25 17:17:07 +08:00
jnanliu
c1fe59d015
update eng doc for multi-run and g-pass
2025-02-25 09:15:08 +00:00
jnanliu
8ebb8a5d11
update latex equation in doc
2025-02-25 09:05:53 +00:00
jnanliu
4e07fcbfac
update doc for multi-run and g-pass
2025-02-25 08:21:21 +00:00
jnanliu
4e63ebbf0c
fix repeat
to n
in openicl_eval
2025-02-24 08:14:08 +00:00
jnanliu
b0330ef1c6
change repeat
to n
2025-02-24 08:11:27 +00:00
Dongsheng Zhu
465e93e10e
[Update] Academic bench llm judge update ( #1876 )
...
* BigCodeBench update
* update LCBench
* update LCBench 2
* update code
* academicBench update
* academic bench ifeval&math update
* generic_llmjudge_aime_academic_postprocess delete
* aime delete
* postprocessors update
* ifeval delete
* update work_dir
* linting
* linting double-quote-string-fixer
* r1-distill out_len update
* fix lint
---------
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-02-24 15:45:24 +08:00
jnanliu
2349fcff2c
delete gpassk_evaluator and fix potential errors
2025-02-24 06:25:17 +00:00
jnanliu
6d5a996deb
delete print
2025-02-23 03:25:58 +00:00
jnanliu
762b66d740
fix pre-commit errors
2025-02-23 03:14:13 +00:00
jnanliu
8def69369a
support dataset repeat and g-pass compute for each evaluator
2025-02-23 03:05:42 +00:00
Junnan Liu
046b6f75c6
[Update] Update Greedy Config & README of LiveMathBench ( #1862 )
...
* support omni-math
* update config
* upload README
* Delete opencompass/configs/datasets/omni_math/__init__.py
* update greedy config & README of LiveMathBench
* update intro for max_out_len
* rename livemathbench greedy confi
* delete greedy config
---------
Co-authored-by: liushz <qq1791167085@163.com>
2025-02-20 19:47:04 +08:00
Linchen Xiao
d7daee6e25
[Update] OpenAI model update, bigcodebench update ( #1879 )
...
* [Update] Openai model update, bigcodebench update
* update
2025-02-20 19:33:25 +08:00
Linchen Xiao
27c916661d
[Feature] Math Verify with model post_processor ( #1881 )
...
* update
* [Feature] Update model post_processor
* update
* update
* update
2025-02-20 19:32:12 +08:00
zhulinJulia24
bc22749fd8
[CI] update daily test scores ( #1870 )
...
* update
* Update daily-run-test.yml
* Update dlc.py
2025-02-20 14:08:18 +08:00
bittersweet1999
f407930475
[Feature] Support subjective evaluation for reasoning model ( #1868 )
...
* fix pip version
* fix pip version
* add subeval for reasoning model
* add subeval for reasoning model
* update configs
* update config
* update config
* update config
* update files
2025-02-20 12:19:46 +08:00
Myhs_phz
68a9838907
[Feature] Add list of supported datasets at html page ( #1850 )
...
* feat dataset-index.yml and stat.py
* fix
* fix
* fix
* feat url of paper and config file
* doc all supported dataset list
* docs zh and en
* docs README zh and en
* docs new_dataset
* docs new_dataset
2025-02-14 16:17:30 +08:00
Dongsheng Zhu
3fd8b4e0cd
[Update] Update BigCodeBench & LCBench load path ( #1857 )
...
* BigCodeBench update
* update LCBench
* update LCBench 2
* update code
2025-02-08 15:15:47 +08:00
Pablo Hinojosa
9c2e6a192c
[Fix] Update broken links in README.md ( #1852 )
2025-02-07 15:41:08 +08:00
zhulinJulia24
ffc04cf650
[CI] Update daily-run-test.yml ( #1854 )
2025-02-07 14:40:16 +08:00
Linchen Xiao
862bf78464
[Demo] Internlm3 math500 thinking demo ( #1846 )
...
* [Demo] Add demo for Internlm3 math500 thinking
* [Demo] Add demo for Internlm3 math500 thinking
* update max_out_len
* update start instruction
2025-01-24 14:56:41 +08:00
Shudong Liu
412199f802
[Feature] Support OlympiadBench Benchmark ( #1841 )
...
* Support OlympiadBench Benchmark
* Support OlympiadBench Benchmark
* Support OlympiadBench Benchmark
* update dataset path
* Update olmpiadBench
* Update olmpiadBench
* Update olmpiadBench
---------
Co-authored-by: liushz <qq1791167085@163.com>
2025-01-24 10:00:01 +08:00
Junnan Liu
70f2c963d3
[Feature] Support Omni-Math ( #1837 )
...
* support omni-math
* update config
* upload README
* Delete opencompass/configs/datasets/omni_math/__init__.py
---------
Co-authored-by: liushz <qq1791167085@163.com>
2025-01-23 18:36:54 +08:00
Linchen Xiao
35ec307c6b
[Bump] Bump version to 0.4.0 ( #1838 )
2025-01-22 11:41:46 +08:00
Linchen Xiao
03415b2a66
[Fix] Update max_out_len logic for OpenAI model ( #1839 )
2025-01-21 15:46:14 +08:00
Linchen Xiao
a6193b4c02
[Refactor] Code refactoarization ( #1831 )
...
* Update
* fix lint
* update
* fix lint
2025-01-20 19:17:38 +08:00
Jishnu Nair
ffdc917523
[Doc] Installation.md update ( #1830 )
2025-01-17 11:08:09 +08:00
Myhs_phz
70da9b7776
[Update] Update method to add dataset in docs ( #1827 )
...
* create new branch
* docs new_dataset.md zh
* docs new_dataset.md zh and en
2025-01-17 11:07:19 +08:00
Linchen Xiao
531643e771
[Feature] Add support for InternLM3 ( #1829 )
...
* update
* update
* update
* update
2025-01-16 14:28:27 +08:00