Linchen Xiao
f3220438bc
[BUMP] Bump version to 0.3.9 ( #1790 )
2024-12-31 16:52:47 +08:00
liushz
9c980cbc62
[Feature] Add LiveStemBench Dataset ( #1794 )
...
* [Fix] Fix vllm max_seq_len parameter transfer
* [Fix] Fix vllm max_seq_len parameter transfer
* Add livestembench dataset
* Add livestembench dataset
* Add livestembench dataset
* Update livestembench_gen_3e3c50.py
* Update eval_livestembench.py
* Update eval_livestembench.py
2024-12-31 15:17:39 +08:00
Songyang Zhang
fc0556ec8e
[Fix] Fix generic_llm_evaluator output_path ( #1798 )
...
* Fix output_path
* Add Logger
2024-12-31 13:05:05 +08:00
Alexander Lam
dc6035cfcb
[Feature] Added Bradley-Terry subjective evaluation
2024-12-31 11:01:23 +08:00
Songyang Zhang
98435dd98e
[Feature] Update o1 evaluation with JudgeLLM ( #1795 )
...
* Update Generic LLM Evaluator
* Update o1 style evaluator
2024-12-30 17:31:00 +08:00
Junnan Liu
8e8d4f1c64
[Feature] Support G-Pass@k and LiveMathBench ( #1772 )
...
* support G-Pass@k and livemathbench
* fix bugs
* fix comments of GPassKEvaluator
* update saved details of GPassKEvaluator
* update saved details of GPassKEvaluator
* fix eval api configs & update openai_api for ease of debugging
* update huggingface path
* fix method name of G-Pass@k
* fix default value of eval_model_name
* refactor G-Pass@k evaluator
* log generation params for each backend
* fix evaluation resume
* add notimplementerror
2024-12-30 16:59:39 +08:00
Linchen Xiao
42b54d6bb8
[Update] Add 0shot CoT config for TheoremQA ( #1783 )
2024-12-27 16:17:27 +08:00
bittersweet1999
357ce8c7a4
[Fix] Fix model summarizer abbr ( #1789 )
...
* fix pip version
* fix pip version
* fix model summarizer abbr
---------
Co-authored-by: root <bittersweet1999>
2024-12-27 14:45:08 +08:00
Linchen Xiao
ae9efb73ad
[CI] Pypi deploy workflow update ( #1786 )
2024-12-27 14:08:37 +08:00
Linchen Xiao
f103e90764
[CI] Update deploy python version ( #1784 )
2024-12-27 13:35:36 +08:00
zhulinJulia24
ebeb578fbf
[ci] remove daily step retry and update pr score ( #1782 )
...
[ci] remove daily step retry
2024-12-26 16:51:26 +08:00
Linchen Xiao
56eaac6d8f
[Update] Volc status exception handle ( #1780 )
...
* update
* update
2024-12-26 15:43:24 +08:00
zhulinJulia24
c48bbde26f
[ci] remove testcase into volc engine ( #1777 )
...
* update
* update
* update
* update
* update
* update
* updaste
* update
* update
* update
* update
* update
* update
* update
* updaste
* update
* update
* update
* update
* update
* update
* update
* update
* update
* Update daily-run-test.yml
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
2024-12-25 17:26:50 +08:00
Linchen Xiao
ebefffed61
[Update] Update OC academic 202412 ( #1771 )
...
* [Update] Update academic settings
* Update
* update
2024-12-19 18:07:34 +08:00
Chang Lan
d70100cdf2
[Update] Customizable tokenizer for RULER ( #1731 )
...
* Customizable tokenizer for RULER
* Relax requirements
2024-12-19 18:02:11 +08:00
Junnan Liu
499302857f
[Fix] Fix Local Runner Params Save Path ( #1768 )
...
* update local runner params save dir
* fix remove
* fix directory remove
* Fix *_params.py by uuid4
2024-12-19 16:07:34 +08:00
Mashiro
9a5adbde6a
[Fix] Fix lark reporter issue ( #1769 )
2024-12-18 19:33:06 +08:00
zhulinJulia24
111f817e04
[ci] add fullbench testcase ( #1766 )
...
add volc testcase
2024-12-18 13:24:28 +08:00
bittersweet1999
38dba9919b
[Fix] Fix Subjective summarizer order error ( #1767 )
...
* fix pip version
* fix pip version
* fix order error
2024-12-18 13:21:31 +08:00
Linchen Xiao
d593bfeac8
[Bump] Bump version to 0.3.8 ( #1765 )
...
* [Bump] Bump version to 0.3.8
* Update README.md
2024-12-17 19:17:18 +08:00
Linchen Xiao
eadbdcb4cb
[Update] Update requirement and deepseek configurations ( #1764 )
2024-12-17 10:16:47 +08:00
liushz
5c8e91f329
[Fix] Fix vllm max_seq_len parameter transfer ( #1745 )
...
* [Fix] Fix vllm max_seq_len parameter transfer
* [Fix] Fix vllm max_seq_len parameter transfer
* Update pr-run-test.yml
* Update pr-run-test.yml
---------
Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
2024-12-16 21:44:36 +08:00
Alexander Lam
1bd594fc62
[Feature] Added CompassArena-SubjectiveBench with Bradley-Terry Model ( #1751 )
...
* fix lint issues
* updated gitignore
* changed infer_order from random to double for the pairwise_judge.py (not changing for pairwise_bt_judge.py
* added return statement to CompassArenaBradleyTerrySummarizer to return overall score for each judger model
2024-12-16 13:41:28 +08:00
zhulinJulia24
aeded4c4db
add new dataset summerizer ( #1758 )
...
add new dataset summerizer
2024-12-13 09:50:43 +08:00
zhulinJulia24
a1c00cc8b7
[ci] add common_summarizer return ( #1724 )
...
* Update common_summarizer.py
* Update common_summarizer.py
2024-12-11 20:38:32 +08:00
liushz
c4ce0174fe
[Fix] Fix ChineseSimpleQA max_out_len ( #1757 )
...
* add chinese simpleqa config
* add chinese simpleqa config
* add chinese simpleqa config
* add chinese simpleqa config
* Update CsimpleQA
* Update CsimpleQA
* Update CsimpleQA
* Update CsimpleQA
* Update CsimpleQA
* Update CsimpleQA
* pdate Csimpleqa
* pdate Csimpleqa
* Update Csimpleqa
---------
Co-authored-by: 明念 <heyancheng.hyc@taobao.com>
2024-12-11 19:51:27 +08:00
Linchen Xiao
bd7b705be4
[Update] Update dataset configuration with no max_out_len ( #1754 )
2024-12-11 18:20:29 +08:00
OpenStellarTeam
1a5b3fc11e
Add Chinese SimpleQA config ( #1697 )
...
* add chinese simpleqa config
* add chinese simpleqa config
* add chinese simpleqa config
* add chinese simpleqa config
* Update CsimpleQA
* Update CsimpleQA
* Update CsimpleQA
* Update CsimpleQA
* Update CsimpleQA
* Update CsimpleQA
* pdate Csimpleqa
---------
Co-authored-by: 明念 <heyancheng.hyc@taobao.com>
Co-authored-by: liushz <qq1791167085@163.com>
2024-12-11 18:03:39 +08:00
Linchen Xiao
0d26b348e4
[Feature] Add OC academic 2412 ( #1750 )
2024-12-10 21:53:06 +08:00
bittersweet1999
54c0fb7a93
[Change] Change Compassarena metric ( #1749 )
...
* fix pip version
* fix pip version
* fix summarizer bug
* fix compassarena
* fix compassarena
* fix compassarena
2024-12-10 14:45:32 +08:00
Songyang Zhang
0d8df541bc
[Update] Update O1-style Benchmark and Prompts ( #1742 )
...
* Update JuderBench
* Support O1-style Prompts
* Update Code
* Update OpenAI
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update
* Update
* Update
* Update
2024-12-09 13:48:56 +08:00
Junnan Liu
f333be177c
[Update] Add MATH500 & AIME2024 to LiveMathBench ( #1741 )
...
* upload dataset definitions & configs
* add single dataset split specific metrics
* add k-pass@threshold & MATH500
* update std computation & k-pass computation
* add AIME224
* update README
2024-12-06 14:36:49 +08:00
bittersweet1999
08d63b5bf3
[Fix] Fix error in subjective default summarizer ( #1740 )
...
* fix pip version
* fix pip version
* fix summarizer bug
2024-12-06 11:03:53 +08:00
Songyang Zhang
fb43dd1906
[Update] Update Skywork/Qwen-QwQ ( #1728 )
...
* Update JuderBench
* Support O1-style Prompts
* Update Code
* Update OpenAI
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update
2024-12-05 19:30:43 +08:00
Junnan Liu
6181ac1122
[Update] Update LiveMathBench Evaluation to Support Single Dataset Split Metric Computation ( #1730 )
...
* upload dataset definitions & configs
* add single dataset split specific metrics
* add k-pass@threshold & MATH500
2024-12-05 16:54:16 +08:00
Linchen Xiao
4f317d1bd5
[Update] Update Manifest ( #1738 )
2024-12-05 13:59:56 +08:00
Linchen Xiao
ac23f0ce1f
[Update] Update init file for Korbench ( #1737 )
2024-12-05 11:26:00 +08:00
Yufeng Zhao
4d773904d4
[Update] Korbench readme supplementation ( #1734 )
...
* renewed
* readme
---------
Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>
2024-12-05 11:24:35 +08:00
Linchen Xiao
a011be6798
[Feature] DLC runner Lark report ( #1735 )
...
* [Bump] Bump version to 0.3.7
* DLC lark report update
2024-12-04 18:03:12 +08:00
Linchen Xiao
e2a290fd46
[Bump] Bump version to 0.3.7 ( #1733 )
2024-12-03 19:34:57 +08:00
Yufeng Zhao
98c4666d65
[Update] Update Korbench dataset abbr ( #1729 )
...
Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>
2024-12-02 16:20:58 +08:00
Linchen Xiao
9de27b4d85
[Update] Update max_out_len for datasets ( #1726 )
...
* [Update] Update max_out_len for datasets
* Update eval_regression_chat_objective_fullbench.py
* Update eval_regression_chat.py
* Update eval_regression_chat.py
* Update oc_score_baseline_fullbench.yaml
---------
Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
2024-12-02 11:42:07 +08:00
Junnan Liu
fe6d76fb13
[Feature] Support LiveMathBench ( #1727 )
2024-11-30 00:07:19 +08:00
liushz
b063779034
[Fix] Update P-MMEVAL OSS data ( #1722 )
...
* Update with PMMEval
* Update
* Update __init__.py
* Fix Bugs
* Delete .pre-commit-config.yaml
* Pull merge
* Fix pmmeval_gen config
* Update P-MMEVAL data
---------
Co-authored-by: wanyu <wanyu2018umac@gmail.com>
Co-authored-by: wanyu2018umac <42405907+wanyu2018umac@users.noreply.github.com>
2024-11-28 20:55:46 +08:00
liushz
c437135fad
[Feature] Add Openai Simpleqa dataset ( #1720 )
...
* Add Openai SimpleQA dataset
* Add Openai SimpleQA dataset
* Add Openai SimpleQA dataset
* Update eval_simpleqa.py
---------
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2024-11-28 19:16:07 +08:00
liushz
06ab27861e
[Fix] Fix pmmeval_gen config ( #1719 )
...
* Update with PMMEval
* Update
* Update __init__.py
* Fix Bugs
* Delete .pre-commit-config.yaml
* Pull merge
* Fix pmmeval_gen config
---------
Co-authored-by: wanyu <wanyu2018umac@gmail.com>
Co-authored-by: wanyu2018umac <42405907+wanyu2018umac@users.noreply.github.com>
2024-11-28 11:53:36 +08:00
wanyu2018umac
90efcf2216
[Feature] Add P-MMEval ( #1714 )
...
* Update with PMMEval
* Update
* Update __init__.py
* Fix Bugs
* Delete .pre-commit-config.yaml
* Pull merge
---------
Co-authored-by: liushz <qq1791167085@163.com>
2024-11-27 21:26:18 +08:00
Junnan Liu
f7dbe6bb7d
[Feature] Add Arc Prize Public Evaluation ( #1690 )
...
* support arc prize
* update arc-prize dataset info & update arc-prize evaluation performance
2024-11-27 15:44:41 +08:00
Yi Ding
bcb707dbfc
[Fix] Fix BailingAPI model ( #1707 )
...
* [fix] sequence under the multiple samples
* resolve the lint problems
* change the parameter name
* add another error code for retry
* output the log for invalid response
* format correction
* update
* update
* update
* update
* add two model python files
* update the default parameter
* use random for delay
* update the api example of bailing
* remove the unnecessary parameter
2024-11-26 19:24:47 +08:00
Linchen Xiao
ef695e28e5
[Bug] Fix Korbench dataset module ( #1717 )
2024-11-26 17:13:28 +08:00