Commit Graph

179 Commits

Author SHA1 Message Date
Songyang Zhang
aa2b89b6f8
[Update] Add CascadeEvaluator with Data Replica (#2022)
* Update CascadeEvaluator

* Update CascadeEvaluator

* Update CascadeEvaluator

* Update Config

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update
2025-05-20 16:46:55 +08:00
kkscilife
8c0ccf9a6b
[CI] Fix Lint error (#2103) 2025-05-16 15:36:45 +08:00
kkscilife
6f3b6a5d12
[CI] Add gitleaks check (#2101) 2025-05-16 14:34:57 +08:00
Linchen Xiao
d590f557bb
[Update] OpenaiSDK handle empty content (#2096) 2025-05-12 19:38:30 +08:00
yuehua-s
c492e49e79
[Update] Add o4 in OpenaiSDK (#2083)
* feature:1.add o4-mini;2.o3 or o4-mini only support temperature==1

* feature:change 4o-mini to 4o

---------

Co-authored-by: yuehuazhang <yuehuazhang@tencent.com>
2025-05-12 18:39:44 +08:00
Linchen Xiao
af8432e1d6
[Update] OpenAI SDK model reasoning content (#2078)
* update

* update

* update
2025-05-07 14:06:40 +08:00
Linchen Xiao
e8bc8c1e8c
[Bug] Concat OpenaiSDK reasoning content (#2041)
* [Bug] Concat OpenaiSDK reasoning content

* [Bug] Concat OpenaiSDK reasoning content

* update

* update
2025-04-25 14:10:33 +08:00
Linchen Xiao
65ff602cf5
[Update] Fix LLM Judge metrics cacluation & Add reasoning content concat to OpenAI SDK 2025-04-15 11:33:16 +08:00
Linchen Xiao
f66b0b347a
[Update] Requirements update (#1993) 2025-04-02 12:03:45 +08:00
Linchen Xiao
b9de8b0e2b
[Update] Unset disallowed_special token for Openai model (#1960) 2025-03-18 20:24:07 +08:00
Songyang Zhang
c84bc18ac1
[Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard (#1899)
* [Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard with LLM Verify

* Update

* Update

* Update DeepSeek-R1 example

* Update DeepSeek-R1 example

* Update DeepSeek-R1 example
2025-03-03 18:56:11 +08:00
Junnan Liu
22a33d8759
[Update] Update LiveMathBench Hard Configs (#1826)
* support G-Pass@k and livemathbench

* fix bugs

* fix comments of GPassKEvaluator

* update saved details of GPassKEvaluator

* update saved details of GPassKEvaluator

* fix eval api configs & update openai_api for ease of debugging

* update huggingface path

* fix method name of G-Pass@k

* fix default value of eval_model_name

* refactor G-Pass@k evaluator

* log generation params for each backend

* fix evaluation resume

* add notimplementerror

* update livemathbench-hard configs

* remove max_out_len from livemathbench_hard_greedy_gen_9befbf.py

* remove max_out_len from livemathbench_hard_gen_9befbf.py

* rename livemathbench_hard_gen_9befbf.py to livemathbench_hard_gen_353ae7.py

* rename livemathbench_hard_greedy_gen_9befbf.py to livemathbench_hard_greedy_gen_353ae7.py

* update livemathbench_gen_9befbf.py

* remove whitespace

* upload livemathbench hard configs
2025-02-25 17:24:36 +08:00
Linchen Xiao
d7daee6e25
[Update] OpenAI model update, bigcodebench update (#1879)
* [Update] Openai model update, bigcodebench update

* update
2025-02-20 19:33:25 +08:00
Junnan Liu
70f2c963d3
[Feature] Support Omni-Math (#1837)
* support omni-math

* update config

* upload README

* Delete opencompass/configs/datasets/omni_math/__init__.py

---------

Co-authored-by: liushz <qq1791167085@163.com>
2025-01-23 18:36:54 +08:00
Linchen Xiao
03415b2a66
[Fix] Update max_out_len logic for OpenAI model (#1839) 2025-01-21 15:46:14 +08:00
Linchen Xiao
a6193b4c02
[Refactor] Code refactoarization (#1831)
* Update

* fix lint

* update

* fix lint
2025-01-20 19:17:38 +08:00
Linchen Xiao
117dc500ad
[Feature] Add Longbenchv2 support (#1801)
* Create eval_longbenchv2.py

* Create longbenchv2_gen.py

* Update __init__.py

* Create longbenchv2.py

* Update datasets_info.py

* update

* update

* update

* update

* update

* update

---------

Co-authored-by: abrohamLee <146956824+abrohamLee@users.noreply.github.com>
2025-01-03 12:04:29 +08:00
Junnan Liu
8e8d4f1c64
[Feature] Support G-Pass@k and LiveMathBench (#1772)
* support G-Pass@k and livemathbench

* fix bugs

* fix comments of GPassKEvaluator

* update saved details of GPassKEvaluator

* update saved details of GPassKEvaluator

* fix eval api configs & update openai_api for ease of debugging

* update huggingface path

* fix method name of G-Pass@k

* fix default value of eval_model_name

* refactor G-Pass@k evaluator

* log generation params for each backend

* fix evaluation resume

* add notimplementerror
2024-12-30 16:59:39 +08:00
Junnan Liu
499302857f
[Fix] Fix Local Runner Params Save Path (#1768)
* update local runner params save dir

* fix remove

* fix directory remove

* Fix *_params.py by uuid4
2024-12-19 16:07:34 +08:00
Songyang Zhang
0d8df541bc
[Update] Update O1-style Benchmark and Prompts (#1742)
* Update JuderBench

* Support O1-style Prompts

* Update Code

* Update OpenAI

* Update BigCodeBench

* Update BigCodeBench

* Update BigCodeBench

* Update BigCodeBench

* Update BigCodeBench

* Update

* Update

* Update

* Update
2024-12-09 13:48:56 +08:00
Songyang Zhang
fb43dd1906
[Update] Update Skywork/Qwen-QwQ (#1728)
* Update JuderBench

* Support O1-style Prompts

* Update Code

* Update OpenAI

* Update BigCodeBench

* Update BigCodeBench

* Update BigCodeBench

* Update BigCodeBench

* Update BigCodeBench

* Update
2024-12-05 19:30:43 +08:00
Linchen Xiao
9de27b4d85
[Update] Update max_out_len for datasets (#1726)
* [Update] Update max_out_len for datasets

* Update eval_regression_chat_objective_fullbench.py

* Update eval_regression_chat.py

* Update eval_regression_chat.py

* Update oc_score_baseline_fullbench.yaml

---------

Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
2024-12-02 11:42:07 +08:00
Yi Ding
bcb707dbfc
[Fix] Fix BailingAPI model (#1707)
* [fix] sequence under the multiple samples

* resolve the lint problems

* change the parameter name

* add another error code for retry

* output the log for invalid response

* format correction

* update

* update

* update

* update

* add two model python files

* update the default parameter

* use random for delay

* update the api example of bailing

* remove the unnecessary parameter
2024-11-26 19:24:47 +08:00
Songyang Zhang
f97c4eae42
[Update] Update Fullbench (#1712)
* Update JuderBench

* Support O1-style Prompts

* Update Code
2024-11-26 14:26:55 +08:00
Linchen Xiao
80e3b9ef37
[Update] Add math prm 800k (#1708) 2024-11-21 21:29:43 +08:00
Linchen Xiao
500fb1032a
[Update] Update configurations (#1704) 2024-11-21 16:51:18 +08:00
Yi Ding
05044dfaf2
[Update] Support new error code for Bailing model (#1702)
* support new error code

* fix the lint problems
2024-11-20 16:40:22 +08:00
bittersweet1999
aca8ec3c6a
[Hotfix] Hotfix (#1683)
* fix pip version

* fix pip version

* fix lint

* hotfix
2024-11-13 10:14:27 +08:00
sobeit
3ec178f4a9
add single lora adapter support for vLLM inference. (#1679) 2024-11-12 17:31:36 +08:00
bittersweet1999
17b5e52f6c
[Hotfix] lmdeploy temp (#1674)
* fix pip version

* fix pip version

* hotfix
2024-11-12 16:10:16 +08:00
Linchen Xiao
835bf75a36
[Feature] Add long context evaluation for base models (#1666)
* [Update] Add base long context evaluation

* update
2024-11-08 10:53:29 +08:00
Chang Cheng
fd7aa83c01
[Update] Update DLC Runner(#1662)
* push interntrain hard code

* push interntrain hard code

* remove redundant post process

---------

Co-authored-by: changcheng <changcheng@pjlab.org.cb>
Co-authored-by: changcheng <changcheng@pjlab.org.cn>
2024-11-07 15:45:35 +08:00
Lyu Han
888f1f3bef
[Fix] Update loglikehood compatibility (#1659) 2024-11-02 17:19:11 +08:00
Linchen Xiao
df57c08ccf
[Feature] Update Models, Summarizers (#1600) 2024-10-29 18:37:15 +08:00
Lyu Han
fb12c3f98a
[Update] strip stop_words (#1635) 2024-10-24 20:39:20 +08:00
Chenguang Li
5868d5afa4
[Bug] Fix-NPU-Support (#1618)
* bugfix NPU support

* formatting

---------

Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-10-21 17:42:53 +08:00
Lyu Han
6e8adf5221
[Bug] Remove prefix bos_token from messages when using lmdeploy as the accelerator (#1623)
* remove prefix bos_token from messages when using lmdeploy as the accelerator

* update
2024-10-19 20:03:47 +08:00
x54-729
2b1afa7d1e
[Fix] fix interntrain's tokenizer truncate (#1605)
Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>
2024-10-15 16:03:57 +08:00
Lyu Han
4fde41036f
[Feature] Update TurboMindModel by integrating lmdeploy pipeline API (#1556)
* integrate lmdeploy's pipeline api

* fix linting

* update user guide

* rename

* update

* update

* update

* rollback class name

* update

* remove unused code

* update

* update

* use pipeline

* fix ci check

* compatibility

* compatibility

* remove concurrency

* update

* fix table content

* update
2024-10-14 15:33:40 +08:00
Lyu Han
b52ba65c26
[Feature] Integrate lmdeploy pipeline api (#1198)
* integrate lmdeploy's pipeline api

* fix linting

* update user guide

* rename

* update

* update

* update

* rollback class name

* update

* remove unused code

* update

* update

* fix ci check

* compatibility

* remove concurrency

* Update configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py

* Update docs/zh_cn/advanced_guides/evaluation_lmdeploy.md

* [Bug] fix lint

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-10-09 22:58:06 +08:00
x54-729
4d6349dfe1
[FIX] fix interntrain get_loglikelihood (#1584) 2024-10-08 11:34:04 +08:00
x54-729
bbdca5eb4c
[BUG] Fix eos token handling and add comments for InternTrain (#1569)
Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>
2024-09-30 15:46:06 +08:00
Yi Ding
85a28874aa
[BUG]: Fix Bailing API configs (#1570) 2024-09-27 11:56:57 +08:00
Songyang Zhang
e8437db98f
[Feature] Update BailingLM/OpenAI verbose (#1568)
* [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI

* Update

* [Feature] Update API

* Update
2024-09-27 11:15:25 +08:00
Songyang Zhang
7d50294117
[Feature] Update Bailing (#1567)
* [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI

* Update

* Update

* Update
2024-09-26 18:56:17 +08:00
Songyang Zhang
a7bacfdf7e
[Feature] Update CoreBench 2.0 (#1566)
* [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI

* Update

* Update
2024-09-26 18:44:00 +08:00
Yi Ding
3f833186dc
[Feature] Support the reasoning from BaiLing LLM (#1541)
* [Feature] Support the reasoning from BaiLing LLM

This commit includes the access to BaiLing LLM and gets the reasoning.

* Add the api example

The example of evalute bailing api

* Revise the generation arguments

Based on current experiment, we update some generation arguments for better reasoning

* [fix] set the batch size

* Retry under flowcontrol of serverside

* add dependent package into requirement.txt

add dependent package retrying to clean up the pre-comment check.

* correct the file names and make the file copy

correct the file names.
copy the files under configs to opencompass

* fix the lint issue

---------

Co-authored-by: christopher.dy <christopher.dy@antgroup.com>
2024-09-26 16:49:52 +08:00
x54-729
335667183a
[Feature] Add Interntrain model support (#1548)
Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>
2024-09-23 19:10:26 +08:00
Songyang Zhang
ee058e25b2
[Feature] Support verbose for OpenAI API (#1546) 2024-09-20 17:12:52 +08:00
hailsham
a81bbb85bf
[FIX] Added handling for the "begin section" in meta_template to APITemplateParser (#1405)
Co-authored-by: leifei <nuuooo@icloud.com>
2024-09-19 18:12:04 +08:00