Songyang Zhang
a4d5a6c81b
[Feature] Support LiveCodeBench ( #1617 )
...
* Update
* Update LCB
* Update
* Update
* Update
* Update
* Update
2024-10-21 20:50:39 +08:00
zhulinJulia24
825d3388d5
[CI] Test PR staging fixed ( #1624 )
...
* Update oc_score_baseline.yaml
* Update runtime.txt
2024-10-21 11:02:37 +08:00
Linchen Xiao
69997f11f8
[Feature] Update requirements.txt ( #1601 )
...
* update crb
* update crbbench
* update crbbench
* update crbbench
* minor update wildbench
* [Fix] Update doc of wildbench, and merge wildbench into subjective
* [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench
* Update crb.md
* Update crb_pair_judge.py
* Update crb_single_judge.py
* Update subjective_evaluation.md
* Update openai_api.py
* [Update] update wildbench readme
* [Update] update wildbench readme
* [Update] update wildbench readme, remove crb
* Delete configs/eval_subjective_wildbench_pair.py
* Delete configs/eval_subjective_wildbench_single.py
* Update __init__.py
* [Fix] fix version mismatch for CIBench
* [Fix] fix version mismatch for CIBench, local runer
* [Fix] fix version mismatch for CIBench, local runer, remove oracle mode
* BUG: Update cibench.py
* BUG: Update cibench.py
* [Bug] Update agent.txt
* update agent
* Update agent.txt
* update readme
* update
---------
Co-authored-by: kleinzcy <zhangchy2@shanghaitech.edu.cn>
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-10-12 18:26:57 +08:00
shijinpjlab
7528b8ab8a
[Feature] Add dingo test ( #1529 )
...
* add qa dingo
* update
* change name qa to dingo
* eval model: llm_base
* update path
* change name and move path
* add eval_dingo
* update import
* add for pip
* add dingo package
* change import place
* update import place
* fix lint fail
* isort
* double quoted
---------
Co-authored-by: sj <shijin@pjlab.org.cn>
2024-09-29 19:24:58 +08:00
Yi Ding
3f833186dc
[Feature] Support the reasoning from BaiLing LLM ( #1541 )
...
* [Feature] Support the reasoning from BaiLing LLM
This commit includes the access to BaiLing LLM and gets the reasoning.
* Add the api example
The example of evalute bailing api
* Revise the generation arguments
Based on current experiment, we update some generation arguments for better reasoning
* [fix] set the batch size
* Retry under flowcontrol of serverside
* add dependent package into requirement.txt
add dependent package retrying to clean up the pre-comment check.
* correct the file names and make the file copy
correct the file names.
copy the files under configs to opencompass
* fix the lint issue
---------
Co-authored-by: christopher.dy <christopher.dy@antgroup.com>
2024-09-26 16:49:52 +08:00
Linchen Xiao
95aad6c282
[Fix] Requirements update
2024-09-03 18:50:40 +08:00
Linchen Xiao
0fe9756c5d
[Doc] Update Readme ( #1439 )
...
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
2024-08-22 14:48:45 +08:00
Hari Seldon
14b4b735cb
[Feature] Add support for SciCode ( #1417 )
...
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode w/ bg
* add scicode
* Update README.md
* Update README.md
* Delete configs/eval_SciCode.py
* rename
* 1
* rename
* Update README.md
* Update scicode.py
* Update scicode.py
* fix some bugs
* Update
* Update
---------
Co-authored-by: root <HariSeldon0>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-22 13:42:25 +08:00
Linchen Xiao
a4b54048ae
[Feature] Add Ruler datasets ( #1310 )
...
* [Feature] Add Ruler datasets
* pre-commit fixed
* Add model specific tokenizer to dataset
* pre-commit modified
* remove unused import
* fix linting
* add trust_remote to tokenizer load
* lint fix
* comments resolved
* fix lint
* Add readme
* Fix lint
* ruler refactorize
* fix lint
* lint fix
* updated
* lint fix
* fix wonderwords import issue
* prompt modified
* update
* readme updated
* update
* ruler dataset added
* Update
---------
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-20 11:40:11 +08:00
Linchen Xiao
8e55c9c6ee
[Update] Compassbench v1.3 ( #1396 )
...
* stash files
* compassbench subjective evaluation added
* evaluation update
* fix lint
* update docs
* Update lint
* changes saved
* changes saved
* CompassBench subjective summarizer added (#1349 )
* subjective summarizer added
* fix lint
[Fix] Fix MathBench (#1351 )
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
[Update] Update model support list (#1353 )
* fix pip version
* fix pip version
* update model support
subjective summarizer updated
knowledge, math objective done (data need update)
remove secrets
objective changes saved
knowledge data added
* secrets removed
* changed added
* summarizer modified
* summarizer modified
* compassbench coding added
* fix lint
* objective summarizer updated
* compass_bench_v1.3 updated
* update files in config folder
* remove unused model
* lcbench modified
* removed model evaluation configs
* remove duplicated sdk implementation
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-08-12 19:09:19 +08:00
Songyang Zhang
c81329b548
[Fix] Fix Slurm ENV ( #1392 )
...
1. Support Slurm Cluster
2. Support automatic data download
3. Update InternLM2.5-1.8B/20B-Chat
2024-08-06 01:35:20 +08:00
Songyang Zhang
46cc7894e1
[Feature] Support import configs/models/summarizers from whl ( #1376 )
...
* [Feature] Support import configs/models/summarizers from whl
* Update LCBench configs
* Update
* Update
* Update
* Update
* update
* Update
* Update
* Update
* Update
* Update
2024-08-01 00:42:48 +08:00
LeavittLang
8ee7fecb68
Adding support for Doubao API ( #1218 )
...
* Adding support for Doubao API
* Update doubao_api.py
Fixed the bug that the connection would be retried even if it was normal.
* Update doubao_api.py
---------
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-07-26 11:44:51 +08:00
Fengzhe Zhou
a32f21a356
[Sync] Sync with internal codes 2024.06.28 ( #1279 )
2024-06-28 14:16:34 +08:00
LIU Xiao
83b9fd9eaa
add ",<2.0.0" to "numpy>=1.23.4" in requirements/runtime.txt, as pandas<2.0.0 doesn't compatible with numpy>=2.0.0 ( #1267 )
2024-06-24 11:03:42 +08:00
bittersweet1999
e0d7808b4e
[Fix] fix pip version ( #1228 )
...
* fix pip version
* fix pip version
2024-06-06 11:48:07 +08:00
Fengzhe Zhou
2954913d9b
[Sync] bump version ( #1204 )
2024-05-28 23:09:59 +08:00
klein
e4830a6926
Update CIBench ( #1089 )
...
* modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4
* update cibench: dataset and evluation
* cibench summarizer bug
* update cibench
* move extract_code import
---------
Co-authored-by: zhangchuyu@pjlab.org.cn <zhangchuyu@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 18:46:02 +08:00
Fengzhe Zhou
b39f501563
[Sync] update taco ( #1030 )
2024-04-09 17:50:23 +08:00
bittersweet1999
02e7eec911
[Feature] Support AlpacaEval_V2 ( #1006 )
...
* support alpacaeval_v2
* support alpacaeval
* update docs
* update docs
2024-03-28 16:49:04 +08:00
klein
4d2591acb2
modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4 ( #983 )
...
Co-authored-by: zhangchuyu@pjlab.org.cn <zhangchuyu@pjlab.org.cn>
2024-03-18 20:25:55 +08:00
bittersweet1999
45c606bcd0
[Fix] Fix IFEval ( #906 )
...
* fix ifeval
* fix ifeval
* fix ifeval
* fix ifeval
2024-02-22 16:51:34 +08:00
Guo Qipeng
4f78388c71
Update runtime.txt to fix rouge_chinese bugs. ( #803 )
...
* Update runtime.txt to fix rouge_chinese bugs.
the wheel file of rouge_chinese will overwrite the rouge package, causing bugs. Replacing it to the github code, which is the correct version.
* fix PEP format issues
* fix PEP format issues
* enable pip install
---------
Co-authored-by: 郭琦鹏 <guoqipeng@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-01-29 19:18:22 +08:00
Hubert
4aa74565e2
[Feat] minor update agent related ( #839 )
...
* [Feat] update cibench
* [Feat] Support CIBench
* [Feat] Support CIBench
* [Feat] Support CIBench
* [Feat] Support CIBench
2024-01-26 14:15:51 +08:00
Songyang Zhang
793e32c9cc
[Feature] Update API implementation ( #834 )
2024-01-24 13:35:21 +08:00
Hubert
d0dc3534e5
[Fix] hot fix for requirements ( #789 )
2024-01-11 15:48:32 +08:00
Fengzhe Zhou
32f40a8f83
[Sync] Sync with internal codes 2023.01.08 ( #777 )
2024-01-08 14:07:24 +00:00
Hubert
e78857ac36
[Sync] minor test ( #683 )
2023-12-11 17:42:53 +08:00
Hubert
1029119e39
[Feat] support pr merge test ci ( #669 )
...
* [Feat] support ci
* [Feat] support ci
* [Feat] support ci
* [Feat] support ci
* init docs
* init docs
* init docs
2023-12-11 14:12:04 +08:00
bittersweet1999
1c95790fdd
New subjective judgement ( #660 )
...
* TabMWP
* TabMWP
* fixed
* fixed
* fixed
* done
* done
* done
* add new subjective judgement
* add new subjective judgement
* add new subjective judgement
* add new subjective judgement
* add new subjective judgement
* modified to a more general way
* modified to a more general way
* final
* final
* add summarizer
* add new summarize
* fixed
* fixed
* fixed
---------
Co-authored-by: caomaosong <caomaosong@pjlab.org.cn>
2023-12-06 13:28:33 +08:00
Hubert
e9e75fb4eb
[Fix] remove colossalai dependency ( #645 )
2023-11-28 14:09:44 +08:00
Songyang Zhang
5329724b65
[Doc] Update README and requirements. ( #622 )
...
* update readme
* update doc
2023-11-22 19:16:54 +08:00
Songyang Zhang
d925748266
[Feature] Support 360API and FixKRetriever for CSQA dataset ( #601 )
...
* [Feature] Support 360API and FixKRetriever for CSQA dataset
* Update API
* Update API
* [Feature] Support 360API and FixKRetriever for CSQA dataset
* Update API
* Update API
* rm mathbench
* fix_lint
* Update opencompass/models/bytedance_api.py
Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>
* update
* update
* update
---------
Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>
2023-11-21 20:25:47 +08:00
Yuan Feng
7199acc25d
Add support for DataCanvas Alaya LM ( #612 )
...
* Support for Alaya
* Remove useless requirements
2023-11-21 17:51:30 +08:00
Songyang Zhang
32884f2e39
[Feature] Update api.txt ( #567 )
2023-11-10 15:55:23 +08:00
Hubert
cf5a6d1ab7
[Fix] fix unnecessary import and update requirements ( #555 )
2023-11-08 17:58:49 +08:00
Songyang Zhang
239c2a346e
[Feature] Add support for MiniMax API ( #548 )
...
* update requirement
* update requirement
* update with minimax
* update api model
* Update readme
* fix error
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2023-11-06 21:57:32 +08:00
Qing
e2355a2ede
[Feature] Add multi model viz ( #509 )
...
* add viz_multi_model.py tool
* Modify the viz_multi_model.py script according to the review
* highlight multiple optimal scores
---------
Co-authored-by: wq.chu <wq.chu@tianrang-inc.com>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-10-30 12:11:33 +08:00
Fengzhe Zhou
dbb20b8270
[Sync] update ( #517 )
2023-10-27 20:31:22 +08:00
Leymore
861942ab1b
[Feature] Add lawbench ( #460 )
...
* add lawbench
* update requirements
* update
2023-10-13 06:51:36 -05:00
Leymore
fbf5089c40
[Sync] update github token ( #475 )
2023-10-13 06:50:54 -05:00
Leymore
d7ff933a73
[Fix] Use jieba rouge in lcsts ( #459 )
...
* use jieba rouge in lcsts
* use rouge_chinese
2023-10-09 10:10:33 +08:00
Tong Gao
767c12a660
[Docs] update get_started ( #435 )
...
* [Docs] update get_started
* [Docs] Refactor get_started
* update
* add zh FAQ
* add cn doc
* update
* fix dead links
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-10-07 11:49:40 +08:00
Tong Gao
2a271dbf60
[Docs] Update doc theme ( #332 )
...
* [Docs] Update doc theme
* update
2023-08-31 10:44:21 +08:00
philipwangOvO
655a807f4b
[Dataset] LongBench ( #236 )
...
Co-authored-by: wangchonghua <wangchonghua@pjlab.org.cn>
2023-08-21 14:15:20 +08:00
Tong Gao
c6a3494993
[Fix] requirements ( #229 )
2023-08-18 14:34:20 +08:00
dependabot[bot]
0555d59a6a
Bump requests from 2.28.1 to 2.31.0 ( #178 )
...
Bumps [requests](https://github.com/psf/requests ) from 2.28.1 to 2.31.0.
- [Release notes](https://github.com/psf/requests/releases )
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md )
- [Commits](https://github.com/psf/requests/compare/v2.28.1...v2.31.0 )
---
updated-dependencies:
- dependency-name: requests
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-09 19:41:09 +08:00
Tong Gao
1e44541730
[Enhancement] Test linting in CI and fix existing linting errors ( #69 )
...
* [Enhancement] Test linting in CI
* fix linting
2023-07-17 15:59:10 +08:00
tonysy
e6b5bdcb87
OpenCompass Public MR
2023-07-05 03:15:21 +00:00
gaotongxiao
7d346000bb
initial commit
2023-07-04 21:34:55 +08:00