Songyang Zhang
f97c4eae42
[Update] Update Fullbench ( #1712 )
...
* Update JuderBench
* Support O1-style Prompts
* Update Code
2024-11-26 14:26:55 +08:00
Yufeng Zhao
300adc31e8
[Feature] Add Korbench dataset ( #1713 )
...
* first version for korbench
* first stage for korbench
* korbench_1
* korbench_1
* korbench_1
* korbench_1
* korbench_1_revised
* korbench_combined_1
* korbench_combined_1
* kor_combined
* kor_combined
* update
---------
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2024-11-25 20:11:27 +08:00
liushz
e49fcfd3a3
[Update] Update MATH dataset with model judge ( #1711 )
...
* Update math with llm judge
* Update math with llm judge
* Update math with llm judge
* Update math with llm judge
* Update math with llm judge
2024-11-25 15:14:55 +08:00
Linchen Xiao
ab8fdbbaab
[Update] Update Math auto-download data ( #1700 )
2024-11-18 20:24:35 +08:00
abrohamLee
e9e4b69ddb
[Feature] MuSR Datset Evaluation ( #1689 )
...
* MuSR Datset Evaluation
* MuSR Datset Evaluation
Add an assertion and a Readme.md
2024-11-14 20:42:12 +08:00
Linchen Xiao
e92a5d4230
[Feature] BABILong Dataset added ( #1684 )
...
* update
* update
* update
* update
2024-11-14 15:32:43 +08:00
Linchen Xiao
a0ef2fd3b4
[Update] Dingo Dataset update ( #1670 )
...
* [Update] Dingo Dataset update
* update
2024-11-08 14:38:43 +08:00
Linchen Xiao
835bf75a36
[Feature] Add long context evaluation for base models ( #1666 )
...
* [Update] Add base long context evaluation
* update
2024-11-08 10:53:29 +08:00
liushz
f7d899823c
[Update] Update mmmlu_lite dataload ( #1658 )
...
* update mmmlu_lite dataload from oss
* update mmmlu_lite dataload from oss
2024-11-01 17:32:29 +08:00
Songyang Zhang
c789ce5698
[Fix] the automatically download for several datasets ( #1652 )
...
* [Fix] the automatically download for several datasets
* Update
* Update
* Update CI
2024-11-01 15:57:18 +08:00
bittersweet1999
a0853c939d
[Add] Add CompassArenaSubjectiveBench ( #1645 )
...
* fix pip version
* fix pip version
* add compassarenasubjectivebench
* add compassarenasubjectivebench
* add compassarenabench
2024-11-01 13:52:22 +08:00
Linchen Xiao
df57c08ccf
[Feature] Update Models, Summarizers ( #1600 )
2024-10-29 18:37:15 +08:00
Junnan Liu
645c5f3b2c
[Datasets] Add datasets CMO&AIME ( #1610 )
...
* add datasets cmo&aime
* delete unused modules
* modify prompt
* update __init__
* update data load and add README
* update data load
* update performance
* update md5
* remove indents
* add indent
* fix log for debug mode
2024-10-28 18:08:02 +08:00
Linchen Xiao
a61e8a0803
[Update] Internal humaneval add ( #1641 )
...
* [Update] internal_humaneval_add
* update
2024-10-25 19:08:42 +08:00
Linchen Xiao
662dddf41a
[Update] Add internal humaneval postprocess ( #1636 )
2024-10-24 17:45:21 +08:00
Songyang Zhang
a4d5a6c81b
[Feature] Support LiveCodeBench ( #1617 )
...
* Update
* Update LCB
* Update
* Update
* Update
* Update
* Update
2024-10-21 20:50:39 +08:00
Chenguang Li
5868d5afa4
[Bug] Fix-NPU-Support ( #1618 )
...
* bugfix NPU support
* formatting
---------
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-10-21 17:42:53 +08:00
Bob Tsang
dd0b655bd0
[Feature] Support MMMLU & MMMLU-lite Benchmark ( #1565 )
...
* rm folder
* modify format according to reviewer
* modify format according to reviewer
* modify format according to reviewer
* add some files requirement
* fix some bug
* fix bug
* change load type
* Update MMMLU Dataset
* Update MMMLU Dataset
* Add MMMLU-Lite Dataset
* update MMMMLU datast
* update MMMMLU datast
* update MMMMLU datast
---------
Co-authored-by: BobTsang <BobTsang1995@gmail.com>
Co-authored-by: liushz <qq1791167085@163.com>
2024-10-17 19:09:34 +08:00
bittersweet1999
f0d436496e
[Update] update docs and add compassarena ( #1614 )
...
* fix pip version
* fix pip version
* update docs and add compassarena
* update docs
2024-10-17 14:39:06 +08:00
Haoran Que
4fe251729b
Upload HelloBench ( #1607 )
...
* upload hellobench
* update hellobench
* update readme.md
* update eval_hellobench.py
* update lastest
---------
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-10-15 17:11:37 +08:00
bittersweet1999
fa54aa62f6
[Feature] Add Judgerbench and reorg subeval ( #1593 )
...
* fix pip version
* fix pip version
* update (#1522 )
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
* [Feature] Update Models (#1518 )
* Update Models
* Update
* Update humanevalx
* Update
* Update
* [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527 )
add judgerbench and reorg sub
add judgerbench and reorg subeval
add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
---------
Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2024-10-15 16:36:05 +08:00
liushz
5faee929db
[Feature] Add GaoKaoMath Dataset for Evaluation & MATH Model Eval Config ( #1589 )
...
* Add GaoKaoMath Dataset
* Add MATH LLM Eval
* Update GAOKAO Math Eval Dataset
* Update GAOKAO Math Eval Dataset
2024-10-12 19:13:06 +08:00
bittersweet1999
3f7a3730d7
[Fix] fix Flames ( #1599 )
...
* fix pip version
* fix pip version
* fix flames
* fix flames
2024-10-12 14:34:59 +08:00
Linchen Xiao
763d7755b6
[BUG]GaokaoBench dataset fix ( #1583 )
2024-09-30 15:13:26 +08:00
shijinpjlab
7528b8ab8a
[Feature] Add dingo test ( #1529 )
...
* add qa dingo
* update
* change name qa to dingo
* eval model: llm_base
* update path
* change name and move path
* add eval_dingo
* update import
* add for pip
* add dingo package
* change import place
* update import place
* fix lint fail
* isort
* double quoted
---------
Co-authored-by: sj <shijin@pjlab.org.cn>
2024-09-29 19:24:58 +08:00
liushz
c9a7026f59
[Feature] Update MathBench & WikiBench for FullBench ( #1521 )
...
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update GPQA & MMLU_Pro
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
---------
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-18 14:35:30 +08:00
Songyang Zhang
6997990c93
[Feature] Update Models ( #1518 )
...
* Update Models
* Update
* Update humanevalx
* Update
* Update
2024-09-12 23:35:30 +08:00
bittersweet1999
7c7fa36235
[Feature] add support for internal Followbench ( #1511 )
...
* fix pip version
* fix pip version
* add internal followbench
* add internal followbench
* fix lint
* fix lint
2024-09-11 13:32:34 +08:00
Linchen Xiao
87ffa71d68
[Feature] Longbench dataset update
2024-09-06 15:50:12 +08:00
Hari Seldon
faf5260155
[Feature] Optimize Evaluation Speed of SciCode ( #1489 )
...
* update scicode
* update comments
* remove redundant variable
* Update
---------
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-09-06 00:59:41 +08:00
Linchen Xiao
6c9cd9a260
[Feature] Needlebench auto-download update ( #1480 )
...
* update
* update
* update
2024-09-05 17:22:42 +08:00
Linchen Xiao
9693be46b7
[Feature] Mmlu-pro auto-download ( #1464 )
...
* update
* update
* update
* update
* update
2024-08-30 10:03:40 +08:00
Linchen Xiao
245664f4c0
[Feature] Fullbench v0.1 language update ( #1463 )
...
* update
* update
* update
* update
2024-08-28 14:01:05 +08:00
Songyang Zhang
7c2d25b557
[Fix] Update SciCode and Gemma model ( #1449 )
...
* [Fix] Update SciCode and Gemma model
* Update
* Update
2024-08-23 10:42:27 +08:00
Hari Seldon
14b4b735cb
[Feature] Add support for SciCode ( #1417 )
...
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode w/ bg
* add scicode
* Update README.md
* Update README.md
* Delete configs/eval_SciCode.py
* rename
* 1
* rename
* Update README.md
* Update scicode.py
* Update scicode.py
* fix some bugs
* Update
* Update
---------
Co-authored-by: root <HariSeldon0>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-22 13:42:25 +08:00
Linchen Xiao
a4b54048ae
[Feature] Add Ruler datasets ( #1310 )
...
* [Feature] Add Ruler datasets
* pre-commit fixed
* Add model specific tokenizer to dataset
* pre-commit modified
* remove unused import
* fix linting
* add trust_remote to tokenizer load
* lint fix
* comments resolved
* fix lint
* Add readme
* Fix lint
* ruler refactorize
* fix lint
* lint fix
* updated
* lint fix
* fix wonderwords import issue
* prompt modified
* update
* readme updated
* update
* ruler dataset added
* Update
---------
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-20 11:40:11 +08:00
Linchen Xiao
ecf9bb3e4c
[Bug] Commonsenseqa dataset fix ( #1425 )
...
* longbench dataset load fix
* update
* Update
* Update
* Update
* update
* update
---------
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-16 15:54:07 +08:00
Songyang Zhang
9b3613f10b
[Update] Support auto-download of FOFO/MT-Bench-101 ( #1423 )
...
* [Update] Support auto-download of FOFO/MT-Bench-101
* Update wildbench
2024-08-16 11:57:41 +08:00
Linchen Xiao
2596f226f4
[Fix] longbench dataset load fix ( #1422 )
2024-08-15 11:30:30 +08:00
Linchen Xiao
8e55c9c6ee
[Update] Compassbench v1.3 ( #1396 )
...
* stash files
* compassbench subjective evaluation added
* evaluation update
* fix lint
* update docs
* Update lint
* changes saved
* changes saved
* CompassBench subjective summarizer added (#1349 )
* subjective summarizer added
* fix lint
[Fix] Fix MathBench (#1351 )
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
[Update] Update model support list (#1353 )
* fix pip version
* fix pip version
* update model support
subjective summarizer updated
knowledge, math objective done (data need update)
remove secrets
objective changes saved
knowledge data added
* secrets removed
* changed added
* summarizer modified
* summarizer modified
* compassbench coding added
* fix lint
* objective summarizer updated
* compass_bench_v1.3 updated
* update files in config folder
* remove unused model
* lcbench modified
* removed model evaluation configs
* remove duplicated sdk implementation
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-08-12 19:09:19 +08:00
yaoyingyy
decb621ff6
[Fix] the issue where scores are negative in the Lawbench dataset evaluation( #1402 ) ( #1403 )
2024-08-08 16:08:26 +08:00
Yunlin Mao
818d72a650
[Fix] modelscope dataset load problem ( #1406 )
...
* fix modelscope dataset load
* fix lint
2024-08-08 14:01:06 +08:00
Songyang Zhang
fed1a4998b
[Fix] Fix CaLM import ( #1395 )
2024-08-06 12:17:45 +08:00
Songyang Zhang
c81329b548
[Fix] Fix Slurm ENV ( #1392 )
...
1. Support Slurm Cluster
2. Support automatic data download
3. Update InternLM2.5-1.8B/20B-Chat
2024-08-06 01:35:20 +08:00
Songyang Zhang
c09fc79ba8
[Feature] Support OpenAI ChatCompletion ( #1389 )
...
* [Feature] Support import configs/models/summarizers from whl
* Update
* Update openai sdk
* Update
* Update gemma
2024-08-01 19:10:13 +08:00
Peng Bo
07c96ac659
Calm dataset ( #1385 )
...
* Add CALM Dataset
2024-08-01 10:03:21 +08:00
Songyang Zhang
46cc7894e1
[Feature] Support import configs/models/summarizers from whl ( #1376 )
...
* [Feature] Support import configs/models/summarizers from whl
* Update LCBench configs
* Update
* Update
* Update
* Update
* update
* Update
* Update
* Update
* Update
* Update
2024-08-01 00:42:48 +08:00
Songyang Zhang
eee5a5be23
[Fix] Update get_data_path for LCBench and HumanEval ( #1375 )
2024-07-29 19:28:09 +08:00
Songyang Zhang
704853e5e7
[Feature] Update pip install ( #1324 )
...
* [Feature] Update pip install
* Update Configuration
* Update
* Update
* Update
* Update Internal Config
* Update collect env
2024-07-29 18:32:50 +08:00
Xingjun.Wang
edab1c07ba
[Feature] Support ModelScope datasets ( #1289 )
...
* add ceval, gsm8k modelscope surpport
* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest
* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets
* format file
* format file
* update dataset format
* support ms_dataset
* udpate dataset for modelscope support
* merge myl_dev and update test_ms_dataset
* udpate dataset for modelscope support
* update readme
* update eval_api_zhipu_v2
* remove unused code
* add get_data_path function
* update readme
* remove tydiqa japanese subset
* add ceval, gsm8k modelscope surpport
* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest
* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets
* format file
* format file
* update dataset format
* support ms_dataset
* udpate dataset for modelscope support
* merge myl_dev and update test_ms_dataset
* update readme
* udpate dataset for modelscope support
* update eval_api_zhipu_v2
* remove unused code
* add get_data_path function
* remove tydiqa japanese subset
* update util
* remove .DS_Store
* fix md format
* move util into package
* update docs/get_started.md
* restore eval_api_zhipu_v2.py, add environment setting
* Update dataset
* Update
* Update
* Update
* Update
---------
Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local>
Co-authored-by: Yunnglin <mao.looper@qq.com>
Co-authored-by: Yun lin <yunlin@laptop.local>
Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-29 13:48:32 +08:00
jxd
12b84aeb3b
[Feature] Update CHARM Memeorziation ( #1230 )
...
* update gemini api and add gemini models
* add openai models
* update CHARM evaluation
* add CHARM memorization tasks
* add CharmMemSummarizer (output eval details for memorization-independent reasoning analysis
* update CHARM readme
---------
Co-authored-by: wujiang <wujiang@pjlab.org.cn>
2024-07-26 18:42:30 +08:00
bittersweet1999
d3782c1d47
Revert "Calm dataset ( #1287 )" ( #1366 )
...
This reverts commit edd0ffdf70
.
2024-07-26 18:27:29 +08:00
Peng Bo
edd0ffdf70
Calm dataset ( #1287 )
...
* add calm dataset
* modify config max_out_len
* update README
* Modify README
* update README
* update README
* update README
* update README
* update README
* add summarizer and modify readme
* delete summarizer config comment
* update summarizer
* modify same response to all questions
* update README
2024-07-26 11:48:16 +08:00
klein
65fad8e2ac
[Fix] minor update wildbench ( #1335 )
...
* update crb
* update crbbench
* update crbbench
* update crbbench
* minor update wildbench
* [Fix] Update doc of wildbench, and merge wildbench into subjective
* [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench
* Update crb.md
* Update crb_pair_judge.py
* Update crb_single_judge.py
* Update subjective_evaluation.md
* Update openai_api.py
* [Update] update wildbench readme
* [Update] update wildbench readme
* [Update] update wildbench readme, remove crb
* Delete configs/eval_subjective_wildbench_pair.py
* Delete configs/eval_subjective_wildbench_single.py
* Update __init__.py
---------
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-07-26 11:19:04 +08:00
Mo Li
69aa2f2d57
[Feature] Make NeedleBench available on HF ( #1364 )
...
* update_lint
* update_huggingface format
* fix bug
* update docs
2024-07-25 19:01:56 +08:00
Que Haoran
a244453d9e
[Feature] Support inference ppl datasets ( #1315 )
...
* commit inference ppl datasets
* revised format
* revise
* revise
* revise
* revise
* revise
* revise
2024-07-22 17:59:30 +08:00
liushz
98c58f8a6c
[Feature] Add compassbench knowledge&math part ( #1342 )
...
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Fix Llama-3 meta template
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Update acclerator
* Update MathBench
* Update accelerator
* Add Doc for accelerator
* Add Doc for accelerator
* Add Doc for accelerator
* Add Doc for accelerator
* Update compassbench august wiki&math
* Update compassbench august wiki&math
* Update compassbench august wiki&math
* Update compassbench_aug_gen_068af0.py
* Update compassbench_aug_gen_068af0.py
* Update
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-19 22:54:46 +08:00
bittersweet1999
1f9f728f22
[Feature] support compassbench Checklist evaluation ( #1339 )
...
* fix pip version
* fix pip version
* support checklist eval
* init
* add lan
* fix typo
2024-07-19 16:40:44 +08:00
Fengzhe Zhou
a62c613d3e
[Sync] bump version 0.2.6+local ( #1294 )
2024-07-06 00:44:06 +08:00
bittersweet1999
68ca48496b
[Refactor] Reorganize subjective eval ( #1284 )
...
* fix pip version
* fix pip version
* reorganize subjective eval
* reorg sub
* reorg subeval
* reorg subeval
* update subjective doc
* reorg subeval
* reorg subeval
2024-07-05 22:11:37 +08:00
Fengzhe Zhou
a32f21a356
[Sync] Sync with internal codes 2024.06.28 ( #1279 )
2024-06-28 14:16:34 +08:00
Xingyuan Bu
842fb1cd70
Update mtbench101.py ( #1276 )
...
fix wrong-used import
from torch.utils.data import DataLoader, Dataset
2024-06-26 00:40:22 +08:00
klein
1fa62c4a42
Support wildbench ( #1266 )
...
Co-authored-by: Leymore <zfz-960727@163.com>
2024-06-24 13:16:27 +08:00
bittersweet1999
982e024540
[Feature] add dataset Fofo ( #1224 )
...
* add fofo dataset
* add dataset fofo
2024-06-06 11:40:48 +08:00
Xingyuan Bu
02a0a4e857
MT-Bench-101 ( #1215 )
...
* add mt-bench-101
* add readme and requirements
* add mt-bench-101 data
* Update readme_mtbench101.md
* update readme
* update leaderboard
* fix typo
* Update readme_mtbench101.md
* fit newest opencompass
* update readme.md
* mtbench101 to opencompass
* mtbench101 to opencompass
* for code review
* for code review
* for code review
* hook
* hook
---------
Co-authored-by: liujie <ljie@buaa.edu.cn>
2024-06-03 14:52:12 +08:00
mqy004
b272803d8a
解决release版本安装后不能导入opencompass.cli.main的问题 ( #1221 )
...
* Create __init__.py
* Create __init__.py
* Create __init__.py
* Create __init__.py
* Create __init__.py
* Create __init__.py
* format
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-31 13:23:33 +08:00
Fengzhe Zhou
a77b8a5cec
[Sync] format ( #1214 )
2024-05-30 00:21:58 +08:00
Fengzhe Zhou
2954913d9b
[Sync] bump version ( #1204 )
2024-05-28 23:09:59 +08:00
jxd
608ff5810d
support CHARM ( https://github.com/opendatalab/CHARM ) reasoning tasks ( #1190 )
...
* support CHARM (https://github.com/opendatalab/CHARM ) reasoning tasks
* fix lint error
* add dataset card for CHARM
* minor refactor
* add txt
---------
Co-authored-by: wujiang <wujiang@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-27 13:48:22 +08:00
yaoyingyy
749e4cea71
[Fix] temporary files using tempfile ( #1186 )
...
Co-authored-by: yaoying <yaoying@kingsoft.com>
2024-05-24 23:27:37 +08:00
Fengzhe Zhou
2b3d4150f3
[Sync] update evaluator ( #1175 )
2024-05-21 14:22:46 +08:00
Fengzhe Zhou
5de85406ce
[Sync] add OC16 entry ( #1171 )
2024-05-17 16:50:58 +08:00
Fengzhe Zhou
80f831b425
[Fix] use ProcessPoolExecutor during mbpp eval ( #1159 )
2024-05-15 13:48:29 +08:00
Fengzhe Zhou
7505b3cadf
[Feature] Add huggingface apply_chat_template ( #1098 )
...
* add TheoremQA with 5-shot
* add huggingface_above_v4_33 classes
* use num_worker partitioner in cli
* update theoremqa
* update TheoremQA
* add TheoremQA
* rename theoremqa -> TheoremQA
* update TheoremQA output path
* rewrite many model configs
* update huggingface
* further update
* refine configs
* update configs
* update configs
* add configs/eval_llama3_instruct.py
* add summarizer multi faceted
* update bbh datasets
* update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py
* rename class
* update readme
* update hf above v4.33
2024-05-14 14:50:16 +08:00
bittersweet1999
826d8307ac
fix links ( #1120 )
2024-05-08 15:13:18 +08:00
JuhaoLiang
d2c40e5648
[Feature] Add AceGPT-MMLUArabic benchmark ( #1099 )
...
* add AceGPT-MMLUArabic benchmark
* update readme and fix lint issue
* remove unused package
* add MMLUArabic zero-shot settings
* rename filename and update readme
2024-05-08 15:00:26 +08:00
Fangyu Lei
862044fb7d
[Feature] Add S3Eval Dataset ( #916 )
...
* s3eval_branch
* update s3eval
2024-05-06 19:41:52 +08:00
Yggdrasill7D6
af10ecc272
add mgsm datasets ( #1081 )
...
* add mgsm datasets
* fix lint
* fix lint
* update mgsm
* update mgsm
* ease code spell
* update
* update
* update
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-06 15:29:34 +08:00
klein
153c4fc988
[Feature] update drop dataset from openai simple eval ( #1092 )
...
* [Feature] update drop dataset from openai simple eval
* update drop template presentation
* update
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-06 13:37:08 +08:00
Alexander Lam
35c94d0cde
[Feature] Adding support for LLM Compression Evaluation ( #1108 )
...
* fixed formatting based on pre-commit tests
* fixed typo in comments; reduced the number of models in the eval config
* fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset
* removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English
2024-04-30 10:51:01 +08:00
bittersweet1999
3de48e9b35
[Bug] Fix CMB dataset ( #1106 )
2024-04-30 00:33:43 +08:00
Yggdrasill7D6
58a57a4c45
[Feature] add support for Flames datasets ( #1093 )
...
* add flames datasets
* fix lint
* rm quota
* add judgemodel info and fix os path
* support flames dataset
* support flames dataset
---------
Co-authored-by: bittersweet1999 <1487910649@qq.com>
2024-04-28 18:56:24 +08:00
Francis-llgg
f1ee11de14
[Feature] Add gpqa prompt from simple_evals, openai ( #1080 )
...
* add gpqa_openai_simple_eval
* 触发CI构建
* reorg
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 20:13:00 +08:00
klein
e4830a6926
Update CIBench ( #1089 )
...
* modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4
* update cibench: dataset and evluation
* cibench summarizer bug
* update cibench
* move extract_code import
---------
Co-authored-by: zhangchuyu@pjlab.org.cn <zhangchuyu@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 18:46:02 +08:00
bittersweet1999
e404b72c52
[Feature] support arenahard evaluation ( #1096 )
...
* support arenahard
* support arenahard
* support arenahard
2024-04-26 15:42:00 +08:00
bittersweet1999
6ba1c4937d
[Feature] Support Math evaluation via judgemodel ( #1094 )
...
* support openai math evaluation
* support openai math evaluation
* support openai math evaluation
* support math llm judge
* support math llm judge
2024-04-26 14:56:23 +08:00
Fengzhe Zhou
004ed79593
[Feature] Add TheoremQA with 5-shot ( #1048 )
...
* add TheoremQA with 5-shot
* cherry pick from add-huggingface-above-v4.33, good TheoremQA results
2024-04-22 15:22:04 +08:00
Fengzhe Zhou
8c85edd1cd
[Sync] deprecate old mbpps ( #1064 )
2024-04-19 20:49:46 +08:00
liuwei130
a00e57296f
[Feature] Add ChemBench ( #1032 )
...
* add ChemBench
* update results
* molbench -> ChemBench
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-12 08:46:26 +08:00
Fengzhe Zhou
b39f501563
[Sync] update taco ( #1030 )
2024-04-09 17:50:23 +08:00
Mo Li
f2af49337d
[Feature] Add ATC Choice Version ( #1019 )
...
* Squashed commit of the following:
commit c48ad194c3976dc63d1b60d8c8ab2d5ff9e1cbfe
Author: DseidLi <2568818204@qq.com>
Date: Tue Apr 2 16:57:43 2024 +0800
add atc_choice
commit 3ac6efea29619573e6fac8fa3cce464853dcead0
Merge: 2d4e559
8e3a9c3
Author: DseidLi <2568818204@qq.com>
Date: Tue Apr 2 16:41:38 2024 +0800
Merge branch 'atc_choice' into atc_add_choice
commit 8e3a9c396a3e5546d3faf584183f6fd60b974d5e
Merge: 150a036 0a6a03f
Author: DseidLi <2568818204@qq.com>
Date: Tue Mar 26 04:47:07 2024 +0800
Merge branch 'main' into atc_choice
Conflicts:
configs/summarizers/needlebench.py
opencompass/datasets/needlebench/multi.py
opencompass/datasets/needlebench/origin.py
opencompass/datasets/needlebench/parallel.py
commit 150a036d6d990f26a57c974d1af83d88c31a0f9d
Merge: 8d6ac9a 940dd18
Author: DseidLi <2568818204@qq.com>
Date: Wed Mar 20 03:49:08 2024 +0800
Merge branch 'needlebench_fix' into atc_choice
commit 8d6ac9a1a43b1c9d0f0ea27e7d58968a203ea898
Author: DseidLi <2568818204@qq.com>
Date: Wed Mar 20 03:41:49 2024 +0800
optimize needlebench code
commit 940dd18a4270f24bc69edd2a780182c68918e1a9
Author: DseidLi <2568818204@qq.com>
Date: Wed Mar 20 03:39:46 2024 +0800
fix vllm
commit d8be6877bc41051f3edcc0421c462c834c0f1c9a
Merge: ecad78a 2527fda
Author: DseidLi <2568818204@qq.com>
Date: Tue Mar 19 21:07:08 2024 +0800
Merge remote-tracking branch 'origin/add_1M_dataset' into atc_choice
commit 2527fda8a5
Author: DseidLi <2568818204@qq.com>
Date: Tue Mar 19 16:03:40 2024 +0800
add model configs
commit 75425acdf8
Author: DseidLi <2568818204@qq.com>
Date: Tue Mar 19 16:02:15 2024 +0800
add prompt postion args
commit 367ba1ba61
Author: DseidLi <2568818204@qq.com>
Date: Wed Feb 28 21:40:00 2024 +0800
add Needlebench-1000K configs
commit ecad78af14c4bb00fe325779114b384c57ab30bf
Author: DseidLi <2568818204@qq.com>
Date: Thu Mar 14 22:08:32 2024 +0800
fix atc
commit 08772c0787b18872abadc9ffec3223941a5ee0c2
Merge: 9f3f8cf caf1cf8
Author: DseidLi <2568818204@qq.com>
Date: Thu Mar 14 22:07:28 2024 +0800
Merge branch 'main' into atc_choice
Conflicts:
configs/datasets/needlebench/readme.md
configs/datasets/needlebench/readme_zh-CN.md
configs/summarizers/needlebench.py
opencompass/datasets/needlebench/atc.py
opencompass/summarizers/needlebench.py
commit 9f3f8cfb4452722734d334114ac1d14110e57406
Author: DseidLi <2568818204@qq.com>
Date: Thu Mar 14 21:35:53 2024 +0800
add atc-choice test
commit 52be7c1202376b4e09821188b826f1a805328129
Author: DseidLi <2568818204@qq.com>
Date: Wed Mar 6 02:54:15 2024 +0800
update needlebench randomseed and add vllm qwen14b
commit fc1effce596ae2e5ece4933e8cd34aef8e64a6f9
Merge: 4e747ed caf1cf8
Author: DseidLi <2568818204@qq.com>
Date: Wed Mar 6 02:51:14 2024 +0800
Merge branch 'main' into add_model_configs
commit 31834f9b23af3354ac3581ec86d693d0f05cdd1c
Merge: 7dabc82 120bf8b
Author: DseidLi <2568818204@qq.com>
Date: Sun Mar 3 23:29:42 2024 +0800
Merge branch 'main' of https://github.com/open-compass/opencompass into atc_choice
commit 4e747ed1988ddbcfcc7fff334601259ade72d363
Author: DseidLi <2568818204@qq.com>
Date: Sun Mar 3 22:15:25 2024 +0800
add internlm2-lmdeploy model and gemma configs
commit 7dabc828123d711c8cf834d6aab4137bb55e85ed
Author: DseidLi <2568818204@qq.com>
Date: Sat Mar 2 17:26:15 2024 +0800
add atc choice version -ZH
commit 996f8ae43d
Author: DseidLi <2568818204@qq.com>
Date: Wed Feb 28 16:58:56 2024 +0800
update readme for needlebench
commit f7266e873c
Author: DseidLi <2568818204@qq.com>
Date: Wed Feb 28 16:44:53 2024 +0800
move readme.md
commit 1c7375681d
Author: DseidLi <2568818204@qq.com>
Date: Wed Feb 28 16:38:31 2024 +0800
fix linting error
commit b6524f3ebf
Author: DseidLi <2568818204@qq.com>
Date: Wed Feb 28 16:33:51 2024 +0800
lint summarizer
commit c0d1190e39
Author: DseidLi <2568818204@qq.com>
Date: Wed Feb 28 16:29:03 2024 +0800
add needlebench intro, fix summarizer
commit 0965baf785
Author: DseidLi <2568818204@qq.com>
Date: Mon Feb 26 13:31:26 2024 +0800
fix bug in needlebench summarizer
commit 5d32b31eb8
Author: DseidLi <2568818204@qq.com>
Date: Sat Feb 24 03:19:08 2024 +0800
update act prompt
commit af82a7f085
Merge: 32bf9fe
53fe788
Author: DseidLi <2568818204@qq.com>
Date: Fri Feb 23 17:50:32 2024 +0800
Merge remote-tracking branch 'upstream/main' into needlebench
commit 32bf9fe802
Author: DseidLi <2568818204@qq.com>
Date: Fri Feb 23 17:31:32 2024 +0800
simplify needlebench 32k, 128k, 200k for eval
commit a7cb025e05
Author: DseidLi <2568818204@qq.com>
Date: Fri Feb 23 14:48:58 2024 +0800
add needlebench
* fix summarizer
* remove repeated code
* remove chinese comments
2024-04-07 15:46:20 +08:00
Mo Li
0a6a03fe1a
[Feature] update needlebench and configs ( #986 )
...
* add Needlebench-1000K configs
* add prompt postion args
* add model configs
* Update parallel.py
* fix lint
2024-03-25 18:05:01 +08:00
Connor-Shen
0221d30877
[Fix] Update APPS/TACO ( #988 )
...
* [Feature] update apps/taco
* [Feature] update apps/taco
2024-03-19 20:21:39 +08:00
Connor-Shen
8a3c6e51ed
[Feature] Update APPS ( #985 )
...
* update post process
* update post process
2024-03-19 15:47:05 +08:00
Connor-Shen
d92595b671
[Feat] Support TACO ( #966 )
...
* [Feat] Support TACO
* update README
* update README
2024-03-19 15:39:16 +08:00
Jingming
89a8a8917b
[Feature] Add the implement of QuALITY datasets ( #976 )
...
#976
2024-03-15 21:22:38 +08:00
Connor-Shen
3098d78845
[Bench] Support APPS ( #963 )
...
* [Feat] support apps
* [Feat] support apps
* [Feat] support apps
* update README
2024-03-13 16:09:23 +08:00
Fengzhe Zhou
bdd85358cc
[Sync] update 20240308 ( #953 )
2024-03-11 22:34:19 +08:00
Fengzhe Zhou
b03d5dc531
[Sync] Sync Internal ( #941 )
2024-03-04 14:42:36 +08:00
yuantao2108
bbec7d8733
[Feature] add lveval benchmark ( #914 )
...
* add lveval benchmark
* add LVEval readme file
* update LVEval readme file
* Update configs/eval_bluelm_32k_lveval.py
* Update configs/eval_llama2_7b_lveval.py
---------
Co-authored-by: yuantao <yuantao@infini-ai.com>
Co-authored-by: Mo Li <82895469+DseidLi@users.noreply.github.com>
2024-03-04 11:22:03 +08:00