changyeyu
59586a8b4a
[Feature] Enable Truncation of Mid-Section for Long Prompts in huggingface_above_v4_33.py
( #1373 )
...
* Retain the first and last halves of the tokens from the prompt, discarding the middle, to avoid exceeding the model's maximum length.
* Add default parameter: mode
* Modified a comment.
* Modified variable names.
* fix yapf lint
2024-08-09 11:36:30 +08:00
Songyang Zhang
88eb91219b
[Doc] Update README ( #1404 )
...
* [Doc] Update README
* Update
2024-08-08 16:18:33 +08:00
yaoyingyy
decb621ff6
[Fix] the issue where scores are negative in the Lawbench dataset evaluation( #1402 ) ( #1403 )
2024-08-08 16:08:26 +08:00
Yunlin Mao
818d72a650
[Fix] modelscope dataset load problem ( #1406 )
...
* fix modelscope dataset load
* fix lint
2024-08-08 14:01:06 +08:00
Songyang Zhang
264fd23129
[Bump] Bump version for v0.3.0 ( #1398 )
2024-08-07 01:25:24 +08:00
Songyang Zhang
fed1a4998b
[Fix] Fix CaLM import ( #1395 )
2024-08-06 12:17:45 +08:00
Songyang Zhang
c81329b548
[Fix] Fix Slurm ENV ( #1392 )
...
1. Support Slurm Cluster
2. Support automatic data download
3. Update InternLM2.5-1.8B/20B-Chat
2024-08-06 01:35:20 +08:00
Songyang Zhang
c09fc79ba8
[Feature] Support OpenAI ChatCompletion ( #1389 )
...
* [Feature] Support import configs/models/summarizers from whl
* Update
* Update openai sdk
* Update
* Update gemma
2024-08-01 19:10:13 +08:00
Peng Bo
07c96ac659
Calm dataset ( #1385 )
...
* Add CALM Dataset
2024-08-01 10:03:21 +08:00
Songyang Zhang
46cc7894e1
[Feature] Support import configs/models/summarizers from whl ( #1376 )
...
* [Feature] Support import configs/models/summarizers from whl
* Update LCBench configs
* Update
* Update
* Update
* Update
* update
* Update
* Update
* Update
* Update
* Update
2024-08-01 00:42:48 +08:00
Mo Li
b83396f57c
add 1m config ( #1383 )
2024-07-31 14:53:51 +08:00
klein
52eccc4f0e
[Fix] Fix version mismatch of CIBench ( #1380 )
...
* update crb
* update crbbench
* update crbbench
* update crbbench
* minor update wildbench
* [Fix] Update doc of wildbench, and merge wildbench into subjective
* [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench
* Update crb.md
* Update crb_pair_judge.py
* Update crb_single_judge.py
* Update subjective_evaluation.md
* Update openai_api.py
* [Update] update wildbench readme
* [Update] update wildbench readme
* [Update] update wildbench readme, remove crb
* Delete configs/eval_subjective_wildbench_pair.py
* Delete configs/eval_subjective_wildbench_single.py
* Update __init__.py
* [Fix] fix version mismatch for CIBench
* [Fix] fix version mismatch for CIBench, local runer
* [Fix] fix version mismatch for CIBench, local runer, remove oracle mode
---------
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-07-30 17:51:24 +08:00
Songyang Zhang
33ceaa0eb8
[Bug] Fix bug in turbomind ( #1377 )
2024-07-30 09:37:50 +08:00
Songyang Zhang
eee5a5be23
[Fix] Update get_data_path for LCBench and HumanEval ( #1375 )
2024-07-29 19:28:09 +08:00
QXY
fea11b1d20
[Feature] add support for hf_pulse_7b ( #1255 )
...
* add support for hf_pulse_7b
* Update hf_pulse_7b.py
2024-07-29 19:01:52 +08:00
Songyang Zhang
704853e5e7
[Feature] Update pip install ( #1324 )
...
* [Feature] Update pip install
* Update Configuration
* Update
* Update
* Update
* Update Internal Config
* Update collect env
2024-07-29 18:32:50 +08:00
Xingjun.Wang
edab1c07ba
[Feature] Support ModelScope datasets ( #1289 )
...
* add ceval, gsm8k modelscope surpport
* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest
* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets
* format file
* format file
* update dataset format
* support ms_dataset
* udpate dataset for modelscope support
* merge myl_dev and update test_ms_dataset
* udpate dataset for modelscope support
* update readme
* update eval_api_zhipu_v2
* remove unused code
* add get_data_path function
* update readme
* remove tydiqa japanese subset
* add ceval, gsm8k modelscope surpport
* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest
* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets
* format file
* format file
* update dataset format
* support ms_dataset
* udpate dataset for modelscope support
* merge myl_dev and update test_ms_dataset
* update readme
* udpate dataset for modelscope support
* update eval_api_zhipu_v2
* remove unused code
* add get_data_path function
* remove tydiqa japanese subset
* update util
* remove .DS_Store
* fix md format
* move util into package
* update docs/get_started.md
* restore eval_api_zhipu_v2.py, add environment setting
* Update dataset
* Update
* Update
* Update
* Update
---------
Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local>
Co-authored-by: Yunnglin <mao.looper@qq.com>
Co-authored-by: Yun lin <yunlin@laptop.local>
Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-29 13:48:32 +08:00
jxd
12b84aeb3b
[Feature] Update CHARM Memeorziation ( #1230 )
...
* update gemini api and add gemini models
* add openai models
* update CHARM evaluation
* add CHARM memorization tasks
* add CharmMemSummarizer (output eval details for memorization-independent reasoning analysis
* update CHARM readme
---------
Co-authored-by: wujiang <wujiang@pjlab.org.cn>
2024-07-26 18:42:30 +08:00
bittersweet1999
d3782c1d47
Revert "Calm dataset ( #1287 )" ( #1366 )
...
This reverts commit edd0ffdf70
.
2024-07-26 18:27:29 +08:00
Xu Song
9b9855a008
Add en
and zh
groups to longbench summarizer; Fix longbench overall score ( #1216 )
...
* Add longbench groups
* update
* update
2024-07-26 11:50:41 +08:00
Peng Bo
edd0ffdf70
Calm dataset ( #1287 )
...
* add calm dataset
* modify config max_out_len
* update README
* Modify README
* update README
* update README
* update README
* update README
* update README
* add summarizer and modify readme
* delete summarizer config comment
* update summarizer
* modify same response to all questions
* update README
2024-07-26 11:48:16 +08:00
mqy004
a08931f214
[Fix] origin_prompt should be None in llm-compression task ( #1225 )
...
Co-authored-by: Qinyang Mou <qinyang_mou@intsig.net>
2024-07-26 11:46:02 +08:00
LeavittLang
8ee7fecb68
Adding support for Doubao API ( #1218 )
...
* Adding support for Doubao API
* Update doubao_api.py
Fixed the bug that the connection would be retried even if it was normal.
* Update doubao_api.py
---------
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-07-26 11:44:51 +08:00
klein
65fad8e2ac
[Fix] minor update wildbench ( #1335 )
...
* update crb
* update crbbench
* update crbbench
* update crbbench
* minor update wildbench
* [Fix] Update doc of wildbench, and merge wildbench into subjective
* [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench
* Update crb.md
* Update crb_pair_judge.py
* Update crb_single_judge.py
* Update subjective_evaluation.md
* Update openai_api.py
* [Update] update wildbench readme
* [Update] update wildbench readme
* [Update] update wildbench readme, remove crb
* Delete configs/eval_subjective_wildbench_pair.py
* Delete configs/eval_subjective_wildbench_single.py
* Update __init__.py
---------
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-07-26 11:19:04 +08:00
baymax591
51a94aee01
[Bug] fix bug: delete & ( #1365 )
...
Co-authored-by: 白超 <baichao19@huawei.com>
2024-07-26 11:03:55 +08:00
Mo Li
69aa2f2d57
[Feature] Make NeedleBench available on HF ( #1364 )
...
* update_lint
* update_huggingface format
* fix bug
* update docs
2024-07-25 19:01:56 +08:00
Fengzhe Zhou
c3c02c2960
update docs ( #1318 )
...
* update docs
* 高效评测 -> 数据分片
* update
* update
* Update faq.md
---------
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-07-25 18:44:25 +08:00
heya5
73aa55af6d
[Fix] Support HF models deployed with an OpenAI-compatible API. ( #1352 )
...
* Support HF models deployed with an OpenAI-compatible API.
* resolve lint issue
* add extra_body arguments
There are many other arguments when using openi-compatiable API like this: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#extra-parameters-for-chat-api
* fix linting issue
* fix yapf linting issue
2024-07-25 18:38:23 +08:00
WANG WENJIN
0aad8199c7
Fix the summary error in subjective.py ( #1363 )
2024-07-25 18:36:13 +08:00
bittersweet1999
8fe75e9937
[Update] update Subeval demo config ( #1358 )
...
* fix pip version
* fix pip version
* update demo config
2024-07-24 15:48:28 +08:00
bittersweet1999
86b6d18731
[Update] Update model support list ( #1353 )
...
* fix pip version
* fix pip version
* update model support
2024-07-23 13:35:58 +08:00
liushz
cf3e942f73
[Fix] Fix MathBench ( #1351 )
...
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-07-23 13:35:38 +08:00
Linchen Xiao
8127fc3518
CompassBench subjective summarizer added ( #1349 )
...
* subjective summarizer added
* fix lint
2024-07-23 12:29:57 +08:00
Que Haoran
a244453d9e
[Feature] Support inference ppl datasets ( #1315 )
...
* commit inference ppl datasets
* revised format
* revise
* revise
* revise
* revise
* revise
* revise
2024-07-22 17:59:30 +08:00
Xu Song
e9384823f2
Upgrade default math pred_postprocessor
( #1340 )
...
* Change default math postprocessor
* Update math_gen_265cce.py
2024-07-22 14:00:49 +08:00
Songyang Zhang
96f644de69
[Fix] Update path and folder ( #1344 )
...
* Update path and folder
* Update path
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-21 08:18:14 +08:00
Linchen Xiao
a56678190b
[Feature] CompassBench v1_3 subjective evaluation ( #1341 )
...
* stash files
* compassbench subjective evaluation added
* evaluation update
* remove unneeded content
* fix lint
* update docs
* Update lint
* Update
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-19 23:12:23 +08:00
liushz
98c58f8a6c
[Feature] Add compassbench knowledge&math part ( #1342 )
...
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Fix Llama-3 meta template
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Update acclerator
* Update MathBench
* Update accelerator
* Add Doc for accelerator
* Add Doc for accelerator
* Add Doc for accelerator
* Add Doc for accelerator
* Update compassbench august wiki&math
* Update compassbench august wiki&math
* Update compassbench august wiki&math
* Update compassbench_aug_gen_068af0.py
* Update compassbench_aug_gen_068af0.py
* Update
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-19 22:54:46 +08:00
bittersweet1999
1f9f728f22
[Feature] support compassbench Checklist evaluation ( #1339 )
...
* fix pip version
* fix pip version
* support checklist eval
* init
* add lan
* fix typo
2024-07-19 16:40:44 +08:00
Mo Li
f40add2596
[Fix] Fix lint ( #1334 )
...
* update needlebench docs
* update model_name_mapping dict
* update README
* fix_lint
2024-07-18 17:15:06 +08:00
Xu Song
1bfb4217ff
Fix typing and typo ( #1331 )
2024-07-18 13:41:24 +08:00
Mo Li
104bddf647
[Doc] Update NeedleBench Docs ( #1330 )
...
* update needlebench docs
* update model_name_mapping dict
* update README
* Update README_zh-CN.md
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-07-18 13:16:19 +08:00
Xu Song
0a1c89e618
[Fix] Fix rouge evaluator of rolebench_zh ( #1322 )
2024-07-16 16:18:13 +08:00
bittersweet1999
3aeabbc427
[Fix] update Faq ( #1313 )
...
* fix pip version
* fix pip version
* update faq
* update faq
* update faq
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-07-12 11:29:26 +08:00
bittersweet1999
8e7ad2e981
[Fix] add bc for alignbench summarizer ( #1306 )
...
* fix pip version
* fix pip version
* fix alignbench
* fix import error
2024-07-12 11:06:20 +08:00
Fengzhe Zhou
62f55987f1
force register ( #1311 )
2024-07-11 19:59:35 +08:00
bittersweet1999
889e7e1140
[Fix] Change abbr for arenahard dataset ( #1302 )
...
* fix pip version
* fix pip version
* change abbr for arenahard
2024-07-11 12:42:03 +08:00
Fengzhe Zhou
a62c613d3e
[Sync] bump version 0.2.6+local ( #1294 )
2024-07-06 00:44:06 +08:00
Fengzhe Zhou
1d3a26c732
[Doc] quick start swap tabs ( #1263 )
...
* [doc] quick start swap tabs
* update docs
* update
* update
* update
* update
* update
* update
* update
2024-07-05 23:51:42 +08:00
bittersweet1999
68ca48496b
[Refactor] Reorganize subjective eval ( #1284 )
...
* fix pip version
* fix pip version
* reorganize subjective eval
* reorg sub
* reorg subeval
* reorg subeval
* update subjective doc
* reorg subeval
* reorg subeval
2024-07-05 22:11:37 +08:00