Fengzhe Zhou
80f831b425
[Fix] use ProcessPoolExecutor during mbpp eval ( #1159 )
2024-05-15 13:48:29 +08:00
bittersweet1999
8a8987be0b
fix arenahard summarizer ( #1154 )
...
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-15 13:31:29 +08:00
Fengzhe Zhou
62dbf04708
[Sync] update github workflow ( #1156 )
2024-05-14 22:42:23 +08:00
Fengzhe Zhou
7505b3cadf
[Feature] Add huggingface apply_chat_template ( #1098 )
...
* add TheoremQA with 5-shot
* add huggingface_above_v4_33 classes
* use num_worker partitioner in cli
* update theoremqa
* update TheoremQA
* add TheoremQA
* rename theoremqa -> TheoremQA
* update TheoremQA output path
* rewrite many model configs
* update huggingface
* further update
* refine configs
* update configs
* update configs
* add configs/eval_llama3_instruct.py
* add summarizer multi faceted
* update bbh datasets
* update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py
* rename class
* update readme
* update hf above v4.33
2024-05-14 14:50:16 +08:00
Mo Li
6c711cb262
[Fix] Fix Needlebench Summarizer ( #1143 )
...
* update few-shot example
* add 128k
2024-05-13 15:59:34 +08:00
bittersweet1999
833a35140b
[Fix] fix alpacaeval while add caching path ( #1139 )
...
* fix alpacaeval
* fix alpacaeval
2024-05-11 14:02:26 +08:00
Fengzhe Zhou
19d7e630d6
[Sync] Update accelerator ( #1122 )
...
(cherry picked from commit 4beb6d9ab655d8a626971841b7acfd9fae9d438f)
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-05-09 14:32:31 +08:00
bittersweet1999
826d8307ac
fix links ( #1120 )
2024-05-08 15:13:18 +08:00
JuhaoLiang
d2c40e5648
[Feature] Add AceGPT-MMLUArabic benchmark ( #1099 )
...
* add AceGPT-MMLUArabic benchmark
* update readme and fix lint issue
* remove unused package
* add MMLUArabic zero-shot settings
* rename filename and update readme
2024-05-08 15:00:26 +08:00
Fangyu Lei
862044fb7d
[Feature] Add S3Eval Dataset ( #916 )
...
* s3eval_branch
* update s3eval
2024-05-06 19:41:52 +08:00
Yggdrasill7D6
af10ecc272
add mgsm datasets ( #1081 )
...
* add mgsm datasets
* fix lint
* fix lint
* update mgsm
* update mgsm
* ease code spell
* update
* update
* update
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-06 15:29:34 +08:00
klein
153c4fc988
[Feature] update drop dataset from openai simple eval ( #1092 )
...
* [Feature] update drop dataset from openai simple eval
* update drop template presentation
* update
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-06 13:37:08 +08:00
Fengzhe Zhou
d43392a3bb
[Feature] Add mmlu prompt from simple_evals, openai ( #1074 )
...
* add mmlu prompt from simple_evals, openai
* return empty str on failure
2024-05-06 13:26:26 +08:00
Yang Yong
53fe390454
fix LightllmApi workers bug ( #1113 )
2024-04-30 22:09:22 +08:00
Alexander Lam
35c94d0cde
[Feature] Adding support for LLM Compression Evaluation ( #1108 )
...
* fixed formatting based on pre-commit tests
* fixed typo in comments; reduced the number of models in the eval config
* fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset
* removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English
2024-04-30 10:51:01 +08:00
bittersweet1999
3de48e9b35
[Bug] Fix CMB dataset ( #1106 )
2024-04-30 00:33:43 +08:00
liushz
a6f67e1a65
[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README ( #1103 )
...
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Fix Llama-3 meta template
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-04-28 21:58:58 +08:00
Lyu Han
1013dce60c
adapt to lmdeploy v0.4.0 ( #1073 )
...
* adapt to lmdeploy v0.4.0
* compatible
2024-04-28 19:57:40 +08:00
Yggdrasill7D6
58a57a4c45
[Feature] add support for Flames datasets ( #1093 )
...
* add flames datasets
* fix lint
* rm quota
* add judgemodel info and fix os path
* support flames dataset
* support flames dataset
---------
Co-authored-by: bittersweet1999 <1487910649@qq.com>
2024-04-28 18:56:24 +08:00
dmitrysarov
cce5b6fbb6
fix output typing, change mutable list to immutable tuple ( #989 )
...
* fix output typing, change mutable list to immutable tuple
* import missed type
* format
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 23:07:34 +08:00
binary-husky
701ecbb292
[Fix] python path bug ( #1063 )
...
* fix relative path bug
* format
---------
Co-authored-by: hmp <505030475@qq.com>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 21:58:45 +08:00
Wang Xingjin
048d41a1c4
add vllm get_ppl ( #1003 )
...
* add vllm get_ppl
* add vllm get_ppl
* format
---------
Co-authored-by: xingjin.wang <xingjin.wang@mihoyo.com>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 21:31:56 +08:00
Haodong Duan
3a232db471
[Deperecate] Remove multi-modal related stuff ( #1072 )
...
* Remove MultiModal
* update index.rst
* update README
* remove mmbench codes
* update news
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 21:20:14 +08:00
Francis-llgg
f1ee11de14
[Feature] Add gpqa prompt from simple_evals, openai ( #1080 )
...
* add gpqa_openai_simple_eval
* 触发CI构建
* reorg
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 20:13:00 +08:00
klein
e4830a6926
Update CIBench ( #1089 )
...
* modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4
* update cibench: dataset and evluation
* cibench summarizer bug
* update cibench
* move extract_code import
---------
Co-authored-by: zhangchuyu@pjlab.org.cn <zhangchuyu@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 18:46:02 +08:00
bittersweet1999
e404b72c52
[Feature] support arenahard evaluation ( #1096 )
...
* support arenahard
* support arenahard
* support arenahard
2024-04-26 15:42:00 +08:00
bittersweet1999
6ba1c4937d
[Feature] Support Math evaluation via judgemodel ( #1094 )
...
* support openai math evaluation
* support openai math evaluation
* support openai math evaluation
* support math llm judge
* support math llm judge
2024-04-26 14:56:23 +08:00
Ke Bao
81d0e4d793
[Feature] Add lmdeploy tis python backend model ( #1014 )
...
* add lmdeploy tis python backend model
* fix pr check
* update
2024-04-23 14:27:11 +08:00
Fengzhe Zhou
8fe7b271cc
[Fix] Fix sequential runner ( #1070 )
2024-04-23 11:31:10 +08:00
Fengzhe Zhou
004ed79593
[Feature] Add TheoremQA with 5-shot ( #1048 )
...
* add TheoremQA with 5-shot
* cherry pick from add-huggingface-above-v4.33, good TheoremQA results
2024-04-22 15:22:04 +08:00
bittersweet1999
6f98c8d9ab
[Fix] Fix MultiRound Subjective Evaluation( #1043 )
...
* fix multiround
* fix
2024-04-22 12:06:03 +08:00
Fengzhe Zhou
8c85edd1cd
[Sync] deprecate old mbpps ( #1064 )
2024-04-19 20:49:46 +08:00
Robin Chen
c172401323
[Fix] Fixed repeated loading of VLLM ( #1051 )
...
* [fix]Fixed the issue caused by the repeated loading of VLLM model during task segmentation.
* [fix] avoid TypeError: VLLM.__init__() got an unexpected keyword argument 'tokenizer_only'
* restore .pre-commit-config.yaml
* restore opencompass/tasks/openicl_infer.py
---------
Co-authored-by: IcyFeather <mengzhuo.happy@gmail.com>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-17 20:36:08 +08:00
Fengzhe Zhou
881bdbf6bd
[Sync] Bump version to 0.2.4 ( #1052 )
...
(cherry picked from commit 16ac6306c72fa202173289b55eaefe85e0fcb73c)
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-04-16 18:09:46 +08:00
Fengzhe Zhou
7a41951dda
[Fix] logger.error -> logger.debug in OpenAI wrapper ( #1050 )
...
* logger.error -> logger.info in OpenAI
* logger.info -> logger.debug in OpenAI
2024-04-15 21:08:13 +08:00
liuwei130
a00e57296f
[Feature] Add ChemBench ( #1032 )
...
* add ChemBench
* update results
* molbench -> ChemBench
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-12 08:46:26 +08:00
Fengzhe Zhou
b39f501563
[Sync] update taco ( #1030 )
2024-04-09 17:50:23 +08:00
Mo Li
16f29b25f1
[Fix] Simplify needlebench summarizer ( #1024 )
...
* Conflicts:
configs/summarizers/needlebench.py
* fix lint problems
2024-04-07 17:51:13 +08:00
Mo Li
f2af49337d
[Feature] Add ATC Choice Version ( #1019 )
...
* Squashed commit of the following:
commit c48ad194c3976dc63d1b60d8c8ab2d5ff9e1cbfe
Author: DseidLi <2568818204@qq.com>
Date: Tue Apr 2 16:57:43 2024 +0800
add atc_choice
commit 3ac6efea29619573e6fac8fa3cce464853dcead0
Merge: 2d4e559
8e3a9c3
Author: DseidLi <2568818204@qq.com>
Date: Tue Apr 2 16:41:38 2024 +0800
Merge branch 'atc_choice' into atc_add_choice
commit 8e3a9c396a3e5546d3faf584183f6fd60b974d5e
Merge: 150a036 0a6a03f
Author: DseidLi <2568818204@qq.com>
Date: Tue Mar 26 04:47:07 2024 +0800
Merge branch 'main' into atc_choice
Conflicts:
configs/summarizers/needlebench.py
opencompass/datasets/needlebench/multi.py
opencompass/datasets/needlebench/origin.py
opencompass/datasets/needlebench/parallel.py
commit 150a036d6d990f26a57c974d1af83d88c31a0f9d
Merge: 8d6ac9a 940dd18
Author: DseidLi <2568818204@qq.com>
Date: Wed Mar 20 03:49:08 2024 +0800
Merge branch 'needlebench_fix' into atc_choice
commit 8d6ac9a1a43b1c9d0f0ea27e7d58968a203ea898
Author: DseidLi <2568818204@qq.com>
Date: Wed Mar 20 03:41:49 2024 +0800
optimize needlebench code
commit 940dd18a4270f24bc69edd2a780182c68918e1a9
Author: DseidLi <2568818204@qq.com>
Date: Wed Mar 20 03:39:46 2024 +0800
fix vllm
commit d8be6877bc41051f3edcc0421c462c834c0f1c9a
Merge: ecad78a 2527fda
Author: DseidLi <2568818204@qq.com>
Date: Tue Mar 19 21:07:08 2024 +0800
Merge remote-tracking branch 'origin/add_1M_dataset' into atc_choice
commit 2527fda8a5
Author: DseidLi <2568818204@qq.com>
Date: Tue Mar 19 16:03:40 2024 +0800
add model configs
commit 75425acdf8
Author: DseidLi <2568818204@qq.com>
Date: Tue Mar 19 16:02:15 2024 +0800
add prompt postion args
commit 367ba1ba61
Author: DseidLi <2568818204@qq.com>
Date: Wed Feb 28 21:40:00 2024 +0800
add Needlebench-1000K configs
commit ecad78af14c4bb00fe325779114b384c57ab30bf
Author: DseidLi <2568818204@qq.com>
Date: Thu Mar 14 22:08:32 2024 +0800
fix atc
commit 08772c0787b18872abadc9ffec3223941a5ee0c2
Merge: 9f3f8cf caf1cf8
Author: DseidLi <2568818204@qq.com>
Date: Thu Mar 14 22:07:28 2024 +0800
Merge branch 'main' into atc_choice
Conflicts:
configs/datasets/needlebench/readme.md
configs/datasets/needlebench/readme_zh-CN.md
configs/summarizers/needlebench.py
opencompass/datasets/needlebench/atc.py
opencompass/summarizers/needlebench.py
commit 9f3f8cfb4452722734d334114ac1d14110e57406
Author: DseidLi <2568818204@qq.com>
Date: Thu Mar 14 21:35:53 2024 +0800
add atc-choice test
commit 52be7c1202376b4e09821188b826f1a805328129
Author: DseidLi <2568818204@qq.com>
Date: Wed Mar 6 02:54:15 2024 +0800
update needlebench randomseed and add vllm qwen14b
commit fc1effce596ae2e5ece4933e8cd34aef8e64a6f9
Merge: 4e747ed caf1cf8
Author: DseidLi <2568818204@qq.com>
Date: Wed Mar 6 02:51:14 2024 +0800
Merge branch 'main' into add_model_configs
commit 31834f9b23af3354ac3581ec86d693d0f05cdd1c
Merge: 7dabc82 120bf8b
Author: DseidLi <2568818204@qq.com>
Date: Sun Mar 3 23:29:42 2024 +0800
Merge branch 'main' of https://github.com/open-compass/opencompass into atc_choice
commit 4e747ed1988ddbcfcc7fff334601259ade72d363
Author: DseidLi <2568818204@qq.com>
Date: Sun Mar 3 22:15:25 2024 +0800
add internlm2-lmdeploy model and gemma configs
commit 7dabc828123d711c8cf834d6aab4137bb55e85ed
Author: DseidLi <2568818204@qq.com>
Date: Sat Mar 2 17:26:15 2024 +0800
add atc choice version -ZH
commit 996f8ae43d
Author: DseidLi <2568818204@qq.com>
Date: Wed Feb 28 16:58:56 2024 +0800
update readme for needlebench
commit f7266e873c
Author: DseidLi <2568818204@qq.com>
Date: Wed Feb 28 16:44:53 2024 +0800
move readme.md
commit 1c7375681d
Author: DseidLi <2568818204@qq.com>
Date: Wed Feb 28 16:38:31 2024 +0800
fix linting error
commit b6524f3ebf
Author: DseidLi <2568818204@qq.com>
Date: Wed Feb 28 16:33:51 2024 +0800
lint summarizer
commit c0d1190e39
Author: DseidLi <2568818204@qq.com>
Date: Wed Feb 28 16:29:03 2024 +0800
add needlebench intro, fix summarizer
commit 0965baf785
Author: DseidLi <2568818204@qq.com>
Date: Mon Feb 26 13:31:26 2024 +0800
fix bug in needlebench summarizer
commit 5d32b31eb8
Author: DseidLi <2568818204@qq.com>
Date: Sat Feb 24 03:19:08 2024 +0800
update act prompt
commit af82a7f085
Merge: 32bf9fe
53fe788
Author: DseidLi <2568818204@qq.com>
Date: Fri Feb 23 17:50:32 2024 +0800
Merge remote-tracking branch 'upstream/main' into needlebench
commit 32bf9fe802
Author: DseidLi <2568818204@qq.com>
Date: Fri Feb 23 17:31:32 2024 +0800
simplify needlebench 32k, 128k, 200k for eval
commit a7cb025e05
Author: DseidLi <2568818204@qq.com>
Date: Fri Feb 23 14:48:58 2024 +0800
add needlebench
* fix summarizer
* remove repeated code
* remove chinese comments
2024-04-07 15:46:20 +08:00
Mo Li
b50d163265
[Fix] Refactor Needlebench Configs for CLI Testing Support ( #1020 )
...
* add needlebench datasets suffix
* fix import
* update run.py args for summarizer key and dataset suffix
* update utils/run.py
2024-04-07 15:12:56 +08:00
bittersweet1999
2d4e559763
[Feature] Add multi-model judge and fix some problems ( #1016 )
...
* support multi-model judge and moe judge
* test_moe
* test_moe
* test
* add moe judge
* support multi-judge-model
2024-04-02 11:52:06 +08:00
bittersweet1999
02e7eec911
[Feature] Support AlpacaEval_V2 ( #1006 )
...
* support alpacaeval_v2
* support alpacaeval
* update docs
* update docs
2024-03-28 16:49:04 +08:00
Mo Li
0a6a03fe1a
[Feature] update needlebench and configs ( #986 )
...
* add Needlebench-1000K configs
* add prompt postion args
* add model configs
* Update parallel.py
* fix lint
2024-03-25 18:05:01 +08:00
Chaseldot
1d3198554b
[Fix] base.py change status into list ( #994 )
2024-03-22 17:06:34 +08:00
Ke Bao
e415ddf96a
[Fix] Fix turbomind_tis ( #992 )
2024-03-22 15:50:12 +08:00
Connor-Shen
0221d30877
[Fix] Update APPS/TACO ( #988 )
...
* [Feature] update apps/taco
* [Feature] update apps/taco
2024-03-19 20:21:39 +08:00
Connor-Shen
8a3c6e51ed
[Feature] Update APPS ( #985 )
...
* update post process
* update post process
2024-03-19 15:47:05 +08:00
Connor-Shen
d92595b671
[Feat] Support TACO ( #966 )
...
* [Feat] Support TACO
* update README
* update README
2024-03-19 15:39:16 +08:00
bittersweet1999
c78a4df923
add support for set prediction path ( #984 )
2024-03-19 14:32:15 +08:00
Jingming
89a8a8917b
[Feature] Add the implement of QuALITY datasets ( #976 )
...
#976
2024-03-15 21:22:38 +08:00
Connor-Shen
3098d78845
[Bench] Support APPS ( #963 )
...
* [Feat] support apps
* [Feat] support apps
* [Feat] support apps
* update README
2024-03-13 16:09:23 +08:00
Fengzhe Zhou
ab6cdb2be8
[Sync] Bump version 0.2.3 ( #957 )
2024-03-12 11:51:56 +08:00
Fengzhe Zhou
64fde73b15
[Fix] Use logger.error on failure ( #960 )
2024-03-12 11:51:39 +08:00
Fengzhe Zhou
bdd85358cc
[Sync] update 20240308 ( #953 )
2024-03-11 22:34:19 +08:00
bittersweet1999
848e7c8a76
[fix] add different temp for different question in mtbench ( #954 )
...
* add temp for mtbench
* add document for mtbench
* add document for mtbench
2024-03-11 17:24:39 +08:00
Yang Yong
3829be87b1
Fix LightllmApi ppl test ( #951 )
2024-03-08 12:04:44 +08:00
Yang Yong
107e022cf4
Support prompt template for LightllmApi. Update LightllmApi token bucket. ( #945 )
2024-03-06 15:33:53 +08:00
RunningLeon
c54a5d3b0f
Support get_ppl for TurbomindModel ( #878 )
...
* update ppl for turbomindmodel
* update api_server
* rename config and set thread_safe for pytorch engine if possible
2024-03-06 11:44:19 +08:00
Fengzhe Zhou
b03d5dc531
[Sync] Sync Internal ( #941 )
2024-03-04 14:42:36 +08:00
yuantao2108
bbec7d8733
[Feature] add lveval benchmark ( #914 )
...
* add lveval benchmark
* add LVEval readme file
* update LVEval readme file
* Update configs/eval_bluelm_32k_lveval.py
* Update configs/eval_llama2_7b_lveval.py
---------
Co-authored-by: yuantao <yuantao@infini-ai.com>
Co-authored-by: Mo Li <82895469+DseidLi@users.noreply.github.com>
2024-03-04 11:22:03 +08:00
Mo Li
8142f399a8
[Feature] Upgrade the needle-in-a-haystack experiment to Needlebench ( #913 )
...
* add needlebench
* simplify needlebench 32k, 128k, 200k for eval
* update act prompt
* fix bug in needlebench summarizer
* add needlebench intro, fix summarizer
* lint summarizer
* fix linting error
* move readme.md
* update readme for needlebench
* update docs of needlebench
* simplify needlebench summarizers
2024-03-04 11:10:52 +08:00
Kdump
3e9844ed33
[Fix]Fixed the problem of never entering task.run() mode in local scheduling mode. ( #930 )
...
* Fixed the problem of never entering task.run() mode in local scheduling mode.
get_command_template方法中为命令行前缀添加了CUDA_VISIBLE_DEVICES=或set CUDA_VISIBLE_DEVICES=。导致task.run()分支失效。
---------
CUDA_VISIBLE_DEVICES= or set CUDA_VISIBLE_DEVICES= is added to the command line prefix in the get_command_template method. Causes the task.run() branch to fail.
* [Fix]Fixed the problem of never entering task.run() mode in local scheduling mode.
get_command_template方法中为命令行前缀添加了CUDA_VISIBLE_DEVICES=或set CUDA_VISIBLE_DEVICES=。导致task.run()分支失效。
---
CUDA_VISIBLE_DEVICES= or set CUDA_VISIBLE_DEVICES= is added to the command line prefix in the get_command_template method. Causes the task.run() branch to fail.
* [Fix]Fixed the problem of never entering task.run() mode in local scheduling mode.
get_command_template方法中为命令行前缀添加了CUDA_VISIBLE_DEVICES=或set CUDA_VISIBLE_DEVICES=。导致task.run()分支失效。
CUDA_VISIBLE_DEVICES= or set CUDA_VISIBLE_DEVICES= is added to the command line prefix in the get_command_template method. Causes the task.run() branch to fail.
2024-02-29 14:35:45 +08:00
Skyfall-xzz
4c45a71bbc
[Feature] Support OpenFinData ( #896 )
...
* [Feature] Support OpenFinData
* add README for OpenFinData
* update README
2024-02-29 12:55:07 +08:00
bittersweet1999
001e77fea2
[Feature] add support for gemini ( #931 )
...
* add gemini
* add gemini
* add gemini
2024-02-28 19:38:34 +08:00
Fengzhe Zhou
9afbfa3639
[Sync] Fix TEvalEvaluator ( #929 )
2024-02-28 16:05:30 +08:00
Fengzhe Zhou
5ce8e0450e
[Fix] Fix type hint in IFEval ( #915 )
2024-02-28 10:53:40 +08:00
Jingming
53fe788d27
[Fix] fix ifeval ( #909 )
2024-02-23 16:52:03 +08:00
bittersweet1999
45c606bcd0
[Fix] Fix IFEval ( #906 )
...
* fix ifeval
* fix ifeval
* fix ifeval
* fix ifeval
2024-02-22 16:51:34 +08:00
RunningLeon
32ba0b074e
Support lmdeploy pytorch engine ( #875 )
...
* add lmdeploy pytorch model
* fix
* speed up encoding and decoding
* fix
* change tokenizer
2024-02-22 03:46:07 -03:00
Yang Yong
b6e21ece38
Support LightllmApi input_format ( #888 )
2024-02-19 10:02:59 +08:00
Fengzhe Zhou
08133e060a
[Sync] Bump version to 0.2.2 ( #880 )
2024-02-07 10:45:48 +08:00
hailsham
dd444685bb
fix bug of gsm8k_postprocess ( #863 )
...
* fix bug of gsm8k_postprocess
* update postprocess
---------
Co-authored-by: Lei Fei <SENSETIME\leifei1@cn3114002087l.domain.sensetime.com>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-02-06 23:52:47 +08:00
Connor-Shen
444d8d9507
[feat] support multipl-e ( #846 )
...
* [feat] support humaneval_multipl-e
* format
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-02-06 23:30:28 +08:00
Yggdrasill7D6
a6c49f15ce
fix lawbench 2-1 f0.5 score calculation bug ( #795 )
...
* fix lawbench 2-1 f0.5 score calculation bug
* use path in overall datasets folder
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-02-06 22:20:11 +08:00
bittersweet1999
1c8e193de8
[Fix] hotfix for mtbench ( #877 )
...
* hotfix for mtbench
* hotfix
2024-02-06 21:26:47 +08:00
Fengzhe Zhou
d34ba11106
[Sync] Merge branch 'dev' into zfz/update-keyset-demo ( #876 )
2024-02-05 23:29:10 +08:00
Skyfall-xzz
7ad1168062
Support NPHardEval ( #835 )
...
* support NPHardEval
* add .md file and fix minor bugs
* refactor and minor fix
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-02-05 15:52:28 +08:00
Yuchen Yan
fed7d800c6
[Fix] Fix error in gsm8k evaluator ( #782 )
...
Co-authored-by: jiangjin1999 <1261842974@qq.com>
2024-02-04 22:55:11 +08:00
bittersweet1999
7806cd0f64
[Feature] support alpacaeval ( #809 )
...
* support alpacaeval_v1
* Update opencompass/summarizers/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/summarizers/subjective/alpacaeval_v1.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix conflict
* support alpacaeval v2
* support alpacav2
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-02-04 14:18:36 +08:00
RunningLeon
4c87e777d8
[Feature] Add end_str for turbomind ( #859 )
...
* fix
* update
* fix internlm1
* fix docs
* remove sys
2024-02-01 22:31:14 +08:00
bittersweet1999
5c6dc908cd
fix compass arena ( #854 )
2024-01-30 16:34:38 +08:00
Songyang Zhang
cdca59ff49
[Fix] Update Zhipu API and Fix issue min_out_len issue of API models ( #847 )
...
* Update zhipu api and fix min_out_len issue of API class
* Update example
* Update example
2024-01-28 14:52:43 +08:00
Jingming
2801883351
[Fix] Fix acc of IFEval ( #849 )
...
* [Feature] Add IFEval
* [Fix] Changing the Score Rule.
2024-01-27 22:27:07 +08:00
Xiaoming Shi
35aace776a
[Fix] Update MedBench ( #845 )
2024-01-26 17:56:13 +08:00
Songyang Zhang
8ed022b4c4
Update Sensetime API ( #844 )
2024-01-26 16:40:49 +08:00
Hubert
4aa74565e2
[Feat] minor update agent related ( #839 )
...
* [Feat] update cibench
* [Feat] Support CIBench
* [Feat] Support CIBench
* [Feat] Support CIBench
* [Feat] Support CIBench
2024-01-26 14:15:51 +08:00
Fengzhe Zhou
0991dd33a0
[Sync] Updata dataset cfg for internMath ( #837 )
...
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-01-24 16:30:32 +08:00
Songyang Zhang
793e32c9cc
[Feature] Update API implementation ( #834 )
2024-01-24 13:35:21 +08:00
bittersweet1999
2ee8e8a1a1
[Feature] add mtbench ( #829 )
...
* add mtbench
* add mtbench
* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/mtbench.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix mtbench
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-24 12:11:47 +08:00
Jingming
e059a5c2bf
[Feature] Add IFEval ( #813 )
...
* [Feature] Add IFEval
* [Doc] add introduction of IFEval
2024-01-23 20:07:49 +08:00
bittersweet1999
3d9bb4aed7
[Fix] fix strings ( #833 )
...
* add compass arena
* add compass_arena
* add compass arena
* Update opencompass/summarizers/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/summarizers/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix check position bias
* fix string
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-23 10:57:26 +00:00
bittersweet1999
2d4da8dd02
[Feature] Add CompassArena ( #828 )
...
* add compass arena
* add compass_arena
* add compass arena
* Update opencompass/summarizers/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/summarizers/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix check position bias
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-23 15:12:46 +08:00
Guo Qipeng
e975a96fa1
Update cdme config and evaluator ( #812 )
...
* update cdme config and evaluator
* fix cdme prompt
* move CDME trim post-processor as a separate evaluator
---------
Co-authored-by: 郭琦鹏 <guoqipeng@pjlab.org.cn>
2024-01-19 11:29:27 +08:00
Yang Yong
f09a2ff418
Add LightllmApi KeyError log & Update doc ( #816 )
...
* Add LightllmApi KeyError log
* Update LightllmApi doc
2024-01-18 22:23:38 +08:00
RunningLeon
61fe873c89
[Fix] Fix turbomind and update docs ( #808 )
...
* update
* update docs
* add engine_config and gen_config in eval_config
* update
* fix
* fix
* fix
* fix docstr
* fix url
2024-01-18 14:41:35 +08:00
Fengzhe Zhou
b4afe3e7c1
[Sync] Add InternLM2 Keyset Evaluation Demo ( #807 )
...
Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>
2024-01-17 13:48:12 +08:00
Mo Li
acae560911
Added support for multi-needle testing in needle-in-a-haystack test ( #802 )
...
* Add NeedleInAHaystack Test
* Apply pre-commit formatting
* Update configs/eval_hf_internlm_chat_20b_cdme.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* add needle in haystack test
* update needle in haystack test
* update plot function in tools_needleinahaystack.py
* optimizing needleinahaystack dataset generation strategy
* modify minor formatting issues
* add English version support
* change NeedleInAHaystackDataset to dynamic loading
* change NeedleInAHaystackDataset to dynamic loading
* fix needleinahaystack test eval bug
* fix needleinahaystack config bug
* Added support for multi-needle testing in needle-in-a-haystack test
* Optimize the code for plotting in the needle-in-a-haystack test.
* Correct the typo in the dataset parameters.
* update needleinahaystack test docs
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-17 13:47:34 +08:00
RunningLeon
0836aec67b
[Feature] Update evaluate turbomind ( #804 )
...
* update
* fix
* fix
* fix
2024-01-17 11:09:50 +08:00
bittersweet1999
814b3f73bd
reorganize subject files ( #801 )
2024-01-16 18:03:11 +08:00
bittersweet1999
83d6c48378
[Feature] Add configs for creationbench ( #791 )
...
* add creationv2_zh
* add creationv2_zh
* add eng config for creationbench
* add eng config for creationbench
* add eng config for creationbench
2024-01-12 14:20:21 +08:00
notoschord
d3a0ddc3ef
[Feature] Add support for Nanbeige API ( #786 )
...
Co-authored-by: notoschord <wangzekai@kanzhun.com>
2024-01-11 13:54:27 +08:00
bittersweet1999
5679edb490
add temperature in alles ( #787 )
2024-01-11 03:57:24 +00:00
Xiaoming Shi
ad872a5dc2
[Feature] Update MedBench ( #779 )
...
* update medbench
* medbench update
* format medbench
* format
* Update
* update
* update
* update suffix
---------
Co-authored-by: 施晓明 <PJLAB\shixiaoming@pjnl104220118l.pjlab.org>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-01-09 11:42:44 +08:00
Fengzhe Zhou
a74e4c1a8d
[Sync] Bump version to 0.2.1 ( #778 )
2024-01-08 14:56:28 +00:00
Fengzhe Zhou
32f40a8f83
[Sync] Sync with internal codes 2023.01.08 ( #777 )
2024-01-08 14:07:24 +00:00
jiangjin1999
8194199d79
[Feature] *_batch_generate* function, add the MultiTokenEOSCriteria ( #772 )
...
* jiangjin1999: in the _batch_generate function, add the MultiTokenEOSCriteria feature to speed up inference.
* jiangjin1999: in the _batch_generate function, add the MultiTokenEOSCriteria feature to speed up inference.
---------
Co-authored-by: jiangjin08 <jiangjin08@MBP-2F32S5MD6P-0029.local>
Co-authored-by: jiangjin08 <jiangjin08@a.sh.vip.dianping.com>
2024-01-08 16:40:02 +08:00
liyucheng09
0b2863039e
[Feature] Contamination analysis for MMLU, Hellaswag, and ARC_c ( #699 )
...
* Contamination analysis for ARC_c, mmlu, and Hellaswag
* update `eval_contamination.py`
* update `contamination.py` summarizer
* fix `eval_contamination.py`
* add mmlu groups for contamination analysis
2024-01-08 15:51:48 +08:00
Connor-Shen
30a90d8dd8
Support Mbpp_plus dataset ( #770 )
...
* support mbpp+
* support mbpp+
* minor fix
* [Feat] minor fix
---------
Co-authored-by: yingfhu <yingfhu@gmail.com>
2024-01-05 22:01:57 +08:00
bittersweet1999
3c606cb712
quick fix for postprocess pred extraction ( #771 )
2024-01-05 21:10:18 +08:00
bittersweet1999
2163f9398f
[Feature] add subject ir dataset ( #755 )
...
* add subject ir
* Add ir dataset
* Add ir dataset
2024-01-05 12:00:57 +00:00
bittersweet1999
be369c3e06
[Feature] Add multi_round dataset evaluation ( #766 )
...
* multi_round dataset
* add multi_round evaluation
2024-01-04 10:37:52 +00:00
bittersweet1999
7cd65d49d8
[Fix] Fix small bug in alignbench ( #764 )
...
* fix small bugs
* fix small bugs
2024-01-03 07:44:53 +00:00
Chris Liu
3eb225a5e6
[Feature] Support LLaMA2-Accessory ( #732 )
...
* Support LLaMA2-Accessory
* remove strip
* clear imports
* reformat
* fix lint
* fix lint
* update readme
* update readme
* update readme
* update readme
2024-01-02 20:48:51 +08:00
HUANG Fei
ba027eeeac
[Feature] Add support of qwen api ( #735 )
2024-01-02 20:47:12 +08:00
Mo Li
33f8df1ca3
[Update] Change NeedleInAHaystackDataset to dynamic dataset loading ( #754 )
...
* Add NeedleInAHaystack Test
* Apply pre-commit formatting
* Update configs/eval_hf_internlm_chat_20b_cdme.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* add needle in haystack test
* update needle in haystack test
* update plot function in tools_needleinahaystack.py
* optimizing needleinahaystack dataset generation strategy
* modify minor formatting issues
* add English version support
* change NeedleInAHaystackDataset to dynamic loading
* change NeedleInAHaystackDataset to dynamic loading
* fix needleinahaystack test eval bug
* fix needleinahaystack config bug
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-02 17:22:56 +08:00
Francis-llgg
b69fe2343b
[Feature] Add GPQA Dataset ( #729 )
...
* check
* message
* add
* change prompt
* change a para nameq
* modify name of the file
* delete an useless file
2024-01-01 15:54:40 +08:00
Francis-llgg
ef3ae63539
[Feature] Add new dataset mastermath2024v1 ( #744 )
...
* add new dataset mastermath2024v1
* change it to simplified chinese prompt
* change file name
2024-01-01 15:53:24 +08:00
Mo Li
17b8e929dd
[Feature] Update plot function in tools_needleinahaystack.py ( #747 )
...
* Add NeedleInAHaystack Test
* Apply pre-commit formatting
* Update configs/eval_hf_internlm_chat_20b_cdme.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* add needle in haystack test
* update needle in haystack test
* update plot function in tools_needleinahaystack.py
* optimizing needleinahaystack dataset generation strategy
* modify minor formatting issues
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-12-29 18:51:09 +08:00
Hubert
327951087f
[Feat] update code config ( #749 )
...
* [Feat] update code dataset
* [Feat] update code dataset
* [Feat] update code dataset
2023-12-29 18:46:34 +08:00
bittersweet1999
fe0b717033
add creationbench ( #753 )
2023-12-29 10:03:44 +00:00
Connor-Shen
81098722d2
add chinese version of humaneval, mbpp ( #743 )
...
* add chinese_version of humaneval,mbpp
* add humaneval&mbpp gen.py
* minor fix
* minor add
---------
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-12-28 14:47:56 +08:00
bittersweet1999
db919f0191
[Fix] SubSizePartition fix ( #746 )
...
* fix subjective_eval
* subject_eval partition situation fixed
* subject_eval partition situation fixed
2023-12-28 11:46:46 +08:00
Hubert
0a525985e8
[Feature] Support sanitized MBPP dataset ( #745 )
2023-12-27 22:17:23 +08:00
bittersweet1999
dfd9ac0fd9
[Feature] Add other judgelm prompts for Alignbench ( #731 )
...
* add judgellm prompts
* add judgelm prompts
* update import info
* fix situation that no abbr in config
* fix situation that no abbr in config
* add summarizer for other judgellm
* change config name
* add maxlen
* add maxlen
* dict assert
* dict assert
* fix strings
* fix strings
2023-12-27 17:54:53 +08:00
Yang Yong
54345c56b7
Update LightllmApi and Fix mmlu bug ( #738 )
...
* Update LightllmApi and Fix mmlu bug
* checkout mmlu_gen_a484b3.py
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-27 13:49:08 +08:00
philipwangOvO
34561ececb
[Feature] Add InfiniteBench ( #739 )
...
* add InfiniteBench
* add InfiniteBench
---------
Co-authored-by: wangchonghua <wangchonghua@pjlab.org.cn>
2023-12-26 15:36:27 +08:00
Fengzhe Zhou
3a68083ecc
[Sync] update configs ( #734 )
2023-12-25 21:59:16 +08:00
AllentDan
336d8d76ff
add turbomind restful api support ( #693 )
...
* add turbomind restful api support
* config
* top_p 0.8
* top_k = 1
2023-12-24 01:40:00 +08:00
bittersweet1999
e985100cd1
[Fix] Fix subjective alignbench ( #730 )
2023-12-23 20:06:53 +08:00
Mo Li
0e24f4213e
[Feature] Add NeedleInAHaystack Test Support ( #714 )
...
* Add NeedleInAHaystack Test
* Apply pre-commit formatting
* Update configs/eval_hf_internlm_chat_20b_cdme.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* add needle in haystack test
* update needle in haystack test
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-12-23 12:00:51 +08:00
RunningLeon
e34c552282
[Feature] Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend ( #721 )
...
* add llama2 test
* fix
* test qwen chat-7b
* test w4
* add baichuan2
* update
* update
* update configs and docs
* update
2023-12-21 18:22:17 +08:00
bittersweet1999
fbb912ddf3
[Feature] Add abbr for judgemodel in subjective evaluation ( #724 )
...
* add_judgemodel_abbr
* add judgemodel abbr
2023-12-21 15:58:20 +08:00
Skyfall-xzz
b35d991786
[Feature] Add ReasonBench(Internal) dataset ( #577 )
...
* [Feature] Add reasonbench dataset
* add configs for supporting generative inference & merge datasets in the same category
* modify config filename to prompt version
* fix codes to meet pre-commit requirements
* lint the code to meet pre-commit requirements
* Align Load_data Sourcecode Briefly
* fix bugs
* reduce code redundancy
2023-12-20 17:57:42 +08:00
Jingming
76a95e9e81
[Feature] Support the use of humaneval_plus. ( #720 )
...
* [Feature] Support the use of humaneval_plus.
* [Feature] Add humaneval_plus_gen.py
* minor check
* [Fix] Fix bug
---------
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-12-20 17:25:17 +08:00
bittersweet1999
97c2068bd9
[Feature] Add JudgeLLMs ( #710 )
...
* add judgellms
* add judgellms
* add sub_size_partition
* add docs
* add ref
2023-12-19 18:40:25 +08:00
Hubert
eda72e756e
[Fix] minor fix openai ( #711 )
2023-12-18 15:45:31 +08:00
Songyang Zhang
637628a70f
[Doc] Update Doc for Alignbench ( #707 )
...
* update alignmentbench
* update alignmentbench
* update doc
* update
* update
2023-12-15 15:07:25 +08:00
DseidLi
db2920326a
[Fix] remove redundant in gsm8k.py ( #700 )
...
Removed redundant code in GSM8KDataset.load method.
2023-12-14 19:55:58 +08:00
Songyang Zhang
bfe4aa2af5
[Fix] Update alignmentbench ( #704 )
...
* update alignmentbench
* update alignmentbench
* update alignmentbench
2023-12-14 18:24:21 +08:00
bittersweet1999
1fe152b3e8
[Feature] Support AlignmentBench infer and judge ( #697 )
...
* alignmentbench infer and judge
* alignmentbench
* alignmentbench done
* alignment all done
* alignment all done
2023-12-13 19:59:30 +08:00
Hubert
a94598d921
[Feat] update python action and slurm ( #694 )
2023-12-13 10:41:10 +08:00
bittersweet1999
6130394165
[Feature] Add double order of subjective evaluation and removing duplicated response among two models ( #692 )
...
* add features
* add doc string
* add doc string
2023-12-12 20:58:17 +08:00
Hubert
4780b39eda
[Sync] format ( #690 )
...
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-12 14:03:45 +08:00
bittersweet1999
3e77175720
[Fix] Hotfix for Subjective Evaluation ( #686 )
2023-12-12 09:22:08 +08:00
bittersweet1999
465308e430
[Feature] Add Subjective Evaluation ( #680 )
...
* new version of subject
* fixed draw
* fixed draw
* fixed draw
* done
* done
* done
* done
* fixed lint
2023-12-11 22:22:11 +08:00
Hubert
4f0b373a0a
[Fix] fix docstring ( #684 )
2023-12-11 19:12:01 +08:00
Hubert
e78857ac36
[Sync] minor test ( #683 )
2023-12-11 17:42:53 +08:00
Jingming
dd4318f6ab
[Feature] enhance the ability of humaneval_postprocess ( #676 )
...
* [Feature] enhance the ability of humaneval_postprocess
* refactor
* [Feature] Keep the old version of the function and realize the new function in humaneval_postprocess_v2.
* Update opencompass/datasets/humaneval.py
---------
Co-authored-by: Leymore <zfz-960727@163.com>
Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>
2023-12-11 14:39:56 +08:00
Songyang Zhang
e25c5f9525
[Enhancement] Update API Interface and Mixtral ( #681 )
...
* [Enhancement] Update API interface
* [Enhancement] Update API interface
* Update mixtral
* Update readme
2023-12-10 13:29:26 +08:00
Xiaoming Shi
1bf85949ef
[Feature] Add medbench ( #678 )
...
* update medbench
* medbench update
* format medbench
* format
---------
Co-authored-by: 施晓明 <PJLAB\shixiaoming@pjnl104220118l.pjlab.org>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-09 16:05:46 +08:00