Commit Graph

97 Commits

Author SHA1 Message Date
Myhs_phz
6118596362
[Feature] Add recommendation configs for datasets (#1937)
* feat datasetrefine drop

* fix datasets in fullbench_int3

* fix

* fix

* back

* fix

* fix and doc

* feat

* fix hook

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* doc

* fix

* fix

* Update dataset-index.yml
2025-03-25 14:54:13 +08:00
Kangreen
59e49aedf1
[Feature] Support SuperGPQA (#1924)
* support supergpqa

* remove unnecessary code

* remove unnecessary code

* Add Readme

* Add Readme

* fix lint

* fix lint

* update

* update

---------

Co-authored-by: mkj3085003 <mkj3085003@gmail.com>
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-03-11 19:32:08 +08:00
Songyang Zhang
c84bc18ac1
[Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard (#1899)
* [Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard with LLM Verify

* Update

* Update

* Update DeepSeek-R1 example

* Update DeepSeek-R1 example

* Update DeepSeek-R1 example
2025-03-03 18:56:11 +08:00
Linchen Xiao
bdb2d46f59
[Feature] Add general math, llm judge evaluator (#1892)
* update_doc

* update llm_judge

* update README

* update md file name
2025-02-26 15:08:50 +08:00
Myhs_phz
68a9838907
[Feature] Add list of supported datasets at html page (#1850)
* feat dataset-index.yml and stat.py

* fix

* fix

* fix

* feat url of paper and config file

* doc all supported dataset list

* docs zh and en

* docs README zh and en

* docs new_dataset

* docs new_dataset
2025-02-14 16:17:30 +08:00
Pablo Hinojosa
9c2e6a192c
[Fix] Update broken links in README.md (#1852) 2025-02-07 15:41:08 +08:00
Linchen Xiao
a6193b4c02
[Refactor] Code refactoarization (#1831)
* Update

* fix lint

* update

* fix lint
2025-01-20 19:17:38 +08:00
Linchen Xiao
531643e771
[Feature] Add support for InternLM3 (#1829)
* update

* update

* update

* update
2025-01-16 14:28:27 +08:00
Linchen Xiao
ebefffed61
[Update] Update OC academic 202412 (#1771)
* [Update] Update academic settings

* Update

* update
2024-12-19 18:07:34 +08:00
Linchen Xiao
d593bfeac8
[Bump] Bump version to 0.3.8 (#1765)
* [Bump] Bump version to 0.3.8

* Update README.md
2024-12-17 19:17:18 +08:00
abrohamLee
e9e4b69ddb
[Feature] MuSR Datset Evaluation (#1689)
* MuSR Datset Evaluation

* MuSR Datset Evaluation

Add an assertion and a Readme.md
2024-11-14 20:42:12 +08:00
Linchen Xiao
d415439f9b
[Fix] Fix bug for first_option_postprocess (#1688) 2024-11-14 16:45:59 +08:00
Songyang Zhang
c789ce5698
[Fix] the automatically download for several datasets (#1652)
* [Fix] the automatically download for several datasets

* Update

* Update

* Update CI
2024-11-01 15:57:18 +08:00
Bob Tsang
dd0b655bd0
[Feature] Support MMMLU & MMMLU-lite Benchmark (#1565)
* rm folder

* modify format according to reviewer

* modify format according to reviewer

* modify format according to reviewer

* add some files requirement

* fix some bug

* fix bug

* change load type

* Update MMMLU Dataset

* Update MMMLU Dataset

* Add MMMLU-Lite Dataset

* update MMMMLU datast

* update MMMMLU datast

* update MMMMLU datast

---------

Co-authored-by: BobTsang <BobTsang1995@gmail.com>
Co-authored-by: liushz <qq1791167085@163.com>
2024-10-17 19:09:34 +08:00
Chuanyang Jin
17eefc0e1e
[Fix] Correct typos (#1561) 2024-09-25 11:27:17 +08:00
Songyang Zhang
5a27c2bd6f
[Model] Support Qwen2.5 Instruct (#1543) 2024-09-19 16:16:07 +08:00
Songyang Zhang
be460fbb21
[Feature] Support OpenAI O1 models (#1539)
* [Feature] Support OpenAI O1 models

* Update README.md

---------

Co-authored-by: liushz <qq1791167085@163.com>
2024-09-18 22:41:17 +08:00
Songyang Zhang
cfbd308edf
[Doc] Update README (#1528)
* '

* Update
2024-09-14 16:02:17 +08:00
liushz
00fc8da5be
[Feature] Add model postprocess function (#1484)
* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

---------

Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-05 21:10:29 +08:00
Linchen Xiao
2295a33a18
[Doc] Update readme (#1453) 2024-08-23 14:11:01 +08:00
Linchen Xiao
0fe9756c5d
[Doc] Update Readme (#1439)
* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update
2024-08-22 14:48:45 +08:00
Hari Seldon
14b4b735cb
[Feature] Add support for SciCode (#1417)
* add SciCode

* add SciCode

* add SciCode

* add SciCode

* add SciCode

* add SciCode

* add SciCode

* add SciCode w/ bg

* add scicode

* Update README.md

* Update README.md

* Delete configs/eval_SciCode.py

* rename

* 1

* rename

* Update README.md

* Update scicode.py

* Update scicode.py

* fix some bugs

* Update

* Update

---------

Co-authored-by: root <HariSeldon0>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-22 13:42:25 +08:00
Linchen Xiao
a4b54048ae
[Feature] Add Ruler datasets (#1310)
* [Feature] Add Ruler datasets

* pre-commit fixed

* Add model specific tokenizer to dataset

* pre-commit modified

* remove unused import

* fix linting

* add trust_remote to tokenizer load

* lint fix

* comments resolved

* fix lint

* Add readme

* Fix lint

* ruler refactorize

* fix lint

* lint fix

* updated

* lint fix

* fix wonderwords import issue

* prompt modified

* update

* readme updated

* update

* ruler dataset added

* Update

---------

Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-20 11:40:11 +08:00
Songyang Zhang
88eb91219b
[Doc] Update README (#1404)
* [Doc] Update README

* Update
2024-08-08 16:18:33 +08:00
Songyang Zhang
c09fc79ba8
[Feature] Support OpenAI ChatCompletion (#1389)
* [Feature] Support import configs/models/summarizers from whl

* Update

* Update openai sdk

* Update

* Update gemma
2024-08-01 19:10:13 +08:00
Xingjun.Wang
edab1c07ba
[Feature] Support ModelScope datasets (#1289)
* add ceval, gsm8k modelscope surpport

* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest

* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets

* format file

* format file

* update dataset format

* support ms_dataset

* udpate dataset for modelscope support

* merge myl_dev and update test_ms_dataset

* udpate dataset for modelscope support

* update readme

* update eval_api_zhipu_v2

* remove unused code

* add get_data_path function

* update readme

* remove tydiqa japanese subset

* add ceval, gsm8k modelscope surpport

* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest

* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets

* format file

* format file

* update dataset format

* support ms_dataset

* udpate dataset for modelscope support

* merge myl_dev and update test_ms_dataset

* update readme

* udpate dataset for modelscope support

* update eval_api_zhipu_v2

* remove unused code

* add get_data_path function

* remove tydiqa japanese subset

* update util

* remove .DS_Store

* fix md format

* move util into package

* update docs/get_started.md

* restore eval_api_zhipu_v2.py, add environment setting

* Update dataset

* Update

* Update

* Update

* Update

---------

Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local>
Co-authored-by: Yunnglin <mao.looper@qq.com>
Co-authored-by: Yun lin <yunlin@laptop.local>
Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-29 13:48:32 +08:00
bittersweet1999
86b6d18731
[Update] Update model support list (#1353)
* fix pip version

* fix pip version

* update model support
2024-07-23 13:35:58 +08:00
Linchen Xiao
a56678190b
[Feature] CompassBench v1_3 subjective evaluation (#1341)
* stash files

* compassbench subjective evaluation added

* evaluation update

* remove unneeded content

* fix lint

* update docs

* Update lint

* Update

---------

Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-19 23:12:23 +08:00
Mo Li
104bddf647
[Doc] Update NeedleBench Docs (#1330)
* update needlebench docs

* update model_name_mapping dict

* update README

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-07-18 13:16:19 +08:00
Songyang Zhang
409a042d93
[Feature] Add InternLM2.5 (#1286)
* [Feature] Add InternLM2.5

* Update

* update readme

---------

Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-07-04 20:10:31 +08:00
liushz
e5ee1647fb
Add doc for accelerator function (#1252)
* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Update acclerator

* Update MathBench

* Update accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Add Doc for accelerator

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-06-24 14:53:51 +08:00
Fengzhe Zhou
7505b3cadf
[Feature] Add huggingface apply_chat_template (#1098)
* add TheoremQA with 5-shot

* add huggingface_above_v4_33 classes

* use num_worker partitioner in cli

* update theoremqa

* update TheoremQA

* add TheoremQA

* rename theoremqa -> TheoremQA

* update TheoremQA output path

* rewrite many model configs

* update huggingface

* further update

* refine configs

* update configs

* update configs

* add configs/eval_llama3_instruct.py

* add summarizer multi faceted

* update bbh datasets

* update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py

* rename class

* update readme

* update hf above v4.33
2024-05-14 14:50:16 +08:00
Alexander Lam
a71122ee18
[Feature] Add Qwen1.5 MoE 7b and Mixtral 8x22b model configs (#1123)
* added qwen moe and mixtral 8x22 model configs

* updated README files news section
2024-05-09 11:04:26 +08:00
Ikko Eltociear Ashimine
9c79224b39
[Docs] Update README.md (#1110)
requiresments -> requirements
2024-04-30 00:45:33 +08:00
Songyang Zhang
063f5f5f49
[Update] Update performance of common benchmarks (#1109)
* [Update] Update performance of common benchmarks

* [Update] Update performance of common benchmarks

* [Update] Update performance of common benchmarks
2024-04-30 00:09:08 +08:00
Haodong Duan
3a232db471
[Deperecate] Remove multi-modal related stuff (#1072)
* Remove MultiModal

* update index.rst

* update README

* remove mmbench codes

* update news

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 21:20:14 +08:00
bittersweet1999
e404b72c52
[Feature] support arenahard evaluation (#1096)
* support arenahard

* support arenahard

* support arenahard
2024-04-26 15:42:00 +08:00
Fengzhe Zhou
a256753221
[Feature] Add LLaMA-3 Series Configs (#1065)
* add LLaMA-3 Series configs

* update readme
2024-04-22 14:39:31 +08:00
Songyang Zhang
629836146a
[Doc] Update README (#1053)
* [Update] Update readme

* [Update] Update readme

* [Update] Update readme
2024-04-16 19:54:12 +08:00
Songyang Zhang
47cb75a3f7
[Docs] Update README (#956)
* [Docs] Update README

* Update README.md

* [Docs] Update README
2024-03-12 11:40:34 +08:00
fanqiNO1
caf1cf8a17
[Docs] Update rank link (#911) 2024-03-05 20:33:44 +08:00
Fengzhe Zhou
ba7cd58da3
[Update] Rename dataset pack (#922) 2024-02-28 10:54:04 +08:00
Fengzhe Zhou
9e5746d3d8
[Doc] Update News (#810) 2024-01-17 18:22:12 +08:00
Songyang Zhang
0c75f0f95a
[Update] Update introduction of CompassBench-2024-Q1 (#769)
* [Doc] Update Example of CompassBench

* [Doc] Update Example of CompassBench

* [Doc] Update Example of CompassBench

* update

* Update docs/zh_cn/advanced_guides/compassbench_intro.md

Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>

---------

Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>
2024-01-05 20:39:36 +08:00
Chris Liu
3eb225a5e6
[Feature] Support LLaMA2-Accessory (#732)
* Support LLaMA2-Accessory

* remove strip

* clear imports

* reformat

* fix lint

* fix lint

* update readme

* update readme

* update readme

* update readme
2024-01-02 20:48:51 +08:00
loveSnowBest
4a2d1926a2
[News] add news for T-Eval (#727)
* add news for teval

* update

* update doc for cz&en
2023-12-22 19:58:24 +08:00
Haodong Duan
6a928b996a
[Doc] Update README (#682) 2023-12-10 21:27:46 +08:00
Songyang Zhang
e25c5f9525
[Enhancement] Update API Interface and Mixtral (#681)
* [Enhancement] Update API interface

* [Enhancement] Update API interface

* Update mixtral

* Update readme
2023-12-10 13:29:26 +08:00
Yggdrasill7D6
68c4c1ef86
[Fix] fix typo in README (#637) 2023-11-24 17:49:04 +08:00
Songyang Zhang
81b67e8d9e
[Doc] Update README (#629)
* update readme

* fix typo

* Update README.md

---------

Co-authored-by: liushz <qq1791167085@163.com>
2023-11-24 11:24:00 +08:00