Songyang Zhang
aa2b89b6f8
[Update] Add CascadeEvaluator with Data Replica ( #2022 )
...
* Update CascadeEvaluator
* Update CascadeEvaluator
* Update CascadeEvaluator
* Update Config
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
2025-05-20 16:46:55 +08:00
Linchen Xiao
bb58cfc85d
[Feature] Add CascadeEvaluator ( #1992 )
...
* [Feature] Add CascadeEvaluator
* update
* updat
2025-04-08 11:58:14 +08:00
Myhs_phz
f71eb78c72
[Doc] Add TBD Token in Datasets Statistics ( #1986 )
...
* feat
* doc
* doc
* doc
* doc
2025-03-31 19:08:55 +08:00
Myhs_phz
6118596362
[Feature] Add recommendation configs for datasets ( #1937 )
...
* feat datasetrefine drop
* fix datasets in fullbench_int3
* fix
* fix
* back
* fix
* fix and doc
* feat
* fix hook
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* doc
* fix
* fix
* Update dataset-index.yml
2025-03-25 14:54:13 +08:00
Kangreen
59e49aedf1
[Feature] Support SuperGPQA ( #1924 )
...
* support supergpqa
* remove unnecessary code
* remove unnecessary code
* Add Readme
* Add Readme
* fix lint
* fix lint
* update
* update
---------
Co-authored-by: mkj3085003 <mkj3085003@gmail.com>
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-03-11 19:32:08 +08:00
Songyang Zhang
c84bc18ac1
[Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard ( #1899 )
...
* [Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard with LLM Verify
* Update
* Update
* Update DeepSeek-R1 example
* Update DeepSeek-R1 example
* Update DeepSeek-R1 example
2025-03-03 18:56:11 +08:00
Linchen Xiao
bdb2d46f59
[Feature] Add general math, llm judge evaluator ( #1892 )
...
* update_doc
* update llm_judge
* update README
* update md file name
2025-02-26 15:08:50 +08:00
Myhs_phz
68a9838907
[Feature] Add list of supported datasets at html page ( #1850 )
...
* feat dataset-index.yml and stat.py
* fix
* fix
* fix
* feat url of paper and config file
* doc all supported dataset list
* docs zh and en
* docs README zh and en
* docs new_dataset
* docs new_dataset
2025-02-14 16:17:30 +08:00
Pablo Hinojosa
9c2e6a192c
[Fix] Update broken links in README.md ( #1852 )
2025-02-07 15:41:08 +08:00
Linchen Xiao
a6193b4c02
[Refactor] Code refactoarization ( #1831 )
...
* Update
* fix lint
* update
* fix lint
2025-01-20 19:17:38 +08:00
Linchen Xiao
531643e771
[Feature] Add support for InternLM3 ( #1829 )
...
* update
* update
* update
* update
2025-01-16 14:28:27 +08:00
Linchen Xiao
ebefffed61
[Update] Update OC academic 202412 ( #1771 )
...
* [Update] Update academic settings
* Update
* update
2024-12-19 18:07:34 +08:00
Linchen Xiao
d593bfeac8
[Bump] Bump version to 0.3.8 ( #1765 )
...
* [Bump] Bump version to 0.3.8
* Update README.md
2024-12-17 19:17:18 +08:00
abrohamLee
e9e4b69ddb
[Feature] MuSR Datset Evaluation ( #1689 )
...
* MuSR Datset Evaluation
* MuSR Datset Evaluation
Add an assertion and a Readme.md
2024-11-14 20:42:12 +08:00
Linchen Xiao
d415439f9b
[Fix] Fix bug for first_option_postprocess ( #1688 )
2024-11-14 16:45:59 +08:00
Songyang Zhang
c789ce5698
[Fix] the automatically download for several datasets ( #1652 )
...
* [Fix] the automatically download for several datasets
* Update
* Update
* Update CI
2024-11-01 15:57:18 +08:00
Bob Tsang
dd0b655bd0
[Feature] Support MMMLU & MMMLU-lite Benchmark ( #1565 )
...
* rm folder
* modify format according to reviewer
* modify format according to reviewer
* modify format according to reviewer
* add some files requirement
* fix some bug
* fix bug
* change load type
* Update MMMLU Dataset
* Update MMMLU Dataset
* Add MMMLU-Lite Dataset
* update MMMMLU datast
* update MMMMLU datast
* update MMMMLU datast
---------
Co-authored-by: BobTsang <BobTsang1995@gmail.com>
Co-authored-by: liushz <qq1791167085@163.com>
2024-10-17 19:09:34 +08:00
Chuanyang Jin
17eefc0e1e
[Fix] Correct typos ( #1561 )
2024-09-25 11:27:17 +08:00
Songyang Zhang
5a27c2bd6f
[Model] Support Qwen2.5 Instruct ( #1543 )
2024-09-19 16:16:07 +08:00
Songyang Zhang
be460fbb21
[Feature] Support OpenAI O1 models ( #1539 )
...
* [Feature] Support OpenAI O1 models
* Update README.md
---------
Co-authored-by: liushz <qq1791167085@163.com>
2024-09-18 22:41:17 +08:00
Songyang Zhang
cfbd308edf
[Doc] Update README ( #1528 )
...
* '
* Update
2024-09-14 16:02:17 +08:00
liushz
00fc8da5be
[Feature] Add model postprocess function ( #1484 )
...
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
---------
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-05 21:10:29 +08:00
Linchen Xiao
2295a33a18
[Doc] Update readme ( #1453 )
2024-08-23 14:11:01 +08:00
Linchen Xiao
0fe9756c5d
[Doc] Update Readme ( #1439 )
...
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
2024-08-22 14:48:45 +08:00
Hari Seldon
14b4b735cb
[Feature] Add support for SciCode ( #1417 )
...
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode w/ bg
* add scicode
* Update README.md
* Update README.md
* Delete configs/eval_SciCode.py
* rename
* 1
* rename
* Update README.md
* Update scicode.py
* Update scicode.py
* fix some bugs
* Update
* Update
---------
Co-authored-by: root <HariSeldon0>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-22 13:42:25 +08:00
Linchen Xiao
a4b54048ae
[Feature] Add Ruler datasets ( #1310 )
...
* [Feature] Add Ruler datasets
* pre-commit fixed
* Add model specific tokenizer to dataset
* pre-commit modified
* remove unused import
* fix linting
* add trust_remote to tokenizer load
* lint fix
* comments resolved
* fix lint
* Add readme
* Fix lint
* ruler refactorize
* fix lint
* lint fix
* updated
* lint fix
* fix wonderwords import issue
* prompt modified
* update
* readme updated
* update
* ruler dataset added
* Update
---------
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-20 11:40:11 +08:00
Songyang Zhang
88eb91219b
[Doc] Update README ( #1404 )
...
* [Doc] Update README
* Update
2024-08-08 16:18:33 +08:00
Songyang Zhang
c09fc79ba8
[Feature] Support OpenAI ChatCompletion ( #1389 )
...
* [Feature] Support import configs/models/summarizers from whl
* Update
* Update openai sdk
* Update
* Update gemma
2024-08-01 19:10:13 +08:00
Xingjun.Wang
edab1c07ba
[Feature] Support ModelScope datasets ( #1289 )
...
* add ceval, gsm8k modelscope surpport
* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest
* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets
* format file
* format file
* update dataset format
* support ms_dataset
* udpate dataset for modelscope support
* merge myl_dev and update test_ms_dataset
* udpate dataset for modelscope support
* update readme
* update eval_api_zhipu_v2
* remove unused code
* add get_data_path function
* update readme
* remove tydiqa japanese subset
* add ceval, gsm8k modelscope surpport
* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest
* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets
* format file
* format file
* update dataset format
* support ms_dataset
* udpate dataset for modelscope support
* merge myl_dev and update test_ms_dataset
* update readme
* udpate dataset for modelscope support
* update eval_api_zhipu_v2
* remove unused code
* add get_data_path function
* remove tydiqa japanese subset
* update util
* remove .DS_Store
* fix md format
* move util into package
* update docs/get_started.md
* restore eval_api_zhipu_v2.py, add environment setting
* Update dataset
* Update
* Update
* Update
* Update
---------
Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local>
Co-authored-by: Yunnglin <mao.looper@qq.com>
Co-authored-by: Yun lin <yunlin@laptop.local>
Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-29 13:48:32 +08:00
bittersweet1999
86b6d18731
[Update] Update model support list ( #1353 )
...
* fix pip version
* fix pip version
* update model support
2024-07-23 13:35:58 +08:00
Linchen Xiao
a56678190b
[Feature] CompassBench v1_3 subjective evaluation ( #1341 )
...
* stash files
* compassbench subjective evaluation added
* evaluation update
* remove unneeded content
* fix lint
* update docs
* Update lint
* Update
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-19 23:12:23 +08:00
Mo Li
104bddf647
[Doc] Update NeedleBench Docs ( #1330 )
...
* update needlebench docs
* update model_name_mapping dict
* update README
* Update README_zh-CN.md
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-07-18 13:16:19 +08:00
Songyang Zhang
409a042d93
[Feature] Add InternLM2.5 ( #1286 )
...
* [Feature] Add InternLM2.5
* Update
* update readme
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-07-04 20:10:31 +08:00
liushz
e5ee1647fb
Add doc for accelerator function ( #1252 )
...
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Fix Llama-3 meta template
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Update acclerator
* Update MathBench
* Update accelerator
* Add Doc for accelerator
* Add Doc for accelerator
* Add Doc for accelerator
* Add Doc for accelerator
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-06-24 14:53:51 +08:00
Fengzhe Zhou
7505b3cadf
[Feature] Add huggingface apply_chat_template ( #1098 )
...
* add TheoremQA with 5-shot
* add huggingface_above_v4_33 classes
* use num_worker partitioner in cli
* update theoremqa
* update TheoremQA
* add TheoremQA
* rename theoremqa -> TheoremQA
* update TheoremQA output path
* rewrite many model configs
* update huggingface
* further update
* refine configs
* update configs
* update configs
* add configs/eval_llama3_instruct.py
* add summarizer multi faceted
* update bbh datasets
* update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py
* rename class
* update readme
* update hf above v4.33
2024-05-14 14:50:16 +08:00
Alexander Lam
a71122ee18
[Feature] Add Qwen1.5 MoE 7b and Mixtral 8x22b model configs ( #1123 )
...
* added qwen moe and mixtral 8x22 model configs
* updated README files news section
2024-05-09 11:04:26 +08:00
Ikko Eltociear Ashimine
9c79224b39
[Docs] Update README.md ( #1110 )
...
requiresments -> requirements
2024-04-30 00:45:33 +08:00
Songyang Zhang
063f5f5f49
[Update] Update performance of common benchmarks ( #1109 )
...
* [Update] Update performance of common benchmarks
* [Update] Update performance of common benchmarks
* [Update] Update performance of common benchmarks
2024-04-30 00:09:08 +08:00
Haodong Duan
3a232db471
[Deperecate] Remove multi-modal related stuff ( #1072 )
...
* Remove MultiModal
* update index.rst
* update README
* remove mmbench codes
* update news
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 21:20:14 +08:00
bittersweet1999
e404b72c52
[Feature] support arenahard evaluation ( #1096 )
...
* support arenahard
* support arenahard
* support arenahard
2024-04-26 15:42:00 +08:00
Fengzhe Zhou
a256753221
[Feature] Add LLaMA-3 Series Configs ( #1065 )
...
* add LLaMA-3 Series configs
* update readme
2024-04-22 14:39:31 +08:00
Songyang Zhang
629836146a
[Doc] Update README ( #1053 )
...
* [Update] Update readme
* [Update] Update readme
* [Update] Update readme
2024-04-16 19:54:12 +08:00
Songyang Zhang
47cb75a3f7
[Docs] Update README ( #956 )
...
* [Docs] Update README
* Update README.md
* [Docs] Update README
2024-03-12 11:40:34 +08:00
fanqiNO1
caf1cf8a17
[Docs] Update rank link ( #911 )
2024-03-05 20:33:44 +08:00
Fengzhe Zhou
ba7cd58da3
[Update] Rename dataset pack ( #922 )
2024-02-28 10:54:04 +08:00
Fengzhe Zhou
9e5746d3d8
[Doc] Update News ( #810 )
2024-01-17 18:22:12 +08:00
Songyang Zhang
0c75f0f95a
[Update] Update introduction of CompassBench-2024-Q1 ( #769 )
...
* [Doc] Update Example of CompassBench
* [Doc] Update Example of CompassBench
* [Doc] Update Example of CompassBench
* update
* Update docs/zh_cn/advanced_guides/compassbench_intro.md
Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>
---------
Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>
2024-01-05 20:39:36 +08:00
Chris Liu
3eb225a5e6
[Feature] Support LLaMA2-Accessory ( #732 )
...
* Support LLaMA2-Accessory
* remove strip
* clear imports
* reformat
* fix lint
* fix lint
* update readme
* update readme
* update readme
* update readme
2024-01-02 20:48:51 +08:00
loveSnowBest
4a2d1926a2
[News] add news for T-Eval ( #727 )
...
* add news for teval
* update
* update doc for cz&en
2023-12-22 19:58:24 +08:00
Haodong Duan
6a928b996a
[Doc] Update README ( #682 )
2023-12-10 21:27:46 +08:00