Commit Graph

124 Commits

Author SHA1 Message Date
Fengzhe Zhou
a32f21a356
[Sync] Sync with internal codes 2024.06.28 (#1279) 2024-06-28 14:16:34 +08:00
liushz
e5ee1647fb
Add doc for accelerator function (#1252)
* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Update acclerator

* Update MathBench

* Update accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Add Doc for accelerator

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-06-24 14:53:51 +08:00
Fengzhe Zhou
d656e818f8
[Docs] Remove --no-batch-padding and Use --hf-num-gpus (#1205)
* [Docs] Remove --no-batch-padding and Use -hf-num-gpus

* update
2024-05-29 16:30:10 +08:00
Fengzhe Zhou
7505b3cadf
[Feature] Add huggingface apply_chat_template (#1098)
* add TheoremQA with 5-shot

* add huggingface_above_v4_33 classes

* use num_worker partitioner in cli

* update theoremqa

* update TheoremQA

* add TheoremQA

* rename theoremqa -> TheoremQA

* update TheoremQA output path

* rewrite many model configs

* update huggingface

* further update

* refine configs

* update configs

* update configs

* add configs/eval_llama3_instruct.py

* add summarizer multi faceted

* update bbh datasets

* update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py

* rename class

* update readme

* update hf above v4.33
2024-05-14 14:50:16 +08:00
Mo Li
cb080fa7de
[Fix] Fix NeedleBench Summarizer Typo (#1125)
* update needleinahaystack eval docs

* update needlebench summarizer

* fix english docs typo
2024-05-08 20:00:15 +08:00
Songyang Zhang
063f5f5f49
[Update] Update performance of common benchmarks (#1109)
* [Update] Update performance of common benchmarks

* [Update] Update performance of common benchmarks

* [Update] Update performance of common benchmarks
2024-04-30 00:09:08 +08:00
liushz
a6f67e1a65
[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README (#1103)
* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-04-28 21:58:58 +08:00
Mo Li
76dd814c4d
[Doc] Update NeedleInAHaystack Docs (#1102)
* update NeedleInAHaystack Test Docs

* update docs
2024-04-28 18:51:47 +08:00
Haodong Duan
3a232db471
[Deperecate] Remove multi-modal related stuff (#1072)
* Remove MultiModal

* update index.rst

* update README

* remove mmbench codes

* update news

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 21:20:14 +08:00
bittersweet1999
e404b72c52
[Feature] support arenahard evaluation (#1096)
* support arenahard

* support arenahard

* support arenahard
2024-04-26 15:42:00 +08:00
Fengzhe Zhou
a256753221
[Feature] Add LLaMA-3 Series Configs (#1065)
* add LLaMA-3 Series configs

* update readme
2024-04-22 14:39:31 +08:00
Fengzhe Zhou
8c85edd1cd
[Sync] deprecate old mbpps (#1064) 2024-04-19 20:49:46 +08:00
Y0oMu
c220550fb9
updates docs (#1015)
Co-authored-by: youmuspc <yejiayi2004@outlook.com>
2024-04-02 10:30:04 +08:00
bittersweet1999
02e7eec911
[Feature] Support AlpacaEval_V2 (#1006)
* support alpacaeval_v2

* support alpacaeval

* update docs

* update docs
2024-03-28 16:49:04 +08:00
seanzhang-zhichen
7baa711fc7
[Fix] Fix doc problem (#975)
Co-authored-by: zhangzc <2608882093@qq.com>
2024-03-15 13:44:46 +08:00
Fengzhe Zhou
2a741477fe
update links and checkers (#890) 2024-03-13 11:01:35 +08:00
Songyang Zhang
47cb75a3f7
[Docs] Update README (#956)
* [Docs] Update README

* Update README.md

* [Docs] Update README
2024-03-12 11:40:34 +08:00
bittersweet1999
848e7c8a76
[fix] add different temp for different question in mtbench (#954)
* add temp for mtbench

* add document for mtbench

* add document for mtbench
2024-03-11 17:24:39 +08:00
Songyang Zhang
7c1a819bb4
[Fix] Chinese version of ReadTheDoc (#947)
* [Fix] Chinese version of ReadTheDoc

* rename

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-03-08 18:10:05 +08:00
Yang Yong
107e022cf4
Support prompt template for LightllmApi. Update LightllmApi token bucket. (#945) 2024-03-06 15:33:53 +08:00
Fengzhe Zhou
ba7cd58da3
[Update] Rename dataset pack (#922) 2024-02-28 10:54:04 +08:00
RunningLeon
4c87e777d8
[Feature] Add end_str for turbomind (#859)
* fix

* update

* fix internlm1

* fix docs

* remove sys
2024-02-01 22:31:14 +08:00
Fengzhe Zhou
f367551668
update doc (#830) 2024-01-24 13:39:28 +08:00
Yang Yong
f09a2ff418
Add LightllmApi KeyError log & Update doc (#816)
* Add LightllmApi KeyError log

* Update LightllmApi doc
2024-01-18 22:23:38 +08:00
RunningLeon
61fe873c89
[Fix] Fix turbomind and update docs (#808)
* update

* update docs

* add engine_config and gen_config in eval_config

* update

* fix

* fix

* fix

* fix docstr

* fix url
2024-01-18 14:41:35 +08:00
Fengzhe Zhou
9e5746d3d8
[Doc] Update News (#810) 2024-01-17 18:22:12 +08:00
Mo Li
acae560911
Added support for multi-needle testing in needle-in-a-haystack test (#802)
* Add NeedleInAHaystack Test

* Apply pre-commit formatting

* Update configs/eval_hf_internlm_chat_20b_cdme.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* add needle in haystack test

* update needle in haystack test

* update plot function in tools_needleinahaystack.py

* optimizing needleinahaystack dataset generation strategy

* modify minor formatting issues

* add English version support

* change NeedleInAHaystackDataset to dynamic loading

* change NeedleInAHaystackDataset to dynamic loading

* fix needleinahaystack test eval bug

* fix needleinahaystack config bug

* Added support for multi-needle testing in needle-in-a-haystack test

* Optimize the code for plotting in the needle-in-a-haystack test.

* Correct the typo in the dataset parameters.

* update needleinahaystack test docs

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-17 13:47:34 +08:00
RunningLeon
0836aec67b
[Feature] Update evaluate turbomind (#804)
* update

* fix

* fix

* fix
2024-01-17 11:09:50 +08:00
Fengzhe Zhou
f78fcf6eeb
[Docs] Update contamination docs (#775) 2024-01-08 16:37:28 +08:00
tpoisonooo
ba1b684fec
typo(installation.md): fix unzip commands (#774)
* Update installation.md

* Update installation.md
2024-01-08 14:23:35 +08:00
Songyang Zhang
0c75f0f95a
[Update] Update introduction of CompassBench-2024-Q1 (#769)
* [Doc] Update Example of CompassBench

* [Doc] Update Example of CompassBench

* [Doc] Update Example of CompassBench

* update

* Update docs/zh_cn/advanced_guides/compassbench_intro.md

Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>

---------

Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>
2024-01-05 20:39:36 +08:00
Fengzhe Zhou
3a68083ecc
[Sync] update configs (#734) 2023-12-25 21:59:16 +08:00
Mo Li
0e24f4213e
[Feature] Add NeedleInAHaystack Test Support (#714)
* Add NeedleInAHaystack Test

* Apply pre-commit formatting

* Update configs/eval_hf_internlm_chat_20b_cdme.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* add needle in haystack test

* update needle in haystack test

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-12-23 12:00:51 +08:00
RunningLeon
e34c552282
[Feature] Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend (#721)
* add llama2 test

* fix

* test qwen chat-7b

* test w4

* add baichuan2

* update

* update

* update configs and docs

* update
2023-12-21 18:22:17 +08:00
Hubert
fdf18a3238
[Docs] Update Docker docs (#718)
* [Docs] update docker docs

* [Docs] update docker docs
2023-12-19 23:29:43 +08:00
bittersweet1999
97c2068bd9
[Feature] Add JudgeLLMs (#710)
* add judgellms

* add judgellms

* add sub_size_partition

* add docs

* add ref
2023-12-19 18:40:25 +08:00
Songyang Zhang
637628a70f
[Doc] Update Doc for Alignbench (#707)
* update alignmentbench

* update alignmentbench

* update doc

* update

* update
2023-12-15 15:07:25 +08:00
Fengzhe Zhou
cadab9474f
[Doc] Update contamination docs (#698)
* update contamination docs

* add citation

* Update contamination_eval.md

* Update contamination_eval.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-12-13 18:03:39 +08:00
bittersweet1999
6130394165
[Feature] Add double order of subjective evaluation and removing duplicated response among two models (#692)
* add features

* add doc string

* add doc string
2023-12-12 20:58:17 +08:00
bittersweet1999
465308e430
[Feature] Add Subjective Evaluation (#680)
* new version of subject

* fixed draw

* fixed draw

* fixed draw

* done

* done

* done

* done

* fixed lint
2023-12-11 22:22:11 +08:00
liyucheng09
05bbce8b08
[Feature] Add Data Contamination Analysis (#639)
* add contamination analysis to ceval

* fix bugs

* add contamination docs

* to pass CI check

* update

---------

Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-08 10:00:11 +08:00
Fengzhe Zhou
79f6449d85
[Doc] Update FAQ (#628)
* update faq

* Update docs/zh_cn/get_started/faq.md

* Update docs/en/get_started/faq.md

* Update docs/zh_cn/get_started/faq.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-11-23 18:19:17 +08:00
Fengzhe Zhou
d949e3c003
[Feature] Add circular eval (#610)
* refactor default, add circular summarizer

* add circular

* update impl

* update doc

* minor update

* no more to be added
2023-11-23 16:45:47 +08:00
Songyang Zhang
5329724b65
[Doc] Update README and requirements. (#622)
* update readme

* update doc
2023-11-22 19:16:54 +08:00
Hubert
8c1483e3ce
[Docs] update ds1000 code eval docs (#618) 2023-11-22 13:37:53 +08:00
Lyu Han
eb56fd6d16
Integrate turbomind python api (#484)
* integrate turbomind python api

* update

* update user guide

* update

* fix according to reviewer's comments

* fix error

* fix linting

* update user guide

* remove debug log

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-11-21 22:34:46 +08:00
Yang Yong
d3b0d5c4ce
[Feature] Support Lightllm API (#613)
* [Feature] Support Lightllm api

* formatting & renaming

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-21 19:18:40 +08:00
Hubert
91fba2c2e9
[Feat] support humaneval and mbpp pass@k (#598)
* [Feat] support pass@ k

* [Feat] support pass@k

* [Feat] support pass@k

* [Feat] support pass@k

* [Feat] support pass@k

* [Feat] support pass@k docs

* update naming

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-16 21:22:06 +08:00
Wei Jueqi
14e6fe6f13
Fix bugs in subjective evaluation (#589)
* rename

* fix sub bugs and update docs

* update

* update
2023-11-14 16:11:55 +08:00
Songyang Zhang
01a0f2f3c7
[Doc] Update README (#582) 2023-11-13 20:39:43 +08:00