Commit Graph

264 Commits

Author SHA1 Message Date
Connor-Shen
30a90d8dd8
Support Mbpp_plus dataset (#770)
* support mbpp+

* support mbpp+

* minor fix

* [Feat] minor fix

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>
2024-01-05 22:01:57 +08:00
bittersweet1999
2163f9398f
[Feature] add subject ir dataset (#755)
* add subject ir

* Add ir dataset

* Add ir dataset
2024-01-05 12:00:57 +00:00
bittersweet1999
be369c3e06
[Feature] Add multi_round dataset evaluation (#766)
* multi_round dataset

* add multi_round evaluation
2024-01-04 10:37:52 +00:00
Mo Li
33f8df1ca3
[Update] Change NeedleInAHaystackDataset to dynamic dataset loading (#754)
* Add NeedleInAHaystack Test

* Apply pre-commit formatting

* Update configs/eval_hf_internlm_chat_20b_cdme.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* add needle in haystack test

* update needle in haystack test

* update plot function in tools_needleinahaystack.py

* optimizing needleinahaystack dataset generation strategy

* modify minor formatting issues

* add English version support

* change NeedleInAHaystackDataset to dynamic loading

* change NeedleInAHaystackDataset to dynamic loading

* fix needleinahaystack test eval bug

* fix needleinahaystack config bug

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-02 17:22:56 +08:00
Francis-llgg
b69fe2343b
[Feature] Add GPQA Dataset (#729)
* check

* message

* add

* change prompt

* change a para nameq

* modify name of the file

* delete an useless file
2024-01-01 15:54:40 +08:00
Francis-llgg
ef3ae63539
[Feature] Add new dataset mastermath2024v1 (#744)
* add new dataset mastermath2024v1

* change it to simplified chinese prompt

* change file name
2024-01-01 15:53:24 +08:00
Mo Li
17b8e929dd
[Feature] Update plot function in tools_needleinahaystack.py (#747)
* Add NeedleInAHaystack Test

* Apply pre-commit formatting

* Update configs/eval_hf_internlm_chat_20b_cdme.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* add needle in haystack test

* update needle in haystack test

* update plot function in tools_needleinahaystack.py

* optimizing needleinahaystack dataset generation strategy

* modify minor formatting issues

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-12-29 18:51:09 +08:00
Hubert
327951087f
[Feat] update code config (#749)
* [Feat] update code dataset

* [Feat] update code dataset

* [Feat] update code dataset
2023-12-29 18:46:34 +08:00
bittersweet1999
fe0b717033
add creationbench (#753) 2023-12-29 10:03:44 +00:00
bittersweet1999
8728287a55
fix erro in configs (#750) 2023-12-28 11:53:07 +00:00
Connor-Shen
81098722d2
add chinese version of humaneval, mbpp (#743)
* add chinese_version of humaneval,mbpp

* add humaneval&mbpp gen.py

* minor fix

* minor add

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-12-28 14:47:56 +08:00
Hubert
0a525985e8
[Feature] Support sanitized MBPP dataset (#745) 2023-12-27 22:17:23 +08:00
bittersweet1999
dfd9ac0fd9
[Feature] Add other judgelm prompts for Alignbench (#731)
* add judgellm prompts

* add judgelm prompts

* update import info

* fix situation that no abbr in config

* fix situation that no abbr in config

* add summarizer for other judgellm

* change config name

* add maxlen

* add maxlen

* dict assert

* dict assert

* fix strings

* fix strings
2023-12-27 17:54:53 +08:00
Yang Yong
54345c56b7
Update LightllmApi and Fix mmlu bug (#738)
* Update LightllmApi and Fix mmlu bug

* checkout mmlu_gen_a484b3.py

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-27 13:49:08 +08:00
philipwangOvO
34561ececb
[Feature] Add InfiniteBench (#739)
* add InfiniteBench

* add InfiniteBench

---------

Co-authored-by: wangchonghua <wangchonghua@pjlab.org.cn>
2023-12-26 15:36:27 +08:00
Fengzhe Zhou
3a68083ecc
[Sync] update configs (#734) 2023-12-25 21:59:16 +08:00
Mo Li
0e24f4213e
[Feature] Add NeedleInAHaystack Test Support (#714)
* Add NeedleInAHaystack Test

* Apply pre-commit formatting

* Update configs/eval_hf_internlm_chat_20b_cdme.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* add needle in haystack test

* update needle in haystack test

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-12-23 12:00:51 +08:00
Skyfall-xzz
b35d991786
[Feature] Add ReasonBench(Internal) dataset (#577)
* [Feature] Add reasonbench dataset

* add configs for supporting generative inference & merge datasets in the same category

* modify config filename to prompt version

* fix codes to meet pre-commit requirements

* lint the code to meet pre-commit requirements

* Align Load_data Sourcecode Briefly

* fix bugs

* reduce code redundancy
2023-12-20 17:57:42 +08:00
Jingming
76a95e9e81
[Feature] Support the use of humaneval_plus. (#720)
* [Feature] Support the use of humaneval_plus.

* [Feature] Add humaneval_plus_gen.py

* minor check

* [Fix] Fix bug

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-12-20 17:25:17 +08:00
bittersweet1999
47e745d748
quick fix for maxoutlen (#719) 2023-12-20 00:00:28 +08:00
Hubert
5e8b838f51
[Feat] Update math/agent (#716)
* minor add

* minor add

* minor fix
2023-12-19 21:20:42 +08:00
Songyang Zhang
bfe4aa2af5
[Fix] Update alignmentbench (#704)
* update alignmentbench

* update alignmentbench

* update alignmentbench
2023-12-14 18:24:21 +08:00
bittersweet1999
1fe152b3e8
[Feature] Support AlignmentBench infer and judge (#697)
* alignmentbench infer and judge

* alignmentbench

* alignmentbench done

* alignment all done

* alignment all done
2023-12-13 19:59:30 +08:00
bittersweet1999
6130394165
[Feature] Add double order of subjective evaluation and removing duplicated response among two models (#692)
* add features

* add doc string

* add doc string
2023-12-12 20:58:17 +08:00
bittersweet1999
465308e430
[Feature] Add Subjective Evaluation (#680)
* new version of subject

* fixed draw

* fixed draw

* fixed draw

* done

* done

* done

* done

* fixed lint
2023-12-11 22:22:11 +08:00
Hubert
e78857ac36
[Sync] minor test (#683) 2023-12-11 17:42:53 +08:00
Xiaoming Shi
1bf85949ef
[Feature] Add medbench (#678)
* update medbench

* medbench update

* format medbench

* format

---------

Co-authored-by: 施晓明 <PJLAB\shixiaoming@pjnl104220118l.pjlab.org>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-09 16:05:46 +08:00
liyucheng09
05bbce8b08
[Feature] Add Data Contamination Analysis (#639)
* add contamination analysis to ceval

* fix bugs

* add contamination docs

* to pass CI check

* update

---------

Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-08 10:00:11 +08:00
bittersweet1999
1c95790fdd
New subjective judgement (#660)
* TabMWP

* TabMWP

* fixed

* fixed

* fixed

* done

* done

* done

* add new subjective judgement

* add new subjective judgement

* add new subjective judgement

* add new subjective judgement

* add new subjective judgement

* modified to a more general way

* modified to a more general way

* final

* final

* add summarizer

* add new summarize

* fixed

* fixed

* fixed

---------

Co-authored-by: caomaosong <caomaosong@pjlab.org.cn>
2023-12-06 13:28:33 +08:00
rolellm
e10f1c9139
added rolebench dataset. (#633)
* added rolebench

* 修改了不合理的变量名

* 修改了评论中的变量名
2023-12-01 22:54:42 +08:00
liushz
f4bbff6537
[Feature] Update MathBench CodeInterpreter & fix MathBench Bug (#657)
* Update MathBench CodeInterpreter & fix MathBench Bug

* Fix errors

* update

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>
2023-12-01 22:27:24 +08:00
Hubert
9eb5cadcac
[Feat] update gsm8k and math agent config (#652)
* [Feat] update gsm8k and math agent config

* minor fix
2023-12-01 15:08:38 +08:00
liushz
a331c9abfd
[Feature] Add wikibench dataset (#655)
* Add WikiBench

* Add WikiBench

* format

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-01 14:56:54 +08:00
liushz
e019c831fe
[Feature] Add Chinese version: commonsenseqa, crowspairs and nq (#144)
* add Chinese version: csqa crowspairs nq

* Update cn_data

* Update cn_data

* update format

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-30 15:33:02 +08:00
Ma Zerun
6aaf3b91ec
[Feature] Support chat style inferencer. (#643)
* [Feature] Support chat style inferencer.

* [Fix] use new prompt

* [Fix] use new prompt

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-11-30 14:00:06 +08:00
Fengzhe Zhou
5933c04fda
fix hellaswag_ppl_47bff9 (#648) 2023-11-29 16:51:44 +08:00
liushz
6d0d78986c
[Feature] Add GSM_Hard dataset (#619)
* Add SVAMP dataset

* Add SVAMP dataset

* Add SVAMP dataset

* Add gsm_hard dataset

* Add gsm_hard dataset

* format

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-27 17:40:34 +08:00
Fengzhe Zhou
9083dea683
[Sync] some renaming (#641) 2023-11-27 16:06:49 +08:00
Fengzhe Zhou
d4d1330a5a
[Sync] Fix cmnli, fix vicuna meta template, fix longbench postprocess and other minor fixes (#625) 2023-11-23 14:05:59 +08:00
liushz
048775192b
[Feature] Add SVAMP dataset (#604)
* Add SVAMP dataset

* Add SVAMP dataset

* Add SVAMP dataset
2023-11-22 14:54:39 +08:00
Songyang Zhang
d925748266
[Feature] Support 360API and FixKRetriever for CSQA dataset (#601)
* [Feature] Support 360API and FixKRetriever for CSQA dataset

* Update API

* Update API

* [Feature] Support 360API and FixKRetriever for CSQA dataset

* Update API

* Update API

* rm mathbench

* fix_lint

* Update opencompass/models/bytedance_api.py

Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>

* update

* update

* update

---------

Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>
2023-11-21 20:25:47 +08:00
liushz
dbacd36379
Add aritch to mathbench (#607) 2023-11-20 19:40:41 +08:00
liushz
c9c5c5d92e
Mathbench update postprocess (#600)
* Update mathbench

* Update mathbench
2023-11-20 16:48:55 +08:00
Jingming
5e75e29711
[Feature] Add multi-prompt generation demo (#568)
* [Feature] Add multi-prompt generation demo

* [Fix] change form in winogrande_gen_XXX.py

* [Fix] make multi prompt demo more directly

* [Fix] fix bug

* [Fix] minor fix

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-11-20 16:16:37 +08:00
Raymond Zhang
c0acd06b05
[Feature] Add FinanceIQ dataset (#596) 2023-11-16 17:47:57 +08:00
Yu
8160cb84e3
update word spell (#594) 2023-11-15 15:23:58 +08:00
Songyang Zhang
c8cb38e822
[Feature] Update mathbench (#580)
* update xunfei api

* fix lint

* update mathbench to avoid incomplete prediction
2023-11-14 16:04:02 +08:00
Fengzhe Zhou
1ea88d5822
[Sync] Bump version to 0.1.8 (#576) 2023-11-13 16:00:38 +08:00
Fengzhe Zhou
d3de5c41fb
[Sync] update model configs (#574) 2023-11-13 15:15:34 +08:00
Fengzhe Zhou
689ffe5b63
[Feature] Use dataset in local path (#570)
* update commonsenseqa

* update drop

* update flores_first100

* update gsm8k

* update humaneval

* update lambda

* update obqa

* update piqa

* update race

* update siqa

* update story_cloze

* update strategyqa

* update tydiqa

* update winogrande

* update doc

* update hellaswag

* fix obqa

* update collections

* update .zip name
2023-11-13 13:00:37 +08:00
Fengzhe Zhou
d6aaac22e7
[Feature] Update cmb (#571) 2023-11-13 00:09:05 +08:00
Kevin Wang
7f77e8dae5
[Docs] fix dataset name error (#533) 2023-11-10 18:54:20 +08:00
Hubert
95e0da0173
[Docs] add humanevalx dataset link in config (#559)
* [Docs] add humanevalx dataset link in config

* [Docs] add humanevalx dataset link in config

* minor fix
2023-11-10 18:18:58 +08:00
jingmingzhuo
b3cbef3226
[Feature] Add py150 and maxmin (#562)
* [feat] add clozeTesst_maxmin dataset

* [feat] add py150 datasets

* [feat] change __init__.py in opencompass/datasets

* [fix] pre-commit check

* [fix] rename py150 and masxmin datasets in configs

* [feat] add gen.py of py150 and maxmin in configs/datasets
2023-11-09 22:05:25 +08:00
Hubert
bb2ecf416e
[Feat] Support cibench (#538)
* [Feat] support cidataset

* [Feat] support cidataset

* [Feat] support cidataset

* [Feat] support cidataset

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix

* rename cibench

* rename cibench

* rename cibench

* rename cibench

* minor fix

* minor fix

* minor fix
2023-11-07 19:11:44 +08:00
liushz
214a34f0b8
【Feature】Update Mathbench dataset prompt and fix small errors (#546)
* Update mathbench

* Update mathbench

* Update mathbench
2023-11-06 21:58:31 +08:00
bittersweet1999
f25a980043
[fFeat] Add an opensource dataset Tabmwp (#505)
* TabMWP

* TabMWP

* fixed

* fixed

* fixed

* done

* done

* done

---------

Co-authored-by: caomaosong <caomaosong@pjlab.org.cn>
2023-11-03 11:15:46 +08:00
Qing
229a65f305
[Fix] Fix typo in WSC prompt (#520)
Co-authored-by: wq.chu <wq.chu@tianrang-inc.com>
2023-10-30 12:16:26 +08:00
Fengzhe Zhou
dbb20b8270
[Sync] update (#517) 2023-10-27 20:31:22 +08:00
Wei Jueqi
b62842335d
[Doc] Update Subjective docs (#510)
* rename

* add en subdoc

* fix name

* fix writing

* update

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-10-27 16:27:24 +08:00
Hubert
b3f5d9e421
[Feat] support math/gms8k agent config (#494)
* support math agent

* support gsm8k agent

* support gsm8k agent

* minor fix

* minor fix

* minor fix

* Update configs/eval_codeagent.py
2023-10-25 23:05:15 +08:00
liushz
2737249f31
[Feature] Add mathbench dataset and circular evaluator (#408)
* add_mathbench

* update mathbench

* support non circular eval dataset

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-10-18 04:08:31 -05:00
Leymore
861942ab1b
[Feature] Add lawbench (#460)
* add lawbench

* update requirements

* update
2023-10-13 06:51:36 -05:00
Leymore
fbf5089c40
[Sync] update github token (#475) 2023-10-13 06:50:54 -05:00
Leymore
d7ff933a73
[Fix] Use jieba rouge in lcsts (#459)
* use jieba rouge in lcsts

* use rouge_chinese
2023-10-09 10:10:33 +08:00
Tong Gao
119bfd1569
[Refactor] Move fix_id_list to Retriever (#442)
* [Refactor] Move fix_id_list to Retriever

* update

* move to base

* fix
2023-10-07 12:53:41 +08:00
philipwangOvO
3bb3d330eb
[Sync] Update LongEval (#443) 2023-09-27 16:32:40 +08:00
Kevin Wang
dc1b82c346
[SIG] add GLUE_MRPC dataset (#440) 2023-09-27 11:44:54 +08:00
Kevin Wang
14fdecfecc
[Dataset] add GLUE QQP dataset (#438) 2023-09-27 11:36:43 +08:00
Kevin Wang
d8354fe5d8
[SIG] add GLUE_CoLA dataset (#406)
* [Dataset] add GLUE_CoLA dataset

* [update] use HFDataset to load glue/cola dataset

* update

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2023-09-27 11:30:44 +08:00
Kevin Wang
012546666b
[SIG] add WikiText-2&103 (#397)
* fix conflict

* add eval_cfg
2023-09-26 14:31:15 +08:00
liushz
c5224c2a91
[Feature] Add kaoshi dataset (#392)
* Add ToT method

* Update ToT

* Update ToT

* Update ToT

* Update ToT

* Update ToT

* Add Koashi

* Update Kaoshi

* Update Kaoshi

* Update kaoshi

* Update kaoshi

* Update Kaoshi

* Update Kaoshi

* Update Kaoshi

* Update Kaoshi

* update Kaoshi

* update

* update

* fix

---------
Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2023-09-22 18:46:33 +08:00
TTTTTiam
2a62bea1a4
add evaluation of scibench (#393)
* add evaluation of scibench

* add evaluation of scibench

* update scibench

* remove scibench evaluator

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-09-22 17:42:08 +08:00
Hubert
8803f7f7a6
[Feat] support antropics evals dataset (#422)
* [Feat] support anthropics ai risk dataset

* [Feat] support anthropics evals dataset

* [Feat] support anthropics evals dataset
2023-09-20 18:36:44 +08:00
Hubert
2c15a0c01d
[Feat] refine docs and codes for more user guides (#409) 2023-09-18 16:12:13 +08:00
Hubert
a11cb45c83
[Feat] implementation for support promptbench (#239)
* [Feat] support adv_glue dataset for adversarial robustness

* reorg files

* minor fix

* minor fix

* support prompt bench demo

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix
2023-09-15 15:06:53 +08:00
Hubert
de8a154795
[Feat] support ds1000 dataset (#395)
* [Feat] support ds1000 datase
2023-09-15 12:50:27 +08:00
Xidong Wang
47a752cd56
[Dataset] Add CMB (#376)
* Add CMB

* modify CMB

---------

Co-authored-by: wangxidong <xidongw@163.com>
2023-09-12 19:16:41 +08:00
Leymore
b48d084020
[Fix] update bbh implement & fix bbh suffix (#371) 2023-09-08 15:14:30 +08:00
Hubert
ddb8197212
[Feat] support wizardcoder series (#344)
* [Feat] support wizardcoder series

* minor fix
2023-09-06 17:52:35 +08:00
Leymore
7ca6ba625e
[Feature] Add qwen & qwen-chat support (#286)
* add and apply update suffix tool

* add tool doc

* add qwen configs

* add cmmlu

* rename bbh

* update datasets

* delete

* update hf_qwen_7b.py
2023-08-31 11:29:05 +08:00
Leymore
c26ecdb1b0
[Feature] Add and apply update suffix tool (#280)
* add and apply update suffix tool

* add dataset suffix updater as precommit hook

* update workflow

* update scripts

* update ci

* update

* ci with py3.8

* run in serial

* update bbh

* use py 3.10

* update pre commit zh cn
2023-08-28 17:35:04 +08:00
Tong Gao
9058be07b8
[Feature] Simplify entry script (#204)
* [Feature] Simply entry script

* update
2023-08-25 17:36:30 +08:00
Tong Gao
fda42fd5fd
[Fix] wrong path in dataset collections (#272) 2023-08-25 15:50:30 +08:00
philipwangOvO
3f37c40aa3
[Dataset] Refactor LEval 2023-08-25 11:46:23 +08:00
liushz
02ce139bc6
[Feature] Add Tree-of-Thought method (#173)
* Add ToT method

* Update ToT

* Update ToT

* Update ToT

* Update ToT

* Update ToT

* Update ToT

* Update ToT

* Update chain_of_thought.md

* Update icl_tot_inferencer.py

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2023-08-23 12:23:05 +08:00
philipwangOvO
655a807f4b
[Dataset] LongBench (#236)
Co-authored-by: wangchonghua <wangchonghua@pjlab.org.cn>
2023-08-21 14:15:20 +08:00
Ezra-Yu
17ccaa5980
[Feat] Add codegeex2 and Humanevalx (#210)
* add codegeex2

* add humanevalx dataset

* add evaluator

* update evaluator

* update configs

* update clean code

* update configs

* fix lint

* remove sleep

* fix lint

* update docs

* fix lint
2023-08-17 11:03:16 +08:00
Hubert
0fe2366a72
[Feat] support adv_glue dataset for adversarial robustness (#205)
* [Feat] support adv_glue dataset for adversarial robustness

* reorg files

* minor fix

* minor fix
2023-08-16 18:42:06 +08:00
Hubert
7c393192af
[Fix] fix bug for postprocessor (#195)
* [Fix] fix bug for postprocessor

* minor fix
2023-08-11 18:41:12 +08:00
Tong Gao
bf79ff1c6d
[Feature] Add LEval datasets
Co-authored-by: kennymckormick <dhd@pku.edu.cn>
2023-08-11 17:38:31 +08:00
Hubert
8d9cee060f
[Feat] update postprocessor to get first option more accurately (#193)
* [Feat] update postprocessor to get first option

* minor fix

* minor fix
2023-08-11 17:33:00 +08:00
Leymore
14332e08fd
[Feature] add llama-oriented dataset configs (#82)
* add llama-oriented dataset configs

* update

* revert cvalues & update llama_example
2023-08-11 12:48:05 +08:00
Hubert
5a9539f375
[Feat] add safety to collections (#185)
* [Feat] add safety to collections

* minor fix
2023-08-11 11:19:26 +08:00
Tong Gao
2931f3dcb8
[Enhancement] Add humaneval postprocessor for GPT models & eval config for GPT4, enhance the original humaneval postprocessor (#129)
* [Enhancement] Enhance humaneval postprocessor

* add human-eval testcase

* update

* update

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-08-10 16:31:12 +08:00
Leymore
e7fc54baf1
[Feature] Add Xiezhi SQuAD2.0 ANLI (#101)
* add Xiezhi SQuAD2.0 ANLI; update WSC

* update

* update

* update doc string
2023-08-10 14:04:18 +08:00
Leymore
876ade71a5
[Fix] Fix AGIEval multiple choice (#137)
* update agieval data

* rename variables
2023-08-10 11:38:24 +08:00
Tong Gao
c00179d46b
[Feature] Evaluating acc based on minimum edit distance, update SIQA (#130)
* [Feature] Support evaluating acc based on minimum edit distance, update SIQA

* update
2023-08-01 14:24:27 +08:00
Leymore
d862f570aa
[Feature] Add SC (#126)
* add self-consistency

* add CoT method Self-Consistency

* fix typo error and update openicl_eval

* add tydiQA-GoldP task

* fix sc

* rename gsm8k_sc

* fix sc

* add self-consistency doc

* refine sc

---------

Authored-by: liushz <qq1791167085@163.com>
2023-07-28 17:29:37 +08:00
Hubert
b7184e9db5
[Refactor] Update crows-pairs evaluation (#98)
* [Refactor] Update crows-pairs evaluation

* [Refactor] Update crows-pairs evaluation

* minor
2023-07-26 11:21:32 +08:00
Haonan Li
e9cdb24ddd
[Feature] Add CMMLU dataset (#91)
* add CMMLU

* debug cmmlu

* add slurm args `qos`

* fix format: space before comment

* remove unused variable

* change the location of `answer is`

---------

Co-authored-by: 李浩楠 <lihaonan@lihaonandeMacBook-Air.local>
Co-authored-by: 李浩楠 <haonan.li>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-07-25 10:14:27 +08:00
Hubert
f83e125e5a
[Feat] Support CValues Responsibility dataset (#78)
* [Feat] support CValues

* minor fix
2023-07-18 18:45:15 +08:00
liushz
f36c0496f3
[Feature] Add tydiqa-goldp (#75)
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2023-07-18 14:54:35 +08:00
Leymore
1326aff77e
[Feature] Add logger info and remove dataset bugs (#61)
* Add logger info and remove dataset bugs

* fix typo
2023-07-17 14:26:30 +08:00
Leymore
86d5ec3d0f
Update configs (#9)
* Update implements

* Update
2023-07-06 12:27:41 +08:00
Tong Gao
16e759b996
Align prompt files with their hash (#1)
* fix bbh

* fix bbh

* rename
2023-07-05 18:28:58 +08:00
mzr1996
04dd01a235 Update configs and code 2023-07-05 11:45:08 +08:00
Leymore
c94cc94348 Add release contribution 2023-07-05 03:15:31 +00:00
tonysy
e6b5bdcb87 OpenCompass Public MR 2023-07-05 03:15:21 +00:00
Ezra-Yu
cbe9fe2cdb Add Release Contraibution 2023-07-05 02:22:40 +00:00
cky
36f111100f update datasets 2023-07-05 01:45:26 +00:00
mzr1996
3cfe73de3f Support a batch of datasets. 2023-07-05 01:30:27 +00:00
yingfhu
fb11108723 [Feat] support opencompass 2023-07-04 22:11:33 +08:00
gaotongxiao
7d346000bb initial commit 2023-07-04 21:34:55 +08:00