mirror of
https://github.com/open-compass/opencompass.git
synced 2025-05-30 16:03:24 +08:00
138 lines
3.8 KiB
Markdown
138 lines
3.8 KiB
Markdown
![]() |
# Evalation pipeline on MMBench
|
||
|
|
||
|
|
||
|
## Intro to each data sample in MMBench
|
||
|
|
||
|
MMBecnh is split into **dev** and **test** split, and each data sample in each split contains the following field:
|
||
|
|
||
|
```
|
||
|
img: the raw data of an image
|
||
|
question: the question
|
||
|
options: the concated options
|
||
|
category: the leaf category
|
||
|
l2-category: the l2-level category
|
||
|
options_dict: the dict contains all options
|
||
|
index: the unique identifier of current question
|
||
|
context (optional): the context to a question, which is optional.
|
||
|
answer: the target answer to current question. (only exists in the dev split, and is keep confidential for the test split on our evaluation server)
|
||
|
```
|
||
|
|
||
|
|
||
|
## Load MMBench
|
||
|
We provide a code snippet as an example of loading MMBench
|
||
|
|
||
|
```python
|
||
|
import base64
|
||
|
import io
|
||
|
import random
|
||
|
|
||
|
import pandas as pd
|
||
|
from PIL import Image
|
||
|
from torch.utils.data import Dataset
|
||
|
|
||
|
def decode_base64_to_image(base64_string):
|
||
|
image_data = base64.b64decode(base64_string)
|
||
|
image = Image.open(io.BytesIO(image_data))
|
||
|
return image
|
||
|
|
||
|
class MMBenchDataset(Dataset):
|
||
|
def __init__(self,
|
||
|
data_file,
|
||
|
sys_prompt='There are several options:'):
|
||
|
self.df = pd.read_csv(data_file, sep='\t')
|
||
|
self.sys_prompt = sys_prompt
|
||
|
|
||
|
def __len__(self):
|
||
|
return len(self.df)
|
||
|
|
||
|
def __getitem__(self, idx):
|
||
|
index = self.df.iloc[idx]['index']
|
||
|
image = self.df.iloc[idx]['image']
|
||
|
image = decode_base64_to_image(image)
|
||
|
question = self.df.iloc[idx]['question']
|
||
|
answer = self.df.iloc[idx]['answer'] if 'answer' in self.df.iloc[0].keys() else None
|
||
|
catetory = self.df.iloc[idx]['category']
|
||
|
l2_catetory = self.df.iloc[idx]['l2-category']
|
||
|
|
||
|
option_candidate = ['A', 'B', 'C', 'D', 'E']
|
||
|
options = {
|
||
|
cand: self.load_from_df(idx, cand)
|
||
|
for cand in option_candidate
|
||
|
if self.load_from_df(idx, cand) is not None
|
||
|
}
|
||
|
options_prompt = f'{self.sys_prompt}\n'
|
||
|
for key, item in options.items():
|
||
|
options_prompt += f'{key}. {item}\n'
|
||
|
|
||
|
hint = self.load_from_df(idx, 'hint')
|
||
|
data = {
|
||
|
'img': image,
|
||
|
'question': question,
|
||
|
'answer': answer,
|
||
|
'options': options_prompt,
|
||
|
'category': catetory,
|
||
|
'l2-category': l2_catetory,
|
||
|
'options_dict': options,
|
||
|
'index': index,
|
||
|
'context': hint,
|
||
|
}
|
||
|
return data
|
||
|
|
||
|
def load_from_df(self, idx, key):
|
||
|
if key in self.df.iloc[idx] and not pd.isna(self.df.iloc[idx][key]):
|
||
|
return self.df.iloc[idx][key]
|
||
|
else:
|
||
|
return None
|
||
|
```
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
## How to construct the inference prompt
|
||
|
```python
|
||
|
if data_sample['context'] is None:
|
||
|
prompt = data_sample['context'] + ' ' + data_sample['question'] + ' ' + data_sample['options']
|
||
|
else:
|
||
|
prompt = data_sample['question'] + ' ' + data_sample['options']
|
||
|
```
|
||
|
|
||
|
For example:
|
||
|
Question: Which category does this image belong to?
|
||
|
A. Oil Paiting
|
||
|
B. Sketch
|
||
|
C. Digital art
|
||
|
D. Photo
|
||
|
|
||
|
<div align=center>
|
||
|
<img src="https://user-images.githubusercontent.com/56866854/252847545-ea829a95-b063-492f-8760-d27143b5c834.jpg" width="10%"/>
|
||
|
</div>
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
```
|
||
|
prompt = ###Human: Question: Which category does this image belong to? There are several options: A. Oil Paiting, B. Sketch, C. Digital art, D. Photo ###Assistant:
|
||
|
```
|
||
|
You can make custom modifications to the prompt
|
||
|
|
||
|
|
||
|
## How to save results:
|
||
|
You should dump your model's predictions into an excel(.xlsx) file, and this file should contain the following fields:
|
||
|
|
||
|
```
|
||
|
question: the question
|
||
|
A: The first choice
|
||
|
B: The second choice
|
||
|
C: The third choice
|
||
|
D: The fourth choice
|
||
|
prediction: The prediction of your model to currrent question
|
||
|
category: the leaf category
|
||
|
l2_category: the l2-level category
|
||
|
index: the l2-level category
|
||
|
```
|
||
|
If there are any questions with fewer than four options, simply leave those fields blank.
|
||
|
|
||
|
|
||
|
|
||
|
|