OpenCompass/docs/en/MMBench.md
Yuan Liu 191a3f6f9d
[Feature]: Use multimodal (#73)
* [Feature]: Add minigpt-4

* [Feature]: Add mm local runner

* [Feature]: Add instructblip

* [Feature]: Delete redundant file

* [Feature]: Delete redundant file

* [Feature]: Add README to InstructBLIP

* [Feature]: Update MiniGPT-4

* [Fix]: Fix lint

* [Feature]add omnibenchmark readme (#49)

* add omnibenchmark readme

* fix

* Update OmniMMBench.md

* Update OmniMMBench.md

* Update OmniMMBench.md

* [Fix]: Refine name (#54)

* [Feature]: Unify out and err

* [Fix]: Fix lint

* [Feature]: Rename to mmbench and change weight path

* [Feature]: Delete Omni in instructblip

* [Feature]: Check the avaliablity of lavis

* [Fix]: Fix lint

* [Feature]: Refactor MM

* [Refactor]: Refactor path

* [Feature]: Delete redundant files

* [Refactor]: Delete redundant files

---------

Co-authored-by: Wangbo Zhao(黑色枷锁) <56866854+wangbo-zhao@users.noreply.github.com>
2023-08-03 11:07:50 +08:00

133 lines
3.9 KiB
Markdown

# Evaluation pipeline on MMBench
## Intro to each data sample in MMBench
MMBecnh is split into **dev** and **test** split, and each data sample in each split contains the following field:
```
img: the raw data of an image
question: the question
options: the concated options
category: the leaf category
l2-category: the l2-level category
options_dict: the dict contains all options
index: the unique identifier of current question
context (optional): the context to a question, which is optional.
answer: the target answer to current question. (only exists in the dev split, and is keep confidential for the test split on our evaluation server)
```
## Load MMBench
We provide a code snippet as an example of loading MMBench
```python
import base64
import io
import random
import pandas as pd
from PIL import Image
from torch.utils.data import Dataset
def decode_base64_to_image(base64_string):
image_data = base64.b64decode(base64_string)
image = Image.open(io.BytesIO(image_data))
return image
class MMBenchDataset(Dataset):
def __init__(self,
data_file,
sys_prompt='There are several options:'):
self.df = pd.read_csv(data_file, sep='\t')
self.sys_prompt = sys_prompt
def __len__(self):
return len(self.df)
def __getitem__(self, idx):
index = self.df.iloc[idx]['index']
image = self.df.iloc[idx]['image']
image = decode_base64_to_image(image)
question = self.df.iloc[idx]['question']
answer = self.df.iloc[idx]['answer'] if 'answer' in self.df.iloc[0].keys() else None
catetory = self.df.iloc[idx]['category']
l2_catetory = self.df.iloc[idx]['l2-category']
option_candidate = ['A', 'B', 'C', 'D', 'E']
options = {
cand: self.load_from_df(idx, cand)
for cand in option_candidate
if self.load_from_df(idx, cand) is not None
}
options_prompt = f'{self.sys_prompt}\n'
for key, item in options.items():
options_prompt += f'{key}. {item}\n'
hint = self.load_from_df(idx, 'hint')
data = {
'img': image,
'question': question,
'answer': answer,
'options': options_prompt,
'category': catetory,
'l2-category': l2_catetory,
'options_dict': options,
'index': index,
'context': hint,
}
return data
def load_from_df(self, idx, key):
if key in self.df.iloc[idx] and not pd.isna(self.df.iloc[idx][key]):
return self.df.iloc[idx][key]
else:
return None
```
## How to construct the inference prompt
```python
if data_sample['context'] is not None:
prompt = data_sample['context'] + ' ' + data_sample['question'] + ' ' + data_sample['options']
else:
prompt = data_sample['question'] + ' ' + data_sample['options']
```
For example:
Question: Which category does this image belong to?
A. Oil Painting
B. Sketch
C. Digital art
D. Photo
<div align=center>
<img src="https://github-production-user-asset-6210df.s3.amazonaws.com/34324155/255581681-1364ef43-bd27-4eb5-b9e5-241327b1f920.png" width="50%"/>
</div>
```python
prompt = """
###Human: Question: Which category does this image belong to?
There are several options: A. Oil Painting, B. Sketch, C. Digital art, D. Photo
###Assistant:
"""
```
You can make custom modifications to the prompt
## How to save results:
You should dump your model's predictions into an excel(.xlsx) file, and this file should contain the following fields:
```
question: the question
A: The first choice
B: The second choice
C: The third choice
D: The fourth choice
prediction: The prediction of your model to current question
category: the leaf category
l2_category: the l2-level category
index: the l2-level category
```
If there are any questions with fewer than four options, simply leave those fields blank.