OpenCompass/docs/en/MMBench.md

# Evalation pipeline on MMBench


## Intro to each data sample in MMBench

MMBecnh is split into **dev** and **test** split, and each data sample in each split contains the following field:

```
img: the raw data of an image
question: the question
options: the concated options
category: the leaf category
l2-category: the l2-level category
options_dict: the dict contains all options
index: the unique identifier of current question
context (optional): the context to a question, which is optional.
answer: the target answer to current question. (only exists in the dev split, and is keep confidential for the test split on our evaluation server)
```


## Load MMBench
We provide a code snippet as an example of loading MMBench

```python
import base64
import io
import random

import pandas as pd
from PIL import Image
from torch.utils.data import Dataset

def decode_base64_to_image(base64_string):
    image_data = base64.b64decode(base64_string)
    image = Image.open(io.BytesIO(image_data))
    return image

class MMBenchDataset(Dataset):
    def __init__(self,
                 data_file,
                 sys_prompt='There are several options:'):
        self.df = pd.read_csv(data_file, sep='\t')
        self.sys_prompt = sys_prompt

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        index = self.df.iloc[idx]['index']
        image = self.df.iloc[idx]['image']
        image = decode_base64_to_image(image)
        question = self.df.iloc[idx]['question']
        answer = self.df.iloc[idx]['answer'] if 'answer' in self.df.iloc[0].keys() else None
        catetory = self.df.iloc[idx]['category']
        l2_catetory = self.df.iloc[idx]['l2-category']

        option_candidate = ['A', 'B', 'C', 'D', 'E']
        options = {
            cand: self.load_from_df(idx, cand)
            for cand in option_candidate
            if self.load_from_df(idx, cand) is not None
        }
        options_prompt = f'{self.sys_prompt}\n'
        for key, item in options.items():
            options_prompt += f'{key}. {item}\n'

        hint = self.load_from_df(idx, 'hint')
        data = {
            'img': image,
            'question': question,
            'answer': answer,
            'options': options_prompt,
            'category': catetory,
            'l2-category': l2_catetory,
            'options_dict': options,
            'index': index,
            'context': hint,
        }
        return data
        
   def load_from_df(self, idx, key):
        if key in self.df.iloc[idx] and not pd.isna(self.df.iloc[idx][key]):
            return self.df.iloc[idx][key]
        else:
            return None
```


## How to construct the inference prompt
```python
if data_sample['context'] is None:
    prompt = data_sample['context'] + ' ' + data_sample['question'] + ' ' + data_sample['options']
else:
    prompt = data_sample['question'] + ' ' + data_sample['options']
```

For example:
Question: Which category does this image belong to?
A. Oil Paiting
B. Sketch
C. Digital art
D. Photo

<div align=center>
<img src="https://user-images.githubusercontent.com/56866854/252847545-ea829a95-b063-492f-8760-d27143b5c834.jpg" width="10%"/>
</div>


```
prompt = ###Human: Question: Which category does this image belong to? There are several options: A. Oil Paiting, B. Sketch, C. Digital art, D. Photo ###Assistant:
```
You can make custom modifications to the prompt


## How to save results:
You should dump your model's predictions into an excel(.xlsx) file, and this file should contain the following fields:

```
question: the question
A: The first choice
B: The second choice
C: The third choice
D: The fourth choice
prediction: The prediction of your model to currrent question
category: the leaf category
l2_category: the l2-level category
index: the l2-level category
```
If there are any questions with fewer than four options, simply leave those fields blank.
[Feature]: Add MMBench (#56) 2023-07-13 10:06:23 +08:00			`# Evalation pipeline on MMBench`


			`## Intro to each data sample in MMBench`

			`MMBecnh is split into dev and test split, and each data sample in each split contains the following field:`

			```
			`img: the raw data of an image`
			`question: the question`
			`options: the concated options`
			`category: the leaf category`
			`l2-category: the l2-level category`
			`options_dict: the dict contains all options`
			`index: the unique identifier of current question`
			`context (optional): the context to a question, which is optional.`
			`answer: the target answer to current question. (only exists in the dev split, and is keep confidential for the test split on our evaluation server)`
			```


			`## Load MMBench`
			`We provide a code snippet as an example of loading MMBench`

			```python
			`import base64`
			`import io`
			`import random`

			`import pandas as pd`
			`from PIL import Image`
			`from torch.utils.data import Dataset`

			`def decode_base64_to_image(base64_string):`
			`image_data = base64.b64decode(base64_string)`
			`image = Image.open(io.BytesIO(image_data))`
			`return image`

			`class MMBenchDataset(Dataset):`
			`def __init__(self,`
			`data_file,`
			`sys_prompt='There are several options:'):`
			`self.df = pd.read_csv(data_file, sep='\t')`
			`self.sys_prompt = sys_prompt`

			`def __len__(self):`
			`return len(self.df)`

			`def __getitem__(self, idx):`
			`index = self.df.iloc[idx]['index']`
			`image = self.df.iloc[idx]['image']`
			`image = decode_base64_to_image(image)`
			`question = self.df.iloc[idx]['question']`
			`answer = self.df.iloc[idx]['answer'] if 'answer' in self.df.iloc[0].keys() else None`
			`catetory = self.df.iloc[idx]['category']`
			`l2_catetory = self.df.iloc[idx]['l2-category']`

			`option_candidate = ['A', 'B', 'C', 'D', 'E']`
			`options = {`
			`cand: self.load_from_df(idx, cand)`
			`for cand in option_candidate`
			`if self.load_from_df(idx, cand) is not None`
			`}`
			`options_prompt = f'{self.sys_prompt}\n'`
			`for key, item in options.items():`
			`options_prompt += f'{key}. {item}\n'`

			`hint = self.load_from_df(idx, 'hint')`
			`data = {`
			`'img': image,`
			`'question': question,`
			`'answer': answer,`
			`'options': options_prompt,`
			`'category': catetory,`
			`'l2-category': l2_catetory,`
			`'options_dict': options,`
			`'index': index,`
			`'context': hint,`
			`}`
			`return data`

			`def load_from_df(self, idx, key):`
			`if key in self.df.iloc[idx] and not pd.isna(self.df.iloc[idx][key]):`
			`return self.df.iloc[idx][key]`
			`else:`
			`return None`
			```




			`## How to construct the inference prompt`
			```python
			`if data_sample['context'] is None:`
			`prompt = data_sample['context'] + ' ' + data_sample['question'] + ' ' + data_sample['options']`
			`else:`
			`prompt = data_sample['question'] + ' ' + data_sample['options']`
			```

			`For example:`
			`Question: Which category does this image belong to?`
			`A. Oil Paiting`
			`B. Sketch`
			`C. Digital art`
			`D. Photo`

			`<div align=center>`
			`<img src="https://user-images.githubusercontent.com/56866854/252847545-ea829a95-b063-492f-8760-d27143b5c834.jpg" width="10%"/>`
			`</div>`




			```
			`prompt = ###Human: Question: Which category does this image belong to? There are several options: A. Oil Paiting, B. Sketch, C. Digital art, D. Photo ###Assistant:`
			```
			`You can make custom modifications to the prompt`


			`## How to save results:`
			`You should dump your model's predictions into an excel(.xlsx) file, and this file should contain the following fields:`

			```
			`question: the question`
			`A: The first choice`
			`B: The second choice`
			`C: The third choice`
			`D: The fourth choice`
			`prediction: The prediction of your model to currrent question`
			`category: the leaf category`
			`l2_category: the l2-level category`
			`index: the l2-level category`
			```
			`If there are any questions with fewer than four options, simply leave those fields blank.`