Evalation pipeline on MMBench

Intro to each data sample in MMBench

MMBecnh is split into dev and test split, and each data sample in each split contains the following field:

img: the raw data of an image
question: the question
options: the concated options
category: the leaf category
l2-category: the l2-level category
options_dict: the dict contains all options
index: the unique identifier of current question
context (optional): the context to a question, which is optional.
answer: the target answer to current question. (only exists in the dev split, and is keep confidential for the test split on our evaluation server)

Load MMBench

We provide a code snippet as an example of loading MMBench

import base64
import io
import random

import pandas as pd
from PIL import Image
from torch.utils.data import Dataset

def decode_base64_to_image(base64_string):
    image_data = base64.b64decode(base64_string)
    image = Image.open(io.BytesIO(image_data))
    return image

class MMBenchDataset(Dataset):
    def __init__(self,
                 data_file,
                 sys_prompt='There are several options:'):
        self.df = pd.read_csv(data_file, sep='\t')
        self.sys_prompt = sys_prompt

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        index = self.df.iloc[idx]['index']
        image = self.df.iloc[idx]['image']
        image = decode_base64_to_image(image)
        question = self.df.iloc[idx]['question']
        answer = self.df.iloc[idx]['answer'] if 'answer' in self.df.iloc[0].keys() else None
        catetory = self.df.iloc[idx]['category']
        l2_catetory = self.df.iloc[idx]['l2-category']

        option_candidate = ['A', 'B', 'C', 'D', 'E']
        options = {
            cand: self.load_from_df(idx, cand)
            for cand in option_candidate
            if self.load_from_df(idx, cand) is not None
        }
        options_prompt = f'{self.sys_prompt}\n'
        for key, item in options.items():
            options_prompt += f'{key}. {item}\n'

        hint = self.load_from_df(idx, 'hint')
        data = {
            'img': image,
            'question': question,
            'answer': answer,
            'options': options_prompt,
            'category': catetory,
            'l2-category': l2_catetory,
            'options_dict': options,
            'index': index,
            'context': hint,
        }
        return data
        
   def load_from_df(self, idx, key):
        if key in self.df.iloc[idx] and not pd.isna(self.df.iloc[idx][key]):
            return self.df.iloc[idx][key]
        else:
            return None

How to construct the inference prompt

if data_sample['context'] is None:
    prompt = data_sample['context'] + ' ' + data_sample['question'] + ' ' + data_sample['options']
else:
    prompt = data_sample['question'] + ' ' + data_sample['options']

For example: Question: Which category does this image belong to? A. Oil Paiting B. Sketch C. Digital art D. Photo

prompt = ###Human: Question: Which category does this image belong to? There are several options: A. Oil Paiting, B. Sketch, C. Digital art, D. Photo ###Assistant:

You can make custom modifications to the prompt

How to save results:

You should dump your model's predictions into an excel(.xlsx) file, and this file should contain the following fields:

question: the question
A: The first choice
B: The second choice
C: The third choice
D: The fourth choice
prediction: The prediction of your model to currrent question
category: the leaf category
l2_category: the l2-level category
index: the l2-level category

If there are any questions with fewer than four options, simply leave those fields blank.

3.8 KiB Raw Blame History

Evalation pipeline on MMBench

Intro to each data sample in MMBench

Load MMBench

How to construct the inference prompt

How to save results:

3.8 KiB

Raw Blame History