Manga109

MangaUB

MangaUB is a benchmark for evaluating the manga comprehension abilities of Large Multimodal Models (Vision-Language Models). Based on Manga109 and its derivative datasets, it includes tasks of various difficulty levels and content related to manga understanding. MangaUB is designed to assess not only the recognition and understanding of content within a single panel, but also the comprehension of information conveyed across multiple panels, enabling a fine-grained analysis of the various capabilities required for manga understanding.

Contents:

Questions: 6,585
Prompts: 18,179

Tasks:

Single-Panel Recognition:
- Location
- Time of Day
- Weather
- Character Count
Single-Panel Understanding:
- Onomatopoeia-Scene
- Emotion
Multi-Panel Recognition:
- Panel Localization
Multi-Panel Understanding:
- Next Panel Inference

Links:

Github repository
Paper