Deep Compositional Question Answering with Neural Module Networks

태그
작성일자
1 more property
Dec 15, 2018 deep-learning, computer-vision

WHY?

Visual question answering task is compositional in nature.

WHAT?

This paper tries to solve VQA by composing modeules to construct a network architecture based on a given question. Primitive modules that can be composed into any configuration of questions are defined: attention, re-attention, combination, classification, and measurement. The key component of the modules is attention mechanism that allow model to focus on the parts of the given image. A question is parsed to form an universal dependency representation which can be map into a network layout. Embedded question is combined to the end of network to capture subtle differences. The composed model is trained end-to-end.

So?

This model achieved the best performance on VQA and SHAPES datasets.