Deep Compositional Question Answering with Neural Module Networks

태그

작성일자

1 more property

Dec 15, 2018 deep-learning, computer-vision

WHY?

Visual question answering task is compositional in nature.

WHAT?

This paper tries to solve VQA by composing modeules to construct a network architecture based on a given question. Primitive modules that can be composed into any configuration of questions are defined: attention, re-attention, combination, classification, and measurement. The key component of the modules is attention mechanism that allow model to focus on the parts of the given image. A question is parsed to form an universal dependency representation which can be map into a network layout. Embedded question is combined to the end of network to capture subtle differences. The composed model is trained end-to-end.

So?

This model achieved the best performance on VQA and SHAPES datasets.

Andreas, Jacob, et al. “Neural module networks.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.