Few-shot Visual Reasoning with Meta-analogical Contrastive Learning
Youngsung Kim, Jinwoo Shin, Eunho Yang, Sung Ju Hwang
While humans can solve a visual puzzle that requires logical reasoning by observing only few samples, it would require training over large amount of data for state-of-the-art deep reasoning models to obtain similar performance on the same task. In this work, we propose to solve such a few-shot (or low-shot) visual reasoning problem, by resorting to analogical reasoning, which is a unique human ability to identify structural or relational similarity between two sets. Specifically, given training and test sets that contain the same type of visual reasoning problems, we extract the structural relationships between elements in both domains, and enforce them to be as similar as possible with analogical learning. We repeatedly apply this process with slightly modified queries of the same problem under the assumption that it does not affect the relationship between a training and a test sample. This allows to learn the relational similarity between the two samples in an effective manner even with a single pair of samples. We validate our method on RAVEN dataset, on which it outperforms state-of-the-art method, with larger gains when the training data is scarce. We further meta-learn our analogical contrastive learning model over the same tasks with diverse attributes, and show that it generalizes to the same visual reasoning problem with unseen attributes.