Akrit Mohapatra

I am a second year M.S. student at the ECE department at Virginia Tech. I am a member of the Machine Learning and Perception (MLP) Lab lead by Prof. Dhruv Batra and work closely with Prof. Devi Parikh.

I received my bachelors degree in Computer Engineering from Virginia Tech in 2016.

Email  /  CV

Research

My broad research interests lie in deep learning, computer vision and natural language processing. I am interested in developing AI systems that can better enable human interaction and improve perception of our world. I am also keen on exploring how we can make more interpretable AI systems which is important to develop a sense of understanding and trust.

The Promise of Premise: Harnessing Question Premises in Visual Question Answering
Aroma Mahendru*, Viraj Prabhu*, Akrit Mohapatra*, Dhruv Batra, Stefan Lee
* equal contribution

[Project] [Code] [Dataset]

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017

In this paper, we make a simple observation that questions about images often contain premises - objects and relationships implied by the question - and that reasoning about premises can help Visual Question Answering (VQA) models respond more intelligently to irrelevant or previously unseen questions. When presented with a question that is irrelevant to an image, state-of-the-art VQA models will still answer purely based on learned language biases, resulting in non-sensical or even misleading answers. We note that a visual question is irrelevant to an image if at least one of its premises is false (i.e. not depicted in the image). We leverage this observation to construct a dataset for Question Relevance Prediction and Explanation (QRPE) by searching for false premises. We train novel question relevance detection models and show that models that reason about premises consistently outperform models that do not. We also find that forcing standard VQA models to reason about premises during training can lead to improvements on tasks requiring compositional reasoning.

Towards Transparent AI Systems: Interpreting Visual Question Answering Models
Yash Goyal, Akrit Mohapatra, Devi Parikh, Dhruv Batra

International Conference on Machine Learning (ICML) Workshop on Visualization for Deep Learning, 2016
Best Student Paper
Interactive Visualizations: Question and Image

In this paper, we experimented with two visualization methods -- guided backpropagation and occlusion -- to interpret deep learning models for the task of Visual Question Answering. Specifically, we find what part of the input (pixels in images or words in questions) the VQA model focuses on while answering a question about an image.

CloudCV: Large-Scale Distributed Computer Vision as a Cloud Service
Harsh Agrawal, Clint Solomon Mathialagan, Yash Goyal, Neelima Chavali, Prakriti Banik, Akrit Mohapatra, Ahmed Osman, Dhruv Batra

Book Chapter, Mobile Cloud Visual Media Computing
Editors: Gang Hua, Xian-Sheng Hua. Springer, 2015.
Website

We present a comprehensive system to provide access to state-of-the-art distributed computer vision algorithms as a cloud service through a Web Interface and APIs.

Work Experience

Research Intern
Creative Technologies Lab (CTL), Adobe Research

Course Projects

Exploring Nearest Neighbor Approach on VQA
Fall 2015: ECE 5554/4984 Computer Vision by Prof. Devi Parikh

Teaching

Fall 2016: ECE 4554/5554: Computer Vision Fall 2016
Graduate Teaching Assistant
Instructor: Prof. Jia-Bin Huang

Other Projects

VQA Visualization

Interpreting Visual Question Answering Models (Image side visualizations)

Interpreting Visual Question Answering Models (Question side visualizations)

Bibtex to JS
Modified bibtex-js. Upload the respective .bib file and the website renders the publications in html format.

[Website courtesy]: The website template is based on Jon Barron's website.