Renjie Liao
Relevant Thesis-Based Degree Programs
Graduate Student Supervision
Master's Student Supervision
Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.
There has been a growing interest in solving Visual Question Answering (VQA) tasks that require the model to reason beyond the content present in the image. In this work, we focus on questions that require commonsense reasoning. In contrast to previous methods which inject knowledge from static knowledge bases, we investigate the incorporation of contextualized knowledge using Commonsense Transformer (COMET), an existing knowledge model trained on human-curated knowledge bases. We propose a method to generate, select, and encode external commonsense knowledge alongside visual and textual cues in a new pre-trained Vision-Language-Commonsense transformer model, VLC-BERT. Through our evaluation on the knowledge-intensive OK-VQA and A-OKVQA datasets, we show that VLC-BERT is capable of outperforming existing models that utilize static knowledge bases. Furthermore, through a detailed analysis, we explain which questions benefit, and which don’t, from contextualized commonsense knowledge from COMET.
View record
In this thesis, we investigate the ability of neural networks, particularly Transformers, to reason and memorize. First, we focus on graph neural networks and Transformers, and analyze their performance on algorithmic reasoning tasks. We show that while models can achieve high accuracy on data from the same distribution as their training data, their performance drops significantly when faced with new, out-of-distribution data. We further show that even high performance on benchmark numbers may be misleading and true reasoning capability of these models remains limited. We identify several challenges involved in achieving true reasoning abilities and generalization to new data. We propose solutions to some of these challenges, including fixing input representation issues, hybrid models, and enlarging the training dataset. We also examine the expressivity of Transformers, providing a theoretical analysis of their ability to memorize data points. The results show a linear relationship between a Transformer's memory capacity and both the number of its attention heads as well as the input's context size.
View record
If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.