Research

PhD Thesis

I defended my PhD thesis titled “Data-Efficient Learning On Structured Output Data” (Nov 19, 2024), and I started at Samsung AI Toronto. [defense slides]

Publications

Hallucination Score

Hallucination Score: Towards Mitigating Hallucinations in Generative Image Super-Resolution

Weiming Ren*, Raghav Goyal*, Zhiming Hu*, Tristan Ty Aumentado-Armstrong*, Iqbal Mohomed, Alex Levinshtein (* equal contribution)
arXiv. 2507.14367. [pdf]

Long Video MAE

Extending Video Masked Autoencoders to 128 frames

Nitesh B. Gundavarapu*, Luke Friedman*, Raghav Goyal*, Chaitra Hegde*, Eirikur Agustsson, Sagar M. Waghmare, Mikhail Sirotenko, Ming-Hsuan Yang, Tobias Weyand, Boqing Gong, Leonid Sigal (* equal contribution)
In NeurIPS. Vancouver, Canada. 2024. [pdf]

TAM-VT

TAM-VT: Transformation-Aware Multi-scale Video Transformer for Segmentation and Tracking

Raghav Goyal*, Wan-Cyuan Fan*, Mennatullah Siam, Leonid Sigal (* equal contribution)
In WACV. Tucson, USA. 2025. [pdf] [project page]

MINOTAUR

MINOTAUR: Multi-task Video Grounding From Multimodal Queries

Raghav Goyal, Effrosyni Mavroudi, Xitong Yang, Sainbayar Sukhbaatar, Leonid Sigal, Matt Feiszli, Lorenzo Torresani, Du Tran
arXiv. 2302.08063. [pdf]

Weakly-Supervised Human-centric Relation Detection

A Simple Baseline for Weakly-Supervised Human-centric Relation Detection

Raghav Goyal, Leonid Sigal
In BMVC. Virtual. 2021. [pdf]

UniT

UniT: Unified Knowledge Transfer for Any-shot Object Detection and Segmentation

Siddhesh Khandelwal*, Raghav Goyal*, Leonid Sigal (* equal contribution)
In CVPR. Virtual. 2021. [pdf]

Improved Few-Shot Visual Classification

Peyman Bateni, Raghav Goyal, Vaden Masrani, Frank Wood, Leonid Sigal
In CVPR. Seattle, USA. 2020. [pdf]

Evaluating visual common sense

Evaluating visual “common sense” using fine-grained classification and captioning tasks

Raghav Goyal, Farzaneh Mahdisoltani, Guillaume Berger, Waseem Gharbieh, Ingo Bax, Roland Memisevic
In ICLR Workshop. Vancouver, Canada. 2018. [pdf]

Something Something Video Database

The “something something” video database for learning and evaluating visual common sense

Raghav Goyal, Samira Ebrahimi Kahou, Vincent Michalski, *, Ingo Bax, Roland Memisevic (* see paper for additional authors)
In ICCV. Venice, Italy. 2017. [pdf] [supp] [code] [data]

Natural Language Generation

Natural Language Generation through Character-based RNNs with Finite-state Prior Knowledge

Raghav Goyal, Marc Dymetman, Eric Gaussier
In COLING. Osaka, Japan. 2016. [pdf]

ML Challenges

(Sep, 2018) Placed 3rd in Visual Dialog challenge hosted as a part of SIVL workshop at ECCV’18. Rankings can be found here.
(Jul, 2017) Placed 3rd in the Kinetics video recognition challenge, hosted by DeepMind at ActivityNet workshop at CVPR’17, with our approach detailed in this blog post.

Miscellaneous

Reviewer: ECCV’20, ICLR’21, CVPR’21