PhD Thesis

I defended my PhD thesis titled “Data-Efficient Learning On Structured Output Data” (Nov 19, 2024), and I started at Samsung AI Toronto. [defense slides]


Publications

Hallucination Score

Hallucination Score: Towards Mitigating Hallucinations in Generative Image Super-Resolution

Weiming Ren*, Raghav Goyal*, Zhiming Hu*, Tristan Ty Aumentado-Armstrong*, Iqbal Mohomed, Alex Levinshtein (* equal contribution)
arXiv. 2507.14367. [pdf]

Long Video MAE

Extending Video Masked Autoencoders to 128 frames

Nitesh B. Gundavarapu*, Luke Friedman*, Raghav Goyal*, Chaitra Hegde*, Eirikur Agustsson, Sagar M. Waghmare, Mikhail Sirotenko, Ming-Hsuan Yang, Tobias Weyand, Boqing Gong, Leonid Sigal (* equal contribution)
In NeurIPS. Vancouver, Canada. 2024. [pdf]

TAM-VT

TAM-VT: Transformation-Aware Multi-scale Video Transformer for Segmentation and Tracking

Raghav Goyal*, Wan-Cyuan Fan*, Mennatullah Siam, Leonid Sigal (* equal contribution)
In WACV. Tucson, USA. 2025. [pdf] [project page]

MINOTAUR

MINOTAUR: Multi-task Video Grounding From Multimodal Queries

Raghav Goyal, Effrosyni Mavroudi, Xitong Yang, Sainbayar Sukhbaatar, Leonid Sigal, Matt Feiszli, Lorenzo Torresani, Du Tran
arXiv. 2302.08063. [pdf]

Weakly-Supervised Human-centric Relation Detection

A Simple Baseline for Weakly-Supervised Human-centric Relation Detection

Raghav Goyal, Leonid Sigal
In BMVC. Virtual. 2021. [pdf]

UniT

UniT: Unified Knowledge Transfer for Any-shot Object Detection and Segmentation

Siddhesh Khandelwal*, Raghav Goyal*, Leonid Sigal (* equal contribution)
In CVPR. Virtual. 2021. [pdf]

Improved Few-Shot Visual Classification

Improved Few-Shot Visual Classification

Peyman Bateni, Raghav Goyal, Vaden Masrani, Frank Wood, Leonid Sigal
In CVPR. Seattle, USA. 2020. [pdf]

Evaluating visual common sense

Evaluating visual “common sense” using fine-grained classification and captioning tasks

Raghav Goyal, Farzaneh Mahdisoltani, Guillaume Berger, Waseem Gharbieh, Ingo Bax, Roland Memisevic
In ICLR Workshop. Vancouver, Canada. 2018. [pdf]

Something Something Video Database

The “something something” video database for learning and evaluating visual common sense

Raghav Goyal, Samira Ebrahimi Kahou, Vincent Michalski, *, Ingo Bax, Roland Memisevic (* see paper for additional authors)
In ICCV. Venice, Italy. 2017. [pdf] [supp] [code] [data]

Natural Language Generation

Natural Language Generation through Character-based RNNs with Finite-state Prior Knowledge

Raghav Goyal, Marc Dymetman, Eric Gaussier
In COLING. Osaka, Japan. 2016. [pdf]


ML Challenges

  • (Sep, 2018) Placed 3rd in Visual Dialog challenge hosted as a part of SIVL workshop at ECCV’18. Rankings can be found here.
  • (Jul, 2017) Placed 3rd in the Kinetics video recognition challenge, hosted by DeepMind at ActivityNet workshop at CVPR’17, with our approach detailed in this blog post.

Miscellaneous

  • Reviewer: ECCV’20, ICLR’21, CVPR’21