University of Toronto
My research draws heavily on the Bayesian Brain Theory in cognitive science, which posits that intelligent agents internalize the rules governing their environment by constantly predicting incoming sensory signals, and then leverage this understanding as an essential prior for interaction and planning. Visual information stands out as the most fundamental signal in this process. Accordingly, my work centers on controllable video generation and its downstream applications as a world model, with a key focus on equipping pretrained models to grasp 3D world dynamics and autonomously explore with imagination.