Dian Chen

I am a fourth-year undergraduate at UC Berkeley, studying Computer Science and Applied Mathematics. I currently work with Prof. Sergey Levine, Prof. Pieter Abbeel, and Prof. Jitendra Malik as a research assistant in the Berkeley Artificial Intelligence Research (BAIR) Lab.

Email  /  CV  /  LinkedIn  /  GitHub


My research interests lie in robotics, computer vision and machine learning including reinforcement learning. I am currently interested in how agents and robots could acquire complex skills in a learning-based manner through imitation and interaction.

Learning Instance Segmentation by Interaction
Deepak Pathak*, Dian Chen*, Fred Shentu*, Pulkit Agrawal*, Trevor Darrell, Sergey Levine, Jitendra Malik (*equal contribution)
(In Review) European Conference on Computer Vision (ECCV), 2018

We present a robotic system that learns to segment its visual observations into individual objects by experimenting with its environment in a completely self-supervised manner. Our system is at par with the state-of-art instance segmentation algorithm trained with strong supervision.

Zero-Shot Visual Imitation
Deepak Pathak*, Parsa Mahmoudieh*, Michael Luo*, Pulkit Agrawal*, Dian Chen, Fred Shentu, Evan Shelhamer, Jitendra Malik, Alexei Efros, Trevor Darrell (*equal contribution)
(Oral Presentation) International Conference on Learning Representation (ICLR), 2018
website / arxiv

We present a novel skill policy architecture and dynamics consistency loss which extend visual imitation to more complex environments while improving robustness. Experiments results are shown in a robot knot tying task and a first-person visual navigation task.

Combining Self-Supervised Learning and Imitation for Vision-Based Rope Manipulationg
Ashvin Nair*, Dian Chen*, Pulkit Agrawal*, Jitendra Malik, Pieter Abbeel, Sergey Levine (*equal contribution)
IEEE International Conference on Robotics and Automation (ICRA), 2017
website / arxiv

We present a system where a robot takes as input a sequence of images of a human manipulating a rope from an initial to goal configuration, and outputs a sequence of actions that can reproduce the human demonstration, using only monocular images as input.