👋 About me

I am a final-year Ph.D. candidate in the School of Software Engineering at South China University of Technology, advised by Prof. Mingkui Tan and Prof. Chuang Gan. I engage in developing an agent that can understand and interact with the multi-modal world. Toward this goal, my research mainly focus on:

  • Embodied AI: Visual Navigation; Robot Manipulation
  • Multi-Modal Video Understanding: Self-Supervised Video Representation Learning; Temporal Action Localization; Visually-Aligned Sound Generation

I am currently seeking opportunities in a company specializing in embodied AI or multi-modal video understanding. If you have a suitable position available, please feel free to contact me.

🗞️ News

  • 2024.05: 3D-VLA is accepted by ICML 2024
  • 2024.02: Two papers are accepted by CVPR 2024
  • 2024.01: One papers is accepted by ICLR 2024
  • 2023.09: Two papers are accepted by NeurIPS 2023 and one is seleceted as Spotlight!
  • 2023.09: Happy to join UMass Amherst as a visiting scholar working closely with Prof. Chuang Gan!
  • 2023.07: One paper is accepted by ICCV 2023!
  • 2023.06: Happy to join MIT-IBM Watson Lab for intership!
  • 2023.02: One paper is accepted by CVPR 2023!
  • 2023.02: The code for MGMap and ActiveCamera is now available.
  • 2022.11: Two NeurIPS 2022 papers are selected as Spotlight!
  • 2022.10: Two papers are accepted by NeurIPS 2022!
  • 2021.01: One paper is accepted by AAAI 2021!


3D-VLA: 3D Vision-Language-Action Generative World Model

Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, Chuang Gan

ICML 2024

Pdf BibTex Project Page Code
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World

Yining Hong, Zishuo Zheng, Peihao Chen, Yian Wang, Junyan Li, Zhenfang Chen, Chuang Gan

CVPR 2024

Pdf BibTex Project Page
RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation

Zeyuan Yang, Jiageng Liu, Peihao Chen, Anoop Cherian, Tim K. Marks, Jonathan Le Roux, Chuang Gan

CVPR 2024

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

Junyan Li, Delin Chen, Yining Hong, Zhenfang Chen, Peihao Chen, Yikang Shen, Chuang Gan

ICLR 2024

Pdf BibTex Project Page
A2Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models

Peihao Chen, Xinyu Sun, Hongyan Zhi, Runhao Zeng, Thomas H. Li, Gaowen Liu, Mingkui Tan, Chuang Gan

NeurIPS Workshop 2023

3D-LLM: Injecting the 3D World into Large Language Models

Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, Chuang Gan

NeurIPS 2023 (Spotlight)

Pdf BibTex Project Page Code
FGPrompt: Fine-grained Goal Prompting for Image-goal Navigation

Xinyu Sun, Peihao Chen, Jugang Fan, Jian Chen, Thomas H. Li, Mingkui Tan

NeurIPS 2023

Pdf BibTex
Learning Vision-and-Language Navigation from YouTube Videos

Kunyang Lin*, Peihao Chen*, Diwei Huang, Thomas H. Li, Mingkui Tan, Chuang Gan

ICCV 2023

Pdf BibTex
Masked Motion Encoding for Self-Supervised Video Representation Learning

Xinyu Sun*, Peihao Chen*, Liangwei Chen, Changhao Li, Thomas H Li, Mingkui Tan, Chuang Gan

CVPR 2023

Pdf BibTex
Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation

Peihao Chen*, Dongyu Ji*, Kunyang Lin, Runhao Zeng, Thomas H Li, Mingkui Tan, Chuang Gan

NeurIPS 2022 (Spotlight)

Pdf BibTex Project Page Code
Learning Active Camera for Multi-Object Navigation

Peihao Chen, Dongyu Ji, Kunyang Lin, Weiwen Hu, Wenbing Huang, Thomas H Li, Mingkui Tan, Chuang Gan

NeurIPS 2022 (Spotlight)

Pdf BibTex Code
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning

Peihao Chen, Deng Huang, Dongliang He, Xiang Long, Runhao Zeng, Shilei Wen, Mingkui Tan, Chuang Gan

AAAI 2021

Pdf BibTex Code
Foley Music: Learning to Generate Music from Videos

Chuang Gan, Deng Huang, Peihao Chen, Joshua B. Tenenbaum, Antonio Torralba

ECCV 2020

Pdf BibTex Project Page
Dense Regression Network for Video Grounding

Runhao Zeng, Haoming Xu, Wenbing Huang, Peihao Chen, Mingkui Tan, Chuang Gan

CVPR 2020

Pdf BibTex Code
Location-aware Graph Convolutional Networks for Video Question Answering

Deng Huang*, Peihao Chen*, Runhao Zeng, Qing Du, Mingkui Tan, Chuang Gan

AAAI 2020

Pdf BibTex Code
Self-supervised Moving Vehicle Tracking with Stereo Sound

Chuang Gan, Hang Zhao, Peihao Chen, David Cox, Antonio Torralba

ICCV 2019

Pdf BibTex Project Page


Generating Visually Aligned Sound from Videos

Peihao Chen, Yang Zhang, Mingkui Tan, Hongdong Xiao, Deng Huang, and Chuang Gan

TIP 2020

Pdf BibTex Code
Relation Attention for Temporal Action Localization

Chen Peihao, Gan Chuang, Shen Guangyao, Huang Wenbing, Zeng Runhao, Tan Mingkui


Pdf BibTex
Breaking Winner-Takes-All: Iterative-Winners-Out Networks for Weakly Supervised Temporal Action Localization

Runhao Zeng, Chuang Gan, Peihao Chen, Wenbing Huang, Qingyao Wu, Mingkui Tan

IEEE Trans. Image Processing 28(12) 2019

Pdf BibTex

🏆 Award

  • 2023: The Principle’s Scholarship of SCUT
  • 2020: The Principle’s Scholarship of SCUT
  • 2018: The First Prize Scholarship of SCUT
  • 2017: The Second Prize of the NXP Cup National University Students Intelligent Car Race