đź‘‹ About me
I am a final-year Ph.D. candidate in the School of Software Engineering at South China University of Technology, advised by Prof. Mingkui Tan and Prof. Chuang Gan. I engage in developing an agent that can understand and interact with the multi-modal world. Toward this goal, my research mainly focus on:
- Embodied AI: Visual Navigation; Robot Manipulation
- Multi-Modal Video Understanding: Self-Supervised Video Representation Learning; Temporal Action Localization; Visually-Aligned Sound Generation
I am currently seeking opportunities in a company specializing in embodied AI or multi-modal video understanding. If you have a suitable position available, please feel free to contact me.
🗞️ News
- 2024.05: 3D-VLA is accepted by ICML 2024
- 2024.02: Two papers are accepted by CVPR 2024
- 2024.01: One papers is accepted by ICLR 2024
- 2023.09: Two papers are accepted by NeurIPS 2023 and one is seleceted as Spotlight!
- 2023.09: Happy to join UMass Amherst as a visiting scholar working closely with Prof. Chuang Gan!
- 2023.07: One paper is accepted by ICCV 2023!
- 2023.06: Happy to join MIT-IBM Watson Lab for intership!
- 2023.02: One paper is accepted by CVPR 2023!
- 2023.02: The code for MGMap and ActiveCamera is now available.
- 2022.11: Two NeurIPS 2022 papers are selected as Spotlight!
- 2022.10: Two papers are accepted by NeurIPS 2022!
- 2021.01: One paper is accepted by AAAI 2021!
Conferences
3D-VLA: 3D Vision-Language-Action Generative World Model ICML 2024 Pdf BibTex Project Page Code | |
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World CVPR 2024 Pdf BibTex Project Page | |
RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation CVPR 2024 | |
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding ICLR 2024 Pdf BibTex Project Page | |
A2Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models NeurIPS Workshop 2023 | |
3D-LLM: Injecting the 3D World into Large Language Models NeurIPS 2023 (Spotlight) Pdf BibTex Project Page Code | |
Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation NeurIPS 2022 (Spotlight) Pdf BibTex Project Page Code | |
Foley Music: Learning to Generate Music from Videos ECCV 2020 Pdf BibTex Project Page | |
Self-supervised Moving Vehicle Tracking with Stereo Sound ICCV 2019 Pdf BibTex Project Page |
Journals
🏆 Award
- 2023: The Principle’s Scholarship of SCUT
- 2020: The Principle’s Scholarship of SCUT
- 2018: The First Prize Scholarship of SCUT
- 2017: The Second Prize of the NXP Cup National University Students Intelligent Car Race