Qiong Cao

mathqiong2012@gmail.com

I am a Senior Research Scientist at JD Explore Academy, JD.com. Formerly I was a Senior Researcher at Tencent Youtu Lab. Before that, I was a Postdoctoral Researcher in the Visual Geometry Group (VGG) at the University of Oxford, where I worked with Prof. Andrew Zisserman.

I obtained my PhD in Computer Science from the University of Exeter under the supervision of Prof. Yiming Ying and Richard Everson.

My research interests are computer vision and deep learning, specifically in human-centric 2D/3D visual perception, multi-modal perception and generation across modalities such as visual, text, and audio.

CV / Google Scholar / LinkedIn

Publications

Motiontrack: Learning motion predictor for multiple object tracking.
Changcheng Xiao, Qiong Cao*, Yujie Zhong, Long Lan, Xiang Zhang, Zhigang Luo, Dacheng Tao.(*corresponding author.)
Neural Networks, 2024.

MuEP: A Multimodal Benchmark for Embodied Planning with Foundation Models.
Kanxue Li, Baosheng Yu, Qi Zheng, Yibing Zhan, Yuhui Zhang, Tianle Zhang, Yijun Yang, Yue Chen, Lei Sun, Qiong Cao, Li Shen, Lusong Li, Dapeng Tao, Xiaodong He.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI), 2024.

Human-aware 3D Scene Generation with Spatially-constrained Diffusion Models.
Xiaolin Hong, Hongwei Yi, Fazhi He*, Qiong Cao*.(*corresponding authors.)
ArXiv preprint arXiv:2406.18159, 2024.
code / arXiv / project

MambaTrack: A Simple Baseline for Multiple Object Tracking with State Space Model.
Changcheng Xiao*, Qiong Cao*, Zhigang Luo, Long Lan.(*equal contribution.)
The ACM Multimedia (ACM MM), Oral, 2024.

Towards Variable and Coordinated Holistic Co-Speech Motion Generation.
Yifei Liu*, Qiong Cao*, Yandong Wen, Huaiguang Jiang, Changxing Ding.(*equal contribution.)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
code / paper / project

Contact-aware Human Motion Generation from Textual Descriptions.
Sihan Ma, Qiong Cao, Jing Zhang, Dacheng Tao.
ArXiv preprint arXiv:2403.15709, 2024.

GraMMaR: Ground-aware Motion Model for 3D Human Reconstruction.
Sihan Ma, Qiong Cao, Hongwei Yi, Jing Zhang, Dacheng Tao.
The ACM Multimedia (ACM MM), 2023.
code / paper / project

Generating Holistic 3D Human Motion from Speech.
Hongwei Yi, Hualin Liang, Yifei Liu, Qiong Cao*, Yandong Wen, Timo Bolkart, Dacheng Tao, Michael J Black*. (*corresponding authors.)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
TalkSHOW code / SHOW code / paper / project

TriDet: Temporal Action Detection with Relative Boundary Modeling.
Dingfeng Shi, Yujie Zhong, Qiong Cao*, Lin Ma, Jia Li, Dacheng Tao. (*corresponding authors.)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

Learning Sequence Representations by Non-local Recurrent Neural Memory.
Wenjie Pei, Xin Feng, Canmiao Fu, Qiong Cao, Guangming Lu and Yu-wing Tai.
International Journal of Computer Vision (IJCV), 2022.

View Vertically: A Hierarchical Network for Trajectory Prediction via Fourier Spectrums.
Conghao Wong, Beihao Xia, Ziming Hong, Qinmu Peng, Wei Yuan, Qiong Cao, Yibo Yang, Xinge You.
European Conference on Computer Vision (ECCV), 2022.

ReAct: Temporal Action Detection with Relational Queries.
Dingfeng Shi, Yujie Zhong, Qiong Cao*, Jing Zhang, Lin Ma, Jia Li*, Dacheng Tao. (*corresponding authors.)
European Conference on Computer Vision (ECCV), 2022.

DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers.
Xianing Chen, Qiong Cao*, Yujie Zhong, Jing Zhang, Shenghua Gao*, Dacheng Tao. (*corresponding authors.)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.

Push for center learning via orthogonalization and subspace masking for person re-identification.
Weinong Wang, Wenjie Pei, Qiong Cao, Shu Liu, Xiaoyong Shen, Yu-Wing Tai.
IEEE Transactions on Image Processing (TIP), 2021.

Non-local Recurrent Neural Memory for Supervised Sequence Modeling.
Canmiao Fu, Wenjie Pei, Qiong Cao, Chaopeng Zhang, Yong Zhao, Xiaoyong Shen, Yu-Wing Tai.
International Conference on Computer Vision (ICCV), 2019.

MMFace: A Multi-Metric Regression Network for Unconstrained Face Reconstruction.
Hongwei Yi, Chen Li, Qiong Cao, Xiaoyong Shen, Sheng Li, Guoping Wang, Yu-Wing Tai.
Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

Automated video face labelling for films and TV material.
Omkar Parkhi, Esa Rahtu, Qiong Cao, Andrew Zisserman.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018.

Vggface2: A dataset for recognising faces across pose and age.
Qiong Cao, Li Shen, Weidi Xie, Omkar M Parkhi, Andrew Zisserman.
IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2018.
Download VGGFace2

Template adaptation for face verification and identification.
Nate Crosswhite, Jeffrey Byrne, Omkar M Parkhi, Chris Stauffer, Qiong Cao, Andrew Zisserman.
IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2017.

Generalization bounds for metric and similarity learning.
Qiong Cao, Zheng-Chu Guo, Yiming Ying.
Machine Learning, 2016.

Similarity metric learning for face recognition.
Qiong Cao, Yiming Ying, Peng Li.
International Conference on Computer Vision (ICCV), 2013.

Distance metric learning revisited.
Qiong Cao, Yiming Ying, Peng Li.
European Conference on Machine Learning & Principles and Practice of Knowledge Discovery (ECML PKDD), 2012.

Doctoral Thesis