Diffusion Model is a Good Pose Estimator from 3D RF-Vision

School of Electrical and Electronic Engineering, Nanyang Technological University, School of Mechanical and Aerospace Engineering, Nanyang Technological University


Accepted by ECCV 2024

Visualization of mmDiff: a diffusion-based human pose estimation for RF-vision, showing enhanced stability for handling signal noise and miss-detection.

Abstract

Human pose estimation (HPE) from Radio Frequency vision (RF-vision) performs human sensing using RF signals that penetrate obstacles without revealing privacy (e.g., facial information). Recently, mmWave radar has emerged as a promising RF-vision sensor, providing radar point clouds by processing RF signals. However, the mmWave radar has a limited resolution with severe noise, leading to inaccurate and inconsistent human pose estimation. This work proposes mmDiff, a novel diffusion-based pose estimator tailored for noisy radar data.

In this paper, we aims to provide reliable guidance as conditions to diffusion models. Two key challenges are addressed by mmDiff: (1) miss-detection of parts of human bodies, which is addressed by a module that isolates feature extraction from different body parts, and (2) signal inconsistency due to environmental interference, which is tackled by incorporating prior knowledge of body structure and motion. Several modules are designed to achieve these goals, whose features work as the conditions for the subsequent diffusion model, eliminating the miss-detection and instability of HPE based on RF-vision. Extensive experiments demonstrate that mmDiff outperforms existing methods significantly, achieving state-of-the-art performances on public datasets.

Motivation

Interpolate start reference image.

Left: challenges of mmWave PCs. Right: the performance of existing SOTA (P4Transformer) compared to ours. PC's sparsity and dispersion cause inaccurate spline and shoulder. Inconsistent PCs with occasional miss-detection further cause size variance and pose vibration. mmDiff proposes diffusion-based pose estimation with enhanced accuracy and stability.

System Architecture

Interpolate start reference image.

mmDiff proposes diffusion-based pose estimation with a conditional diffusion model, using mmWave radar information as conditions. $k \in [0..K]$ denotes the diffusion step. Four modules are proposed to extract more reliable guidance, addressing PCs' noise and inconsistency respectively: GRC and LRC first extract more robust global-local radar features, $C^{glo}$ and $C^{loc}$; SLC and TMC then extract consistent human structure and motion patterns, $C^{tem}$ and $C^{lim}$.

Huamn Pose Estimation Result

Quantative results

Interpolate start reference image.
Interpolate start reference image.

Qualitative result on mmBody dataset

Interpolate start reference image.

Qualitative result on mm-Fi dataset

Interpolate start reference image.

License

mmDiff is released under the CC BY-NC 4.0.

Citation

@article{fan2024diffusion,
      title={Diffusion Model is a Good Pose Estimator from 3D RF-Vision},
      author={Fan, Junqiao and Yang, Jianfei and Xu, Yuecong and Xie, Lihua},
      journal={arXiv preprint arXiv:2403.16198},
      year={2024}
}