MIME: Human-Aware 3D Scene Generation

CVPR2023

Hongwei Yi, Chun-Hao P. Huang, Shashank Tripathi, Lea Hering, Justus Thies, Michael J. Black

Max Planck Institute for Intelligent Systems, Tübingen, Germany

Abstract

Generating realistic 3D worlds occupied by moving humans has many applications in games, architecture, and synthetic data creation. But generating such scenes is expensive and labor intensive. Recent work generates human poses and motions given a 3D scene. Here, we take the opposite approach and generate 3D indoor scenes given 3D human motion. Such motions can come from archival motion capture or from IMU sensors worn on the body, effectively turning human movement in a “scanner” of the 3D world. Intuitively, human movement indicates the free-space in a room and human contact indicates surfaces or objects that support activities such as sitting, lying or touching. We propose MIME (Mining Interaction and Movement to infer 3D Environments), which is a generative model of indoor scenes that produces furniture layouts that are consistent with the human movement. MIME uses an auto-regressive transformer architecture that takes the already generated objects in the scene as well as the human motion as input, and outputs the next plausible object. To train MIME, we build a dataset by populating the 3D FRONT scene dataset with 3D humans. Our experiments show that MIME produces more diverse and plausible 3D scenes than a recent generative scene method that does not know about human movement. Code and data will be available for research at https://mime.is.tue.mpg.de.

Given one motion sequence, MIME can generate various rooms.

MIME can be trained to generate different types of rooms separately.

Illustrative Video

Paper

Code and Data

Code is released.

Data: please download through our download page.

Acknowledgement & Disclosure

Acknowledgement. We thank Despoina Paschalidou, Wamiq Para for useful feedback about the reimplementation of 
ATISS, and Yuliang Xiu, Weiyang Liu, Yandong Wen, Yao Feng for the insightful discussions, and Benjamin Pellkofer
for IT support. This work was supported by the German Federal Ministry of Education and Research (BMBF): Tübingen AI Center, FKZ: 01IS18039B. 
Disclosure. MJB has received research gift funds from Adobe, Intel, Nvidia, Meta/Facebook, and Amazon.
MJB has financial interests in Amazon, Datagen Technologies, and Meshcapade GmbH.
JT has received research gift funds from Microsoft Research.

Citation

@inproceedings{yi2022mime,
title = {{MIME}: Human-Aware {3D} Scene Generation},
author = {Yi, Hongwei and Huang, Chun-Hao P. and Tripathi, Shashank and Hering, Lea and 
Thies, Justus and Black, Michael J.},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, 
pages={12965-12976},
month={June}, 
year={2023} 
}

Contact

For questions, please contact mime@tue.mpg.de

For commercial licensing, please contact ps-licensing@tue.mpg.de