MyGit
๐Ÿšฉๆ”ถๅˆฐGitHubไป“ๅบ“็š„ๆ›ดๆ–ฐ้€š็Ÿฅ

sanweiliti/RoHM

Fork: 3 Star: 191 (ๆ›ดๆ–ฐไบŽ 2024-05-08 02:46:24)

license: NOASSERTION

Language: Python .

The official PyTorch code for RoHM: Robust Human Motion Reconstruction via Diffusion.

ๆœ€ๅŽๅ‘ๅธƒ็‰ˆๆœฌ๏ผš v0 ( 2024-05-06 21:47:40)

ๅฎ˜ๆ–น็ฝ‘ๅ€ GitHub็ฝ‘ๅ€

โœจๅ…่ดน็”ณ่ฏท็ฝ‘็ซ™SSL่ฏไนฆ๏ผŒๆ”ฏๆŒๅคšๅŸŸๅๅ’Œๆณ›ๅŸŸๅ๏ผŒ็‚นๅ‡ปๆŸฅ็œ‹

RoHM

Robust Human Motion Reconstruction via Diffusion

Project Page | Paper

RoHM is a novel diffusion-based motion model that, conditioned on noisy and occluded input data, reconstructs complete, plausible motions in consistent global coordinates. -- we decompose it into two sub-tasks and learn two models, one for global trajectory and one for local motion. To capture the correlations between the two, we then introduce a novel conditioning module, combining it with an iterative inference scheme.

Installation

Creating a clean conda environment and install all dependencies by:

conda env create -f environment.yml

After the installation is complete, activate the conda environment by:

conda activate rohm

Data preparation

AMASS

  • Download the SMPL-X neutral annotations from AMASS dataset, and unzip the files.
  • To preprocess the raw AMASS data into the format for RoHM, run the following script for each subset, where dataset_name indicates the name of each subset. It will save the processed AMASS data to datasets/AMASS_smplx_preprocessed.
python preprocessing_amass.py --dataset_name=SUBSET_NAME --amass_root=PATH/TO/AMASS --save_root=datasets/AMASS_smplx_preprocessed

PROX

Download the following contents for PROX dataset:

  • cam2world, calibration and recordings from official PROX dataset
  • keypoints_openpose and mask_joint from here
  • and organize the contents as below:
PROX
โ”œโ”€โ”€ cam2world
โ”œโ”€โ”€ calibration
โ”œโ”€โ”€ recordings
โ”œโ”€โ”€ keypoints_openpose
โ”œโ”€โ”€ mask_joint

EgoBody

Download the following contents for EgoBody dataset:

  • kinect_color, data_splits.csv, calibrations, kinect_cam_params, smplx_camera_wearer_*, smplx_interactee_* from the official EgoBody dataset
  • keypoints_cleaned, mask_joint and egobody_rohm_info.csv from here
  • and organize the contents as below:
EgoBody
โ”œโ”€โ”€ kinect_color
โ”œโ”€โ”€ data_splits.csv
โ”œโ”€โ”€ smplx_camera_wearer_train
โ”œโ”€โ”€ smplx_camera_wearer_test
โ”œโ”€โ”€ smplx_camera_wearer_val
โ”œโ”€โ”€ smplx_interactee_train
โ”œโ”€โ”€ smplx_interactee_test
โ”œโ”€โ”€ smplx_interactee_val
โ”œโ”€โ”€ calibrations
โ”œโ”€โ”€ kinect_cam_params
โ”œโ”€โ”€ keypoints_cleaned
โ”œโ”€โ”€ mask_joint
โ”œโ”€โ”€ egobody_rohm_info.csv

egobody_rohm_info.csv includes information of recordings from EgoBody that we used for evaluation of RoHM.

SMPL-X body model

Download SMPL-X body model from here. Note that the latest version is 1.1 while we use 1.0 in the implementation.

Download smplx vertices segmentation smplx_vert_segmentation.json from here.

Other data (checkpoints, input noise for evaluation, etc.)

Download the model checkpoints from here and unzip. Download eval_noise_smplx (pre-computed motion noise for evaluation on AMASS) from here and unzip.

Organize all downloaded data as below:

RoHM
โ”œโ”€โ”€ data
โ”‚   โ”œโ”€โ”€ body_models
โ”‚   โ”‚   โ”œโ”€โ”€ smplx_model
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ smplx
โ”‚   โ”œโ”€โ”€ checkpoints
โ”‚   โ”œโ”€โ”€ eval_noise_smplx
โ”‚   โ”œโ”€โ”€ smplx_vert_segmentation.json
โ”œโ”€โ”€ datasets
โ”‚   โ”œโ”€โ”€ AMASS_smplx_preprocessed
โ”‚   โ”œโ”€โ”€ PROX
โ”‚   โ”œโ”€โ”€ EgoBody

Training

RoHM is trained on AMASS dataset.

TrajNet Training

Train the vanilla TrajNet with a curriculum training scheme for three stages, with increasing noise ratios:

python train_trajnet.py --config=cfg_files/train_cfg/trajnet_train_vanilla_stage1.yaml 
python train_trajnet.py --config=cfg_files/train_cfg/trajnet_train_vanilla_stage2.yaml --pretrained_model_path=PATH/TO/MODEL
python train_trajnet.py --config=cfg_files/train_cfg/trajnet_train_vanilla_stage3.yaml --pretrained_model_path=PATH/TO/MODEL

For stage 2 and 3, set pretrained_model_path to the trained checkpoint from the previous stage. To obtain the reported checkpoint, we train for 800k/400k/450k steps for stage 1/2/3, respectively.

TrajNet fine-tuning with TrajControl:

python train_trajnet.py --config=cfg_files/train_cfg/trajnet_ft_trajcontrol.yaml --pretrained_backbone_path=PATH/TO/MODEL

Set pretrained_backbone_path to the pre-trained checkpoint of vanilla TrajNet, and we train for 400k to obtain the reported checkpoint.

PoseNet training

Train PoseNet with a curriculum training scheme for two stages, with increasing noise ratios:

python train_posenet.py --config=cfg_files/train_cfg/posenet_train_stage1.yaml
python train_posenet.py --config=cfg_files/train_cfg/posenet_train_stage2.yaml --pretrained_model_path=PATH/TO/MODEL

For stage 2, set pretrained_model_path to the trained checkpoint from the previous stage. To obtain the reported checkpoint, we train for 300k/200k steps for stage 1/2, respectively.

Test and evaluate on AMASS

Test on AMASS

Test on AMASS with different configurations (corresponds to Tab.1 in the paper) and save reconstructed results to test_results/results_amass_full: Note that running the given configurations with the same random seed cannot guarantee exactly the same number across different machines, however the stochasticity is quite small.

  • Input noise level 3, and mask 10% frames out (masking out both trajectory and local body pose):
python test_amass_full.py --config=cfg_files/test_cfg/amass_occ_0.1_noise_3.yaml
  • Input noise level 3, and mask out lower body joints:
python test_amass_full.py --config=cfg_files/test_cfg/amass_occ_leg_noise_3.yaml
  • Input noise level 5, and mask out lower body joints:
python test_amass_full.py --config=cfg_files/test_cfg/amass_occ_leg_noise_5.yaml
  • Input noise level 7, and mask out lower body joints:
python test_amass_full.py --config=cfg_files/test_cfg/amass_occ_leg_noise_7.yaml

Evaluate on AMASS

Calculate the evaluation metrics and visualize/render on reconstructed results on AMASS.

  • Input noise level 3, and mask 10% frames out (masking out both trajectory and local pose):
python eval_amass_full.py --config=cfg_files/eval_cfg/amass_occ_0.1_noise_3.yaml --saved_data_path=PATH/TO/TEST/RESULTS
  • Input noise level 3, and mask out lower body joints
python eval_amass_full.py --config=cfg_files/eval_cfg/amass_occ_leg_noise_3.yaml --saved_data_path=PATH/TO/TEST/RESULTS
  • Input noise level 5, and mask out lower body joints
python eval_amass_full.py --config=cfg_files/eval_cfg/amass_occ_leg_noise_5.yaml --saved_data_path=PATH/TO/TEST/RESULTS
  • Input noise level 7, and mask out lower body joints
python eval_amass_full.py --config=cfg_files/eval_cfg/amass_occ_leg_noise_7.yaml --saved_data_path=PATH/TO/TEST/RESULTS

Other flags for visualization and rendering:

  • --visualize=True: visualize input/output/GT motions with open3d (with both skeletons and body meshes)
  • --render=True: render the input/output/GT motions with pyrender and save rendered results to --render_save_path

Test and evaluate on PROX/EgoBody

Correponds to the experiment setups in Tab.2 and Tab.3 in the paper.

Initialization

To obtain the initial (noisy and partially visible) motions on PROX, we use the following options:

  • For RGB-based reconstruction on PROX, we obtain the initial body pose from CLIFF, body shape from PIXIE, and global translation / orientation from MeTRAbs.
  • For RGBD-based reconstruction on PROX, we obtain the initial motion from per-frame optimization by adapted code from LEMO.
  • For RGB-based reconstruction on EgoBody, we obtain the intial motion from VPoser-t using the code from HuMoR.

TODO: we might provide our preprocessed initial & final motion sequences, or scripts to get the initializations later (currently undergoing license review). For now, we provide:

  • A sample sequence for initilized motion on PROX by LEMO (see 'Additional Data' section in LEMO).
  • A sample sequence for initilized motion on EgoBody by EgoBody Dataset (download rohm_init_egobody_rgb.zip from EgoBody dataset)
  • With the sample init motions above, or your customized input, please prepare your initial SMPL-X sequences following the data format as below:
RoHM
โ”œโ”€โ”€ data
โ”‚   โ”œโ”€โ”€ init_motions
โ”‚   โ”‚   โ”œโ”€โ”€ init_egobody_rgb
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ RECORDING_NAME
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ body_idx_0/1
โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ results
โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ frame_xxxxx
โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ 000.pkl
โ”‚   โ”‚   โ”œโ”€โ”€ init_prox_rgb
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ RECORDING_NAME
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ results
โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ s00x_frame_xxxxx_...
โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ 000.pkl
โ”‚   โ”‚   โ”œโ”€โ”€ init_prox_rgbd
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ same as in init_prox_rgb...

where 000.pkl follows the same format as in PROX and EgoBody datasets to save SMPL-X parameters. Note that for the following scripts, the intial motions should have z-axis up for PROX, and y-axis up for EgoBody.

Test on PROX/EgoBody

  • Test on PROX with RGB-D input (initization sequeces obtained by per-frame optimization), and results will be saved to test_results/results_prox_rgbd:
python test_prox_egobody.py --config=cfg_files/test_cfg/prox_rgbd.yaml --recording_name=RECORDING_NAME
  • Test on PROX with RGB input (initization sequeces obtained by regressors), and results will be saved to test_results/results_prox_rgb:
python test_prox_egobody.py --config=cfg_files/test_cfg/prox_rgb.yaml --recording_name=RECORDING_NAME
  • Test on EgoBody with RGB input (initization sequeces obtained by VPoser-t as in HuMoR), and results will be saved to test_results/results_egobody_rgb:
python test_prox_egobody.py --config=cfg_files/test_cfg/egobody_rgb.yaml --recording_name=RECORDING_NAME

Evaluate on PROX/EgoBody

Calculate the evaluation metrics and visualize/render on reconstructed results on PROX/EgoBody.

  • Evaluate on PROX with RGB-D input:
python eval_prox_egobody.py --config=cfg_files/eval_cfg/prox_rgbd.yaml --saved_data_dir=PATH/TO/TEST/RESULTS --recording_name=RECORDING_NAME
  • Evaluate on PROX with RGB input:
python eval_prox_egobody.py --config=cfg_files/eval_cfg/prox_rgb.yaml --saved_data_dir=PATH/TO/TEST/RESULTS --recording_name=RECORDING_NAME
  • Evaluate on EgoBody with RGB input:
python eval_prox_egobody.py --config=cfg_files/eval_cfg/egobody_rgb.yaml --saved_data_dir=PATH/TO/TEST/RESULTS --recording_name=RECORDING_NAME

Note: recording_name can be set to:

  • sequence recording name: then evaluation is done over this particular sequence.
  • 'all': the evaluation is done over all sequences in the subset (used to report numbers in the paper).

Other flags for visualization and rendering:

  • --visualize=True: visualize input/output/GT motions with open3d
    • --vis_option=mesh: visualize body
    • --vis_option=skeleton: visualize skeleton
  • --render=True: render the input/output/GT motions with pyrender and save rendered results to --render_save_path

Customized Input

If you want to run RoHM on your customized input:

  • Step 1: prepare the initial SMPL-X sequences following the data format as in data/init_motions
  • Step 2: prepare the joint occlusion mask following the data format as in datasets/PROX/mask_joint
    • If you have the 3D scene mesh, render a depth map from the camera view for the 3D scene, and identify if the 3D joint is occluded by comparing the depth values (we use utils/get_occlusion_mask.py to obtain occlusion masks on PROX dataset)
    • If you do not have the 3D scene mesh, you can use confidence scores from OpenPose or other 2D body detection methods and set jonits with low confidence as occluded
  • Step 3: Customized canonicalization depending on the coordinate system:
    • The current implementation enables canonicalization for inital sequences with y (EgoBody), or z (PROX/AMASS) axis up, with the canicalized sequences always with z axis up
    • If your input initial sequences do not follow this, you need to firstly perform proper transformation to obtain sequences with z/y axis up

License

The majority of RoHM is licensed under CC-BY-NC (including released checkpoints), however portions of the project are available under separate license terms:

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{zhang2024rohm,
   title={RoHM: Robust Human Motion Reconstruction via Diffusion},
   author={Zhang, Siwei and Bhatnagar, Bharat Lal and Xu, Yuanlu and Winkler, Alexander and Kadlecek, Petr and Tang, Siyu and Bogo, Federica},
   booktitle={CVPR},
   year={2024}
 }

ๆœ€่ฟ‘็‰ˆๆœฌๆ›ดๆ–ฐ:(ๆ•ฐๆฎๆ›ดๆ–ฐไบŽ 2024-05-19 19:40:37)

2024-05-06 21:47:40 v0

ไธป้ข˜(topics):

3d-human-shape-and-pose-estimation, 3d-vision, diffusion, human-mesh-recovery, human-motion-reconstruction, motion-prior

sanweiliti/RoHMๅŒ่ฏญ่จ€ Pythonๆœ€่ฟ‘ๆ›ดๆ–ฐไป“ๅบ“

2024-05-19 16:50:46 MetaCubeX/mihomo

2024-05-18 22:55:03 xtekky/gpt4free

2024-05-18 12:28:29 VikParuchuri/marker

2024-05-18 12:09:39 yihong0618/xiaogpt

2024-05-18 07:02:12 bridgecrewio/checkov

2024-05-18 00:28:45 huggingface/transformers