ECCV 2018 Person Re-Identification Paper Reading

Pose-Normalized Generation for Person Re-Identification

[1]

Overview: The author tries to use GAN that takes pose information to transfer (generate from) a real person image to new image with targeted pose. Thus, with the inputs regulated to the same pose, the feature extractor can learn pose-invariant features.

Motivation:

  • Make the learned features pose-insensitive. Since the inputs are of the same pose, the feature extractor can learn discriminative features despite the poses.
  • Use eight canonical poses obtained by K-means for specific dataset to make the normalized pose more representative.

Model:

  • A GAN to generate pose specific image with the same identity as the input. The loss function for generator:
    $$ \mathcal{L}_{G_p} = \mathcal{L}_{GAN} + \lambda_1 \cdot \mathcal{L}_{L_1}, $$
    and
    $$ \mathcal{L}_{GAN} = \mathbb{E}_{\mathbf{I}_j \sim p_{data}( \mathbf{I}_j )}\{log{ D_p(\mathbf{I}_j) + log( 1 - D_p( G_p(\mathbf{I}_j, \mathbf{I}_{P_j}) ) ) }\} $$
    $$ \mathcal{L}_{L_1} = \mathbb{E}_{\mathbf{I}_j \sim p_{data}( \mathbf{I}_j )}\{[\| \mathbf{I}_j - \mathbf{\hat I}_j \|_1]\} $$
    The loss of discriminator is the same as conventional ones.
  • Use K-means algorithm to get the 8 most representative poses, and then train another network to handle person image in these 8 normalized poses.
  • When evaluation, use the fusion of nine features. The fusion is to perform element-wise max among these features.

Question:

  • No explicit loss for discriminator to judge the generated pose. The discriminator just takes the generated image and tell if the image is fake or real (from dataset), which means the generator can just ignore the input pose information and learn as regular generator in GANs.
  • Due to the $\mathcal{L}_{L_1}$ in the loss of generator, the generator can be optimized just generate the same image as the input. And as mentioned in the above item, the pose information can be ignored.
  • Since there are two base networks, and at the end the two features are fused, how to guarantee the two learned features have the same semantic meaning?
  • More evaluations on the components of the model is need. Also more images generated from the PN-GAN.

To Be Continued…

Reference


  1. 1.X. Qian, Y. Fu, T. Xiang, W. Wang, J. Qiu, Y. Wu, Y. G. Jiang, X. Xue. Pose-Normalized Image Generation for Person Re-identification. ECCV 2018.
Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×