Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on custom data #12

Closed
waleedrazakhan92 opened this issue Jan 20, 2023 · 24 comments
Closed

Training on custom data #12

waleedrazakhan92 opened this issue Jan 20, 2023 · 24 comments

Comments

@waleedrazakhan92
Copy link

Hello, can you share the details on how to train the model on custom data. Having gone through the paper, i believe we need:

  1. Input images (custom images)
  2. Masks (produced with face parsing, only for skin )
  3. Depth Masks (Same masks as above but without nose and mouth)
  4. Depth Map (How to produce those)
  5. Albedo Map (using SfSNet), but you mention in your paper that you convert it into grayscale first. Did i get that correctly?
  6. Light directions. What are those and how to obtain them?

Is there anything else needed to train the model?
If not then can you please direct me towards how to find the missing data, 4,5 and 6 to train on my custom images.

@andrewhou1
Copy link
Owner

To produce the depth maps, you can use https://github.com/zqbai-jeremy/DFNRMVS

The depth masks correspond to any pixel that has a valid face depth.

Yes, the albedo is converted first to grayscale.

The lighting directions are also produced by SfSNet: I use the first order coefficients (2-4), normalize them, and treat that as the lighting direction.

@waleedrazakhan92
Copy link
Author

@andrewhou1 thankyou for a quick response. Also if i wan to change the resolution of the model from 256 to 512 or 1024 what changes I need to make, besides self.img_height and self.img_width, in the model to incorporate this change in size.

@andrewhou1
Copy link
Owner

andrewhou1 commented Jan 21, 2023

Yes that definitely needs to be changed. Also, change all instances of 256 to your new resolution. There's also this line:

sample_increments = torch.reshape(torch.tensor(np.arange(0.025, 0.825, 0.005)), (self.num_sample_points, 1, 1, 1))

You may want to increase self.num_sample_points given the larger resolution and adjust np.arange accordingly to match.

@waleedrazakhan92
Copy link
Author

@andrewhou1 I have changed the suggested input changes from 256 to 1024 wherever i can and the model takes 1024 input. I need advice and suggestions on a few more things to keep the performance high.

Can you explain what increasing self.num_sample_points does and more importantly how much should i increase it to?

Also since by changing the resolution the h4_out shape is also changed from [1,155,16,16] to [1,155,64,64] (for 1024 resolution). Should there need to be a change in the selection of indexes for identity_features and lighting_features

identity_features = h4_out[:, 0:128, :, :]
lighting_features = h4_out[:, 128:155, :, :]
LF_shape = list(lighting_features.size())
.

And also in

LF_avg_pool = self.AvgPool_LF(lighting_features)
SL_lin1 = F.leaky_relu(self.linear_SL1(LF_avg_pool.permute(0, 2, 3, 1)), 0.2)
SL_lin2 = self.linear_SL2(SL_lin1)

The average pooing size need to be also (16,16) to (64,64) which seems now to be quite a big window to be average pooling from. Do you suggest changing the average pooling size or the linear_SL input and output size for the model to keep its performance.

@andrewhou1
Copy link
Owner

self.num_sample_points is the number of points that are sampled along each ray to determine if the original point on the face is under a cast shadow. If the points are sampled too sparsely, they may miss an occluding surface (such as the nose) and incorrectly determine a point to be well illuminated. This results in white stripes in the cast shadows, so self.num_sample_points should be set sufficiently high. If you want to maintain the same sampling frequency as I had in 256 resolution, increase the sampling rate 4x for 1024. Also change the np.arange portion to match this change. This is an experimental parameter: you can also lower the sampling rate as well and observe the effect on the performance, but you should not need to set it any higher than 4x its current setting.

For the other two tensors, I believe 64x64 should be fine. If the performance seems noticeably worse and you want to change to 32x32 or 16x16, you would need to add one or two more downsampling and upsampling blocks respectively.

@waleedrazakhan92
Copy link
Author

waleedrazakhan92 commented Jan 21, 2023

To produce the depth maps, you can use https://github.com/zqbai-jeremy/DFNRMVS

The depth masks correspond to any pixel that has a valid face depth.

Yes, the albedo is converted first to grayscale.

The lighting directions are also produced by SfSNet: I use the first order coefficients (2-4), normalize them, and treat that as the lighting direction.

Hi so I've been trying to get light directions from the SFS model. I couldn't get the matlab version to work but found a working pytorch version https://github.com/Mannix1994/SfSNet-Pytorch. I'm getting the outputs as expected, however i wanted to be clear about the light directions still. In the code there is an explanation about the light_out https://github.com/Mannix1994/SfSNet-Pytorch/blob/c2c1ed96b20dab66c5f84fe41ccb5d08aaa2291a/SfSNet_test.py#L66-L72
which i understand is the output determining the light direction. You mentioned that we need the first order coefficients and normalize them to get as light directions. Now in the code they are getting these 27 outputs (9 for each channel), which they reshape and form a 3 channel shade image.
So how do i normalize this output to form the training lighting inputs like you have provided in the traning dataset.

@andrewhou1
Copy link
Owner

So among those 27 outputs, you can reshape into a 9x3 matrix, where each column is the SH for each color channel. Then simply take the average to get a single 9x1 vector. You can use this to determine your lighting directions.

@waleedrazakhan92
Copy link
Author

waleedrazakhan92 commented Jan 24, 2023

@andrewhou1 but wouldn't that give me a 9x1 vector but in the training lightings .mat files, for each image there are just three values per image. So I'm unsure still about the exact process of how to get the format and the values like you've provided for training.
Con you please share the same process or a piece of code that you used with which i can obtain the exact values in the exact format for the same image.

@andrewhou1
Copy link
Owner

Right so then you can use the 2nd, 3rd, and 4th values and normalize them as a vector.

@waleedrazakhan92
Copy link
Author

@andrewhou1 can you tell how long(time in hours) did it take to train the final model?

@andrewhou1
Copy link
Owner

At 256 resolution it took about 1 day to train. However at 1024 resolution, it would dramatically increase (maybe up to 4x) if you increase the sampling rate proportionally.

@waleedrazakhan92
Copy link
Author

@andrewhou1 also does the shape in both these tensors have anything to do with the batch size? Because when i try to change the batch size there is a shape mismatch error. torch.tensor([[[0.0]], [[0.0]], [[0.0]]]) and torch.reshape(tmp_incident_light_z, (3 1, 1, 1)))

tmp_incident_light_z = torch.maximum(tmp_incident_light[:, 2], torch.tensor([[[0.0]], [[0.0]], [[0.0]]]).cuda())
incident_light = torch.cat((tmp_incident_light[:, 0:2], torch.reshape(tmp_incident_light_z, (3, 1, 1, 1))), 1)

@andrewhou1
Copy link
Owner

yes it does. So if the batch size is n, then torch.tensor should have n of those 0.0s and torch.reshape(tmp_incident_light_z, (n, 1, 1, 1)) should be used.

@waleedrazakhan92
Copy link
Author

Thankyou, so if i replace these lines as :
tmp_incident_light_z = torch.maximum(tmp_incident_light[:, 2], torch.zeros(self.batch_size,1,1).float().cuda())

incident_light = torch.cat((tmp_incident_light[:, 0:2], torch.reshape(tmp_incident_light_z, (self.batch_size, 1, 1, 1))), 1)
This is the correct way right?

@andrewhou1
Copy link
Owner

Right, that should be correct.

@waleedrazakhan92
Copy link
Author

waleedrazakhan92 commented Jan 25, 2023

  1. One more question. You mention in that you upscaled the results from SfsNet from 128 to 256 resolution. Now as for the lighting directions, did you use the same values that were calculated on the 128 resolution and used them for 256 resolution to train your model?

If thats the case then if i just upscale the images again to 512 and use the same lighting direction values that you provided to train the model, would that be okay?

  1. Also does self.batch_size need to be the same for both the training and testing? for example if i trained the model on batch size 2 then during the testing do i also have to set the self.batch_size to 2 as well?

@andrewhou1
Copy link
Owner

Correct, the lighting directions are independent of resolution.

@yafeim
Copy link

yafeim commented Feb 3, 2023

Hello @andrewhou1 , I notice that SfSnet albedo images do not align well with the original image. How do you solve this problem? Thanks.

@andrewhou1
Copy link
Owner

Thanks for your interest in our work!

Did you crop the original images first using our provided cropping code? They should align if the images are cropped.

@yafeim
Copy link

yafeim commented Feb 4, 2023

0_orig
0

Thanks for your quick reply. The first attached image is what I got from the cropping code, and the second is the provided albedo in MP_data. They are still misaligned. Can you advise? Thanks.

@yafeim
Copy link

yafeim commented Feb 4, 2023

Also, I got "assert img.shape[0] == img.shape[1] == 256
AssertionError" when applying the cropping logic to image "10587.jpg". Is the image discarded?

@andrewhou1
Copy link
Owner

Hmmm that's interesting.....I inspected that training image on my end and the image matches the grayscale albedo (with the chin reflected). Did you install the separate dependencies for the cropping code? They're different from the dependencies for running the relighting model. If you did, you can try changing borderType=cv2.BORDER_DEFAULT to cv2.BORDER_REFLECT

@andrewhou1
Copy link
Owner

Also yes, 10587.jpg was discarded

@yafeim
Copy link

yafeim commented Feb 4, 2023

Oh I see. I think the problem is that I installed a different version of opencv using pip. Now I am getting aligned crops. Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants