Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have you achieved the similar results to the reported ones in the original paper? #4

Open
yzhang2016 opened this issue Apr 9, 2020 · 15 comments

Comments

@yzhang2016
Copy link

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

@neverUseThisName
Copy link
Owner

No, I did not dive deep into this.

@jerry4h
Copy link

jerry4h commented May 18, 2020

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

My implementation on my repo fails to generalize, can I ask you for some advice?

@yzhang2016
Copy link
Author

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

My implementation on my repo fails to generalize, can I ask you for some advice?

My implementation did not generalize well on F2F and NT when training on the constructed BI database.

@jerry4h
Copy link

jerry4h commented May 18, 2020

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

My implementation on my repo fails to generalize, can I ask you for some advice?

My implementation did not generalize well on F2F and NT when training on the constructed BI database.

When training on my constructed BI database, loss decreases quickly from 1000 to 10 in the first 2000 iterations without freezing pretrained HRNet-18w parameters. During the evaluation, the model seems to have only learned my blending fingerprint. My training leads to heavily overfitting. I'm puzzled about that. Did you apply data augmentation or any trick not mentioned in the paper? I think my implementation strictly follows the original paper. Thanks.

@yzhang2016
Copy link
Author

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

My implementation on my repo fails to generalize, can I ask you for some advice?

My implementation did not generalize well on F2F and NT when training on the constructed BI database.

When training on my constructed BI database, loss decreases quickly from 1000 to 10 in the first 2000 iterations without freezing pretrained HRNet-18w parameters. During the evaluation, the model seems to have only learned my blending fingerprint. My training leads to heavily overfitting. I'm puzzled about that. Did you apply data augmentation or any trick not mentioned in the paper? I think my implementation strictly follows the original paper. Thanks.

For data augmentation, adding random noise and blurring are used.

@yzhang2016
Copy link
Author

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

My implementation on my repo fails to generalize, can I ask you for some advice?

My implementation did not generalize well on F2F and NT when training on the constructed BI database.

When training on my constructed BI database, loss decreases quickly from 1000 to 10 in the first 2000 iterations without freezing pretrained HRNet-18w parameters. During the evaluation, the model seems to have only learned my blending fingerprint. My training leads to heavily overfitting. I'm puzzled about that. Did you apply data augmentation or any trick not mentioned in the paper? I think my implementation strictly follows the original paper. Thanks.

Did you align the results of your implementation and the reported results in the paper?

@wshenx
Copy link

wshenx commented Jun 2, 2020

Hi, @yzhang2016 @jerry4h . My implementation also heavily overfits BI dataset. I also notice that in my experiments the noise difference in fake images from BI dataset is much more obvious than those from DF which may explain the overfitting. The tool I used is the one introduced in Fig. 2 in the original paper. My frames are all extracted from c23 videos. Could it be the reason that the video is compressed so details are lost? I have not downloaded the raw videos because it takes much more disk space.

@yzhang2016
Copy link
Author

Hi, @yzhang2016 @jerry4h . My implementation also heavily overfits BI dataset. I also notice that in my experiments the noise difference in fake images from BI dataset is much more obvious than those from DF which may explain the overfitting. The tool I used is the one introduced in Fig. 2 in the original paper. My frames are all extracted from c23 videos. Could it be the reason that the video is compressed so details are lost? I have not downloaded the raw videos because it takes much more disk space.

I don't think this is caused by not using the raw videos. In most works of face forgery detection, high-quality videos (c23) are used for training. In my case, the distribution of the generated BI database seems closer to those of DF and FS than those of F2F and NT.

@wshenx
Copy link

wshenx commented Jun 3, 2020

Hi, @yzhang2016 @jerry4h . My implementation also heavily overfits BI dataset. I also notice that in my experiments the noise difference in fake images from BI dataset is much more obvious than those from DF which may explain the overfitting. The tool I used is the one introduced in Fig. 2 in the original paper. My frames are all extracted from c23 videos. Could it be the reason that the video is compressed so details are lost? I have not downloaded the raw videos because it takes much more disk space.

I don't think this is caused by not using the raw videos. In most works of face forgery detection, high-quality videos (c23) are used for training. In my case, the distribution of the generated BI database seems closer to those of DF and FS than those of F2F and NT.

Thank you for your reply. I notice that in the limitation section of the paper the authors say "We test our framework on the HQ version (a light compression) and the LQ version (a heavy compression) of FF++ dataset and the overall AUC are 87.35% and 61.6% respectively." It seems that resolution matters. Nevertheless, I'll keep on trying on c23 images.

@skJack
Copy link

skJack commented Aug 12, 2020

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

Could you tell me what is your acc on the c23 deepfake and face2face?

@LoveSiameseCat
Copy link

Hi,@yzhang2016 . When I reimplemented face xray, I also encountered overfitting problem to the generated data, which is the similar situation as @jerry4h. I trained the BI dataset, the model only catched the blending fingerprint in BI evaluation set, but failed to detect the blending boundaries to the deepfake in FF++_c23 dataset. I checked some examples in my BI dataset, the generated fake face is hard for me to distinguish. So I think the generated data is ok, the reason of poor generation may be that the blending operation is far away from the synthetic process in FF++. Can you tell me your detail parameter in your experiment? I also notice you used random noise and blurring to augment data,whether this operation was implemented on the foreground face or on the whole generated image? Hope for your reply, thank you.

@AugustasMacys
Copy link

AugustasMacys commented Jan 28, 2021

@yzhang2016 , @ChineseboyLuo, @jerry4h Hi guys, do you mind sharing your neural network architecture? specifically init and forward functions for Neural Network architecture? (It was Called NNb in the paper). It would be very appreciated and helpful.

@byx-123
Copy link

byx-123 commented Feb 18, 2021

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?
Hello, May I ask your some questions about Face X-ray? I've followed your github id,can you tell me your email?

@gleonato
Copy link

gleonato commented Sep 9, 2021

Hi folks!
Have you had the chance to overcome the generalization problem with results similar to the original paper ?

@AugustasMMatches
Copy link

Hi,

@gleonato

I did not manage to get similar results with Deepfake Detection Chalenge. However, what I noticed was that generating Deepfakes for this paper is very important, if your generated Deepfakes will differ from the test data, it will not generalize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants