Replies: 1 comment
-
Did you find a solution? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I want to use a custom patch_embedding layer for ViT or DeiT. Basically, I want to do video classification using ViT and treat each frame of the video as a patch for ViT. I use a pretrained backbone like EffNet to generate features from the patches and send the list of features to ViT. So, I need to skip the patch embedding layer entirely. Any idea how to do this? I want to keep rest of the pretrained weights but only change the patch embedding layer.
Also is there anyway to load pretrained weights changing the patch-size of number of patches? Tred to do so, but gives size mismatch error for pos_embed layer.
Beta Was this translation helpful? Give feedback.
All reactions