Custom Patch Embedding for ViT #562

sowmen · 2021-04-13T18:55:36Z

sowmen
Apr 13, 2021

I want to use a custom patch_embedding layer for ViT or DeiT. Basically, I want to do video classification using ViT and treat each frame of the video as a patch for ViT. I use a pretrained backbone like EffNet to generate features from the patches and send the list of features to ViT. So, I need to skip the patch embedding layer entirely. Any idea how to do this? I want to keep rest of the pretrained weights but only change the patch embedding layer.

Also is there anyway to load pretrained weights changing the patch-size of number of patches? Tred to do so, but gives size mismatch error for pos_embed layer.

ruchikmishra · 2024-03-25T20:20:36Z

ruchikmishra
Mar 25, 2024

Did you find a solution?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Patch Embedding for ViT #562

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Custom Patch Embedding for ViT #562

sowmen Apr 13, 2021

Replies: 1 comment

ruchikmishra Mar 25, 2024

sowmen
Apr 13, 2021

ruchikmishra
Mar 25, 2024