hi, guys!
this is a conversion of the paddle
implementation of the paper Paint Transformer: Feed Forward Neural Painting with Stroke Prediction (ICCV 2021, oral). Here are the project site 🌐 and the official codes 💻.
Target Image | From Pytorch |
From Paddle |
---|---|---|
Try to run
python inference.py
and check the output frames in ./sampels/outputs/bingbing/
.
Goto ./samples/outputs/
and run
python frames2mp4.py
to merge the frames into a video.
Run
python paddle2pytorch.py
to convert the trained paddle
checkpoints into pytorch
. BTW, Remember to check the checkpoints paths.
Put yr images in ./samples/inputs/
, and modify the lines of #12 in inference.py
.
if __name__ == "__main__":
## files
input_path = "samples/inputs/darling.jpg" # line# = 12
output_dir = "samples/outputs/"
Just do as Section 1.
Notice that this codes are dubugged in Pytorch-1.9.0
, and the input shape of transformer is in (B,N,C)
, i.e., "batch first".
In Pytorch
of earlier versions, no argument "batch_first" is defined. So you may try to modify the network.py
as follows.
- In
network.py
#39, change
self.transformer = nn.Transformer(hidden_dim, n_heads, n_enc_layers, n_dec_layers, batch_first=True)
➡️self.transformer = nn.Transformer(hidden_dim, n_heads, n_enc_layers, n_dec_layers)
- In #72-74, change
src = (pos_embedding + feat.view(b, c, -1).permute(2, 0, 1)).permute(1, 0, 2) tgt = self.query_pos_embedding.unsqueeze(1).repeat(1, b, 1).permute(1, 0, 2) hidden_state = self.transformer(src, tgt)➡️
src = (pos_embedding + feat.view(b, c, -1).permute(2, 0, 1)) tgt = self.query_pos_embedding.unsqueeze(1).repeat(1, b, 1) hidden_state = self.transformer(src, tgt).permute(1, 0, 2)
We have found that the APIs in torch.nn.functional.affine_grid
and paddle.nn.functional.affine_grid
outputs slightly differently when fed with the same input
Fortunately, this seems not to make negative effects on final results.
MAYBE come ... 😄