Skip to content
This repository has been archived by the owner on Apr 10, 2024. It is now read-only.

change fft image parametrization to use only one op for the batch #200

Merged
merged 1 commit into from
Oct 2, 2019

Conversation

michaelpetrov
Copy link
Contributor

Current Lucid FFT image parametrization creates a separate irfft op per image in a batch. If you create just one FFT op and send all images through it, it is much faster. Essentially each irfft op you put in the graph incurs a ~7ms CPU cost for a forward pass and 8ms CPU cost per backward pass. These per op costs are paid in series on each sess.run!

Seems to be related to a per-op recomputation of an execution plan that's done each time in a session.run. This behavior is described here: tensorflow/tensorflow#6541 (comment)

Results on a V100 for a neuron vis with batch of 32 and 128 steps:
https://colab.research.google.com/drive/1jfQP9E6M8piGaHxy75Tl23VeTx-QscL9#scrollTo=VFN7L_QNuwOI
time with current parametrization: 45.98s
time with updated parametrization: 12.89s

TF timelines that show the issue with a batch of 16 items:
before (250 ms):
Screenshot 2019-10-01 11 53 46

after (30 ms):
Screenshot 2019-10-01 12 24 57

@michaelpetrov michaelpetrov requested review from colah and gabgoh October 1, 2019 21:40
@ludwigschubert
Copy link
Contributor

@michaelpetrov this is awesome, thank you! IMHO can be merged. :-)

@coveralls
Copy link

Pull Request Test Coverage Report for Build 608

  • 11 of 11 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.03%) to 69.616%

Totals Coverage Status
Change from base Build 606: -0.03%
Covered Lines: 1833
Relevant Lines: 2633

💛 - Coveralls

@michaelpetrov michaelpetrov merged commit 67d3e73 into master Oct 2, 2019
@michaelpetrov michaelpetrov deleted the fast-fft-param branch October 2, 2019 01:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants