Skip to content
This repository has been archived by the owner on Jul 1, 2023. It is now read-only.

Compiling on release mode #1189

Closed
philipturner opened this issue Jan 9, 2022 · 1 comment
Closed

Compiling on release mode #1189

philipturner opened this issue Jan 9, 2022 · 1 comment

Comments

@philipturner
Copy link

Due to the errors with linking S4TF to an arbitrary Swift executable (#1185 (comment)), I am currently very constrained with how I can test code that imports S4TF. For now, my only option is to replace the Swift package tests with custom code I want to execute. Having to re-build S4TF repeatedly presents a bottleneck to my workflow.

I profiled S4TF build times on Google Colab (dual-core x64), and found out some interesting results. When running swift test, it always re-compiles your code, even if you compiled it previously via swift build. There is only one exception - when both swift build and swift test are in debug mode, it avoids redundantly re-compiling. This speedup does not apply when both are -Onone release, the option that compiles most quickly otherwise.

  • Pre-build as release (-Onone) (excluding tests): 1 min 51 sec
    • Build tests as release (-Onone): 2 min 29 sec (everything)
      • Extrapolated time if excluding tests: 1 min 50 sec
    • Build tests as debug: 3 min 50 sec
      • Extrapolated time if excluding tests: 3 min 0 sec
  • Pre-build as debug (excluding tests): 3 min 0 sec
    • Build tests as release (-Onone): 2 min 48 sec (everything)
      • Extrapolated time if excluding tests: 2 min 7 sec
    • Build tests as debug: 57 sec
      • Extrapolated time if excluding tests: 0 sec

If I can find a way to import S4TF outside of its tests, compiling with unoptimized release seems to be the wisest option. That would take around 2 minutes. I could add a special command to Swift-Colab that caches the Swift package build products folder. When you restart the runtime (I do that often), it would link against the build products instead of re-compiling. It would also cache the x10 binaries so you only download them from the network once. This Colab command would be implemented once there is a Swift toolchain that both runs S4TF and has the Python LLDB API.

I previously heard that there were some performance concerns with not compiling S4TF with full optimization. There are tight loops where using debug mode could cause a bottleneck, but where do these loops happen? If they are in CTensorFlow, then it doesn't matter how S4TF is compiled because CTensorFlow is pre-compiled in the x10 binary.

When I tried compiling S4TF in fully optimized release mode, I got the compiler crash caused by BatchNorm, which is currently unsolved. The crash logs are in the Colab notebooks attached below. This crash did not happen in release when the -Onone flag was set - does that behavior reveal anything new about the bug?
crash_no_tests.ipynb.zip
crash_with_tests.ipynb.zip

I am compiling using the 2021-11-12 toolchain instead of the newest toolchain (2022-01-06). Newer toolchains (starting with 2021-12-23 or earlier) introduce a bug that prevents S4TF from compiling even in debug mode (#1184 (comment)).

@philipturner
Copy link
Author

philipturner commented Jan 9, 2022

After implementing @BradLarson's suggested workaround, I have confirmed that the cause of the BatchNorm failure is duplication of debug symbols and nothing else.

This fix does not change anything; it successfully compiles on release (with -Onone) even in the current state of fan/resurrection.

CC: @dan-zheng you seem to have the most knowledge of this bug. Do you have any suggestions for how to completely bypass the bug? (e.g. use a struct instead of a tuple)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant