Skip to content

Commit

Permalink
Small doc update for torch_compile_debug (pytorch#95809)
Browse files Browse the repository at this point in the history
Updates the troubleshooting documentation with the folder structure of the debug directory
Pull Request resolved: pytorch#95809
Approved by: https://github.com/msaroufim
  • Loading branch information
mlazos authored and pytorchmergebot committed Mar 2, 2023
1 parent 1fd1199 commit 184fb9f
Showing 1 changed file with 68 additions and 12 deletions.
80 changes: 68 additions & 12 deletions docs/source/dynamo/troubleshooting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -447,33 +447,89 @@ calling ``torch._dynamo.utils.compile_times()`` after executing
Torch._Dynamo. By default, this returns a string representation of the
compile times spent in each TorchDynamo function by name.

TorchInductor Debug Tracing
---------------------------
TorchInductor Debugging using TORCH_COMPILE_DEBUG
-------------------------------------------------

TorchInductor has a builtin stats and trace function for displaying time
spent in each compilation phase, output code, output graph visualization
and IR dump. This is a debugging tool designed to make it easier to
understand and troubleshoot the internals of TorchInductor.

Setting the environment variable ``TORCH_COMPILE_DEBUG=1`` will cause a
debug trace directory to be created and printed:
Let's run an example with the following test program (repro.py):

::

import torch

@torch.compile()
def test_model(x):
model = torch.nn.Sequential(
torch.nn.Linear(10, 10),
torch.nn.LayerNorm(10),
torch.nn.ReLU(),
)
return model(x)


y = test_model(torch.ones(10, 10))

Setting the environment variable ``TORCH_COMPILE_DEBUG=1`` will cause a
debug trace directory to be created, by default this directory will be in the current directory and named torch_compile_debug
(this can be overridden in the torchdynamo configuration field ``debug_dir_root`` and also the env var TORCH_COMPILE_DEBUG_DIR).
Inside this directory, each run will have a separate folder named with the timestamp and process id of the run:
::

$ env TORCH_COMPILE_DEBUG=1 python repro.py
torch._inductor.debug: [WARNING] model_forward_0 debug trace: /tmp/torchinductor_jansel/rh/crhwqgmbqtchqt3v3wdeeszjb352m4vbjbvdovaaeqpzi7tdjxqr.debug
$ cd torch_compile_debug
$ ls
run_2023_03_01_08_20_52_143510-pid_180167

Here is an `example debug directory
output <https://gist.github.com/jansel/f4af078791ad681a0d4094adeb844396>`__
In the run folder there will be a torchdynamo directory which contains debug logs, and an aot_torchinductor
folder which contains a subfolder for each compiled kernel with inductor debug artifacts.

::

$ cd
run_2023_03_01_08_20_52_143510-pid_180167
$ ls
aot_torchinductor torchdynamo

Moving further into the aot_torchinductor directory, the \*.log files are logs from the aot autograd phase of compilation, model__0_forward_1.0 contains the inductor debug artifacts.

::

$ cd aot_torchinductor
$ ls
aot_model___0_debug.log model__0_forward_1.0
$ cd model__0_forward_1.0
$ ls
debug.log fx_graph_readable.py fx_graph_runnable.py fx_graph_transformed.py ir_post_fusion.txt ir_pre_fusion.txt output_code.py

Here is a summary of the contents:
- fx_graph_readable.py and fx_graph_runnable.py are the readable and runnable versions of the fx_graph received by inductor.
- fx_graph_transformed.py is the fx graph after inductor has run all fx passes.
- ir\*.txt is the inductor ir pre and post fusion.
- output_code.py is the compiled triton kernel for the subgraph.

Here are `example debug directory contents
<https://gist.github.com/jansel/f4af078791ad681a0d4094adeb844396>`__
for the test program:

::

torch.nn.Sequential(
torch.nn.Linear(10, 10),
torch.nn.LayerNorm(10),
torch.nn.ReLU(),
)
import torch

@torch.compile()
def test_model(x):
model = torch.nn.Sequential(
torch.nn.Linear(10, 10),
torch.nn.LayerNorm(10),
torch.nn.ReLU(),
)
return model(x)


y = test_model(torch.ones(10, 10))

Each file in that debug trace can be enabled and disabled through
``torch._inductor.config.trace.*``. The profile and the diagram are both
Expand Down

0 comments on commit 184fb9f

Please sign in to comment.