This frontend allows for a user to describe the set of operations for nvFuser to fuse via 1 or more kernels. This frontend is intended to be an integration point with PyTorch or standalone applications.
import torch
from nvfuser import FusionDefinition, DataType
with FusionDefinition() as fd :
t0 = fd.define_tensor(symbolic_sizes=[-1, 1, -1],
contiguous=[True, True, True],
dtype=DataType.Float)
t1 = fd.define_tensor(3)
c0 = fd.define_constant(3.0)
t2 = fd.ops.add(t0, t1)
t3 = fd.ops.mul(t2, c0)
t4 = fd.ops.sum(t3, [-1], False, DataType.Float)
fd.add_output(t4)
input1 = torch.ones(2, 1, 8, device='cuda')
input2 = torch.ones(2, 4, 8, device='cuda')
nvf_out = fd.execute([input1, input2])[0]
fid = 0
fd = FusionDefinition(fid)
input1 = torch.ones(2, 1, 8, device='cuda')
input2 = torch.ones(2, 4, 8, device='cuda')
nvf_out = fd.execute([input1, input2])[0]
execute([inputs])
: Allows you to execute the currently defined fusion with a list of given inputs and returns a list of tensors.id()
: Returns the fusion id for a given definition.print()
: Prints theFusionDefinition
as a python function.print_ir()
: Prints the low level IR for the currently defined fusion.
All intermediate tensors are created by operations. Constant tensors do not exist.
There are 3 ways to define tensors that will be enumerated below.
This interface tells nvFuser that the tensor has a given number of symbolic dimensions that are not necessarily contiguous in memory. The user also has the ability to specify a data type. The default type is Float
.
t0 = fd.define_tensor(3)
t1 = fd.define_tensor(3, DataType.Half)
The sizes
parameter defines the number of dimensions and the size of each dimension. The strides
parameter has to have the same number of dimensions as the sizes
parameter.
nvFuser translates the concrete sizes and strides into symbolic sizes and contiguity information that can be directly defined via the next way to define tensors. This allows the user to directly take a Pytorch defined tensor and query its sizes and strides in order to apply them in the definition.
t0 = fd.define_tensor(sizes=[2, 4, 6], strides=[24, 6, 1], dtype=DataType.Half)
The list of symbolic sizes defines the number of dimensions and -1
is given for each dimension unless it is a broadcast dimension that is defined with a 1
. The contiguity information is viewed from right to left. A True
definition indicates the current dimension is contiguous with the dimension to its right.
t0 = fd.define_tensor(symbolic_sizes=[-1, 1, -1], contiguous=[True, True, True], dtype=DataType.Float)
All intermediate scalars, except for constants, are created by operations.
The only thing the user has to define for a scalar is its type.
s0 = fd.define_scalar(dtype=DataType.Half)
Constants can be of types: Bool
, ComplexDouble
, Double
, or Int
. The definition only takes a constant and the type is inferred by the constant.
c0 = fd.define_constant(3.0)
Operators are added with the following notation:
output = fd.ops.foo(arg1, ... )
You can see a supported list of operations with the following query:
python -c "from nvfuser import FusionDefinition; help(FusionDefinition.Operators)"
The FusionDefinition
add_output
method is used to indicate an intermediate is an output to the fusion.
add_output(output: Tensor)
# or
add_output(output: Scalar)
Query a list of supported operations:
python -c "from nvfuser import FusionDefinition; help(FusionDefinition.Operators)"
View the fusion definitions that are executed by setting an environment variable:
export PYTORCH_NVFUSER_DUMP=python_definition
Example Output:
def nvfuser_fusion_id0(fd : FusionDefinition) -> None :
T0 = fd.define_tensor(symbolic_sizes=[-1, 1, -1], contiguous=[True, True, True], dtype=DataType.Float)
T1 = fd.define_tensor(symbolic_sizes=[-1, -1, -1], contiguous=[False, False, False], dtype=DataType.Float)
S2 = fd.define_constant(3.00000)
T3 = fd.ops.add(T0, T1)
T4 = fd.ops.mul(T3, S2)
T5 = fd.ops.sum(T4, axes=[-1], keepdim=False, dtype=DataType.Float)
fd.add_output(T5)