-
Notifications
You must be signed in to change notification settings - Fork 371
Tentatively eliminate graph break overhead #3741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you include similar changes to the C++ runtime as well?
@@ -174,6 +173,8 @@ def __init__( | |||
self.cudagraph: Optional[torch.cuda.CUDAGraph] = None | |||
self._caller_stream: Optional[torch.cuda.Stream] = None | |||
self._engine_stream: Optional[torch.cuda.Stream] = None | |||
self.output_tensors: Optional[List[torch.Tensor]] = None | |||
self.sync_stream = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just inherit stream from PyTorch / input tensors
@@ -381,16 +405,17 @@ def setup_input_tensors( | |||
|
|||
# For shape tensors, we use CPU pointers and for data tensors, we use GPU pointers | |||
# as per TensorRT requirements | |||
if self.engine.is_shape_inference_io(input_name): | |||
if self.is_shape_inference_io[i]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to make this a dictionary and key on names, instead of implicitly relying on input order to stay the same over time
input_name, tuple(contiguous_inputs[i].shape) | ||
) | ||
if shape_changed: | ||
self.context.set_input_shape( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we safely assume execution context holds shape between inference calls?
@@ -994,6 +994,10 @@ def preserve_module_specs( | |||
) as f: | |||
f.write(trt_module.get_layer_info()) | |||
|
|||
# Only set the requires_unique_output flag for the last TRT Module when user has access to the output tensor | |||
if trt_module and settings.use_python_runtime: | |||
trt_module.set_requires_unique_output(True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this going to work with serialization in C++?
Also make the name clearer like trt_module.module_is_output_operator
or trt_module.requires_unowned_output_tensor
Yeah once we think all changes in pytorch is valid and I can make changes accordingly |
b2ef228
to
a9a27b1
Compare
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
Fixes # (issue)
Type of change
Please delete options that are not relevant and/or add your own.
Checklist: