-
Notifications
You must be signed in to change notification settings - Fork 807
[SYCL] Postpone creation of HostKernel copy #20240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
[SYCL] Postpone creation of HostKernel copy #20240
Conversation
Create possibility to delay creation of copy of HostKernel till it became used out of submit stack, i.e. by scheduler. Do type erasure for kernel lambda via vptr in HostKernelRefBase.
#ifndef __INTEL_PREVIEW_BREAKING_CHANGES | ||
// This function is needed for host-side compilation to keep kernels | ||
// instantitated. This is important for debuggers to be able to associate | ||
// kernel code instructions with source code lines. | ||
// NOTE: InstatiateKernelOnHost() should not be called. | ||
void InstantiateKernelOnHost() override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be empty. HostKernelRef<...>
instantiates HostKernel<...>
on line 256, and its InstantiateKernelOnHost
already does the right thing (outside preview). And for preview we need a mechanism that doesn't require copy-paste.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it can be empty. I try to describe the reason in comment.
And for preview we need a mechanism that doesn't require copy-paste.
Are we talking about GetInstantiateKernelOnHostPtr()
call, right? It required template parameter, so unclear what can we done other then adding the call to templated constructor.
virtual char *getPtr() override { | ||
return const_cast<char *>(reinterpret_cast<const char *>(&MKernel)); | ||
} | ||
virtual std::shared_ptr<HostKernelBase> takeOrCopyOwnership() const override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, this should std::unique_ptr
because it has no overhead and one can always easily create shared
via unique_ptr::release
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally agree. Overhead is the creation/destructor of unique_ptr
, meanwhile caller needs shared_ptr
. (And for shared_ptr
we return 2 pointers vs 1 for unique_ptr
, so it's hard to judge). Is that a chance that someday caller would need unique_ptr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overhead is the creation/destructor of unique_ptr
is exactly zero with optimizations enabled: https://godbolt.org/z/fcaos1Wr7
That is not true for std::shared_ptr
(which not only has extra memory alloc but also involves atomics).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, good point. Done.
sycl/include/sycl/queue.hpp
Outdated
|
||
std::shared_ptr<detail::HostKernelBase> HostKernel = std::make_shared< | ||
detail::HostKernel<KernelType, TransformedArgType, Dims>>(KernelFunc); | ||
HostKernelRef<KernelType, TransformedArgType, Dims> HostKernel(KernelFunc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish we could do
const HostKernelRefBase &TypeErasedKernel = HostKernerlRef<...>{KernelFunc};
(https://godbolt.org/z/h9v9s3TrG), but getPtr()
isn't marked as const
😞
const KernelType &MKernel; | ||
|
||
public: | ||
HostKernelRef(const KernelType &Kernel) : MKernel(Kernel) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we delete copy ctor here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to create HostKernelRef from constant reference, as in sycl/include/sycl/queue.hpp, so we can't.
HostKernelRef<KernelType, KernelTypeUniversalRef, TransformedArgType, Dims>
HostKernel(std::forward<KernelTypeUniversalRef>(KernelFunc));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean add HostKernelRef(const HostKernelRef&) = delete;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we explicitly delete copy ctor from HostKernelRefBase
?
Create possibility to delay creation of copy of HostKernel till it became used out of submit stack, i.e. by scheduler. Do type erasure for kernel lambda via vptr in HostKernelRefBase.