-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
possible kernel proliferation with scalar parameters #663
Comments
That's an interesting issue. I understand that that example is just to show where is the problem ;) I hope nobody would write something like this (the loop part). I think that:
|
Hi, Is there any news on this? this issue becomes more severe for me as some unit tests currently generate > 10000 kernels instead of maybe 10 or so. I would vote for anything that can add a scalar argument to the kernel :). I prefer the scalar argument solution. The problem with scalar arguments however is, that we have to be more careful regarding the uniqueness of the argument, because if everytime we do k<<*(my_iter) my_iter adds a scalar argument, we might end up with many copies of the same argument. So my_iter would have to tore the index of the argument somehow(in the same way that buffer objects are used as unique identifiers, scalar arguments also need this unique identifier...). Also, we have to ensure that the value is correctly set after the kernel is built. currently adding a kernel arg will not set its value as is done with buffers. |
Oh i forgot my idea on how to solve the unique name problem. The best way to make the argument names unique is to tell the kernel which variables belong together. so we can tell the kernel "now we are going to use variables used by the group 1" and this gives the kernel the ability to tell whether a variable has been already registered or not (the first time we enter a group every request of a kernel variable needs to create it, in subsequent calls we know they are there). For this we might need to add the following kernel functions
With this, we could easily create a registered_iterator which wraps another iterator with the following semantics:
usage:
|
I am sorry for the shameless plug, but you could use vexcl with the boost.compute backend (see http://vexcl.readthedocs.io/en/latest/interop.html). This will solve the problem with scalar kernel arguments, since vexcl by default treats scalar variables as kernel parameters. void add_scalar(bc::vector<float>& v, float scalar){
auto unary = bc::bind(bc::plus<float>(),bc::placeholders::_1, scalar);//_1+scalar
bc::transform(v().begin(),v().end(), v().begin(), unary);
}
...
bc::vector<float> v(10000,1.0);
for(int i = 0; i != 1000; ++i)
add_scalar(v,0.5*i); becomes for(int i = 0; i != 1000; ++i)
v += 0.5 * i; which results in single kernel generated and applied many times. The problem with unique variables is solved by tagging the variables: http://vexcl.readthedocs.io/en/latest/expressions.html#tagged-terminals |
Any news on this? This is a real showstopper for me. The biggest issue I have is that all kinds of indexing into a buffer leads to recompilation of the kernel. E.g. when i use the row-major matrix layout, i index the buffer as This is not an easy issue to fix and my first few own-rolled attempts failed in the planing stage. The only solution i see is that when we get an iterator, we have to pre-register all its constants. This means that for example the strided iterator registers the index and gets a unique argument name returned which is then used instead of the constant when the kernel source is generated via op<<. In practice this would mean that we have a strided_iterator and a kernel_strided_iterator. The latter is constructed from the strided_iterator and the kernel and does all the registering and saving of the argument name in the constructor. We then only use the kernel_strided_iterator during generation of the kernel code. |
To be honest, I just don't have enough free time to think about this. Currently, I'm trying to hunt and fix remaining bugs and and missing wrappers for API calls introduced in 2.1 etc.
Yeah, I think for iterators it may be possible resolve it by changing how they are processed by |
If we have a working proposal, i can invest the time to add and thest the changes in the meta_kernel as well as some of the iterators. I am stuck with this for some time, i can as well work on it. |
OK, I'll try to find time to work on this after I adjust Boost.Compute to changes introduced in OpenCL 2.1 and 2.2. |
I worked myself a bit on a solution, see lines 581-607 This is a solution were functions with arguments are transformed into functions with a registered variable name. Line 609 gives an example for such a function with a bind_second. Basically a call to bind_second(f,t) (where t is the argument) is replaced by bind_second(f, "variable_name_of_t") and t is registered as a kernel argument. |
I have found a hacky solution. By using a permutation iterator combined with a constant iterator for the index, it is possible to have an array which is indexed at zero (aka a scalar). The performance for this should be fine, but this may be worse than an actual scalar parameter. Here is an example:
The kernel generated for this looks like:
|
Consider the following function
This will in every iteration generate a kernel and compile it on the device, leading to way more kernel compilations than required(essentially making kernel caching useless). The issue, as far as i understood the meta_kernel code is, that a function can only register buffers, but not constants. Thus the only way to generate the kernel is to stringify the scalar value. A solution would be to add the ability to register constants as additional kernel arguments the same way as it is done with buffers.
An alternative solution that does not require deep changes to the internals would be to use an bc::array<float,1> for everything that is not a constant. But this is very tedious and error prone as the user must be very careful with scalar arguments, especially as for example bc::accumulate returns a float and not an array<float,1> (while the internal implementation actually uses an array<float,1>...).
A way to make the latter solution more viable would be to implement a small wrapper class scalar which is a bit of syntactic sugar around bc::array<float,1>
The text was updated successfully, but these errors were encountered: