-
Notifications
You must be signed in to change notification settings - Fork 661
Quick Concepts
The GPU requires a lot (relatively speaking) of preparation to work affectively. It can take years to master, we prefer to not have to make everyone do that.
Note: You can see what GPU.js up to, at any point in time, by calling: kernel.toString(...args)
and see exactly how your kernel is talking with the GPU. You will quickly see that a simple 1 function in GPU.js requires many hundreds of lines.
The concept is then matching performant javascript to a counter practice that would work on the graphics processor as a pseudo lambda function.
To make multi-threaded number calculations much faster in javascript, we need to define what a "thread" is. Here we'll demonstrate it.
In javascript, say we have a simple function that calculates the multiplication dot product of two arrays:
function multiplyMatrixes(a, b) {
const width = a.length;
const height = b.length;
const result = new Array(height);
for (let y = 0; y < height; y++) {
const row = new Float32Array(width);
for (let x = 0; x < width; x++) {
let sum = 0;
for (let i = 0; i < width; i++) {
sum += a[y][i] * b[i][x];
}
row[x] = sum;
}
result[y] = row;
}
return result;
}
// multiplyMatrixes(a, b);
Here the "thread" is actionable calculations, specifically this part:
let sum = 0;
for (let i = 0; i < width; i++) {
sum += a[y][i] * b[i][x];
}
row[x] = sum;
See how the loops to calculate x
and y
are not included?
This is because on the GPU:
- These values are calculated in tandem, or at the same time
-
x
may exist as many different numbers simultaneously, but in different threads, executing the same code.
It is as well important to note that:
- The GPU isn't actually much faster, if at all, then the CPU, but it carries out the instruction set at the same time, thus saving, in affect, time.
- Each thread isn't aware of what is going on in the other threads. It is simply calculating a value, and returning it.
Much like a how your computer calculates everything it needs to display a single pixel. The pixels are calculated values from red, green, blue, and alpha, and each pixel is only aware of the visuals it is "touching".
If we want to perform this same calculation on the GPU, because the kernel function operates as the loop, or thread, we omit the loops and essentially have the same calculation. In GPU.js the x
and y
(also z
, and likely more to come in the future) are obtained by using this.thread.x
or this.thread.y
(or this.thread.z
). Here is the GPU.js version of the same function above:
const multiplyMatrixes = gpu.createKernel(function(a, b) {
let sum = 0;
for (let i = 0; i < this.constants.width; i++) {
sum += a[this.thread.y][i] * b[i][this.thread.x];
}
return sum;
})
.setOutput([width, height])
.setConstants({
width:
});
// multiplyMatrixes(a, b);
Traditionally the 4 color values associated with pixels can store 8 bits of information. Because there are 4 of them, someone somewhere thought: "Hey, I can probably fit a 32 bit (or 4 * 8 bits) floating point value there!" and after a bit of playing around with "bit shifting" (also referred to in some cases as "packing") a 32 bit floating point value, with some precision (we'll call it "unsigned", because some precision is lost) into these 4 color channels, and as well the inverse operation, referred to in some cases as "unpacking", GPGPUs (or General Purpose Graphics Processor Units) were born.
As time and technology progressed, we were eventually able to store a whole 32 bit value into these color channels, we'll call this type of storage "single", as in a "single 32 bit floating point precision value".
GPU.js provides a means of using both "single" and "unsigned" precision, via the kernel setting "precision", it may be used like this:
const gpu = new GPU();
const kernel = gpu.createKernel(function() {
// ... some maths...
return value;
}, {
output: [10],
precision: 'single',
});
By default (as of writing, pre release of v2), we use "single" first, if it isn't available, and is not defined as a setting, and we fall back to "unsigned".
There is no easy way to get around making something work for the GPU, it isn't magic. Even if you use GPU.js (sometimes referred to as magic or voodoo), you are still performing these steps. Included in this list would be:
- transpiling javascript for use on the GPU
- read javascript to a common format, in this case a mozilla abstract syntax tree
- type inference from any value or derivation of any value from the parsed javascript
- translate from the mozilla abstract syntax tree to a string value of a language understood by the GPU, generally GLSL (a C++ subset), but likely more to come
- adding required utility functions and environment corrections to said translated string
- compiling the entire translated string, now likely in a subset of C++
- uploading values (arguments or constants) needed to calculate the result of a kernel
- calculating said kernel output
- downloading value from kernel output (this step can be skipped by using the kernel setting
pipeline: true
)- this is generally regarded as the most time consuming part of calculating values from a GPU
- If you find yourself here, please ask yourself:
- "Are the values I need, really needed?"
- "Can I offset the values I need, to the GPU, and or return less often the values I think I need from the GPU?
- If you find yourself here, please ask yourself:
- this is generally regarded as the most time consuming part of calculating values from a GPU
NOTE: You can be assured that "magic" (aka black magic, voodoo, or "clever" API's that are not well tested) is in fact evil, and the "magic", referred to above, is simply in reference to the efficient and effective API, the unit tests proving the API is sufficient, and the speed at which GPU.js is able to calculate values.
As a utility of converting your CPU code to GPU, which can be somewhat overwhelming if one is not accustomed to thinking in "threads", https://github.com/gpujs/matrix-log.js was built. It is a tool for manually converting a function to from CPU to GPU and to learn by way of visual representation either environment's dependencies when executing. Often times, after finally converting your code to the GPU, you (perhaps it is just me) may feel that the GPU is in fact the way it should have been written to begin with, and that the CPU code is actually quite "clever" (aka in some cases as magic), as it is joining our finite and primitive human understanding of math to the rule of reading a book: top left to bottom right.
The term "fallback" is used to describe how GPU.js utilizes GPU technologies. The intent is that GPU.js will execute what you have built with it no matter what, using the best means possible.
For web browsers:
- When WebGL2 is available, it is used
- When WebGL2 is not available WebGL1 will be used
- When WebGL1 is not available CPU will be used
For NodeJS:
- When HeadlessGL is available, it is used
- When HeadlessGL is not available CPU will be used