-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draw calls performance bottleneck #1026
Comments
Sounds great 👍 |
This also requires changing the shaders from using a uniform color to one color per vertex. Triangles from different shapes gets packed into the same buffer, so their color must be separated. |
I absolutely love https://love2d.org/wiki/SpriteBatch. |
@crumblingstatue Can you open a new issue about it? Thanks! |
Alright, I opened #1041. |
I've found the text renderer horrible. The minimal overhead is about 23 calls/frame (rusttype's gpu_cache example). However, Piston doesn't batch it at all, do many context switches like enabling and disabling scissors. This resulted in 1000 calls/frame (and due to the Text implementation, it can increase further with more characters). This is 50x slowdown. Not really affordable. |
@ishitatsuyuki Yeah, text rendering is really bad right now. |
What's the current state of this issue, especially in regard to text rendering? |
Texture rendering is now significantly faster for the OpenGL backend, but the glyph cache implementation must be changed to take advantage of this optimization. |
This issue is to help people understanding the picture of what causes a performance bottleneck in piston-graphics, and what the plan is to fix it.
For each draw call, the CPU need to send data to the GPU. The GPU is often very fast at rendering. If this capacity is not used fully, the GPU sits and waits for more input from the CPU.
GPUs are designed for handling massive amounts of data with a limited set of variation. What the GPU does is controlled through a shader language. For OpenGL the shader language is GLSL.
When you render a rectangle, this is what happens:
Step 1-4 happens repeatedly when drawing many objects for each frame.
In the Gfx backend the draw commands are collected upfront and given to the driver at the same time. However, from the graphics driver side, the instructions seems similar to the ones generated by the OpenGL backend (except for changes made the draw state).
The 1st step is done by piston-graphics's design. Reasons to triangulate on the CPU:
f64
precision of matrix transformations, which are not that frequently supported on the GPUSome questions one might ask:
Before making changes to the design, one might consider using the strengths it offers to fix the problem. It seems the largest overhead is the number of draw calls, and since reducing the number of draw calls will lead to less overhead, we should looks for ways to do that first. This happens in the 2nd step, not the 1st!
Batch, batch, batch!
The key insight here is that since piston-graphics triangulates on the CPU, we could pack multiple shapes into the same buffer in the backend. This leads to fewer draw calls when:
One downside is that many backend instances leads to higher memory usage. Based on experience so far most applications only use one instance, so I do not think this is a problem.
For example, in Conrod a lot of solid colored shapes are rendered, then some textured shapes (text) and then more solid colored shapes etc. Currently the
CharacterCache
backends rasterizes glyphs using Freetype for each character in a separate texture. This means we can reduce the number of draw calls for solid shapes, but not for text.In the case of text, we could try two different approaches:
Number one seems sensible to test first because it would benefit from the same reduction of draw calls. However, it requires some changes:
Character
should take&'a T
to the texture, separating offset and size from texture storage internally in the glyph cacheCharacterCache::character
to returnCharacter<'a, T>
Alternative: Retained API
By organizing graphic primitives into a tree structure, one can traverse it and optimize the draw calls.
While this would be very interesting to work on, there are some major obstacles/unknowns:
Summary of plan
Character
andCharacterCache
I believe this plan requires minimum effort and least amount of breaking changes. We keep the same overall design of piston-graphics and the existing benefits.
The text was updated successfully, but these errors were encountered: