-
Notifications
You must be signed in to change notification settings - Fork 7
/
RayTracingGPUEdition.html
5356 lines (4449 loc) · 249 KB
/
RayTracingGPUEdition.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<meta charset="utf-8">
<link rel="icon" type="image/png" href="../favicon.png">
<!-- Markdeep: https://casual-effects.com/markdeep/ -->
**Ray Tracing: GPU Edition**
[Arman Uguray][]
<br>
Draft
<br>
!!! WARNING
This is a living document for a work in progress. Please bear in mind that the contents will
change frequently and go through many edits before the final version.
Introduction
====================================================================================================
_Ray Tracing_ is a rendering method in Computer Graphics that simulates the flow of light. It can
faithfully recreate a variety of optical phenomena and can be used to render photorealistic images.
_Path tracing_ is an application of this approach used to compute _Global Illumination_. Its
core idea is to repeatedly trace millions of random rays through the scene and bounce them off
objects based on surface properties. The algorithm is remarkably simple and relatively easy
to implement when applied to a small number of material and geometry types. Peter
Shirley's [_Ray Tracing In One Weekend_][RTIOW] (RTIOW) is a great introduction to building the
foundation for a hobby renderer.
A challenge with path tracing is its high computational cost. Rendering a complex scene takes a
long time and this get worse as the rendered scenes get complex. This has historically made path
tracing unsuitable for real-time applications. Fortunately -- like many problems in Computer
Graphics -- the algorithm lends itself very well to parallelism. It is possible to achieve a
significant speedup by distributing the work across many processor cores.
The GPU (Graphics Processing Unit) is a type of processor designed to run the same set of operations
over large amounts of data in parallel. This parallelism has been instrumental to achieving
realistic visuals in real-time applications like video games. GPUs have been traditionally used to
accelerate scanline rasterization but have since become programmable and capable of running
a variety of parallel workloads. Notably, modern GPUs are now equipped with hardware cores dedicated
to ray tracing.
GPUs aren't without limitations. Programming a GPU requires a different approach than a typical CPU
program. Taking full advantage of a GPU often involves careful tuning based on its architecture and
capabilities which can vary widely across vendors and models. Rendering fully path-traced scenes
at real-time rates remains elusive even on the most high-end GPUs. This is an an active and vibrant
area of Computer Graphics research.
This book is an introduction to GPU programming by building a simple GPU accelerated path tracer.
We'll focus on building a renderer that can produce high quality and correct images using a fairly
simple design. It won't be full-featured and its performance will be limited, however it will expose
you to several fundamental GPU programming concepts. By the end, the renderer you'll have built can
serve as a great starting point for extensions and experiments with more advanced GPU techniques. We will
avoid most optimizations in favor of simplicity but the renderer will be able to achieve interactive
frame rates on a decent GPU when targeting simple scenes.[^ch1] The accompanying code intentionally
avoids hardware ray tracing APIs that are present on newer GPU models, instead focusing on
implementing the same functionality on a programmable GPU unit using a shading language.
This book follows a similar progression to [_Ray Tracing In One Weekend_][RTIOW]. It covers some of
the same material but I highly recommend completing _RTIOW_ before embarking on building
the GPU version. Doing so will teach you the path tracing algorithm in a much more approachable
way and it will make you appreciate both the advantages and challenges of moving to a GPU-based
architecture.
If you run into any problems with your implementation, have general questions or corrections, or
would like to share your own ideas or work, check out [the GitHub Discussions forum][discussions].
[^ch1]: A BVH-accelerated implementation can render a version of the RTIOW cover scene with ~32,000
spheres, 16 ray bounces per pixel, and a resolution of 2048x1536 on a 2022 _Apple M1 Max_ in 15
milliseconds. The same renderer performs very poorly on a 2019 _Intel UHD Graphics 630_ which takes
more than 200ms to render a single sample.
GPU APIs
--------
Interfacing with a GPU and writing programs for it typically requires the use of a special API. This
interface depends on your operating system and GPU vendor. You often have various options depending
on the capabilities you want. For example, an application that wants to get the most juice out of a
NVIDIA GPU for general purpose computations may choose to target CUDA. A developer who prefers
broad hardware compatibility for a graphical mobile game may choose OpenGL ES or Vulkan. Direct3D
(D3D) is the main graphics API on Microsoft platforms while Metal is the preferred framework on
Apple systems. Vulkan, D3D12, and Metal all support an API specifically to accelerate ray
tracing.
You can implement this book using any API or framework that you prefer, though I generally assume
you are working with a graphics API. In my examples I use an API based on [WebGPU][webgpu],
which I think maps well to all modern graphics APIs. The code
examples should be easy to adapt to those libraries. I avoid using ray tracing APIs (such as
[DXR][dxr] or [Vulkan Ray Tracing][vkrt]) to show you how to implement similar functionality on
your own.
<!-- TODO: Maybe this is better to list in a references section near the bottom -->
If you're looking to implement this in CUDA, you may also be interested in Roger Allen's
[blog post][rtiow-cuda] titled _Accelerated Ray Tracing in One Weekend in CUDA_.
Example Code
------------
Like _RTIOW_, you'll find code examples throughout the book. I use [Rust][] as
the implementation language but you can choose any language that supports your GPU API of choice. I avoid
most esoteric aspects of Rust to keep the code easily understandable to a large audience. On the few
occasions where I had to resort to a potentially unfamiliar Rust-ism, I provide a C example to add
clarity.
I provide the finished source code for this book on [GitHub][gt-project] as a reference but I
encourage you to type in your own code. I decided to also provide a minimal source template that you
can use as a starting point if you want to follow along in Rust. The template provides a small
amount of setup code for the windowing logic to help get you started.
### A note on Rust, Libraries, and APIs
I chose Rust for this project because of its ease of use and portability. It is also the language
that I tend to be most productive in.
An important aspect of Rust is that a lot of common functionality is provided by libraries outside
its standard library. I tried to avoid external dependencies as much as possible except for the
following:
* I use *[wgpu][]* to interact with the GPU. This is a native graphics API based on
WebGPU. It's portable and allows the example code to run on Vulkan, Metal, Direct3D 11/12, OpenGL
ES 3.1, as well as WebGPU and WebGL via WebAssembly.
wgpu also has [native bindings in other languages](https://github.com/gfx-rs/wgpu-native).
* I use [*winit*](https://docs.rs/winit/latest/winit/) which is a portable windowing library. It's
used to display the rendered image in real-time and to make the example code interactive.
* For ease of Rust development I use [*anyhow*](https://docs.rs/anyhow/latest/anyhow/) and
[*bytemuck*](https://docs.rs/bytemuck/latest/bytemuck/). *anyhow* is a popular error handling
utility and integrates seamlessly. *bytemuck* provides a safe abstraction for the equivalent of
`reinterpret_cast` in C++, which normally requires [`unsafe`][rust-unsafe] Rust. It's used to
bridge CPU data types with their GPU equivalents.
* Lastly, I use [*pollster*](https://docs.rs/pollster/latest/pollster/) to execute asynchronous
wgpu API functions (which is only called from a single line).
[wgpu][] is the most important dependency as it defines how the example code interacts with the
GPU. Every GPU API is different but their abstractions for the general concepts used in this book
are fairly similar. I will highlight these differences occasionally where they matter.
A large portion of the example code runs on the GPU. Every graphics API defines a programming
language -- a so called **shading language** -- for authoring GPU programs. wgpu is based on WebGPU,
as such my GPU code examples are written in *WebGPU Shading Language* (WGSL)[^ch1.2.1].
<!-- Have GLSL examples too? -->
I also recommend keeping the following references handy while you're developing:
* wgpu API documentation (version 0.19.1): https://docs.rs/wgpu/0.19.1/wgpu
* WebGPU specification: https://www.w3.org/TR/webgpu
* WGSL specification: https://www.w3.org/TR/WGSL
With all of that out of the way, let's get started!
[^ch1.2.1]: wgpu also supports shaders in the
[SPIR-V](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html) binary format. You could
in theory write your shaders in a shading language that can compile to SPIR-V (such as OpenGL's GLSL
and Direct3D's HLSL) as long as you avoid any language features that can't be expressed in WGSL.
Windowing and GPU Setup
====================================================================================================
The first thing to decide is how you want to view your image. One option is to write the output from
the GPU to a file. I think a more fun option is to display the image inside an application window.
I prefer this approach because it allows you to see your rendering as it resolves over time and it
will allow you to make your application interactive later on. The downside is that it requires a
little bit of wiring.
First, your program needs a way to interact with your operating system to create and manage a
window. Next, you need a way to coordinate your GPU workloads to output a sequence of images at the
right time for your OS to be able to composite it inside the window and send it to your display.
Every operating system with a graphical UI provides a native *windowing API* for this purpose.
Graphics APIs typically define some way to integrate with a windowing system. You'll have various
libraries to choose from depending on your OS and programming language. You mainly need to make sure
that the windowing API or UI toolkit you choose can integrate with your graphics API.
In my examples I use *winit* which is a Rust framework that integrates smoothly with wgpu. I put
together a [project template][gt-template] that sets up the library boilerplate for the window
handling. You're welcome to use it as a starting point.
The setup code isn't a lot, so I'll briefly go over the important pieces in this chapter.
The Event Loop
--------------
The first thing the template does is create a window and associate it with an *event loop*. The OS
sends a message to the application during important "events" that the application should act on,
such as a mouse click or when the window gets resized. Your application can wait for these events
and handle them as they arrive by looping indefinitely:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
use {
anyhow::{Context, Result},
winit::{
event::{Event, WindowEvent},
event_loop::{ControlFlow, EventLoop},
window::{Window, WindowBuilder},
},
};
const WIDTH: u32 = 800;
const HEIGHT: u32 = 600;
fn main() -> Result<()> {
let event_loop = EventLoop::new()?;
let window_size = winit::dpi::PhysicalSize::new(WIDTH, HEIGHT);
let window = WindowBuilder::new()
.with_inner_size(window_size)
.with_resizable(false)
.with_title("GPU Path Tracer".to_string())
.build(&event_loop)?;
// TODO: initialize renderer
event_loop.run(|event, control_handle| {
control_handle.set_control_flow(ControlFlow::Poll);
match event {
Event::WindowEvent { event, .. } => match event {
WindowEvent::CloseRequested => control_handle.exit(),
WindowEvent::RedrawRequested => {
// TODO: draw frame
window.request_redraw();
}
_ => (),
},
_ => (),
}
})?;
Ok(())
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [main-initial]: <kbd>[main.rs]</kbd> Creating a window and handling window events]
This code creates a window titled "GPU Path Tracer" and kicks off an event loop.
`event_loop.run()` internally waits for window events and notifies your application by calling the
lambda function that it gets passed as an argument.
The lambda function only handles a few events for now. The most important one is `RedrawRequested`
which is the signal to render and present a new frame. `MainEventsCleared` is simply an event that
gets sent when all pending events have been processed. We call `window.request_redraw()` to draw
repeatedly -- this triggers a new `RedrawRequested` event which is followed by another
`MainEventsCleared`, which requests a redraw, and so on until someone closes the window.
Running this code should bring up an empty window like this:
![Figure [empty-window]: Empty Window](../images/img-01-empty-window.png)
GPU and Surface Initialization
------------------------------
The next thing the template does is establish a connection to the GPU and configure a surface. The
surface manages a set of *textures* that allow the GPU to render inside the window.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
async fn connect_to_gpu(window: &Window) -> Result<(wgpu::Device, wgpu::Queue, wgpu::Surface)> {
use wgpu::TextureFormat::{Bgra8Unorm, Rgba8Unorm};
// Create an "instance" of wgpu. This is the entry-point to the API.
let instance = wgpu::Instance::default();
// Create a drawable "surface" that is associated with the window.
let surface = instance.create_surface(window)?;
// Request a GPU that is compatible with the surface. If the system has multiple GPUs then
// pick the high performance one.
let adapter = instance
.request_adapter(&wgpu::RequestAdapterOptions {
power_preference: wgpu::PowerPreference::HighPerformance,
force_fallback_adapter: false,
compatible_surface: Some(&surface),
})
.await
.context("failed to find a compatible adapter")?;
// Connect to the GPU. "device" represents the connection to the GPU and allows us to create
// resources like buffers, textures, and pipelines. "queue" represents the command queue that
// we use to submit commands to the GPU.
let (device, queue) = adapter
.request_device(&wgpu::DeviceDescriptor::default(), None)
.await
.context("failed to connect to the GPU")?;
// Configure the texture memory backs the surface. Our renderer will draw to a surface texture
// every frame.
let caps = surface.get_capabilities(&adapter);
let format = caps
.formats
.into_iter()
.find(|it| matches!(it, Rgba8Unorm | Bgra8Unorm))
.context("could not find preferred texture format (Rgba8Unorm or Bgra8Unorm)")?;
let size = window.inner_size();
let config = wgpu::SurfaceConfiguration {
usage: wgpu::TextureUsages::RENDER_ATTACHMENT,
format,
width: size.width,
height: size.height,
present_mode: wgpu::PresentMode::AutoVsync,
alpha_mode: caps.alpha_modes[0],
view_formats: vec![],
desired_maximum_frame_latency: 3,
};
surface.configure(&device, &config);
Ok((device, queue, surface))
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [main-initial]: <kbd>[main.rs]</kbd> The connect_to_gpu function]
The code that sets this all up is a bit wordy. I'll quickly go over the important bits:
1. What the first ~20 lines do is request a connection to a GPU that is compatible with the
window. The bit about `wgpu::PowerPreference::HighPerformance` is a hint to the API that we want
the higher-powered GPU if the current system has more than one available.
2. The rest of the function configures the dimensions, pixel format, and presentation mode of the
surface. `Rgba8Unorm` and `Bgra8Unorm` are common pixel formats that store each color component
(red, green, blue, and alpha) as an 8-bit unsigned integer. The "unorm" part stands for "unsigned
normalized", which means that our rendering code can represent the component values as a real
number in the range `[0.0, 1.0]`. We set the size to simply span the entire window.
The bit about `wgpu::PresentMode::AutoVsync` tells the surface to synchronize the presentation of
each frame with the display's refresh rate. The surface will manage an internal queue of textures
for us and we will render to them as they become available. This prevents a visual artifact known
as "tearing" (which can happen when frames get presented faster than the display refresh rate) by
setting up the renderer to be *v-sync locked*. We will discuss some of the implications of this
later on.
The last bit that I'll highlight here is `wgpu::TextureUsage::RENDER_ATTACHMENT`. This just
indicates that we are going to use the GPU's rendering function to draw directly into the surface
textures.
After setting all this up the function returns 3 objects: A `wgpu::Device` that represents the
connection to the GPU, a `wgpu::Queue` which we'll use to issue commands to the GPU, and a
`wgpu::Surface` that we'll use to present frames to the window. We will talk a lot about the first
two when we start putting together our renderer in the next chapter.
You may have noticed that the function declaration begins with `async`. This marks the function as
*asynchronous* which means that it doesn't return its result immediately. This is only necessary
because the API functions that we invoke (`wgpu::Instance::request_adapter` and
`wgpu::Adapter::request_device`) are asynchronous functions. The `.await` keyword is syntactic sugar
that makes the asynchronous calls appear like regular (synchronous) function calls. What happens
under the hood is somewhat complex but I wouldn't worry about this too much since this is the one
and only bit of asynchronous code that we will encounter. If you want to learn more about it, I
recommend checking out the [Rust Async Book](https://rust-lang.github.io/async-book/).
### Completing Setup
Finally, the `main` function needs a couple updates: first we make it `async` so that it we can
"await" on `connect_to_gpu`. Technically the `main` function of a program cannot be async and
running an async function requires some additional utilities. There are various alternatives but I
chose to use a library called `pollster`. The library provides a special macro (called `main`) that
takes care of everything. Again, this is the only asynchronous code that we'll encounter so don't
worry about what it does.
The second change to the main function is where it handles the `RedrawRequested` event. For every
new frame, we first request the next available texture from the surface that we just created. The
queue has a limited number of textures available. If the CPU outpaces the GPU (i.e. the GPU takes
longer than a display refresh cycle to finish its tasks), then calling
`surface.get_current_texture()` can block until a texture becomes available.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
#[pollster::main]
async fn main() -> Result<()> {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
let event_loop = EventLoop::new()?;
let window_size = winit::dpi::PhysicalSize::new(WIDTH, HEIGHT);
let window = WindowBuilder::new()
.with_inner_size(window_size)
.with_resizable(false)
.with_title("GPU Path Tracer".to_string())
.build(&event_loop)?;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let (device, queue, surface) = connect_to_gpu(&window).await?;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
// TODO: initialize renderer
event_loop.run(|event, control_handle| {
control_handle.set_control_flow(ControlFlow::Poll);
match event {
Event::WindowEvent { event, .. } => match event {
WindowEvent::CloseRequested => control_handle.exit(),
WindowEvent::RedrawRequested => {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
// Wait for the next available frame buffer.
let frame: wgpu::SurfaceTexture = surface
.get_current_texture()
.expect("failed to get current texture");
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
// TODO: draw frame
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
frame.present();
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
window.request_redraw();
}
_ => (),
},
_ => (),
}
})?;
Ok(())
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [main-setup-complete]: <kbd>[main.rs]</kbd> Putting together the initial main function]
Once a frame texture becomes available, the example issues a request to display it as soon as
possible by calling `frame.present()`. All of our rendering work will be scheduled before this call.
That was a lot of boilerplate -- this is sometimes necessary to interact with OS resources. With all
of this in place, we can start building a real-time renderer.
### A note on error handling in Rust
If you're new to Rust, some of the patterns above may look unfamiliar. One of these is error
handling using the `Result` type. I use this pattern frequently enough that it's worth a quick
explainer.
A `Result` is a variant type that can hold either a success (`Ok`) value or an error (`Err`) value.
The types of the `Ok` and `Err` variants are generic:
<script type="preformatted">
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
pub enum Result<T, E> {
Ok(T),
Err(E),
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Figure [rust-result]: The `Result` type]
</script>
`T` and `E` can be any type. It's common for a library to define its own error types to represent
various error conditions.
The idea is that a function returns a `Result` if it has a failure mode. A caller must check the
status of the `Result` to unpack the return value or recover from an error.
In a C program, a common way to handle an error is to return early from the calling function and
and perhaps return an entirely new error. For example:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C
bool function_with_a_result(Foo* out_result);
int main() {
Foo foo;
if (!function_with_result(&foo)) {
return -1;
}
// ...do something with `foo`...
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Rust provides the `?` operator to automatically unpack a `Result` and return early if it holds an
error. A Rust version of the C program above could be written like this:
<script type="preformatted">
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
fn function_with_result() -> Result<Foo, FooError> {...}
fn caller() -> Result<(), FooError> {
let foo: Foo = function_with_result()?;
// ...do something with `foo`...
Ok(())
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
</script>
If `function_with_result()` returns an error, the `?` operator will cause `caller` to return and
propagate the error value. This works as long as `caller` and `function_with_result` either return
the same error type or types with a known conversion. There are various other ways to handle an
error:
<script type="preformatted">
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
fn function_with_result() -> Result<Foo, FooError> {...}
fn caller() -> Result<(), FooError> {
...
if let Err(e) = function_with_result() {
println!("got a foo error: {:?}", e);
return Err(BarError::from_foo(e));
}
...
let foo = function_with_result().map_err(BarError::from_foo)?;
...
let Ok(foo) = function_with_result() else {
panic!("Didn't work the second time");
}
...
let foo = match function_with_result() {
Ok(foo) => foo,
Err(e) => panic!("failed again");
};
...
Ok(())
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
</script>
I like to keep things simple in my code examples and use the `?` operator. Instead of defining
custom error types and conversions, I use a catch all `Error` type from a library called *anyhow*.
You'll often see the examples include `anyhow::Result` (an alias for `Result<, anyhow::Error>`)
and `anyhow::Context`. The latter is a useful trait for adding an error message while converting to
an `anyhow::Error`:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
fn caller() -> anyhow::Result<()> {
let foo: Foo = function_with_result().context("failed to get foo")?;
// ...do something with `foo`...
Ok(())
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can read more about the `Result` type in [its module
documentation](https://doc.rust-lang.org/std/result/index.html).
Drawing Pixels
====================================================================================================
At this stage, we have code that brings up a window, connects to the GPU, and sets up a queue of
textures that is synchronized with the display. In Computer Graphics, the term "texture" is
generally used in the context of *texture mapping*, which is a technique to apply detail to geometry
using data stored in memory. A very common application is to map color data from the pixels of a 2D
image onto the surface of a 3D polygon.
Texture mapping is so essential to real-time graphics that all modern GPUs are equipped with
specialized hardware to speed up texture operations. It's not uncommon for a modern video game to
use texture assets that take up hundreds of megabytes. Processing all of that data involves a lot
of memory traffic which is a big performance bottleneck for a GPU. This is why GPUs come with
dedicated texture memory caches, sampling hardware, compression schemes and other features to
improve texture data throughput.
We are going to use the texture hardware to store the output of our renderer. In wgpu, a *texture
object* represents texture memory that can be used in three main ways: texture mapping, shader
storage, or as a *render target*[^ch3-cit1]. A surface texture is a special kind of texture that can
only be used as a render target.
Not all native APIs have this restriction. For instance, both Metal and Vulkan allow their version
of a surface texture -- a *frame buffer* (Metal) or *swap chain* (Vulkan) texture -- to be
configured for other usages, though this sometimes comes with a warning about impaired performance
and is not guaranteed to be supported by the hardware.
wgpu doesn't provide any other option so I'm going to start by implementing a render pass. This is
a fundamental and very widely used function of the GPU, so it's worth learning about.
[^ch3-cit1]: See [`wgpu::TextureUsages`](https://docs.rs/wgpu/0.17.0/wgpu/struct.TextureUsages.html).
The render Module
---------------------
I like to separate the rendering code from all the windowing code, so I'll start by creating a file
named `render.rs`. Every Rust file makes up a *module* (with the same name) which serves as a
namespace for all functions and types that are declared in it. Here I'll add a data structure called
`PathTracer`. This will hold all GPU resources and eventually implement our path tracing algorithm:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
pub struct PathTracer {
device: wgpu::Device,
queue: wgpu::Queue,
}
impl PathTracer {
pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer {
device.on_uncaptured_error(Box::new(|error| {
panic!("Aborting due to an error: {}", error);
}));
// TODO: initialize GPU resources
PathTracer {
device,
queue,
}
}
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [render-initial]: <kbd>[render.rs]</kbd> The PathTracer structure]
We start out with an associated function called `PathTracer::new` which will serve as the
constructor and eventually initialize all GPU resources. The `PathTracer` takes ownership of the
`wgpu::Device` and `wgpu::Queue` that we created earlier and it will hold on to them for the rest of
the application's life.
`wgpu::Device` represents a connection to the GPU. It is responsible for creating resources like
texture, buffer, and pipeline objects. It also defines some methods for error handling.
The first thing I do is set up an "uncaptured error" handler. If you look at the [declarations
](https://docs.rs/wgpu/0.17.0/wgpu/struct.Device.html) of resource creation methods you'll notice
that none of them return a `Result`. This doesn't mean that they always succeed, as a matter of fact
all of these operations can fail. This is because wgpu closely mirrors the WebGPU API which uses a
concept called *error scopes* to detect and respond to errors.
Whenever there's an error that I don't handle using an error scope it will trigger the uncaptured
error handler, which will print out an error message and abort the program[^ch3.1-cit1]. For now,
I won't set up any error scopes in `PathTracer::new` and I'll abort the program if the API fails to
create the initial resources.
Next, let's declare the `render` module and initialize a `PathTracer` in the `main` function:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
};
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Rust highlight
mod render;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
const WIDTH: u32 = 800;
const HEIGHT: u32 = 600;
#[pollster::main]
async fn main() -> Result<()> {
let event_loop = EventLoop::new();
let window_size = winit::dpi::PhysicalSize::new(WIDTH, HEIGHT);
let window = WindowBuilder::new()
.with_inner_size(window_size)
.with_resizable(false)
.with_title("GPU Path Tracer".to_string())
.build(&event_loop)?;
let (device, queue, surface) = connect_to_gpu(&window).await?;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Rust highlight
let renderer = render::PathTracer::new(device, queue);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
event_loop.run(move |event, _, control_flow| {
control_handle.set_control_flow(ControlFlow::Poll);
match event {
Event::WindowEvent { event, .. } => match event {
WindowEvent::CloseRequested => control_handle.exit(),
WindowEvent::RedrawRequested => {
// Wait for the next available frame buffer.
let frame: wgpu::SurfaceTexture = surface
.get_current_texture()
.expect("failed to get current texture");
// TODO: draw frame
frame.present();
window.request_redraw();
}
_ => (),
},
_ => (),
}
});
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [main-renderer-init]: <kbd>[main.rs]</kbd> Initializing a Renderer]
Now that we have the skeleton in place, it's time to paint some pixels on the screen.
[^ch3.1-cit1]: This is actually the default behavior so I didn't really need to call
`on_uncaptured_error`.
Display Pipeline
----------------
Before setting up the render pass let's first talk about how it works. Traditionally, graphics
systems have been modeled after an abstraction called the *graphics pipeline*.[#Hughes13] At a
very high level, the input to the pipeline is a mathematical model that describes what to draw
-- such as geometry, materials, and light -- and the output is a 2D grid of pixels. This
transformation is processed in a series of standard *pipeline stages* which form the basis of the
rendering abstraction provided by GPUs and graphics APIs. wgpu uses the term *render pipeline* which
is what I'll use going forward.
The input to the render pipeline is a polygon stream represented by points in 3D space and their
associated data. The polygons are described in terms of geometric primitives (points, lines, and
triangles) which consist of *vertices*. The *vertex stage* transforms each vertex from the input
stream into a 2D coordinate space that corresponds to the viewport. After some additional processing
(such as clipping and culling) the assembled primitives are passed on to the *rasterizer*.
The rasterizer applies a process called scan conversion to determine the pixels that are covered by
each primitive and breaks them up into per-pixel *fragments*. The output of the vertex
stage (the vertex positions, texture coordinates, vertex colors, etc) gets interpolated between the
vertices of the primitive and the interpolated values get assigned to each fragment. Fragments are
then passed on to the *fragment stage* which computes an output (such as the pixel or sample color)
for each fragment. Shading techniques such as texture mapping and lighting are usually performed
in this stage. The output then goes through several other operations before getting written to the
render target as pixels.[^ch3-footnote1]
![Figure [render-pipeline]: Vertex and Fragment stages of the render pipeline
](../images/fig-01-render-pipeline.svg)
What I just described is very much a data pipeline: a data stream goes through a series of
transformations in stages. The input to each stage is defined in terms of smaller elements (e.g.
vertices and pixel-fragments) that can be processed in parallel. This is the fundamental principle
behind the GPU.
Early commercial GPUs implemented the graphics pipeline entirely in fixed-function hardware. Modern
GPUs still use fixed-function stages (and at much greater data rates) but virtually all of them
allow you to program the vertex and fragment stages with custom logic using *shader programs*.
[^ch3-footnote1]: I glossed over a few pipeline stages (such as geometry and tessellation) and
important steps like multi-sampling, blending, and the scissor/depth/stencil tests. These play an
important role in many real-time graphics applications but we won't make use of them in our path
tracer.
### Compiling Shaders
Let's put together a render pipeline that draws a red triangle. We'll define a vertex shader that
outputs the 3 corner vertices and a fragment shader that outputs a solid color. We'll write
these shaders in the WebGPU Shading Language (WGSL).
Go ahead and create a file called `shaders.wgsl` to host all of our WGSL code (I put it next to the
Rust files under `src/`). Before we can run this code on the GPU we need to compile it into a
form that can be executed on the GPU. We start by creating a *shader module*:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
pub struct PathTracer {
device: wgpu::Device,
queue: wgpu::Queue,
}
impl PathTracer {
pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer {
device.on_uncaptured_error(Box::new(|error| {
panic!("Aborting due to an error: {}", error);
}));
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let shader_module = compile_shader_module(&device);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust delete
// TODO: initialize GPU resources
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
PathTracer {
device,
queue,
}
}
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
fn compile_shader_module(device: &wgpu::Device) -> wgpu::ShaderModule {
use std::borrow::Cow;
let code = include_str!(concat!(env!("CARGO_MANIFEST_DIR"), "/src/shaders.wgsl"));
device.create_shader_module(wgpu::ShaderModuleDescriptor {
label: None,
source: wgpu::ShaderSource::Wgsl(Cow::Borrowed(code)),
})
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [render-shader-module]: <kbd>[render.rs]</kbd> Creating the shader module]
The `compile_shader_module` function loads the file we just created into a string using the
`include_str!` macro. This bundles the contents of `shaders.wgsl` into the program binary at build
time. This is followed by a call to `wgpu::Device::create_shader_module` to compile the WGSL source
code.[^ch3-footnote2]
Let's define the vertex and fragment functions, which I'm calling `display_vs` and `display_fs`:
<script type="preformatted">
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WGSL highlight
alias TriangleVertices = array<vec2f, 3>;
var<private> vertices: TriangleVertices = TriangleVertices(
vec2f(-0.5, -0.5),
vec2f( 0.5, -0.5),
vec2f( 0.0, 0.5),
);
@vertex fn display_vs(@builtin(vertex_index) vid: u32) -> @builtin(position) vec4f {
return vec4f(vertices[vid], 0.0, 1.0);
}
@fragment fn display_fs() -> @location(0) vec4f {
return vec4f(1.0, 0.0, 0.0, 1.0);
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [vertex-and-fragment-shaders]: <kbd>[shaders.wgsl]</kbd> Vertex and Fragment shaders]
</script>
I'm using the "vs" and "fs" suffixes as shorthand for "vertex stage" and "fragment stage". Together,
these two functions form our "display pipeline" (the "display" part will become more clear later).
The `@vertex` and `@fragment` annotations are WGSL keywords that mark these two functions as entry
points to each pipeline stage program.
Since graphics workloads generally involve a high amount of linear algebra, GPUs natively support
SIMD operations over vectors and matrices. All shading languages define built-in types for vectors
and matrices of up to 4 dimensions (4x4 in the case of matrices). The `vec4f` and `vec2f` types that
are in the code represent 4D and 2D vectors of floating point numbers.
`display_vs` returns the vertex position as a `vec4f`. This position is defined relative to a
coordinate space called the *Normalized Device Coordinate Space*. In NDC, the center of the viewport
marks the origin $(0, 0, 0)$. The $x$-axis spans horizontally from $(-1, 0, 0)$ on the left edge of
the viewport to $(1, 0, 0)$ on the right edge while the $y$-axis spans vertically from $(0,-1,0)$ at
the bottom to $(0,1,0)$ at the top. The $z$-axis is directly perpendicular to the viewport, going
*through* the origin.
![Figure [ndc]: Our triangle in Normalized Device Coordinates](../images/fig-02-ndc.svg)
`display_vs` takes a *vertex index* as its parameter. The vertex function gets invoked for every
input vertex across different GPU threads. `vid` identifies the individual vertex that is assigned
to the *invocation*. The number of vertices and where they exist within the topology of the input
geometry is up to us to define. Since we want to draw a triangle, we'll later issue a *draw call*
with 3 vertices and `display_vs` will get invoked exactly 3 times with vertex indices ranging from
$0$ to $2$.
Since our 2D triangle is viewport-aligned, we can set the $z$ coordinate to $0$. The 4th
coordinate is known as a *homogeneous coordinate* used for projective transformations. Don't worry
about this coordinate for now -- just know that for a vector that represents a *position* we set
this coordinate to $1$. We can declare the $x$ and $y$ coordinates for the 3 vertices as an array
of `vec2f` and simply return the element that corresponds to `vid`. I enumerate the vertices in
counter-clockwise order which matches the winding order we'll specify when we create the pipeline.
`display_fs` takes no inputs and returns a `vec4f` that represents the fragment color. The 4
dimensions represent the red, green, blue, and alpha channels of the destination pixel. `display_fs`
gets invoked for all pixel fragments that result from our triangle and the invocations are executed
in parallel across many GPU threads, just like the vertex function. To paint the triangle solid red,
we simply return `vec4f(1., 0., 0., 1.)` for all fragments.
[^ch3-footnote2]: The `Cow::Borrowed` bit is a Rust idiom that creates a "copy-on-write borrow".
This allows the API to take ownership of the WGSL string if necessary. This is not really an
important detail for us.
### Creating the Pipeline Object
Before we can run the shaders, we need to assemble them into a *pipeline state object*. This is
where we specify the data layout of the render pipeline and link the shaders into a runnable binary
program. Let's add a new function called `create_display_pipeline`:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
fn compile_shader_module(device: &wgpu::Device) -> wgpu::ShaderModule {
use std::borrow::Cow;
let code = include_str!(concat!(env!("CARGO_MANIFEST_DIR"), "/src/shaders.wgsl"));
device.create_shader_module(wgpu::ShaderModuleDescriptor {
label: None,
source: wgpu::ShaderSource::Wgsl(Cow::Borrowed(code)),
})
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
fn create_display_pipeline(
device: &wgpu::Device,
shader_module: &wgpu::ShaderModule,
) -> wgpu::RenderPipeline {
device.create_render_pipeline(&wgpu::RenderPipelineDescriptor {
label: Some("display"),
layout: None,
primitive: wgpu::PrimitiveState {
topology: wgpu::PrimitiveTopology::TriangleList,
front_face: wgpu::FrontFace::Ccw,
polygon_mode: wgpu::PolygonMode::Fill,
..Default::default()
},
vertex: wgpu::VertexState {
module: shader_module,
entry_point: "display_vs",
buffers: &[],
},
fragment: Some(wgpu::FragmentState {
module: shader_module,
entry_point: "display_fs",
targets: &[Some(wgpu::ColorTargetState {
format: wgpu::TextureFormat::Bgra8Unorm,
blend: None,
write_mask: wgpu::ColorWrites::ALL,
})],
}),
depth_stencil: None,
multisample: wgpu::MultisampleState::default(),
multiview: None,
})
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [display-pipeline]: <kbd>[render.rs]</kbd> The `create_display_pipeline` function]
This code describes a render pipeline that draws a list of triangle primitives. The vertex winding
order is set to counter-clockwise which defines the orientation of the triangle's *front
face*.[^ch3-footnote3]
We request that the interior of each polygon be completely filled (rather than drawing just the
edges or vertices). We specify that `display_vs` is the main function of the vertex stage and that
we're not providing any vertex data from the CPU (since we declared our vertices in the shader
code). Similarly, we set up a fragment stage with `display_fs` as the entry point and a single
color target.[^ch3-footnote4] I set the pixel format of the render target to `Bgra8Unorm` since
that happens to be widely supported on all of my devices. What's important is that you assign a
pixel format that matches the surface configuration in your windowing setup and that your GPU device
supports this as a *render attachment* format.
Let's instantiate the pipeline and store it in the `PathTracer` object. Pipeline creation is
expensive so we want to create the pipeline state object once and hold on to it. We'll reference it
later when drawing a frame:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
pub struct PathTracer {
device: wgpu::Device,
queue: wgpu::Queue,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
display_pipeline: wgpu::RenderPipeline,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
impl PathTracer {
pub fn new(device: wgpu::Device, queue: wgpu::Queue) -> PathTracer {
device.on_uncaptured_error(Box::new(|error| {
panic!("Aborting due to an error: {}", error);
}));
let shader_module = compile_shader_module(&device);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
let display_pipeline = create_display_pipeline(&device, &shader_module);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
PathTracer {
device,
queue,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
display_pipeline,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
}
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [display-pipeline-init]: <kbd>[render.rs]</kbd> Initializing the display pipeline]
[^ch3-footnote3]: The GPU can automatically discard triangles that are oriented away from the
viewport. This is a feature called *back face culling* which our code doesn't make use of.
[^ch3-footnote4]: The `fragment` field of `wgpu::RenderPipelineDescriptor` is optional
(notice the *Some* in `Some(wgpu::FragmentState {...})` ?). A render pipeline that only outputs to
the depth or stencil buffers doesn't have to specify a fragment shader or any color attachments. An
example of this is *shadow mapping*: a shadow map is a texture that stores the distances between a
light source and geometry samples from the scene; it can be produced by a depth-only render-pass
from the point of view of the light source. The shadow map is later sampled from a render pass from
the camera's point of view to determine whether a rasterized point is visible from the light or in
shadow.
The Render Pass
---------------
We now have the pieces in place to issue a draw command to the GPU. The general abstraction modern
graphics APIs define for this is called a "command buffer" (or "command list" in D3D12). You can
think of the command buffer as a memory location that holds the serialized list of GPU commands
representing the sequence of actions we want the GPU to take. To draw a triangle we'll *encode*
a draw command into the command buffer and then *submit* the command buffer to the GPU for exection.
With wgpu, the encoding is abstracted by an object called `wgpu::CommandEncoder`, which we'll use to
record our draw command. Once we are done, we will call `wgpu::CommandEncoder::finish()` to produce
a finalized `wgpu::CommandBuffer` which we can submit to the GPU via the `wgpu::Queue` that we
created at start up.
Let's add a new `PathTracer` function called `render_frame`. This function will take a texture as
its parameter (our *render target*) and tell the GPU to draw to it using the pipeline object we
created earlier:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
...
impl PathTracer {
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust highlight
pub fn render_frame(&self, target: &wgpu::TextureView) {
let mut encoder = self
.device
.create_command_encoder(&wgpu::CommandEncoderDescriptor {
label: Some("render frame"),
});
let mut render_pass = encoder.begin_render_pass(&wgpu::RenderPassDescriptor {
label: Some("display pass"),
color_attachments: &[Some(wgpu::RenderPassColorAttachment {
view: target,
resolve_target: None,
ops: wgpu::Operations {
load: wgpu::LoadOp::Clear(wgpu::Color::BLACK),
store: wgpu::StoreOp::Store,
},
})],
..Default::default()
});
render_pass.set_pipeline(&self.display_pipeline);
// Draw 1 instance of a polygon with 3 vertices.
render_pass.draw(0..3, 0..1);
// End the render pass by consuming the object.
drop(render_pass);
let command_buffer = encoder.finish()
self.queue.submit(Some(command_buffer));
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rust
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Listing [render_frame-stub]: <kbd>[render.rs]</kbd> The `render_frame` function]