forked from taskflow/taskflow
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathForEachCUDA.html
133 lines (133 loc) · 14.6 KB
/
ForEachCUDA.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>cudaFlow Algorithms » Parallel Iterations | Taskflow QuickStart</title>
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:400,400i,600,600i%7CSource+Code+Pro:400,400i,600" />
<link rel="stylesheet" href="m-dark+documentation.compiled.css" />
<link rel="icon" href="favicon.ico" type="image/vnd.microsoft.icon" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="theme-color" content="#22272e" />
</head>
<body>
<header><nav id="navigation">
<div class="m-container">
<div class="m-row">
<span id="m-navbar-brand" class="m-col-t-8 m-col-m-none m-left-m">
<a href="https://taskflow.github.io"><img src="taskflow_logo.png" alt="" />Taskflow</a> <span class="m-breadcrumb">|</span> <a href="index.html" class="m-thin">QuickStart</a>
</span>
<div class="m-col-t-4 m-hide-m m-text-right m-nopadr">
<a href="#search" class="m-doc-search-icon" title="Search" onclick="return showSearch()"><svg style="height: 0.9rem;" viewBox="0 0 16 16">
<path id="m-doc-search-icon-path" d="m6 0c-3.31 0-6 2.69-6 6 0 3.31 2.69 6 6 6 1.49 0 2.85-0.541 3.89-1.44-0.0164 0.338 0.147 0.759 0.5 1.15l3.22 3.79c0.552 0.614 1.45 0.665 2 0.115 0.55-0.55 0.499-1.45-0.115-2l-3.79-3.22c-0.392-0.353-0.812-0.515-1.15-0.5 0.895-1.05 1.44-2.41 1.44-3.89 0-3.31-2.69-6-6-6zm0 1.56a4.44 4.44 0 0 1 4.44 4.44 4.44 4.44 0 0 1-4.44 4.44 4.44 4.44 0 0 1-4.44-4.44 4.44 4.44 0 0 1 4.44-4.44z"/>
</svg></a>
<a id="m-navbar-show" href="#navigation" title="Show navigation"></a>
<a id="m-navbar-hide" href="#" title="Hide navigation"></a>
</div>
<div id="m-navbar-collapse" class="m-col-t-12 m-show-m m-col-m-none m-right-m">
<div class="m-row">
<ol class="m-col-t-6 m-col-m-none">
<li><a href="pages.html">Handbook</a></li>
<li><a href="namespaces.html">Namespaces</a></li>
</ol>
<ol class="m-col-t-6 m-col-m-none" start="3">
<li><a href="annotated.html">Classes</a></li>
<li><a href="files.html">Files</a></li>
<li class="m-show-m"><a href="#search" class="m-doc-search-icon" title="Search" onclick="return showSearch()"><svg style="height: 0.9rem;" viewBox="0 0 16 16">
<use href="#m-doc-search-icon-path" />
</svg></a></li>
</ol>
</div>
</div>
</div>
</div>
</nav></header>
<main><article>
<div class="m-container m-container-inflatable">
<div class="m-row">
<div class="m-col-l-10 m-push-l-1">
<h1>
<span class="m-breadcrumb"><a href="cudaFlowAlgorithms.html">cudaFlow Algorithms</a> »</span>
Parallel Iterations
</h1>
<nav class="m-block m-default">
<h3>Contents</h3>
<ul>
<li><a href="#CUDAForEachIncludeTheHeader">Include the Header</a></li>
<li><a href="#ForEachCUDAIndexBasedParallelFor">Index-based Parallel Iterations</a></li>
<li><a href="#ForEachCUDAIteratorBasedParallelIterations">Iterator-based Parallel Iterations</a></li>
<li><a href="#ForEachCUDAMiscellaneousItems">Miscellaneous Items</a></li>
</ul>
</nav>
<p><a href="classtf_1_1cudaFlow.html" class="m-doc">tf::<wbr />cudaFlow</a> provides two template methods, <a href="classtf_1_1cudaFlow.html#a1a681f6223853b6445dcfdad07e4d0fd" class="m-doc">tf::<wbr />cudaFlow::<wbr />for_each</a> and <a href="classtf_1_1cudaFlow.html#a34f1ea89e5651faa6e8af522a42556ac" class="m-doc">tf::<wbr />cudaFlow::<wbr />for_each_index</a>, for creating tasks to perform parallel iterations over a range of items.</p><section id="CUDAForEachIncludeTheHeader"><h2><a href="#CUDAForEachIncludeTheHeader">Include the Header</a></h2><p>You need to include the header file, <code>taskflow/cuda/algorithm/for_each.hpp</code>, for creating a parallel-iteration task.</p></section><section id="ForEachCUDAIndexBasedParallelFor"><h2><a href="#ForEachCUDAIndexBasedParallelFor">Index-based Parallel Iterations</a></h2><p>Index-based parallel-for performs parallel iterations over a range <code>[first, last)</code> with the given <code>step</code> size. The task created by <a href="classtf_1_1cudaFlow.html#a34f1ea89e5651faa6e8af522a42556ac" class="m-doc">tf::<wbr />cudaFlow::<wbr />for_each_index(I first, I last, I step, C callable)</a> represents a kernel of parallel execution for the following loop:</p><pre class="m-code"><span class="c1">// positive step: first, first+step, first+2*step, ...</span>
<span class="k">for</span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">i</span><span class="o">=</span><span class="n">first</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o"><</span><span class="n">last</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">+=</span><span class="n">step</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">callable</span><span class="p">(</span><span class="n">i</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="c1">// negative step: first, first-step, first-2*step, ...</span>
<span class="k">for</span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">i</span><span class="o">=</span><span class="n">first</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">></span><span class="n">last</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">+=</span><span class="n">step</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">callable</span><span class="p">(</span><span class="n">i</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span></pre><p>Each iteration <code>i</code> is independent of each other and is assigned one kernel thread to run the callable. Since the callable runs on GPU, it must be declared with a <code>__device__</code> specifier. The following example creates a kernel that assigns each entry of <code>gpu_data</code> to 1 over the range [0, 100) with step size 1.</p><pre class="m-code"><span class="n">taskflow</span><span class="p">.</span><span class="n">emplace</span><span class="p">([</span><span class="o">&</span><span class="p">](</span><span class="n">tf</span><span class="o">::</span><span class="n">cudaFlow</span><span class="o">&</span><span class="w"> </span><span class="n">cf</span><span class="p">){</span><span class="w"></span>
<span class="w"> </span><span class="c1">// ... create other gpu tasks</span>
<span class="w"> </span><span class="c1">// assigns each element in gpu_data to 1 over the range [0, 100) with step size 1</span>
<span class="w"> </span><span class="n">cf</span><span class="p">.</span><span class="n">for_each_index</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="n">gpu_data</span><span class="p">]</span><span class="w"> </span><span class="n">__device__</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">idx</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">gpu_data</span><span class="p">[</span><span class="n">idx</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">});</span><span class="w"></span>
<span class="p">});</span><span class="w"></span></pre></section><section id="ForEachCUDAIteratorBasedParallelIterations"><h2><a href="#ForEachCUDAIteratorBasedParallelIterations">Iterator-based Parallel Iterations</a></h2><p>Iterator-based parallel-for performs parallel iterations over a range specified by two STL-styled iterators, <code>first</code> and <code>last</code>. The task created by <a href="classtf_1_1cudaFlow.html#a1a681f6223853b6445dcfdad07e4d0fd" class="m-doc">tf::<wbr />cudaFlow::<wbr />for_each(I first, I last, C callable)</a> represents a parallel execution of the following loop:</p><pre class="m-code"><span class="k">for</span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">i</span><span class="o">=</span><span class="n">first</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o"><</span><span class="n">last</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">callable</span><span class="p">(</span><span class="o">*</span><span class="n">i</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span></pre><p>The two iterators, <code>first</code> and <code>last</code>, are typically two raw pointers to the first element and the next to the last element in the range in GPU memory space. The following example creates a <code>for_each</code> kernel that assigns each element in <code>gpu_data</code> to 1 over the range <code>[gpu_data, gpu_data + 1000)</code>.</p><pre class="m-code"><span class="n">taskflow</span><span class="p">.</span><span class="n">emplace</span><span class="p">([</span><span class="o">&</span><span class="p">](</span><span class="n">tf</span><span class="o">::</span><span class="n">cudaFlow</span><span class="o">&</span><span class="w"> </span><span class="n">cf</span><span class="p">){</span><span class="w"></span>
<span class="w"> </span><span class="c1">// ... create gpu tasks</span>
<span class="w"> </span><span class="c1">// assigns each element to 1 over the range [gpu_data, gpu_data + 1000)</span>
<span class="w"> </span><span class="n">cf</span><span class="p">.</span><span class="n">for_each</span><span class="p">(</span><span class="n">gpu_data</span><span class="p">,</span><span class="w"> </span><span class="n">gpu_data</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1000</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="w"> </span><span class="n">__device__</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="o">&</span><span class="w"> </span><span class="n">item</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">item</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">});</span><span class="w"> </span>
<span class="p">});</span><span class="w"></span></pre><p>Each iteration is independent of each other and is assigned one kernel thread to run the callable. Since the callable runs on GPU, it must be declared with a <code>__device__</code> specifier.</p></section><section id="ForEachCUDAMiscellaneousItems"><h2><a href="#ForEachCUDAMiscellaneousItems">Miscellaneous Items</a></h2><p>The parallel-iteration algorithms are also available in <a href="classtf_1_1cudaFlowCapturer.html#a0b2f1bcd59f0b42e0f823818348b4ae7" class="m-doc">tf::<wbr />cudaFlowCapturer::<wbr />for_each</a> and <a href="classtf_1_1cudaFlowCapturer.html#aeb877f42ee3a627c40f1c9c84e31ba3c" class="m-doc">tf::<wbr />cudaFlowCapturer::<wbr />for_each_index</a>.</p></section>
</div>
</div>
</div>
</article></main>
<div class="m-doc-search" id="search">
<a href="#!" onclick="return hideSearch()"></a>
<div class="m-container">
<div class="m-row">
<div class="m-col-m-8 m-push-m-2">
<div class="m-doc-search-header m-text m-small">
<div><span class="m-label m-default">Tab</span> / <span class="m-label m-default">T</span> to search, <span class="m-label m-default">Esc</span> to close</div>
<div id="search-symbolcount">…</div>
</div>
<div class="m-doc-search-content">
<form>
<input type="search" name="q" id="search-input" placeholder="Loading …" disabled="disabled" autofocus="autofocus" autocomplete="off" spellcheck="false" />
</form>
<noscript class="m-text m-danger m-text-center">Unlike everything else in the docs, the search functionality <em>requires</em> JavaScript.</noscript>
<div id="search-help" class="m-text m-dim m-text-center">
<p class="m-noindent">Search for symbols, directories, files, pages or
modules. You can omit any prefix from the symbol or file path; adding a
<code>:</code> or <code>/</code> suffix lists all members of given symbol or
directory.</p>
<p class="m-noindent">Use <span class="m-label m-dim">↓</span>
/ <span class="m-label m-dim">↑</span> to navigate through the list,
<span class="m-label m-dim">Enter</span> to go.
<span class="m-label m-dim">Tab</span> autocompletes common prefix, you can
copy a link to the result using <span class="m-label m-dim">⌘</span>
<span class="m-label m-dim">L</span> while <span class="m-label m-dim">⌘</span>
<span class="m-label m-dim">M</span> produces a Markdown link.</p>
</div>
<div id="search-notfound" class="m-text m-warning m-text-center">Sorry, nothing was found.</div>
<ul id="search-results"></ul>
</div>
</div>
</div>
</div>
</div>
<script src="search-v2.js"></script>
<script src="searchdata-v2.js" async="async"></script>
<footer><nav>
<div class="m-container">
<div class="m-row">
<div class="m-col-l-10 m-push-l-1">
<p>Taskflow handbook is part of the <a href="https://taskflow.github.io">Taskflow project</a>, copyright © <a href="https://tsung-wei-huang.github.io/">Dr. Tsung-Wei Huang</a>, 2018–2022.<br />Generated by <a href="https://doxygen.org/">Doxygen</a> 1.8.14 and <a href="https://mcss.mosra.cz/">m.css</a>.</p>
</div>
</div>
</div>
</nav></footer>
</body>
</html>