forked from Theano/Theano
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmodes.txt
381 lines (281 loc) · 15.2 KB
/
modes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
.. _using_modes:
==========================================
Configuration Settings and Compiling Modes
==========================================
Configuration
=============
The ``config`` module contains several *attributes* that modify Theano's behavior. Many of these
attributes are examined during the import of the ``theano`` module and several are assumed to be
read-only.
*As a rule, the attributes in the* ``config`` *module should not be modified inside the user code.*
Theano's code comes with default values for these attributes, but you can
override them from your ``.theanorc`` file, and override those values in turn by
the :envvar:`THEANO_FLAGS` environment variable.
The order of precedence is:
1. an assignment to theano.config.<property>
2. an assignment in :envvar:`THEANO_FLAGS`
3. an assignment in the .theanorc file (or the file indicated in :envvar:`THEANORC`)
You can display the current/effective configuration at any time by printing
theano.config. For example, to see a list of all active configuration
variables, type this from the command-line:
.. code-block:: bash
python -c 'import theano; print theano.config' | less
For more detail, see :ref:`Configuration <libdoc_config>` in the library.
Exercise
========
Consider the logistic regression:
.. testcode::
import numpy
import theano
import theano.tensor as T
rng = numpy.random
N = 400
feats = 784
D = (rng.randn(N, feats).astype(theano.config.floatX),
rng.randint(size=N,low=0, high=2).astype(theano.config.floatX))
training_steps = 10000
# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
x.tag.test_value = D[0]
y.tag.test_value = D[1]
# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w)-b)) # Probability of having a one
prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy
cost = xent.mean() + 0.01*(w**2).sum() # The cost to optimize
gw,gb = T.grad(cost, [w,b])
# Compile expressions to functions
train = theano.function(
inputs=[x,y],
outputs=[prediction, xent],
updates=[(w, w-0.01*gw), (b, b-0.01*gb)],
name = "train")
predict = theano.function(inputs=[x], outputs=prediction,
name = "predict")
if any([x.op.__class__.__name__ in ['Gemv', 'CGemv', 'Gemm', 'CGemm'] for x in
train.maker.fgraph.toposort()]):
print('Used the cpu')
elif any([x.op.__class__.__name__ in ['GpuGemm', 'GpuGemv'] for x in
train.maker.fgraph.toposort()]):
print('Used the gpu')
else:
print('ERROR, not able to tell if theano used the cpu or the gpu')
print(train.maker.fgraph.toposort())
for i in range(training_steps):
pred, err = train(D[0], D[1])
print("target values for D")
print(D[1])
print("prediction on D")
print(predict(D[0]))
.. testoutput::
:hide:
:options: +ELLIPSIS
Used the cpu
target values for D
...
prediction on D
...
Modify and execute this example to run on CPU (the default) with floatX=float32 and
time the execution using the command line ``time python file.py``. Save your code
as it will be useful later on.
.. Note::
* Apply the Theano flag ``floatX=float32`` (through ``theano.config.floatX``) in your code.
* Cast inputs before storing them into a shared variable.
* Circumvent the automatic cast of *int32* with *float32* to *float64*:
* Insert manual cast in your code or use *[u]int{8,16}*.
* Insert manual cast around the mean operator (this involves division by length, which is an *int64*).
* Note that a new casting mechanism is being developed.
:download:`Solution<modes_solution_1.py>`
-------------------------------------------
Mode
====
Every time :func:`theano.function <function.function>` is called,
the symbolic relationships between the input and output Theano *variables*
are optimized and compiled. The way this compilation occurs
is controlled by the value of the ``mode`` parameter.
Theano defines the following modes by name:
- ``'FAST_COMPILE'``: Apply just a few graph optimizations and only use Python implementations. So GPU is disabled.
- ``'FAST_RUN'``: Apply all optimizations and use C implementations where possible.
- ``'DebugMode``: Verify the correctness of all optimizations, and compare C and Python
implementations. This mode can take much longer than the other modes, but can identify
several kinds of problems.
- ``'ProfileMode'`` (deprecated): Same optimization as FAST_RUN, but print some profiling information.
The default mode is typically ``FAST_RUN``, but it can be controlled via
the configuration variable :attr:`config.mode`,
which can be overridden by passing the keyword argument to
:func:`theano.function <function.function>`.
================= =============================================================== ===============================================================================
short name Full constructor What does it do?
================= =============================================================== ===============================================================================
``FAST_COMPILE`` ``compile.mode.Mode(linker='py', optimizer='fast_compile')`` Python implementations only, quick and cheap graph transformations
``FAST_RUN`` ``compile.mode.Mode(linker='cvm', optimizer='fast_run')`` C implementations where available, all available graph transformations.
``DebugMode`` ``compile.debugmode.DebugMode()`` Both implementations where available, all available graph transformations.
``ProfileMode`` ``compile.profilemode.ProfileMode()`` Deprecated. C implementations where available, all available graph transformations, print profile information.
================= =============================================================== ===============================================================================
.. Note::
For debugging purpose, there also exists a ``MonitorMode`` (which has no
short name). It can be used to step through the execution of a function:
see :ref:`the debugging FAQ<faq_monitormode>` for details.
Linkers
=======
A mode is composed of 2 things: an optimizer and a linker. Some modes,
like ``ProfileMode`` and ``DebugMode``, add logic around the optimizer and
linker. ``ProfileMode`` and ``DebugMode`` use their own linker.
You can select which linker to use with the Theano flag :attr:`config.linker`.
Here is a table to compare the different linkers.
============= ========= ================= ========= ===
linker gc [#gc]_ Raise error by op Overhead Definition
============= ========= ================= ========= ===
cvm yes yes "++" As c|py, but the runtime algo to execute the code is in c
cvm_nogc no yes "+" As cvm, but without gc
c|py [#cpy1]_ yes yes "+++" Try C code. If none exists for an op, use Python
c|py_nogc no yes "++" As c|py, but without gc
c no yes "+" Use only C code (if none available for an op, raise an error)
py yes yes "+++" Use only Python code
ProfileMode no no "++++" (Deprecated) Compute some extra profiling info
DebugMode no yes VERY HIGH Make many checks on what Theano computes
============= ========= ================= ========= ===
.. [#gc] Garbage collection of intermediate results during computation.
Otherwise, their memory space used by the ops is kept between
Theano function calls, in order not to
reallocate memory, and lower the overhead (make it faster...).
.. [#cpy1] Default
For more detail, see :ref:`Mode<libdoc_compile_mode>` in the library.
.. _using_debugmode:
Using DebugMode
===============
While normally you should use the ``FAST_RUN`` or ``FAST_COMPILE`` mode,
it is useful at first (especially when you are defining new kinds of
expressions or new optimizations) to run your code using the DebugMode
(available via ``mode='DebugMode``). The DebugMode is designed to
run several self-checks and assertions that can help diagnose
possible programming errors leading to incorrect output. Note that
``DebugMode`` is much slower than ``FAST_RUN`` or ``FAST_COMPILE`` so
use it only during development (not when you launch 1000 processes on a
cluster!).
.. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_modes.test_modes_1
DebugMode is used as follows:
.. testcode::
x = T.dvector('x')
f = theano.function([x], 10 * x, mode='DebugMode')
f([5])
f([0])
f([7])
If any problem is detected, DebugMode will raise an exception according to
what went wrong, either at call time (*f(5)*) or compile time (
``f = theano.function(x, 10 * x, mode='DebugMode')``). These exceptions
should *not* be ignored; talk to your local Theano guru or email the
users list if you cannot make the exception go away.
Some kinds of errors can only be detected for certain input value combinations.
In the example above, there is no way to guarantee that a future call to, say
*f(-1)*, won't cause a problem. DebugMode is not a silver bullet.
.. TODO: repair the following link
If you instantiate DebugMode using the constructor (see :class:`DebugMode`)
rather than the keyword ``DebugMode`` you can configure its behaviour via
constructor arguments. The keyword version of DebugMode (which you get by using ``mode='DebugMode'``)
is quite strict.
For more detail, see :ref:`DebugMode<debugmode>` in the library.
.. _using_profilemode:
ProfileMode
===========
.. note::
ProfileMode is deprecated. Use :attr:`config.profile` instead.
Besides checking for errors, another important task is to profile your
code. For this Theano uses a special mode called ProfileMode which has
to be passed as an argument to :func:`theano.function <function.function>`.
Using the ProfileMode is a three-step process.
.. note::
To switch the default accordingly, set the Theano flag
:attr:`config.mode` to ProfileMode. In that case, when the Python
process exits, it will automatically print the profiling
information on the standard output.
The memory profile of the output of each ``apply`` node can be enabled with the
Theano flag :attr:`config.ProfileMode.profile_memory`.
For more detail, see :ref:`ProfileMode <profilemode>` in the library.
Creating a ProfileMode Instance
-------------------------------
First create a ProfileMode instance:
>>> from theano import ProfileMode
>>> profmode = theano.ProfileMode(optimizer='fast_run', linker=theano.gof.OpWiseCLinker())
The ProfileMode constructor takes as input an optimizer and a
linker. Which optimizer and linker to use will depend on the
application. For example, a user wanting to profile the Python
implementation only, should use the gof.PerformLinker (or "py" for
short). On the other hand, a user wanting to profile his graph using C
implementations wherever possible should use the ``gof.OpWiseCLinker``
(or "c|py"). For testing the speed of your code we would recommend
using the ``fast_run`` optimizer and the ``gof.OpWiseCLinker`` linker.
Compiling your Graph with ProfileMode
-------------------------------------
Once the ProfileMode instance is created, simply compile your graph as you
would normally, by specifying the mode parameter.
>>> v1, v2 = T.vectors(2)
>>> o = v1 + v2
>>> f = theano.function([v1,v2],[o], mode=profmode)
Retrieving Timing Information
-----------------------------
Once your graph is compiled, simply run the program or operation you wish to
profile, then call ``profmode.print_summary()``. This will provide you with
the desired timing information, indicating where your graph is spending most
of its time. This is best shown through an example. Let's use our logistic
regression example.
Compiling the module with ``ProfileMode`` and calling ``profmode.print_summary()``
generates the following output:
.. code-block:: python
"""
ProfileMode.print_summary()
---------------------------
local_time 0.0749197006226 (Time spent running thunks)
Apply-wise summary: <fraction of local_time spent at this position> (<Apply position>, <Apply Op name>)
0.069 15 _dot22
0.064 1 _dot22
0.053 0 InplaceDimShuffle{x,0}
0.049 2 InplaceDimShuffle{1,0}
0.049 10 mul
0.049 6 Elemwise{ScalarSigmoid{output_types_preference=<theano.scalar.basic.transfer_type object at 0x171e650>}}[(0, 0)]
0.049 3 InplaceDimShuffle{x}
0.049 4 InplaceDimShuffle{x,x}
0.048 14 Sum{0}
0.047 7 sub
0.046 17 mul
0.045 9 sqr
0.045 8 Elemwise{sub}
0.045 16 Sum
0.044 18 mul
... (remaining 6 Apply instances account for 0.25 of the runtime)
Op-wise summary: <fraction of local_time spent on this kind of Op> <Op name>
0.139 * mul
0.134 * _dot22
0.092 * sub
0.085 * Elemwise{Sub{output_types_preference=<theano.scalar.basic.transfer_type object at 0x1779f10>}}[(0, 0)]
0.053 * InplaceDimShuffle{x,0}
0.049 * InplaceDimShuffle{1,0}
0.049 * Elemwise{ScalarSigmoid{output_types_preference=<theano.scalar.basic.transfer_type object at 0x171e650>}}[(0, 0)]
0.049 * InplaceDimShuffle{x}
0.049 * InplaceDimShuffle{x,x}
0.048 * Sum{0}
0.045 * sqr
0.045 * Sum
0.043 * Sum{1}
0.042 * Elemwise{Mul{output_types_preference=<theano.scalar.basic.transfer_type object at 0x17a0f50>}}[(0, 1)]
0.041 * Elemwise{Add{output_types_preference=<theano.scalar.basic.transfer_type object at 0x1736a50>}}[(0, 0)]
0.039 * Elemwise{Second{output_types_preference=<theano.scalar.basic.transfer_type object at 0x1736d90>}}[(0, 1)]
... (remaining 0 Ops account for 0.00 of the runtime)
(*) Op is running a c implementation
"""
This output has two components. In the first section called
*Apply-wise summary*, timing information is provided for the worst
offending ``Apply`` nodes. This corresponds to individual op applications
within your graph which took longest to execute (so if you use
``dot`` twice, you will see two entries there). In the second portion,
the *Op-wise summary*, the execution time of all ``Apply`` nodes executing
the same op are grouped together and the total execution time per op
is shown (so if you use ``dot`` twice, you will see only one entry
there corresponding to the sum of the time spent in each of them).
Finally, notice that the ``ProfileMode`` also shows which ops were running a C
implementation.
For more detail, see :ref:`ProfileMode<profilemode>` in the library.