From 4a64cdb37c8378dc3db95533ee94b421520aff3d Mon Sep 17 00:00:00 2001 From: Matko Bosnjak Date: Tue, 12 Apr 2016 22:22:40 +0100 Subject: [PATCH] Typo fix --- optimization-2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/optimization-2.md b/optimization-2.md index 13ecb414..c08fdc91 100644 --- a/optimization-2.md +++ b/optimization-2.md @@ -155,7 +155,7 @@ $$ = \left( 1 - \sigma(x) \right) \sigma(x) $$ -As we see, the gradient turns out to simplify and becomes surprisingly simple. For example, the sigmoid expression receives the input 1.0 and computes the ouput 0.73 during the forward pass. The derivation above shows that the *local* gradient would simply be (1 - 0.73) * 0.73 ~= 0.2, as the circuit computed before (see the image above), except this way it would be done with a single, simple and efficient expression (and with less numerical issues). Therefore, in any real practical application it would be very useful to group these operations into a single gate. Lets see the backprop for this neuron in code: +As we see, the gradient turns out to simplify and becomes surprisingly simple. For example, the sigmoid expression receives the input 1.0 and computes the output 0.73 during the forward pass. The derivation above shows that the *local* gradient would simply be (1 - 0.73) * 0.73 ~= 0.2, as the circuit computed before (see the image above), except this way it would be done with a single, simple and efficient expression (and with less numerical issues). Therefore, in any real practical application it would be very useful to group these operations into a single gate. Lets see the backprop for this neuron in code: ```python w = [2,-3,-3] # assume some random weights and data