How to Compute the Derivative of a Sigmoid Function
A fully worked example
Published: | Last updated:
This is the sigmoid function:
$$ s(x) = \frac{1}{1 + e^{-x}} $$And we can write this in Python:
Let's plot a range of inputs \( x \) between -10 and 10.
Nice okay so we can see how the sigmoid function is bounded between 0 and 1.
Alright, now let's put on our calculus hats...
Here's how you compute the derivative of a sigmoid function
First, let's rewrite the original equation to make it easier to work with.
$$ s(x) = \frac{1}{1+e^{-x}} = (1)(1+e^{-x})^{-1} = (1+e^{-x})^{-1} $$Now we take the derivative:
$$ \begin{aligned} \frac{d}{dx}s(x) &= \frac{d}{dx}((1+e^{-x})^{-1}) \\ \frac{d}{dx}s(x) &= -1((1+e^{-x})^{(-1-1)}) \frac{d}{dx}(1+ e^{-x}) \\ \frac{d}{dx}s(x) &= -1((1+e^{-x})^{(-2)}) (\frac{d}{dx}(1) + \frac{d}{dx}(e^{-x})) \\ \frac{d}{dx}s(x) &= -1((1+e^{-x})^{(-2)}) (0 + e^{-x}(\frac{d}{dx}(-x))) \\ \frac{d}{dx}s(x) &= -1((1+e^{-x})^{(-2)}) (e^{-x})(-1) \\ \end{aligned} $$Nice! We computed the derivative of a sigmoid! Okay, let's simplify a bit.
$$ \begin{aligned} \frac{d}{dx}s(x) &= ((1+e^{-x})^{(-2)}) (e^{-x}) \\ \frac{d}{dx}s(x) &= \frac{1}{(1+e^{-x})^{2}} (e^{-x}) \\ \frac{d}{dx}s(x) &= \frac{(e^{-x})}{(1+e^{-x})^{2}} \\ \end{aligned} $$Okay! That looks pretty good to me. Let's write Python code for the derivative of the sigmoid we computed.
Let's plot the sigmoid and the derivative we computed by hand to see if it looks reasonable.
Nice! Looks like a derivative, with a peak at \( x=0 \) and goes to zero as \( x \) goes to \( \pm \infty \).
But wait... there's more!
If you've been reading some of the neural net
literature, you've probably come across text that says the derivative of a sigmoid s(x) is
equal to s'(x) = s(x)(1-s(x)).
Note that \( \frac{d}{dx}s(x) \) and s'(x) are the same thing, just different
notation.
Also note that Andrew Ng writes, f'(z) = f(z)(1 - f(z)), where f(z) is the sigmoid
function, which is the exact same thing that we are doing here.
So your next question should be, is our derivative we calculated earlier equivalent to
s'(x) = s(x)(1-s(x))?
So, using Andrew Ng's notation...
Does the derivative of a sigmoid f(z) equal f(z)(1-f(z))?
Swapping with our notation, we can ask the equivalent question:
Does the derivative of a sigmoid s(x) equal s(x)(1-s(x))?
Okay we left off with...
$$ \frac{d}{dx}s(x) = \frac{(e^{-x})}{(1+e^{-x})^{2}}$$This part is not intuitive... but let's add and subtract a 1 to the numerator (this does not change the equation).
$$ \begin{align} \frac{d}{dx}s(x) &= \frac{(e^{-x} + 1 -1)}{(1+e^{-x})^{2}} \\ \frac{d}{dx}s(x) &= \frac{(1 + e^{-x} -1)}{(1+e^{-x})^{2}} \\ \frac{d}{dx}s(x) &= \frac{(1 + e^{-x})}{(1+e^{-x})^{2}} - \frac{1}{(1+e^{-x})^{2}} \\ \frac{d}{dx}s(x) &= \frac{1}{(1+e^{-x})} - \frac{1}{(1+e^{-x})^{2}} \\ \end{align} $$ $$ \begin{align} &= \frac{1}{(1+e^{-x})} - (\frac{1}{(1+e^{-x})}) (\frac{1}{(1+e^{-x})}) \\ &= \frac{1}{(1+e^{-x})} (1 - \frac{1}{(1+e^{-x})}) \end{align} $$Hmmm.... look at that! There's actually two sigmoid functions there... Recall that the sigmoid function is, \( s(x) = \frac{1}{1 + e^{-x}} \). Let's replace them with \( s(x) \).
$$ s'(x) = \frac{d}{dx}s(x) = s(x) (1 - s(x)) $$Just like Prof Ng said... :)
And for a sanity check, do they both show the same function?
Yes! They perfectly match!
So there you go.
Hopefully this satisfies your mathematical curiosity of why the derivative of a sigmoid
s(x) is equal to s'(x) = s(x)(1-s(x)).