What is the derivative of ReLU?

What is the derivative of ReLU?

Published: | Last updated:


Note: You can run (and even edit) this code in the browser thanks to PyScript! If you get an error, refresh the page and run blocks in order.

Short Summary

The rectified linear unit (ReLU) is defined as:

$$ f(x)=\text{max}(0,x) $$

The derivative of ReLU is:

$$ f'(x) = \begin{cases} 1, & \text{if}\ x>0 \\ 0, & \text{otherwise} \end{cases} $$

If you want a more complete explanation, then let's read on!

In neural networks, a now commonly used activation function is the rectified linear unit, or as commonly abbreviated, ReLU. The ReLU is defined as,

$$ f(x)=\text{max}(0,x) $$

Derivative of ReLU

Now just looking at the equation \( f(x)=\text{max}(0,x) \), it was not clear to me what the derivative is, i.e. what is the derivative of the max() function?

However, the derivative becomes clearer if we graph things out.

Let's create a range of x values, between -4 to +4, incremented by 1. Then get all the responses after passing in \( x \) through the ReLu \(f(\cdot) \).


Now we will plot \( x \) along the a-axis, and the responses \( y = f(x) \) along the y-axis to get a sense of how the ReLU function looks.


A "derivative" is just the slope of the graph at certain point. So what is the slope of the graph at the point x=2?

We can visually look at the segment where x=2 and see that the slope is 1. In fact, this holds everywhere >0. The slope is 1.

What is the slope of the graph when x=-2? Visually, we see that there is no slope (change in Y), so the slope is 0. In fact, for all negative numbers, the slope is 0.

Now what about x=0? Technically this is undefined. When x=0, there are many possible lines (slopes) we could fit through it. So what do we do here?

Basically we just choose a slope to use when x=0. A common choice is when x=0, the derivative will be 0. It could be some other value, but most implementations use this (this has a nice property that it encourages many values to be 0 i.e., sparsity in the feature map).

So we can adopt this definition and then define the derivative of ReLU as the following:


Let's graph the plot again. But this time, we'll show both the ReLU function and its derivative. We'll also compute more finer points to better approximate the function.


We can see that the ReLU function and its derivative for various values of x, where the derivative is the slope of the ReLU. We can also see that the derivative is not well defined when x=0. So we define the derivative when x = 0 as 0.