Skip to main content

Questions tagged [relu]

The tag has no usage guidance.

Filter by
Sorted by
Tagged with
1 vote
0 answers
25 views

Maxout activation function vs ReLU (Number of weights)

From what I understood, Maxout function works quite differently from ReLU. ReLU function is max(0, x), so the input x is (W_T x + b) Maxout function has many Ws, and it is max(W1_T x + b1, W2_T x + b2,...
kite's user avatar
  • 11
4 votes
1 answer
57 views

Neural Network with 1 hidden layer with ReLU modeling capabilities

I've been doing Andrew Ng's Machine Learning Specialization on Coursera, there's a lab in which he uses 1 hidden layer with ReLU to show how it enables models to stitch together linear segments to ...
loot's user avatar
  • 43
0 votes
0 answers
67 views

How can we use ReLU activation in a Normalizing Flow model? More generally, is differentiable almost everywhere enough for a normalizing flow?

In some works, e.g., enter link description here normalizing flow models are considered with ReLU activation. For example, using a planar flow, $f = f_n \circ ... \circ f_1$, and each $f_i$ has the ...
travelingbones's user avatar
1 vote
0 answers
34 views

When or why would we want to use one of the smooth approximation functions for the ReLU? [duplicate]

Learning about the ReLU, I keep finding variants or approximations that attempt to smooth out the function (eg. squarreplus). Why (and when) is smoothing desirable?
blueberryfields's user avatar
2 votes
1 answer
86 views

ReLU Variance in Practice Disagrees with Theory

Let $X$ be a normally distributed random variable with $\mu=0$ and $\sigma=1$. Theory tells us that $\mathbb{V}[\text{ReLU}(X)] = \frac{1}{2}\mathbb{V}[X] = \frac{1}{2}$. Thus, we should have that $\...
krc's user avatar
  • 123
1 vote
1 answer
47 views

How are groups created in maxout units when dividing the set of inputs 𝑧 into groups of 𝑘 values?

I don't get $G^(i)$the set of indices into the inputs for group $i$, $\{(i −1)k+ 1, \ldots , ik\}$ when creating a maxout units/function, these thing that outputs the maximum element of groups: $$g(z)...
Revolucion for Monica's user avatar
1 vote
0 answers
168 views

Is ReLU activation function unsuitable for input layer if the input data has high inter-example correlation?

After making a neural network using ReLU as the activation function throughout, I had a look at the input layer activations and noticed that about 10% of the neurons are dead on initialization (never ...
Museful's user avatar
  • 395
0 votes
0 answers
200 views

With 2 ReLU activated layers, if the 2nd layer has all weights initialized to < 0, the network is always stillborn?

I've built on my own neural network library with keras-like syntax. I noticed that when using 2 consecutive ReLU activated layers, and the 2nd of those layers has its weights initialized to negative ...
Tim de Jong's user avatar
0 votes
1 answer
115 views

Autoencoder accuracy with standardized data

I want to make an autoencoder over the data that I originally standardized (that is, the data is now normally distributed ~ N(0,1)). The activation functions I use in the linear autoencoder is ReLu. ...
josf's user avatar
  • 51
0 votes
1 answer
526 views

Derivate of Neural Network respect to input

I have a neural network like this $x=\text{input}$ $z_1=W_{1x}\cdot x+b_1$ $h_1=\text{relu}(z_1)$ $z_2=W_2\cdot h_1+W_{2x}\cdot x+b_2$ $h_2=\text{relu}(z_2)$ $y=W_3\cdot h_2+W_{3x}\cdot x+b_3$ input ...
Gaweiliex's user avatar
2 votes
0 answers
31 views

Can predicting negative regression values increase the chance of dying ReLUs?

Since ReLUs have zero gradient around negative values, we know that if a neuron outputs a negative value, the corresponding ReLU activation will cause it to die. Therefore, if I use a neural network ...
desert_ranger's user avatar
2 votes
0 answers
24 views

How is ReLU used in neural networks if it doesn't squish the weighted sum into the interval of (0, 1)? [duplicate]

So as far as I understand from watching 3Blue1Brown's video on neural networks, all neurons operate on numbers ranging from 0 to 1. Since a weighted sum goes larger than that, a sigmoid function is ...
Yeepsta's user avatar
  • 21
1 vote
1 answer
39 views

If all computed features ( or components ) in Neural Network nodes are positive numbers , does using Relu meaningful?

I am trying to understand the following issue. The reason we use activation functions such as sigmoid,tanh or relu in neural networks is to obtain a nonlinear combination of input features ( x's). My ...
levitatmas's user avatar
10 votes
5 answers
11k views

Can a neural network work with negative and zero inputs?

As the title suggests, I have several features which have values of either -1, 0 or 1. If I feed this data into a neural network where I use ReLu as the activation ...
spectre's user avatar
  • 350
1 vote
1 answer
1k views

What are the benefits of using SoftPlus over ReLU activation functions?

All the discussions online seem to be centered around the benefits of ReLU activations over SoftPlus. The general consensus seems to be that the use of SoftPlus is discouraged since the computation of ...
InternetUser0947's user avatar
237 votes
9 answers
293k views

What are the advantages of ReLU over sigmoid function in deep neural networks?

The state of the art of non-linearity is to use rectified linear units (ReLU) instead of sigmoid function in deep neural network. What are the advantages? I know that training a network when ReLU is ...
RockTheStar's user avatar
  • 13.1k