Newest 'relu' Questions

1 vote

0 answers

25 views

Maxout activation function vs ReLU (Number of weights)

From what I understood, Maxout function works quite differently from ReLU. ReLU function is max(0, x), so the input x is (W_T x + b) Maxout function has many Ws, and it is max(W1_T x + b1, W2_T x + b2,...

kite

11

asked Sep 29 at 4:23

4 votes

1 answer

57 views

Neural Network with 1 hidden layer with ReLU modeling capabilities

I've been doing Andrew Ng's Machine Learning Specialization on Coursera, there's a lab in which he uses 1 hidden layer with ReLU to show how it enables models to stitch together linear segments to ...

loot

43

asked Feb 23 at 13:52

0 votes

0 answers

67 views

How can we use ReLU activation in a Normalizing Flow model? More generally, is differentiable almost everywhere enough for a normalizing flow?

In some works, e.g., enter link description here normalizing flow models are considered with ReLU activation. For example, using a planar flow, $f = f_n \circ ... \circ f_1$, and each $f_i$ has the ...

travelingbones

473

asked Feb 19 at 17:34

1 vote

0 answers

34 views

When or why would we want to use one of the smooth approximation functions for the ReLU? [duplicate]

Learning about the ReLU, I keep finding variants or approximations that attempt to smooth out the function (eg. squarreplus). Why (and when) is smoothing desirable?

blueberryfields

129

asked Jul 26, 2023 at 14:58

2 votes

1 answer

86 views

ReLU Variance in Practice Disagrees with Theory

Let $X$ be a normally distributed random variable with $\mu=0$ and $\sigma=1$. Theory tells us that $\mathbb{V}[\text{ReLU}(X)] = \frac{1}{2}\mathbb{V}[X] = \frac{1}{2}$. Thus, we should have that $\...

krc

123

asked Jul 19, 2023 at 23:43

1 vote

1 answer

47 views

How are groups created in maxout units when dividing the set of inputs 𝑧 into groups of 𝑘 values?

I don't get $G^(i)$the set of indices into the inputs for group $i$, $\{(i −1)k+ 1, \ldots , ik\}$ when creating a maxout units/function, these thing that outputs the maximum element of groups: $$g(z)...

Revolucion for Monica

863

asked Jul 3, 2023 at 16:36

1 vote

0 answers

168 views

Is ReLU activation function unsuitable for input layer if the input data has high inter-example correlation?

After making a neural network using ReLU as the activation function throughout, I had a look at the input layer activations and noticed that about 10% of the neurons are dead on initialization (never ...

Museful

395

asked May 28, 2023 at 15:16

0 votes

0 answers

200 views

With 2 ReLU activated layers, if the 2nd layer has all weights initialized to < 0, the network is always stillborn?

I've built on my own neural network library with keras-like syntax. I noticed that when using 2 consecutive ReLU activated layers, and the 2nd of those layers has its weights initialized to negative ...

Tim de Jong

1

asked Oct 18, 2022 at 15:12

0 votes

1 answer

115 views

Autoencoder accuracy with standardized data

I want to make an autoencoder over the data that I originally standardized (that is, the data is now normally distributed ~ N(0,1)). The activation functions I use in the linear autoencoder is ReLu. ...

josf

51

asked Sep 22, 2022 at 12:35

0 votes

1 answer

526 views

Derivate of Neural Network respect to input

I have a neural network like this $x=\text{input}$ $z_1=W_{1x}\cdot x+b_1$ $h_1=\text{relu}(z_1)$ $z_2=W_2\cdot h_1+W_{2x}\cdot x+b_2$ $h_2=\text{relu}(z_2)$ $y=W_3\cdot h_2+W_{3x}\cdot x+b_3$ input ...

Gaweiliex

3

asked Jul 13, 2022 at 11:29

2 votes

0 answers

31 views

Can predicting negative regression values increase the chance of dying ReLUs?

Since ReLUs have zero gradient around negative values, we know that if a neuron outputs a negative value, the corresponding ReLU activation will cause it to die. Therefore, if I use a neural network ...

desert_ranger

654

asked Jul 11, 2022 at 18:08

2 votes

0 answers

24 views

How is ReLU used in neural networks if it doesn't squish the weighted sum into the interval of (0, 1)? [duplicate]

So as far as I understand from watching 3Blue1Brown's video on neural networks, all neurons operate on numbers ranging from 0 to 1. Since a weighted sum goes larger than that, a sigmoid function is ...

Yeepsta

21

asked Jul 7, 2022 at 12:38

1 vote

1 answer

39 views

If all computed features ( or components ) in Neural Network nodes are positive numbers , does using Relu meaningful?

I am trying to understand the following issue. The reason we use activation functions such as sigmoid,tanh or relu in neural networks is to obtain a nonlinear combination of input features ( x's). My ...

levitatmas

325

asked Apr 15, 2022 at 20:24

10 votes

5 answers

11k views

Can a neural network work with negative and zero inputs?

As the title suggests, I have several features which have values of either -1, 0 or 1. If I feed this data into a neural network where I use ReLu as the activation ...

spectre

350

asked Dec 22, 2021 at 11:09

1 vote

1 answer

1k views

What are the benefits of using SoftPlus over ReLU activation functions?

All the discussions online seem to be centered around the benefits of ReLU activations over SoftPlus. The general consensus seems to be that the use of SoftPlus is discouraged since the computation of ...

InternetUser0947

11

asked Jul 17, 2021 at 9:16

237 votes

9 answers

293k views

What are the advantages of ReLU over sigmoid function in deep neural networks?

The state of the art of non-linearity is to use rectified linear units (ReLU) instead of sigmoid function in deep neural network. What are the advantages? I know that training a network when ReLU is ...

RockTheStar

13.1k

asked Dec 2, 2014 at 2:13

Stack Exchange Network

Questions tagged [relu]

Maxout activation function vs ReLU (Number of weights)

Neural Network with 1 hidden layer with ReLU modeling capabilities

How can we use ReLU activation in a Normalizing Flow model? More generally, is differentiable almost everywhere enough for a normalizing flow?

When or why would we want to use one of the smooth approximation functions for the ReLU? [duplicate]

ReLU Variance in Practice Disagrees with Theory

How are groups created in maxout units when dividing the set of inputs 𝑧 into groups of 𝑘 values?

Is ReLU activation function unsuitable for input layer if the input data has high inter-example correlation?

With 2 ReLU activated layers, if the 2nd layer has all weights initialized to < 0, the network is always stillborn?

Autoencoder accuracy with standardized data

Derivate of Neural Network respect to input

Can predicting negative regression values increase the chance of dying ReLUs?

How is ReLU used in neural networks if it doesn't squish the weighted sum into the interval of (0, 1)? [duplicate]

If all computed features ( or components ) in Neural Network nodes are positive numbers , does using Relu meaningful?

Can a neural network work with negative and zero inputs?

What are the benefits of using SoftPlus over ReLU activation functions?

What are the advantages of ReLU over sigmoid function in deep neural networks?

Hot Network Questions

Questions tagged [relu]

Related Tags