Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
1 vote
0 answers
22 views

Why is the threshold term incorporated into the weight vector in linear classifiers?

In the context of linear classifiers, such as the perceptron or logistic regression, I understand that the decision boundary is defined by a linear combination of input features and weights, plus a ...
Narges Ghanbari's user avatar
2 votes
1 answer
25 views

Using threshold and bias at the same time in NN

I'm using NN with sigmoid binary activation. And for threshold I using 0,5. So if output < 0,5, it classified as 0. And if output >= 0,5 it classified as 1. But I'm using bias too at the same ...
Arias231's user avatar
2 votes
1 answer
412 views

Derivative error with respect to bias in binary cross entropy

I will do research using NN with 1 hidden layer. To calculate loss using binary cross entropy and for the activation function using sigmoid. I found the derivative formula from Sadowski, 2016 (link: ...
Andryan's user avatar
  • 47
0 votes
0 answers
100 views

Training on biased dataset, when the bias is quantitively known

I have a machine learning model (A neural network here) which minimizes MSE loss. The model should fallow an unbiased distribution. Nevertheless, the training set is biased, but fortunately by a known ...
Daniel Wiczew's user avatar
1 vote
1 answer
193 views

How does SGD training error decrease in subsequent epochs with non-iid samples when it is recommended that samples in subsequent epochs be iid?

I have been reading the Deep Learning book by Ian Goodfellow and on pg. 277, they mention: It is also crucial that the minibatches be selected randomly. Computing an unbiased estimate of the expected ...
Kunj Mehta's user avatar
5 votes
3 answers
186 views

Confusion about the training procedure while using transfer learning

Suppose that we have a trained CNN, there is 5 conv layers and 3 fully connected layers. We take the first 5 conv layers as it is (with their parameter settings: like kernel size, activation etc) and ...
Mas A's user avatar
  • 243
3 votes
1 answer
195 views

Why is the bias neuron in neural network always initialised to 1?

I'm just starting with neural networks wherein this towards data science article mentions that bias neuron is always initialized to 1. My question is why is the bias neuron in Neural networks is ...
user3046211's user avatar
0 votes
0 answers
25 views

How to explain huge bias on unseen data?

I've trained a CNN to do a binary classification based on 2D radar spectra. I've tried different dataset sizes (reaching 200.000 samples per class) and always make sure that the classes are ...
user132792's user avatar
2 votes
2 answers
3k views

Do Neural Networks suffer from high bias or high variance

For most ML models we say they suffer from high bias or high variance, then we correct for it. However, in DL do neural networks suffer from the same concept in the sense that they initially have high ...
Jack Armstrong's user avatar
12 votes
3 answers
7k views

Batch normalization and the need for bias in neural networks

I've read that batch normalization eliminates the need for a bias vector in neural networks, since it introduces a shift parameter that functions similarly as a bias. As far as I'm aware though, a ...
Bas Krahmer's user avatar
1 vote
0 answers
321 views

can a model outperform on test data then on training data

I am training Deep Neural Networks on a classification problem. N while choosing the no of epochs, I get below graph : So my question is that this case neither comes in high bias and nor in high ...
Onki's user avatar
  • 255
0 votes
0 answers
22 views

Statistical proof to exclude less frequent records from data during analysis

I am working on reviewing the results of an automated task. For ex, To give you an idea, the data that I have to review looks like as shown below Let's say from the downstream analytics perspective, ...
The Great's user avatar
  • 3,342
1 vote
0 answers
38 views

What does the famous bias-variance figure actually represent?

Below figure is generally used to explain bias-variance tradeoff. But something which is not clear and not explained anywhere is: What does the dots represent ? Do they represent: 1. predictions on ...
mach's user avatar
  • 1,825
0 votes
0 answers
807 views

How to predict new data in Matlab neural network regression when output vs. target is not diagonal

In the ideal case, we expect the output vs. target plot to be diagonal. In Matlab, using the neural network regression app, the plot comes with the non-diagonal best fit (i.e., output=m x target+...
Md. Ferdous Wahid's user avatar
2 votes
2 answers
350 views

Network learns bias during the first iterations if parameter initialization is not good

Andrej Karpathy in his blog post "A Recipe for Training Neural Networks" states that initialization is important for convergence. I get that but when he says: init well. Initialize the final layer ...
Amir Hossein F's user avatar
2 votes
1 answer
3k views

Maximization bias in reinforcement learning

In Richard S. Sutton and Andrew G. Barto's book on reinforcement learning on page 156 it says: Maximization bias occurs when estimate the value function while taking max on it (that is what Q ...
yi li's user avatar
  • 131
2 votes
0 answers
151 views

Neural Network Bias updating during BackProp [closed]

Can it make sense to say that when I update the weights in a positive way in a neural network also the bias is updated in a positive way and that therefore the trend of weight and bias for the ...
Rubio95R's user avatar
  • 191
1 vote
1 answer
86 views

Free lunch Autoencoder? Data dimensionality reduction

I came across Autoencoders, and saw one example were no activation is used - it's simply a linear transform to lower dimension and then back up $$ B(Ax+a)+b=x$$ with $x\in \mathbb{R}^d$ and $A\in \...
Nic's user avatar
  • 113
2 votes
0 answers
644 views

Time-series predictions constant offset from reference values

I am currently trying to solve a regression problem using neural networks. I want to detect movement patterns in images over time (video) and output a continuous value for different medical indices. ...
Unknown User's user avatar
10 votes
2 answers
2k views

Is low bias in a sample a synonym for high variance?

Is the following true? low bias = high variance high bias = low variance I understand high and low bias but then how is variance different? Or are the above synonyms?
alwayscurious's user avatar
4 votes
1 answer
971 views

Multilayer perceptron: Hyperparameters vs Parameters and Cross validation (nested or not)

I'm a bit confused about the k-fold cross validation (inner and external) done for the model performance evaluation. I've read that when you are trying to validate your model, you need to do it ...
Nikaido's user avatar
  • 812
4 votes
1 answer
1k views

Relationship between bias, variance, and regularization

In Goodfellow et al.'s Deep Learning, the authors write on page 222: "... the model family being trained either (1) excluded the true data-generating process - corresponding to underfitting and ...
Vivek Subramanian's user avatar
2 votes
1 answer
1k views

Can't update bias using gradient descent, because derivative of loss function with respect to bias has different dimensions

I want to update a bias in my Neural Network using the gradient descent optimization algorithm. Unfortunately, the bias has different dimensions than the derivative of the loss function with respect ...
mzmyslowski's user avatar
2 votes
0 answers
192 views

Visible Layer Bias in Restricted Boltzmann Machines

In Neural Networks the bias term of the hidden units can be considered a threshold for the node to fire. This is how neurons basically work in the brain as well. In RBMs, also a bias for the visible ...
Chris's user avatar
  • 525
3 votes
0 answers
502 views

Does batch normalization bias the gradient?

With a training set of size $N$, let $L_i(\theta)$ denote the loss of the $i^\text{th}$ training example when the model has parameters $\theta$. Then the training loss $L(\theta)$ is equal to the mean ...
Charles Staats's user avatar
30 votes
7 answers
42k views

Deep learning : How do I know which variables are important?

In terms of neural network lingo (y = Weight * x + bias) how would I know which variables are more important than others? I have a neural network with 10 inputs, 1 hidden layer with 20 nodes, and 1 ...
user1367204's user avatar
  • 1,001
2 votes
3 answers
995 views

Random initialization/order in neural network — bias or variance?

I'm puzzled about how to describe differences occurring between neural networks trained on the same data and with the same configuration. They differ only in the initial weights (different seeds used ...
johannes's user avatar
4 votes
1 answer
3k views

Autoencoder with tied weights: bias?

For some unsupervised learning problem, I need to train an autoencoder, so that I only have to store the encoder afterwards. However, I am not sure on how and if the bias weights can be tied. To make ...
Cantfindname's user avatar
1 vote
1 answer
271 views

Large Neural Networks have zero bias in the bias-variance tradeoff? [closed]

We know that neural networks are universal approximators. That means that they can approximate ANY function (with a large enough number of hidden neurons). Error can be broken down into two ...
user avatar
1 vote
1 answer
7k views

How do I incorporate the biases in my feed-forward neural network

I'm trying to implement a FFNN. I'm doing this as an excercise to understand how biases play a role in the classification. I trained a NN using a package in R with the inputs being 1..100 and the ...
tolgap's user avatar
  • 161
4 votes
1 answer
10k views

Are bias weights essential in the output layer, if one wants a universal function approximator (or non-linearly separable problem solver)?

I am learning about ELM (Extreme Learning Machines) and it appears to have no bias weights at the output layer. Besides that, just to clarify, the kind of ELM I am refering to are topologically no ...
dawid's user avatar
  • 255
4 votes
1 answer
2k views

What do the bias units represent in a restricted boltzmann machine?

I'm reading up on RBMs and this is not obvious to me. I'm imagining RBMs being used for something like the Netflix Prize (since that was one of the papers I read on it). So you have a bunch of ...
phone's user avatar
  • 43