Questions tagged [training-error]
The training-error tag has no usage guidance.
73 questions
4
votes
1
answer
68
views
How do machine learning topics fit into a traditional undergraduate statistics course on estimation?
I'm recently teaching an undergraduate introduction to statistics course, but as required by program director, need to add some machine learning materials to it. I'm wondering what is the appropriate ...
0
votes
0
answers
25
views
Model training loss always converge to 1.35
I'm trying to create a multi-class classification model using RNNs. The input data has a sequence length of 90 and consists of 5 features, normalized to the [0,1] range.
Here's the network ...
0
votes
0
answers
5
views
Custom Model For Approximating Sin Function Using Backpropagation [duplicate]
I have very simple custom model which I am doing experiment with, I have model which takes one input and produce one output. the model equation is: y = sin(ax + b). (a) and (b) are single learnable ...
2
votes
1
answer
44
views
Training loss reach to zero, then suddenly increases, then decreases to zero
I get the following loss behavior when training multilayer perceptron with mean squared error loss on some synthetic data using default Adam with default learning. (I am working on 1 demention data)
I ...
1
vote
0
answers
13
views
Reference Request: Rate at which Training Error goes to 0
Looking for references on which training error goes to 0. As an example, suppose that we have a linear model $y = X \beta_0 + \epsilon,$ where $\beta_0$ is bounded and each row of $X$ is generated ...
1
vote
1
answer
47
views
Why does the performance on the training set go down as the number of samples increases?
To my knowledge there are two types of learning curves, those that show the progression of the performance as the amount of epochs progresses, and those that show the performance progression as the ...
1
vote
0
answers
136
views
What is a good way to understand the difference between training RSS/MSE and test RSS for polynomial models?
I am trying to understand the difference for training RSS and test RSS for smoothing splines e.g.
$$\hat{g}=\arg \min_g \left(\Sigma_{i=1}^{n}(y_{i} - g(x_i))^2 + \lambda \int [g^{(m)}(x)]^{2}dx \...
1
vote
0
answers
60
views
Peaks in error during training - Regression problem with DL model (LSTM)
During training I get unusual behavior of my model. Peaks show up in the both validation and training errors that I could not understand. I use MSE as a loss 1 and similar behavior appears in other ...
3
votes
0
answers
378
views
Shuffling data significantly decrease the performance of linear regression
I'm trying to build a simple linear regression model $y=ax+b$ using pytorch, where $y$ is the number of cells increased on Day $n$ and $x$ is the number of cells ...
4
votes
1
answer
2k
views
Is it possible to have a higher train error than a test error in machine learning?
Usually it is called over-fitting when the test error is higher than the training error. Does that imply that it is called under-fitting when the training error is higher than the test error? Also ...
0
votes
0
answers
77
views
PRESS statistic and k-fold cross-validation
How is the PRESS statistic calculated in a k-fold cross-validation? I know how it is done in the leave-one-out scenario. Is it still summed over all training samples, just that there are now k-many ...
1
vote
0
answers
150
views
How to handle problem of different random seeds giving drastically different test scores in machine learning model?
For a rigorous empirical analysis, I am training a model with three different seeds - 0, 1 and 2. In each case, I found that the model obtained through early stopping (lowest validation loss) had an ...
1
vote
0
answers
235
views
Training loss, validation loss and WER decrease, then increase [duplicate]
I am trying to use Hugginface Datasets for speech recognition using transformers using this tutorial, epochs=30, steps=400, train_batch_size=16. Training loss, validation loss and WER decrease, and ...
4
votes
0
answers
161
views
Initial jump in validation loss, how to explain?
A lot of times I see the following behavior of the training and validation loss during training of deep nets:
At the very beginning (first couple of batches), the training loss descends, but the ...
2
votes
0
answers
81
views
Strange behaviour of training accuracy and loss function
Firstly, I want to mention that I am not looking for suggestions to improve the training accuracy of my NN. The only purpose of this question is to know what might be causing the peculiar behaviour ...
1
vote
1
answer
186
views
How to find the optimum when using regularization?
Using regularization increases the training error and the validation possibly as well.
How do I find the optimum? Still just the optimum of the bias² and variance, like here:
Source: https://dziganto....
0
votes
0
answers
50
views
How are parameters in graphical models learnt?
This is a request of a good reference. I wanted to have a better understanding of graphical models and I am reading "Pattern recognition and machine learning" of Bishop. chap. 8 (Graphical ...
0
votes
0
answers
23
views
Explaining and Addressing the Bias-Variance Tradeoff
Let's say we have a model whose training error is 7% and the validation error is 10%. What does this mean in terms of the bias-variance tradeoff? I know that high validation error and low training ...
1
vote
2
answers
9k
views
Training loss decreases, then suddenly increases, then decreases lower than the first time
I get the following loss behavior when training multilayer perceptron with mean squared error loss on some synthetic data using Adam with learning rate 1e-1.
As far as I can say from reading, for ...
0
votes
1
answer
624
views
Why is the expected value of the optimism of the training error for a linear model equal to $\frac{2}{ n}d\sigma ^{2}$?
In the book Elements of statistical learning 2 on page 229, they express the expected optimism of the training error as:
$$
\omega=\frac{2}{N} \sum_{i=1}^{N} \operatorname{Cov}\left(\hat{y}_{i}, y_{i}\...
0
votes
0
answers
101
views
Slow convergence when the error is small when solving a convex optimization problem
I implemented an algorithm to solve trace-norm regularized least square problem. I noticed something that when the error becomes small, the convergence becomes very slow. I "manually" fixed ...
1
vote
1
answer
665
views
Which is more important in model selection, smallest difference between Train MSE and Test MSE, or the lowest Test MSE?
For a dataset that is not too large, I am trying a couple of models for prediction. I get the following train and test MSE for them:
Model 1: Train MSE = 100, Test MSE = 104
Model 2: Train MSE = 30,...
2
votes
1
answer
518
views
Tensorflow loss and accuracy during training weird values [closed]
I am doing some testing with tensorflow, and I bumbed into a very weird behaviour.
Here is my code
...
0
votes
0
answers
369
views
Why does my training loss decrease with number of samples?
I'm training a convolutional U-Net-like structure for image restoration, with SSIM loss and ADAM.
I have ~1500 training samples and ~350 validation samples partitioned at random. My batches have just ...
0
votes
1
answer
156
views
Accuracy is 100% but model.predict is totaly wrong! what could be the problem? (Autoencoder NN)
Its an Autoencoder model that receive 4 different vectors { (1,0,0,0) , (0,1,0,0) , (0,0,1,0) , (0,0,0,1) }
The encoder transforms the vectors to vectors size 2, which get inside an "NLPN channel&...
6
votes
1
answer
4k
views
In linear regression, we have 0 training error if data dimension is high, but are there similar results for other supervised learning problems?
P.S. I just posted this question on MathOverflow, as I didn't seem to get an answer here.
Let's consider a supervised learning problem where $\{(x_1,y_1) \dots (x_n,y_n)\} \subset \mathbb{R}^p \times \...
5
votes
1
answer
152
views
Is overfitting an issue if all I care about is training error
I am working on a project where we perform non-response adjustment by weighting survey respondents by their probability of response. In order to do this, we need to estimate each respondents ...
2
votes
0
answers
30
views
How does the Loss of a Neural Network effect training
Suppose I have a neural network, say an input layer taking in vectors $i \in \mathbb{R}^d$, a hidden relu layer, and then a softmax output layer with a cross entropy loss (with no biases added for ...
2
votes
1
answer
248
views
Can I still use an overfitted model with high test accuracy?
Below is the training statistics output from training a Keras/TF model. You can see val_accuracy peaks at Epoch 4 with 0.6633. After that accuracy(train) continues to go up but val_accuracy becomes ...
1
vote
0
answers
111
views
Relation between test and train error with gradient descent iterates
My question is about establishing an inequality between population error and expected training error (i.e, expected training error < population error) for a model trained with gradient descent on a ...
3
votes
0
answers
91
views
In practice, do we distinguish between "in-sample" and "training" error?
In Elements of Statistical Learning, it distinguishes between "in-sample" and "training" error (Frankly, I found the chapter on errors to be very confusing, especially with how ...
3
votes
3
answers
6k
views
Does zero training error mean zero bias?
In machine learning, high bias prevents a model from properly fitting even the training set.
So, does a model have zero training error if and only if it has zero bias?
2
votes
2
answers
5k
views
Learning curve vs training (loss) curve?
In machine learning, there are two commonly used plots to identify overfitting.
One is the learning curve, which plots the training + test error (y-axis) over the training set size (x-axis).
The other ...
0
votes
0
answers
12
views
How to pull in-sample fitted models towards out-of-sample optimum during training?
A common problem in prediction problems is getting a model fit on in-sample data to predict out-of-sample data with decent accuracy and precision. Assuming that we break a part of the in-sample data ...
2
votes
1
answer
2k
views
Finding the training- and test-error for a glm-model
I have a data-set with approx. 200 000 observations and 10 predictors, with continuous target. I have divided this data into a training set and a test set (70%/30%).
I want to compare a glm-model ...
0
votes
2
answers
2k
views
Am I overfitting even though my model performs well on the test set?
I have a dataset with 1289 observations and around 2000 features. I split my dataset into a 70/30 training and test set. I use GridSearchCV from scikit-learn to perform 5 fold cross validation on the ...
1
vote
1
answer
1k
views
Validation loss decreasing faster than training loss
I have two different scenarios that I ran across and I can't seem to wrap my head around what caused them.
In this scenario, my validation loss, in orange, initially fell faster than my training loss,...
1
vote
1
answer
563
views
Building a neural network with two training paths in Keras [closed]
I am trying to build a NN in Keras with two different output paths where the first path informs the second. The first path passes its loss to the end of the second path, like so:
Pass through layer A ...
1
vote
1
answer
171
views
Is there a clear relationship between number of training examples and over/underfitting when you do not know the model complexity?
It seems that without knowing the model complexity, it is difficult to state for certain what is the relationship between the number of training examples and over/underfitting.
As a concrete example, ...
1
vote
0
answers
134
views
Handling look-ahead bias while training
A lot of real-life transactional/CRM applications are not careful in creating snapshots of the data (historical data) for the model to learn from.
Imagine a typical example of predicting the ACV ...
2
votes
0
answers
338
views
What does over and underestimation of test error in Cross Validation mean?
I know it's a naive question but i am having a hard time to understand what does it mean by the term under/over estimation of test error in Cross Validation.
For example the following snippet is taken ...
1
vote
1
answer
120
views
Machine Learning - Training/Validation Sets
I have a very general question that I can't seem to get a straight answer on.
Machine Learning - I understand how it works - you have your dataset for which you want to answer either a prediction or ...
4
votes
3
answers
2k
views
Machine Learning - How to Sample Test and Training Data for Rare Events
Suppose I have a data set with 1000 observations. I want to train and test a Classification Model to predict a target variable as true or false. However, in my observation set, true occurs only say 10%...
0
votes
0
answers
101
views
Random Forest Underperforms Median on Training Set for Toy Regression Problem
I have found that random forests is failing on a toy regression problem. My prior impression of random forests is that it is very robust, so I expected that, on the training set, it should always ...
0
votes
0
answers
629
views
Is the use of Nested Cross Validation and train- test CV necessary or an overkill?
I have been relatively obsessed lately in the proper way of selecting a model (including tuning hyper parameters) and then assessing model performance.
I have read various posts and the approach I ...
0
votes
0
answers
238
views
Time series data validation error is significantly lower than training error
I have a time series dataset that covers daily observations (closing price) for several stocks, and I would like to build models to forecast the closing prices for the future 7 days using their ...
1
vote
2
answers
111
views
Correct way of getting generalizaton performance of a model using the whole dataset
Standard practice is to split data into a train/test set, then use the train set for hyperparameter tuning / model selection, using for example cross-validation over the whole training set. Finally, ...
0
votes
0
answers
838
views
Custom TF 2.0 training loop performing considerably worse than keras fit_generator - can't understand why
In trying to better understand tensorflow 2.0, I am trying to write a custom training loop to replicate the work of the keras fit_generator function. In my head, I have replicated the steps ...
4
votes
0
answers
640
views
Reinforcement Learning - When to stop training?
I have built a deep reinforcement learning based portfolio optimisation agent. At a high level it is using macro economic data, valuations of the assets and a few technical indicators as the features. ...
1
vote
1
answer
2k
views
Multiple cross-validation and multiple train-test splits
Suppose we have only four observations in a dataset. Let's called them a,b,c and d.
If we perform a cross-validation in a k-fold, with k=2, we would get the following :
We get two groups of data, (a,...