Skip to main content

Questions tagged [training-error]

Filter by
Sorted by
Tagged with
4 votes
1 answer
68 views

How do machine learning topics fit into a traditional undergraduate statistics course on estimation?

I'm recently teaching an undergraduate introduction to statistics course, but as required by program director, need to add some machine learning materials to it. I'm wondering what is the appropriate ...
ExcitedSnail's user avatar
  • 3,050
0 votes
0 answers
25 views

Model training loss always converge to 1.35

I'm trying to create a multi-class classification model using RNNs. The input data has a sequence length of 90 and consists of 5 features, normalized to the [0,1] range. Here's the network ...
Mangi222's user avatar
0 votes
0 answers
5 views

Custom Model For Approximating Sin Function Using Backpropagation [duplicate]

I have very simple custom model which I am doing experiment with, I have model which takes one input and produce one output. the model equation is: y = sin(ax + b). (a) and (b) are single learnable ...
mohammad's user avatar
2 votes
1 answer
44 views

Training loss reach to zero, then suddenly increases, then decreases to zero

I get the following loss behavior when training multilayer perceptron with mean squared error loss on some synthetic data using default Adam with default learning. (I am working on 1 demention data) I ...
Rahim Brahimi's user avatar
1 vote
0 answers
13 views

Reference Request: Rate at which Training Error goes to 0

Looking for references on which training error goes to 0. As an example, suppose that we have a linear model $y = X \beta_0 + \epsilon,$ where $\beta_0$ is bounded and each row of $X$ is generated ...
Alan Chung's user avatar
1 vote
1 answer
47 views

Why does the performance on the training set go down as the number of samples increases?

To my knowledge there are two types of learning curves, those that show the progression of the performance as the amount of epochs progresses, and those that show the performance progression as the ...
Valo's user avatar
  • 53
1 vote
0 answers
136 views

What is a good way to understand the difference between training RSS/MSE and test RSS for polynomial models?

I am trying to understand the difference for training RSS and test RSS for smoothing splines e.g. $$\hat{g}=\arg \min_g \left(\Sigma_{i=1}^{n}(y_{i} - g(x_i))^2 + \lambda \int [g^{(m)}(x)]^{2}dx \...
therickster's user avatar
1 vote
0 answers
60 views

Peaks in error during training - Regression problem with DL model (LSTM)

During training I get unusual behavior of my model. Peaks show up in the both validation and training errors that I could not understand. I use MSE as a loss 1 and similar behavior appears in other ...
JeanAR's user avatar
  • 11
3 votes
0 answers
378 views

Shuffling data significantly decrease the performance of linear regression

I'm trying to build a simple linear regression model $y=ax+b$ using pytorch, where $y$ is the number of cells increased on Day $n$ and $x$ is the number of cells ...
Jack's user avatar
  • 71
4 votes
1 answer
2k views

Is it possible to have a higher train error than a test error in machine learning?

Usually it is called over-fitting when the test error is higher than the training error. Does that imply that it is called under-fitting when the training error is higher than the test error? Also ...
Just a stat student's user avatar
0 votes
0 answers
77 views

PRESS statistic and k-fold cross-validation

How is the PRESS statistic calculated in a k-fold cross-validation? I know how it is done in the leave-one-out scenario. Is it still summed over all training samples, just that there are now k-many ...
dinaue's user avatar
  • 1
1 vote
0 answers
150 views

How to handle problem of different random seeds giving drastically different test scores in machine learning model?

For a rigorous empirical analysis, I am training a model with three different seeds - 0, 1 and 2. In each case, I found that the model obtained through early stopping (lowest validation loss) had an ...
Dhruv Mullick's user avatar
1 vote
0 answers
235 views

Training loss, validation loss and WER decrease, then increase [duplicate]

I am trying to use Hugginface Datasets for speech recognition using transformers using this tutorial, epochs=30, steps=400, train_batch_size=16. Training loss, validation loss and WER decrease, and ...
user1680859's user avatar
4 votes
0 answers
161 views

Initial jump in validation loss, how to explain?

A lot of times I see the following behavior of the training and validation loss during training of deep nets: At the very beginning (first couple of batches), the training loss descends, but the ...
hirschme's user avatar
  • 1,140
2 votes
0 answers
81 views

Strange behaviour of training accuracy and loss function

Firstly, I want to mention that I am not looking for suggestions to improve the training accuracy of my NN. The only purpose of this question is to know what might be causing the peculiar behaviour ...
Ranjan's user avatar
  • 121
1 vote
1 answer
186 views

How to find the optimum when using regularization?

Using regularization increases the training error and the validation possibly as well. How do I find the optimum? Still just the optimum of the bias² and variance, like here: Source: https://dziganto....
Ben's user avatar
  • 3,493
0 votes
0 answers
50 views

How are parameters in graphical models learnt?

This is a request of a good reference. I wanted to have a better understanding of graphical models and I am reading "Pattern recognition and machine learning" of Bishop. chap. 8 (Graphical ...
Thomas's user avatar
  • 952
0 votes
0 answers
23 views

Explaining and Addressing the Bias-Variance Tradeoff

Let's say we have a model whose training error is 7% and the validation error is 10%. What does this mean in terms of the bias-variance tradeoff? I know that high validation error and low training ...
MLNewbie's user avatar
  • 141
1 vote
2 answers
9k views

Training loss decreases, then suddenly increases, then decreases lower than the first time

I get the following loss behavior when training multilayer perceptron with mean squared error loss on some synthetic data using Adam with learning rate 1e-1. As far as I can say from reading, for ...
Dmitry Kabanov's user avatar
0 votes
1 answer
624 views

Why is the expected value of the optimism of the training error for a linear model equal to $\frac{2}{ n}d\sigma ^{2}$?

In the book Elements of statistical learning 2 on page 229, they express the expected optimism of the training error as: $$ \omega=\frac{2}{N} \sum_{i=1}^{N} \operatorname{Cov}\left(\hat{y}_{i}, y_{i}\...
Gloomy's user avatar
  • 121
0 votes
0 answers
101 views

Slow convergence when the error is small when solving a convex optimization problem

I implemented an algorithm to solve trace-norm regularized least square problem. I noticed something that when the error becomes small, the convergence becomes very slow. I "manually" fixed ...
rando's user avatar
  • 328
1 vote
1 answer
665 views

Which is more important in model selection, smallest difference between Train MSE and Test MSE, or the lowest Test MSE?

For a dataset that is not too large, I am trying a couple of models for prediction. I get the following train and test MSE for them: Model 1: Train MSE = 100, Test MSE = 104 Model 2: Train MSE =   30,...
sam's user avatar
  • 13
2 votes
1 answer
518 views

Tensorflow loss and accuracy during training weird values [closed]

I am doing some testing with tensorflow, and I bumbed into a very weird behaviour. Here is my code ...
Dave's user avatar
  • 149
0 votes
0 answers
369 views

Why does my training loss decrease with number of samples?

I'm training a convolutional U-Net-like structure for image restoration, with SSIM loss and ADAM. I have ~1500 training samples and ~350 validation samples partitioned at random. My batches have just ...
Andrew Kay's user avatar
0 votes
1 answer
156 views

Accuracy is 100% but model.predict is totaly wrong! what could be the problem? (Autoencoder NN)

Its an Autoencoder model that receive 4 different vectors { (1,0,0,0) , (0,1,0,0) , (0,0,1,0) , (0,0,0,1) } The encoder transforms the vectors to vectors size 2, which get inside an "NLPN channel&...
Adar Cohen's user avatar
6 votes
1 answer
4k views

In linear regression, we have 0 training error if data dimension is high, but are there similar results for other supervised learning problems?

P.S. I just posted this question on MathOverflow, as I didn't seem to get an answer here. Let's consider a supervised learning problem where $\{(x_1,y_1) \dots (x_n,y_n)\} \subset \mathbb{R}^p \times \...
Mathmath's user avatar
  • 761
5 votes
1 answer
152 views

Is overfitting an issue if all I care about is training error

I am working on a project where we perform non-response adjustment by weighting survey respondents by their probability of response. In order to do this, we need to estimate each respondents ...
astel's user avatar
  • 1,598
2 votes
0 answers
30 views

How does the Loss of a Neural Network effect training

Suppose I have a neural network, say an input layer taking in vectors $i \in \mathbb{R}^d$, a hidden relu layer, and then a softmax output layer with a cross entropy loss (with no biases added for ...
IntegrateThis's user avatar
2 votes
1 answer
248 views

Can I still use an overfitted model with high test accuracy?

Below is the training statistics output from training a Keras/TF model. You can see val_accuracy peaks at Epoch 4 with 0.6633. After that accuracy(train) continues to go up but val_accuracy becomes ...
etang's user avatar
  • 1,027
1 vote
0 answers
111 views

Relation between test and train error with gradient descent iterates

My question is about establishing an inequality between population error and expected training error (i.e, expected training error < population error) for a model trained with gradient descent on a ...
sgg's user avatar
  • 11
3 votes
0 answers
91 views

In practice, do we distinguish between "in-sample" and "training" error?

In Elements of Statistical Learning, it distinguishes between "in-sample" and "training" error (Frankly, I found the chapter on errors to be very confusing, especially with how ...
24n8's user avatar
  • 1,147
3 votes
3 answers
6k views

Does zero training error mean zero bias?

In machine learning, high bias prevents a model from properly fitting even the training set. So, does a model have zero training error if and only if it has zero bias?
kennysong's user avatar
  • 1,081
2 votes
2 answers
5k views

Learning curve vs training (loss) curve?

In machine learning, there are two commonly used plots to identify overfitting. One is the learning curve, which plots the training + test error (y-axis) over the training set size (x-axis). The other ...
kennysong's user avatar
  • 1,081
0 votes
0 answers
12 views

How to pull in-sample fitted models towards out-of-sample optimum during training?

A common problem in prediction problems is getting a model fit on in-sample data to predict out-of-sample data with decent accuracy and precision. Assuming that we break a part of the in-sample data ...
develarist's user avatar
  • 4,049
2 votes
1 answer
2k views

Finding the training- and test-error for a glm-model

I have a data-set with approx. 200 000 observations and 10 predictors, with continuous target. I have divided this data into a training set and a test set (70%/30%). I want to compare a glm-model ...
AnnieFrannie's user avatar
0 votes
2 answers
2k views

Am I overfitting even though my model performs well on the test set?

I have a dataset with 1289 observations and around 2000 features. I split my dataset into a 70/30 training and test set. I use GridSearchCV from scikit-learn to perform 5 fold cross validation on the ...
Ajay Sundaresan's user avatar
1 vote
1 answer
1k views

Validation loss decreasing faster than training loss

I have two different scenarios that I ran across and I can't seem to wrap my head around what caused them. In this scenario, my validation loss, in orange, initially fell faster than my training loss,...
ayak's user avatar
  • 85
1 vote
1 answer
563 views

Building a neural network with two training paths in Keras [closed]

I am trying to build a NN in Keras with two different output paths where the first path informs the second. The first path passes its loss to the end of the second path, like so: Pass through layer A ...
Alex's user avatar
  • 497
1 vote
1 answer
171 views

Is there a clear relationship between number of training examples and over/underfitting when you do not know the model complexity?

It seems that without knowing the model complexity, it is difficult to state for certain what is the relationship between the number of training examples and over/underfitting. As a concrete example, ...
Fraïssé's user avatar
  • 1,630
1 vote
0 answers
134 views

Handling look-ahead bias while training

A lot of real-life transactional/CRM applications are not careful in creating snapshots of the data (historical data) for the model to learn from. Imagine a typical example of predicting the ACV ...
keshr3106's user avatar
2 votes
0 answers
338 views

What does over and underestimation of test error in Cross Validation mean?

I know it's a naive question but i am having a hard time to understand what does it mean by the term under/over estimation of test error in Cross Validation. For example the following snippet is taken ...
Saif Ali Khan's user avatar
1 vote
1 answer
120 views

Machine Learning - Training/Validation Sets

I have a very general question that I can't seem to get a straight answer on. Machine Learning - I understand how it works - you have your dataset for which you want to answer either a prediction or ...
user2813606's user avatar
4 votes
3 answers
2k views

Machine Learning - How to Sample Test and Training Data for Rare Events

Suppose I have a data set with 1000 observations. I want to train and test a Classification Model to predict a target variable as true or false. However, in my observation set, true occurs only say 10%...
Fritz45's user avatar
  • 251
0 votes
0 answers
101 views

Random Forest Underperforms Median on Training Set for Toy Regression Problem

I have found that random forests is failing on a toy regression problem. My prior impression of random forests is that it is very robust, so I expected that, on the training set, it should always ...
Matt Munson's user avatar
0 votes
0 answers
629 views

Is the use of Nested Cross Validation and train- test CV necessary or an overkill?

I have been relatively obsessed lately in the proper way of selecting a model (including tuning hyper parameters) and then assessing model performance. I have read various posts and the approach I ...
ALEX.VAMVAS's user avatar
0 votes
0 answers
238 views

Time series data validation error is significantly lower than training error

I have a time series dataset that covers daily observations (closing price) for several stocks, and I would like to build models to forecast the closing prices for the future 7 days using their ...
Lynn's user avatar
  • 1
1 vote
2 answers
111 views

Correct way of getting generalizaton performance of a model using the whole dataset

Standard practice is to split data into a train/test set, then use the train set for hyperparameter tuning / model selection, using for example cross-validation over the whole training set. Finally, ...
hirschme's user avatar
  • 1,140
0 votes
0 answers
838 views

Custom TF 2.0 training loop performing considerably worse than keras fit_generator - can't understand why

In trying to better understand tensorflow 2.0, I am trying to write a custom training loop to replicate the work of the keras fit_generator function. In my head, I have replicated the steps ...
actuary_meets_data's user avatar
4 votes
0 answers
640 views

Reinforcement Learning - When to stop training?

I have built a deep reinforcement learning based portfolio optimisation agent. At a high level it is using macro economic data, valuations of the assets and a few technical indicators as the features. ...
Vivek YS's user avatar
1 vote
1 answer
2k views

Multiple cross-validation and multiple train-test splits

Suppose we have only four observations in a dataset. Let's called them a,b,c and d. If we perform a cross-validation in a k-fold, with k=2, we would get the following : We get two groups of data, (a,...
hellowolrd's user avatar