All Questions
Tagged with deep-learning or neural-networks
9,974 questions
1
vote
0
answers
8
views
Understanding the Definition of Conditional Optimal Transport Path
In this tutorial about flow matching, the authors define a probability path interpolating between $p_0 = N(0, I)$ and a target distribution $q$ as follows (Equation 2.2 of the paper):
$$p_t(x) = \int{...
1
vote
0
answers
32
views
Preprocessing and model selection strategies
I am working on a fault detection problem where each sample is a time series labeled with a specific type of fault. I am using a CNN model and a validation set for hyperparameter tuning. Currently, I ...
1
vote
1
answer
12
views
Varying sequence lengths between classes in LSTM
I am working on a project where the goal is to predict whether students in an online course will drop out of the course. The course is divided into 20 course weeks. For each week, I have certain kinds ...
1
vote
0
answers
18
views
Can the closed-form solution for ridge regression be used in training neural networks?
Is it established that the closed-form solution of ridge regression can be used during the training of neural networks? If so:
What are the potential benefits of using it?
In what scenarios would ...
2
votes
0
answers
30
views
Does the ill-conditioning of the design matrix affect the ill-conditioning of the Hessian in the context of DL?
I know that when we use the square loss as our cost function in DL, the ill-conditioning of the Hessian is directly tied to that of the design matrix. Does this apply to other cost functions ?
If so, ...
0
votes
0
answers
10
views
What is the best way to set weights for weighted MSE in multi output regression? [duplicate]
I am working on a regression task where the goal is to predict 6 scalar output values from a given input. The input consists of decaying signal data, and the outputs are the parameters of the signal ...
1
vote
2
answers
39
views
How does a single layer/single unit with Adam optimizer network work?
I'm very new to ML and I'm trying to mess around with Linear Regression. I tested sklearn's LinearRegression model and then wanted to compare the results to a very simple neural network.
I created a ...
3
votes
0
answers
21
views
Reward and Penalty Design in reinforcement learning
I hope you're all doing well.
I am currently working on a reinforcement learning problem to solve an optimization problem in wireless networks and I'm having troubles with designing the reward and ...
1
vote
1
answer
20
views
Learnability of boolean formulae by neural networks using back propagation?
I've been researching neural networks and boolean formulae. From my efforts, it doesn't seem that neural networks can generally learn boolean formulae using back propagation. This makes sense ...
0
votes
0
answers
16
views
"Inflating" learning rates in diminishing gradient areas for NN training
In neural net training, nowadays tanh and sigmoid activation functions in hidden layers are avoided as they tend to "saturate" easily. Meaning, if the x value plugged into tanh/sigmoid is ...
2
votes
2
answers
37
views
Predicting the probability distribution of a deterministic dataset
In classical machine learning regression, we often assume the target variable $y$, given an input $x$, follows a probability distribution, allowing us to model and predict not just the expected value ...
0
votes
0
answers
16
views
Why do VAEs work?
I am currently reading into Variational Autoencoders, and although I kind of understand the mathematical background described in the original paper (Auto-encoding Variational Bayes), I am struggling ...
0
votes
0
answers
16
views
Geometric structure of the output of a random weight neural network fed with random data
Take a model with random weigths:
...
0
votes
0
answers
22
views
Train a neural network to predict the false positive rate of a segmentation model
I am trying to train a neural network to infer the false positive rate of an image segmentation model on the basis of the input image and the threshold.
To do so, I am considering a dataset organized ...
0
votes
0
answers
16
views
Embeddings in time series prediction
Increasingly, I’ve noted that embeddings are used in pure prediction ML tasks. For example, instead of predicting whether user i will purchase item i and thereby adding thousands or millions of inputs ...
3
votes
1
answer
166
views
Questions on backpropagation in a neural net
I understand how to symbolically apply back propagation, calculate the formulas with pen and paper. When it comes to actually using these derivations on data, I have 2 questions:
Suppose certain ...
1
vote
1
answer
27
views
Reason for softmax approximation in Ian Goodfellow's deep learning book
In section 6.2.2.2 (equation 6.31) they state:
Overall, unregularized maximum likelihood will drive the model to learn parameters that drive the softmax to predict the fraction of counts of each ...
0
votes
0
answers
14
views
Why does my test loss and test evaluation metrics fluctuate?
I am fine-tuning the resnet18 model with additional classifiers. What I observed during the training process, is that test loss and other test evaluation metrics (AP, AUC) seem to fluctuate as you can ...
0
votes
1
answer
28
views
GNNs with higher order adjacency matrices
Usually, the adjacency matrix stores information about direct connections of nodes in a graph.
The information from k-th neighbours is passed-on at k-th layers of GNNs, as described in the original ...
0
votes
0
answers
17
views
Why does the image classification model perform worse when augmenting only minority class
I have a problem of data imbalance (1:10 ratio) for image classification tasks.
To cope with it, I tried different imbalance training strategies, including weighted loss function, different loss ...
0
votes
0
answers
11
views
How to Cluster Highly Skewed DALY Data When Transformations and Clustering Algorithms Don't Help?
I’m working with a dataset of Disability-Adjusted Life Years (DALY) values with a high degree of skewness, as shown in the histogram below, where most values cluster near zero with a long tail ...
1
vote
1
answer
40
views
How does temporal data leakage happen?
Assume I use a moving window to slice a daily stock closing price history data. Using past 7 days to predict next day. For each training instance, I'm strictly using historical data to predict future ...
0
votes
0
answers
10
views
What to do when training and validation loss both keep decreasing and seem correlated?
I'm training a neural network (a custom architecture, but mostly a variant of a CNN) with an MSE loss. The trained network works well for my task (predicting one vector from another vector) - like, ...
0
votes
0
answers
26
views
Threshold moving with k-fold cross validation
I have an issue of class imbalance in my data (1:10 ratio), so I have implemented strategies, including Weighted Random Sampler, Weighted loss function, Threshold moving, Data augmentation (not always ...
1
vote
1
answer
55
views
Paper Discussion - Hamiltonian Neural Networks
Im writing a paper on the discoveries made by Sam Greydanus et al.: https://arxiv.org/abs/1906.01563v1
I was wondering if anyone could help me provide some insights on this:
They are highlighting how ...
2
votes
1
answer
51
views
How to use differential-entropy as pre-processing?
I am currently working on implementing the model EEG_DMNet. For pre-processing it calls for using differential entropy like
$$
h(X) = -\int_{-\infty}^{\infty} p(x) \log p(x) \, dx
$$
Assuming the Data ...
2
votes
1
answer
117
views
Is the deep set operation just elementwise addition?
I have been reading the Deep Sets, 2017, Zaheer et al
which has a fair bit of complex looking math.
But I look in the code repo.
and it seems like it is just elementwise addition?
Which is fine and ...
2
votes
1
answer
34
views
Is it possible to train Neural networks for time series forecasting using elastic distances (such as dtw) as a loss function?
Normally, elastic distances are used as ways to tell how similar two time series are. Examples of these are dynamic time warping and move-split-merge and many more. And I read some researches such as ...
1
vote
1
answer
39
views
Understanding deep learning notation and properties of expectation - neighbor2neighbor
I am trying to follow the proof of Theorem 1 here but can't fully understand the notation the authors use so that I am not able to completely understand the steps. This misunderstanding keeps coming ...
6
votes
1
answer
274
views
Model performs well on train and cross-validation sets but inaccurate in the test set. How to solve? [duplicate]
I've been working on a CNN binary classification model, and the model performs pretty good in both the training set, and the cross-validation set as well (both practically 1.0 acc). However, I also ...
1
vote
0
answers
16
views
Why does all these adaptive methods in neural network training require a $g_t^2$ term?
All the adaptive learning methods, AdaGrad, AdaDelta, RMSprop, ADAM, and later variants all require $g_t^2$ which is the gradient multiplying itself in the elementwise fashion.
Why is this needed? I ...
0
votes
0
answers
7
views
Why does internal covariate shift affect neural networks when the loss landscape remains the same?
Why does internal covariate shift affect neural networks when the loss landscape remains the same? When reading about internal covariate shift and how batch normalization doesn't really solve it, it ...
1
vote
0
answers
26
views
How to Interpret Reconstruction Error for Anomaly Detection with Autoencoders?
I have an autoencoder based on a neural network. This model was trained using SCADA data. I got decent results in anomaly detection, with around 85% in the main metrics (recall, accuracy, precision, ...
1
vote
0
answers
51
views
The gradient method based attack does not seem make sense for neural networks because the training error is non-convex
There are several gradient-based attack methods. Let $J$ be the training error, then for instance the projected gradient attack is,
$$
\widetilde{x} = \Pi( x + \epsilon \nabla_x J(\theta, x, y) )
$$
...
0
votes
0
answers
16
views
Given that a flattening operation is used, why is it that a CNN does not have the same problem as vectorizing an image?
So a key motivation for using CNN is that if we were to vectorize an image and then feed it into a standard multilayer perceptron, that vectorized image would lose spatial information between pixels.
...
0
votes
0
answers
31
views
Hyperparameter Tuning for Multiple Time Series
I am developing a time-series model utilizing NeuralProphet for forecasting the demand of products by day. I have grouped the products into a number of clusters by features such as average demand, ...
18
votes
2
answers
6k
views
What are the works of Hopfield and Hinton that enable machine learning with neural networks, as noted in the physics Nobel award statement?
On October 9, 2024, the Nobel Foundation announced the Nobel Prize in Physics 2024 with the following statement of merit:
for foundational discoveries and inventions that enable machine learning with ...
1
vote
0
answers
9
views
How does one measure the fall in performance of Object Detection algorithms with degraded training images? [closed]
I'm supposed to be working on a project that involves studying of the effect on Object Detection algorithms with different kinds of training images. We(team) are supposed to measure the difference in ...
0
votes
0
answers
20
views
Does it make sense to use Batch Normalization in autoencoders for image reconstruction?
I’m working on an autoencoder for image reconstruction, and I would like to hear your opinions on using Batch Normalization (BN) in this type of architecture.
What are the advantages of applying BN ...
4
votes
2
answers
130
views
Why computation of $P(X)$ is hard in posterior?
Let $X$ denotes the input space of dimension $d$ and $Z$ denotes the output space of size $k$. The posterior is defined as
$$P(Z|X) = \frac{P(X|Z)P(Z)}{P(X)}$$
where $P(Z)$ is the prior and $P(X|Z)$ ...
0
votes
1
answer
36
views
Modeling approaches for conditional probability distribution, applied to Propensity Score estimation for IPW (causal inference)
I'm trying to understand and ideally implement the Inverse Probability Weighting approach to estimate a causal effect. My ressources so far have been Pearl's Primer and the book "What If?".
...
1
vote
0
answers
25
views
Maxout activation function vs ReLU (Number of weights)
From what I understood, Maxout function works quite differently from ReLU.
ReLU function is max(0, x), so the input x is (W_T x + b)
Maxout function has many Ws, and it is max(W1_T x + b1, W2_T x + b2,...
0
votes
0
answers
45
views
Are linear RBFs universal function approximators?
This paper shows that radial basis functions neural networks (RBFs NN, i.e., NN with one hidden layer with RBF activation) are universal function approximator (UVA). How about linear RBFs ...
0
votes
0
answers
29
views
Advice on fine-tuning an email classifier for a Pharma company
I'm an intern working on implementing a binary email classifier for a client (Pharmaceutical company) and I need some advice on fine-tuning the model.
The model I'm using is Longformer (because it has ...
0
votes
0
answers
23
views
What advantage do lstms provide for Apple's language identification over other architectures?
Why do we use LSTMs over other architectures for character-based language identification (LID) from short-strings of text when the LSTM's power comes from its long-range dependency memory?
For example,...
1
vote
0
answers
19
views
Why expectation term in VAE loss not implemented in practical?
According to VAE paper : https://arxiv.org/pdf/1906.02691 (eq 2.10, page 21)
VAE loss contain the expectation term. If I understand correctly, we need sample more than 1 result from an input x and ...
0
votes
0
answers
14
views
Why does convolutional neural network use a 2d Filter
Given an input of C,H,W where C is channels the filter is of size X,Y and slides across each channel individually. Why isn't the filter of size C,X,Y and slides across the entire 3d shape?
I know of ...
1
vote
1
answer
62
views
Should classical/traditional ML techniques such as polynomial regression/decision trees/random forests SIGNIFICANTLY outperform RNN in timeseries? [closed]
I have a dataset of numerous years of buoy wave height measurements including features such as measured significant wave height, numerical model predictions, peak wave period, mean wave period, and ...
0
votes
0
answers
11
views
How to bound generalization error of a shallow NN?
I have a trained one-hidden layer NN with ReLU. There are 2N+d parameters in total (N is the number of nodes and d is the number of dimension). The sample size to train this NN is exactly N.
Now, in ...
2
votes
1
answer
54
views
Why does having a smaller set of weight help with generalization?
When I studied machine learning for the first time, I learned that we need to use l2 regularization to improve generalization. The reason is based on the polynomial regression experiment in Chris ...