Newest 'deep-learning' Questions

1 vote

0 answers

8 views

Understanding the Definition of Conditional Optimal Transport Path

In this tutorial about flow matching, the authors define a probability path interpolating between $p_0 = N(0, I)$ and a target distribution $q$ as follows (Equation 2.2 of the paper): $$p_t(x) = \int{...

pushinproto

11

asked yesterday

1 vote

0 answers

32 views

Preprocessing and model selection strategies

I am working on a fault detection problem where each sample is a time series labeled with a specific type of fault. I am using a CNN model and a validation set for hyperparameter tuning. Currently, I ...

S.H.W

77

asked Dec 12 at 18:02

1 vote

1 answer

12 views

Varying sequence lengths between classes in LSTM

I am working on a project where the goal is to predict whether students in an online course will drop out of the course. The course is divided into 20 course weeks. For each week, I have certain kinds ...

Computeraar

11

asked Dec 12 at 13:38

1 vote

0 answers

18 views

Can the closed-form solution for ridge regression be used in training neural networks?

Is it established that the closed-form solution of ridge regression can be used during the training of neural networks? If so: What are the potential benefits of using it? In what scenarios would ...

FadiBenz

31

asked Dec 11 at 19:18

2 votes

0 answers

30 views

Does the ill-conditioning of the design matrix affect the ill-conditioning of the Hessian in the context of DL?

I know that when we use the square loss as our cost function in DL, the ill-conditioning of the Hessian is directly tied to that of the design matrix. Does this apply to other cost functions ? If so, ...

FadiBenz

31

asked Dec 9 at 22:28

0 votes

0 answers

10 views

What is the best way to set weights for weighted MSE in multi output regression? [duplicate]

I am working on a regression task where the goal is to predict 6 scalar output values from a given input. The input consists of decaying signal data, and the outputs are the parameters of the signal ...

COTHE

11

asked Dec 8 at 11:31

1 vote

2 answers

39 views

How does a single layer/single unit with Adam optimizer network work?

I'm very new to ML and I'm trying to mess around with Linear Regression. I tested sklearn's LinearRegression model and then wanted to compare the results to a very simple neural network. I created a ...

TamerM

111

asked Dec 8 at 3:47

3 votes

0 answers

21 views

Reward and Penalty Design in reinforcement learning

I hope you're all doing well. I am currently working on a reinforcement learning problem to solve an optimization problem in wireless networks and I'm having troubles with designing the reward and ...

Mehran Varshosaz

31

asked Dec 2 at 5:25

1 vote

1 answer

20 views

Learnability of boolean formulae by neural networks using back propagation?

I've been researching neural networks and boolean formulae. From my efforts, it doesn't seem that neural networks can generally learn boolean formulae using back propagation. This makes sense ...

yters

143

asked Nov 28 at 1:09

0 votes

0 answers

16 views

"Inflating" learning rates in diminishing gradient areas for NN training

In neural net training, nowadays tanh and sigmoid activation functions in hidden layers are avoided as they tend to "saturate" easily. Meaning, if the x value plugged into tanh/sigmoid is ...

Omrii

101

asked Nov 26 at 16:45

2 votes

2 answers

37 views

Predicting the probability distribution of a deterministic dataset

In classical machine learning regression, we often assume the target variable $y$, given an input $x$, follows a probability distribution, allowing us to model and predict not just the expected value ...

juekai

121

asked Nov 26 at 6:04

0 votes

0 answers

16 views

Why do VAEs work?

I am currently reading into Variational Autoencoders, and although I kind of understand the mathematical background described in the original paper (Auto-encoding Variational Bayes), I am struggling ...

Marco Rosinus Serrano

1

asked Nov 25 at 20:41

0 votes

0 answers

16 views

Geometric structure of the output of a random weight neural network fed with random data

Take a model with random weigths: ...

Random Network

9

asked Nov 25 at 14:10

0 votes

0 answers

22 views

Train a neural network to predict the false positive rate of a segmentation model

I am trying to train a neural network to infer the false positive rate of an image segmentation model on the basis of the input image and the threshold. To do so, I am considering a dataset organized ...

Giorgio Bianchi

1

asked Nov 25 at 10:54

0 votes

0 answers

16 views

Embeddings in time series prediction

Increasingly, I’ve noted that embeddings are used in pure prediction ML tasks. For example, instead of predicting whether user i will purchase item i and thereby adding thousands or millions of inputs ...

jbuddy_13

3,520

asked Nov 24 at 11:02

3 votes

1 answer

166 views

Questions on backpropagation in a neural net

I understand how to symbolically apply back propagation, calculate the formulas with pen and paper. When it comes to actually using these derivations on data, I have 2 questions: Suppose certain ...

Baron Yugovich

599

asked Nov 22 at 14:20

1 vote

1 answer

27 views

Reason for softmax approximation in Ian Goodfellow's deep learning book

In section 6.2.2.2 (equation 6.31) they state: Overall, unregularized maximum likelihood will drive the model to learn parameters that drive the softmax to predict the fraction of counts of each ...

Philipp

11

asked Nov 22 at 14:02

0 votes

0 answers

14 views

Why does my test loss and test evaluation metrics fluctuate?

I am fine-tuning the resnet18 model with additional classifiers. What I observed during the training process, is that test loss and other test evaluation metrics (AP, AUC) seem to fluctuate as you can ...

Yuju Ahn

1

asked Nov 19 at 14:06

0 votes

1 answer

28 views

GNNs with higher order adjacency matrices

Usually, the adjacency matrix stores information about direct connections of nodes in a graph. The information from k-th neighbours is passed-on at k-th layers of GNNs, as described in the original ...

ignoramus

1

asked Nov 18 at 16:29

0 votes

0 answers

17 views

Why does the image classification model perform worse when augmenting only minority class

I have a problem of data imbalance (1:10 ratio) for image classification tasks. To cope with it, I tried different imbalance training strategies, including weighted loss function, different loss ...

Yuju Ahn

1

asked Nov 18 at 9:39

0 votes

0 answers

11 views

How to Cluster Highly Skewed DALY Data When Transformations and Clustering Algorithms Don't Help?

I’m working with a dataset of Disability-Adjusted Life Years (DALY) values with a high degree of skewness, as shown in the histogram below, where most values cluster near zero with a long tail ...

shekoofeh momahhed

1

asked Nov 9 at 6:14

1 vote

1 answer

40 views

How does temporal data leakage happen?

Assume I use a moving window to slice a daily stock closing price history data. Using past 7 days to predict next day. For each training instance, I'm strictly using historical data to predict future ...

yang

149

asked Nov 8 at 4:50

0 votes

0 answers

10 views

What to do when training and validation loss both keep decreasing and seem correlated?

I'm training a neural network (a custom architecture, but mostly a variant of a CNN) with an MSE loss. The trained network works well for my task (predicting one vector from another vector) - like, ...

davewy

101

asked Nov 2 at 13:32

0 votes

0 answers

26 views

Threshold moving with k-fold cross validation

I have an issue of class imbalance in my data (1:10 ratio), so I have implemented strategies, including Weighted Random Sampler, Weighted loss function, Threshold moving, Data augmentation (not always ...

Yuju Ahn

1

asked Nov 1 at 16:53

1 vote

1 answer

55 views

Paper Discussion - Hamiltonian Neural Networks

Im writing a paper on the discoveries made by Sam Greydanus et al.: https://arxiv.org/abs/1906.01563v1 I was wondering if anyone could help me provide some insights on this: They are highlighting how ...

Ole Askeland

11

asked Oct 31 at 12:19

2 votes

1 answer

51 views

How to use differential-entropy as pre-processing?

I am currently working on implementing the model EEG_DMNet. For pre-processing it calls for using differential entropy like $$ h(X) = -\int_{-\infty}^{\infty} p(x) \log p(x) \, dx $$ Assuming the Data ...

Sebastian Krafft

23

asked Oct 29 at 12:34

2 votes

1 answer

117 views

Is the deep set operation just elementwise addition?

I have been reading the Deep Sets, 2017, Zaheer et al which has a fair bit of complex looking math. But I look in the code repo. and it seems like it is just elementwise addition? Which is fine and ...

Frames Catherine White

3,214

asked Oct 22 at 9:10

2 votes

1 answer

34 views

Is it possible to train Neural networks for time series forecasting using elastic distances (such as dtw) as a loss function?

Normally, elastic distances are used as ways to tell how similar two time series are. Examples of these are dynamic time warping and move-split-merge and many more. And I read some researches such as ...

Mike Bukowski

21

asked Oct 21 at 18:46

1 vote

1 answer

39 views

Understanding deep learning notation and properties of expectation - neighbor2neighbor

I am trying to follow the proof of Theorem 1 here but can't fully understand the notation the authors use so that I am not able to completely understand the steps. This misunderstanding keeps coming ...

Dr. John

125

asked Oct 21 at 18:34

6 votes

1 answer

274 views

Model performs well on train and cross-validation sets but inaccurate in the test set. How to solve? [duplicate]

I've been working on a CNN binary classification model, and the model performs pretty good in both the training set, and the cross-validation set as well (both practically 1.0 acc). However, I also ...

Efe FRK

71

asked Oct 20 at 21:09

1 vote

0 answers

16 views

Why does all these adaptive methods in neural network training require a $g_t^2$ term?

All the adaptive learning methods, AdaGrad, AdaDelta, RMSprop, ADAM, and later variants all require $g_t^2$ which is the gradient multiplying itself in the elementwise fashion. Why is this needed? I ...

Shamisen Expert

671

asked Oct 20 at 9:55

0 votes

0 answers

7 views

Why does internal covariate shift affect neural networks when the loss landscape remains the same?

Why does internal covariate shift affect neural networks when the loss landscape remains the same? When reading about internal covariate shift and how batch normalization doesn't really solve it, it ...

richardjoseph

31

asked Oct 19 at 19:34

1 vote

0 answers

26 views

How to Interpret Reconstruction Error for Anomaly Detection with Autoencoders?

I have an autoencoder based on a neural network. This model was trained using SCADA data. I got decent results in anomaly detection, with around 85% in the main metrics (recall, accuracy, precision, ...

hgropelli

11

asked Oct 14 at 1:38

1 vote

0 answers

51 views

The gradient method based attack does not seem make sense for neural networks because the training error is non-convex

There are several gradient-based attack methods. Let $J$ be the training error, then for instance the projected gradient attack is, $$ \widetilde{x} = \Pi( x + \epsilon \nabla_x J(\theta, x, y) ) $$ ...

Shamisen Expert

671

asked Oct 11 at 14:53

0 votes

0 answers

16 views

Given that a flattening operation is used, why is it that a CNN does not have the same problem as vectorizing an image?

So a key motivation for using CNN is that if we were to vectorize an image and then feed it into a standard multilayer perceptron, that vectorized image would lose spatial information between pixels. ...

Shamisen Expert

671

asked Oct 11 at 1:16

0 votes

0 answers

31 views

Hyperparameter Tuning for Multiple Time Series

I am developing a time-series model utilizing NeuralProphet for forecasting the demand of products by day. I have grouped the products into a number of clusters by features such as average demand, ...

GJKamClark

1

asked Oct 9 at 21:26

18 votes

2 answers

6k views

What are the works of Hopfield and Hinton that enable machine learning with neural networks, as noted in the physics Nobel award statement?

On October 9, 2024, the Nobel Foundation announced the Nobel Prize in Physics 2024 with the following statement of merit: for foundational discoveries and inventions that enable machine learning with ...

Community wiki

9 revs, 5 users 67%
Arc

1 vote

0 answers

9 views

How does one measure the fall in performance of Object Detection algorithms with degraded training images? [closed]

I'm supposed to be working on a project that involves studying of the effect on Object Detection algorithms with different kinds of training images. We(team) are supposed to measure the difference in ...

Trainers

11

asked Oct 6 at 19:16

0 votes

0 answers

20 views

Does it make sense to use Batch Normalization in autoencoders for image reconstruction?

I’m working on an autoencoder for image reconstruction, and I would like to hear your opinions on using Batch Normalization (BN) in this type of architecture. What are the advantages of applying BN ...

Dazckel

81

asked Oct 4 at 8:19

4 votes

2 answers

130 views

Why computation of $P(X)$ is hard in posterior?

Let $X$ denotes the input space of dimension $d$ and $Z$ denotes the output space of size $k$. The posterior is defined as $$P(Z|X) = \frac{P(X|Z)P(Z)}{P(X)}$$ where $P(Z)$ is the prior and $P(X|Z)$ ...

Rma

163

asked Oct 1 at 11:47

0 votes

1 answer

36 views

Modeling approaches for conditional probability distribution, applied to Propensity Score estimation for IPW (causal inference)

I'm trying to understand and ideally implement the Inverse Probability Weighting approach to estimate a causal effect. My ressources so far have been Pearl's Primer and the book "What If?". ...

ThighCrush

225

asked Oct 1 at 7:47

1 vote

0 answers

25 views

Maxout activation function vs ReLU (Number of weights)

From what I understood, Maxout function works quite differently from ReLU. ReLU function is max(0, x), so the input x is (W_T x + b) Maxout function has many Ws, and it is max(W1_T x + b1, W2_T x + b2,...

kite

11

asked Sep 29 at 4:23

0 votes

0 answers

45 views

Are linear RBFs universal function approximators?

This paper shows that radial basis functions neural networks (RBFs NN, i.e., NN with one hidden layer with RBF activation) are universal function approximator (UVA). How about linear RBFs ...

Simon

191

asked Sep 27 at 13:52

0 votes

0 answers

29 views

Advice on fine-tuning an email classifier for a Pharma company

I'm an intern working on implementing a binary email classifier for a client (Pharmaceutical company) and I need some advice on fine-tuning the model. The model I'm using is Longformer (because it has ...

Bhashwar Sengupta

1

asked Sep 24 at 16:03

0 votes

0 answers

23 views

What advantage do lstms provide for Apple's language identification over other architectures?

Why do we use LSTMs over other architectures for character-based language identification (LID) from short-strings of text when the LSTM's power comes from its long-range dependency memory? For example,...

karak87rt0

11

asked Sep 23 at 13:52

1 vote

0 answers

19 views

Why expectation term in VAE loss not implemented in practical?

According to VAE paper : https://arxiv.org/pdf/1906.02691 (eq 2.10, page 21) VAE loss contain the expectation term. If I understand correctly, we need sample more than 1 result from an input x and ...

Manh

125

asked Sep 22 at 18:52

0 votes

0 answers

14 views

Why does convolutional neural network use a 2d Filter

Given an input of C,H,W where C is channels the filter is of size X,Y and slides across each channel individually. Why isn't the filter of size C,X,Y and slides across the entire 3d shape? I know of ...

Fine-Tuning

236

asked Sep 11 at 14:57

1 vote

1 answer

62 views

Should classical/traditional ML techniques such as polynomial regression/decision trees/random forests SIGNIFICANTLY outperform RNN in timeseries? [closed]

I have a dataset of numerous years of buoy wave height measurements including features such as measured significant wave height, numerical model predictions, peak wave period, mean wave period, and ...

Donald M.

11

asked Sep 11 at 5:32

0 votes

0 answers

11 views

How to bound generalization error of a shallow NN?

I have a trained one-hidden layer NN with ReLU. There are 2N+d parameters in total (N is the number of nodes and d is the number of dimension). The sample size to train this NN is exactly N. Now, in ...

J M

1

asked Sep 9 at 2:17

2 votes

1 answer

54 views

Why does having a smaller set of weight help with generalization?

When I studied machine learning for the first time, I learned that we need to use l2 regularization to improve generalization. The reason is based on the polynomial regression experiment in Chris ...

Fraïssé

1,630

asked Sep 9 at 2:07

All Questions

Related Tags