Skip to main content

All Questions

Filter by
Sorted by
Tagged with
1 vote
0 answers
8 views

Understanding the Definition of Conditional Optimal Transport Path

In this tutorial about flow matching, the authors define a probability path interpolating between $p_0 = N(0, I)$ and a target distribution $q$ as follows (Equation 2.2 of the paper): $$p_t(x) = \int{...
pushinproto's user avatar
1 vote
0 answers
32 views

Preprocessing and model selection strategies

I am working on a fault detection problem where each sample is a time series labeled with a specific type of fault. I am using a CNN model and a validation set for hyperparameter tuning. Currently, I ...
S.H.W's user avatar
  • 77
1 vote
1 answer
12 views

Varying sequence lengths between classes in LSTM

I am working on a project where the goal is to predict whether students in an online course will drop out of the course. The course is divided into 20 course weeks. For each week, I have certain kinds ...
Computeraar's user avatar
1 vote
0 answers
18 views

Can the closed-form solution for ridge regression be used in training neural networks?

Is it established that the closed-form solution of ridge regression can be used during the training of neural networks? If so: What are the potential benefits of using it? In what scenarios would ...
FadiBenz's user avatar
2 votes
0 answers
30 views

Does the ill-conditioning of the design matrix affect the ill-conditioning of the Hessian in the context of DL?

I know that when we use the square loss as our cost function in DL, the ill-conditioning of the Hessian is directly tied to that of the design matrix. Does this apply to other cost functions ? If so, ...
FadiBenz's user avatar
0 votes
0 answers
10 views

What is the best way to set weights for weighted MSE in multi output regression? [duplicate]

I am working on a regression task where the goal is to predict 6 scalar output values from a given input. The input consists of decaying signal data, and the outputs are the parameters of the signal ...
COTHE's user avatar
  • 11
1 vote
2 answers
39 views

How does a single layer/single unit with Adam optimizer network work?

I'm very new to ML and I'm trying to mess around with Linear Regression. I tested sklearn's LinearRegression model and then wanted to compare the results to a very simple neural network. I created a ...
TamerM's user avatar
  • 111
3 votes
0 answers
21 views

Reward and Penalty Design in reinforcement learning

I hope you're all doing well. I am currently working on a reinforcement learning problem to solve an optimization problem in wireless networks and I'm having troubles with designing the reward and ...
Mehran Varshosaz's user avatar
1 vote
1 answer
20 views

Learnability of boolean formulae by neural networks using back propagation?

I've been researching neural networks and boolean formulae. From my efforts, it doesn't seem that neural networks can generally learn boolean formulae using back propagation. This makes sense ...
yters's user avatar
  • 143
0 votes
0 answers
16 views

"Inflating" learning rates in diminishing gradient areas for NN training

In neural net training, nowadays tanh and sigmoid activation functions in hidden layers are avoided as they tend to "saturate" easily. Meaning, if the x value plugged into tanh/sigmoid is ...
Omrii's user avatar
  • 101
2 votes
2 answers
37 views

Predicting the probability distribution of a deterministic dataset

In classical machine learning regression, we often assume the target variable $y$, given an input $x$, follows a probability distribution, allowing us to model and predict not just the expected value ...
juekai's user avatar
  • 121
0 votes
0 answers
16 views

Why do VAEs work?

I am currently reading into Variational Autoencoders, and although I kind of understand the mathematical background described in the original paper (Auto-encoding Variational Bayes), I am struggling ...
Marco Rosinus Serrano's user avatar
0 votes
0 answers
16 views

Geometric structure of the output of a random weight neural network fed with random data

Take a model with random weigths: ...
Random Network's user avatar
0 votes
0 answers
22 views

Train a neural network to predict the false positive rate of a segmentation model

I am trying to train a neural network to infer the false positive rate of an image segmentation model on the basis of the input image and the threshold. To do so, I am considering a dataset organized ...
Giorgio Bianchi's user avatar
0 votes
0 answers
16 views

Embeddings in time series prediction

Increasingly, I’ve noted that embeddings are used in pure prediction ML tasks. For example, instead of predicting whether user i will purchase item i and thereby adding thousands or millions of inputs ...
jbuddy_13's user avatar
  • 3,520
3 votes
1 answer
166 views

Questions on backpropagation in a neural net

I understand how to symbolically apply back propagation, calculate the formulas with pen and paper. When it comes to actually using these derivations on data, I have 2 questions: Suppose certain ...
Baron Yugovich's user avatar
1 vote
1 answer
27 views

Reason for softmax approximation in Ian Goodfellow's deep learning book

In section 6.2.2.2 (equation 6.31) they state: Overall, unregularized maximum likelihood will drive the model to learn parameters that drive the softmax to predict the fraction of counts of each ...
Philipp's user avatar
  • 11
0 votes
0 answers
14 views

Why does my test loss and test evaluation metrics fluctuate?

I am fine-tuning the resnet18 model with additional classifiers. What I observed during the training process, is that test loss and other test evaluation metrics (AP, AUC) seem to fluctuate as you can ...
Yuju Ahn's user avatar
0 votes
1 answer
28 views

GNNs with higher order adjacency matrices

Usually, the adjacency matrix stores information about direct connections of nodes in a graph. The information from k-th neighbours is passed-on at k-th layers of GNNs, as described in the original ...
ignoramus's user avatar
0 votes
0 answers
17 views

Why does the image classification model perform worse when augmenting only minority class

I have a problem of data imbalance (1:10 ratio) for image classification tasks. To cope with it, I tried different imbalance training strategies, including weighted loss function, different loss ...
Yuju Ahn's user avatar
0 votes
0 answers
11 views

How to Cluster Highly Skewed DALY Data When Transformations and Clustering Algorithms Don't Help?

I’m working with a dataset of Disability-Adjusted Life Years (DALY) values with a high degree of skewness, as shown in the histogram below, where most values cluster near zero with a long tail ...
shekoofeh momahhed's user avatar
1 vote
1 answer
40 views

How does temporal data leakage happen?

Assume I use a moving window to slice a daily stock closing price history data. Using past 7 days to predict next day. For each training instance, I'm strictly using historical data to predict future ...
yang's user avatar
  • 149
0 votes
0 answers
10 views

What to do when training and validation loss both keep decreasing and seem correlated?

I'm training a neural network (a custom architecture, but mostly a variant of a CNN) with an MSE loss. The trained network works well for my task (predicting one vector from another vector) - like, ...
davewy's user avatar
  • 101
0 votes
0 answers
26 views

Threshold moving with k-fold cross validation

I have an issue of class imbalance in my data (1:10 ratio), so I have implemented strategies, including Weighted Random Sampler, Weighted loss function, Threshold moving, Data augmentation (not always ...
Yuju Ahn's user avatar
1 vote
1 answer
55 views

Paper Discussion - Hamiltonian Neural Networks

Im writing a paper on the discoveries made by Sam Greydanus et al.: https://arxiv.org/abs/1906.01563v1 I was wondering if anyone could help me provide some insights on this: They are highlighting how ...
Ole Askeland's user avatar
2 votes
1 answer
51 views

How to use differential-entropy as pre-processing?

I am currently working on implementing the model EEG_DMNet. For pre-processing it calls for using differential entropy like $$ h(X) = -\int_{-\infty}^{\infty} p(x) \log p(x) \, dx $$ Assuming the Data ...
Sebastian Krafft's user avatar
2 votes
1 answer
117 views

Is the deep set operation just elementwise addition?

I have been reading the Deep Sets, 2017, Zaheer et al which has a fair bit of complex looking math. But I look in the code repo. and it seems like it is just elementwise addition? Which is fine and ...
Frames Catherine White's user avatar
2 votes
1 answer
34 views

Is it possible to train Neural networks for time series forecasting using elastic distances (such as dtw) as a loss function?

Normally, elastic distances are used as ways to tell how similar two time series are. Examples of these are dynamic time warping and move-split-merge and many more. And I read some researches such as ...
Mike Bukowski's user avatar
1 vote
1 answer
39 views

Understanding deep learning notation and properties of expectation - neighbor2neighbor

I am trying to follow the proof of Theorem 1 here but can't fully understand the notation the authors use so that I am not able to completely understand the steps. This misunderstanding keeps coming ...
Dr. John's user avatar
  • 125
6 votes
1 answer
274 views

Model performs well on train and cross-validation sets but inaccurate in the test set. How to solve? [duplicate]

I've been working on a CNN binary classification model, and the model performs pretty good in both the training set, and the cross-validation set as well (both practically 1.0 acc). However, I also ...
Efe FRK's user avatar
  • 71
1 vote
0 answers
16 views

Why does all these adaptive methods in neural network training require a $g_t^2$ term?

All the adaptive learning methods, AdaGrad, AdaDelta, RMSprop, ADAM, and later variants all require $g_t^2$ which is the gradient multiplying itself in the elementwise fashion. Why is this needed? I ...
Shamisen Expert's user avatar
0 votes
0 answers
7 views

Why does internal covariate shift affect neural networks when the loss landscape remains the same?

Why does internal covariate shift affect neural networks when the loss landscape remains the same? When reading about internal covariate shift and how batch normalization doesn't really solve it, it ...
richardjoseph's user avatar
1 vote
0 answers
26 views

How to Interpret Reconstruction Error for Anomaly Detection with Autoencoders?

I have an autoencoder based on a neural network. This model was trained using SCADA data. I got decent results in anomaly detection, with around 85% in the main metrics (recall, accuracy, precision, ...
hgropelli's user avatar
1 vote
0 answers
51 views

The gradient method based attack does not seem make sense for neural networks because the training error is non-convex

There are several gradient-based attack methods. Let $J$ be the training error, then for instance the projected gradient attack is, $$ \widetilde{x} = \Pi( x + \epsilon \nabla_x J(\theta, x, y) ) $$ ...
Shamisen Expert's user avatar
0 votes
0 answers
16 views

Given that a flattening operation is used, why is it that a CNN does not have the same problem as vectorizing an image?

So a key motivation for using CNN is that if we were to vectorize an image and then feed it into a standard multilayer perceptron, that vectorized image would lose spatial information between pixels. ...
Shamisen Expert's user avatar
0 votes
0 answers
31 views

Hyperparameter Tuning for Multiple Time Series

I am developing a time-series model utilizing NeuralProphet for forecasting the demand of products by day. I have grouped the products into a number of clusters by features such as average demand, ...
GJKamClark's user avatar
18 votes
2 answers
6k views

What are the works of Hopfield and Hinton that enable machine learning with neural networks, as noted in the physics Nobel award statement?

On October 9, 2024, the Nobel Foundation announced the Nobel Prize in Physics 2024 with the following statement of merit: for foundational discoveries and inventions that enable machine learning with ...
1 vote
0 answers
9 views

How does one measure the fall in performance of Object Detection algorithms with degraded training images? [closed]

I'm supposed to be working on a project that involves studying of the effect on Object Detection algorithms with different kinds of training images. We(team) are supposed to measure the difference in ...
Trainers's user avatar
0 votes
0 answers
20 views

Does it make sense to use Batch Normalization in autoencoders for image reconstruction?

I’m working on an autoencoder for image reconstruction, and I would like to hear your opinions on using Batch Normalization (BN) in this type of architecture. What are the advantages of applying BN ...
Dazckel's user avatar
  • 81
4 votes
2 answers
130 views

Why computation of $P(X)$ is hard in posterior?

Let $X$ denotes the input space of dimension $d$ and $Z$ denotes the output space of size $k$. The posterior is defined as $$P(Z|X) = \frac{P(X|Z)P(Z)}{P(X)}$$ where $P(Z)$ is the prior and $P(X|Z)$ ...
Rma's user avatar
  • 163
0 votes
1 answer
36 views

Modeling approaches for conditional probability distribution, applied to Propensity Score estimation for IPW (causal inference)

I'm trying to understand and ideally implement the Inverse Probability Weighting approach to estimate a causal effect. My ressources so far have been Pearl's Primer and the book "What If?". ...
ThighCrush's user avatar
1 vote
0 answers
25 views

Maxout activation function vs ReLU (Number of weights)

From what I understood, Maxout function works quite differently from ReLU. ReLU function is max(0, x), so the input x is (W_T x + b) Maxout function has many Ws, and it is max(W1_T x + b1, W2_T x + b2,...
kite's user avatar
  • 11
0 votes
0 answers
45 views

Are linear RBFs universal function approximators?

This paper shows that radial basis functions neural networks (RBFs NN, i.e., NN with one hidden layer with RBF activation) are universal function approximator (UVA). How about linear RBFs ...
Simon's user avatar
  • 191
0 votes
0 answers
29 views

Advice on fine-tuning an email classifier for a Pharma company

I'm an intern working on implementing a binary email classifier for a client (Pharmaceutical company) and I need some advice on fine-tuning the model. The model I'm using is Longformer (because it has ...
Bhashwar Sengupta's user avatar
0 votes
0 answers
23 views

What advantage do lstms provide for Apple's language identification over other architectures?

Why do we use LSTMs over other architectures for character-based language identification (LID) from short-strings of text when the LSTM's power comes from its long-range dependency memory? For example,...
karak87rt0's user avatar
1 vote
0 answers
19 views

Why expectation term in VAE loss not implemented in practical?

According to VAE paper : https://arxiv.org/pdf/1906.02691 (eq 2.10, page 21) VAE loss contain the expectation term. If I understand correctly, we need sample more than 1 result from an input x and ...
Manh's user avatar
  • 125
0 votes
0 answers
14 views

Why does convolutional neural network use a 2d Filter

Given an input of C,H,W where C is channels the filter is of size X,Y and slides across each channel individually. Why isn't the filter of size C,X,Y and slides across the entire 3d shape? I know of ...
Fine-Tuning's user avatar
1 vote
1 answer
62 views

Should classical/traditional ML techniques such as polynomial regression/decision trees/random forests SIGNIFICANTLY outperform RNN in timeseries? [closed]

I have a dataset of numerous years of buoy wave height measurements including features such as measured significant wave height, numerical model predictions, peak wave period, mean wave period, and ...
Donald M.'s user avatar
0 votes
0 answers
11 views

How to bound generalization error of a shallow NN?

I have a trained one-hidden layer NN with ReLU. There are 2N+d parameters in total (N is the number of nodes and d is the number of dimension). The sample size to train this NN is exactly N. Now, in ...
J M's user avatar
  • 1
2 votes
1 answer
54 views

Why does having a smaller set of weight help with generalization?

When I studied machine learning for the first time, I learned that we need to use l2 regularization to improve generalization. The reason is based on the polynomial regression experiment in Chris ...
Fraïssé's user avatar
  • 1,630

1
2 3 4 5
200