Newest 'sequence-analysis' Questions

0 votes

0 answers

39 views

Is my time course analysis with DESeq2 valid?

As a pure behavioural ecologist who has stumbled into the world of gene expression analysis and am a novice in analyzing it, I am asking for help in validating whether my model is correct for the type ...

Jason Rissanen

41

asked Nov 14 at 13:12

0 votes

1 answer

39 views

How to approximate the point a sequence is converging to?

I have created a poker solver as part of my Master's Thesis. This solver uses Counterfactual Regret Minimization (CFR) to compute a Nash Equilibrium of Hold'em or Omaha Poker. The solver uses existing ...

Timon Groen

1

asked Jun 5 at 14:39

0 votes

0 answers

23 views

Random sequence generator algorithm non informative piror distribution

I want to conduct a Bayesian statistical analysis of a sequence generation phenomenon. The sequences generated contain elements from a known alphabet. Working on that, I have tried to define the prior ...

Guilhem Nespoulous

1

asked Feb 20 at 15:49

0 votes

1 answer

55 views

Optimization of fault diagnosis sequence using probability and cost [closed]

I am developping the algorithm to optimize the fault diagnosis sequence using probability and cost. For exemple, I have 3 diagnosis actions possibles : option 1 : probability which can detect the root ...

stat_man

3

asked Feb 13 at 16:52

0 votes

1 answer

97 views

How to look at between-group differences for a single gene using RNA seq data

I have an RNA seq dataset, but I am only interested in the expression of a single pre-specified gene and to compare it between 2 groups (patient phenotypes). Some have suggested (without a reference) ...

Kristoffer N

1

asked Dec 18, 2023 at 12:26

0 votes

1 answer

112 views

Why use sliding window input features in sequence modeling?

I was reading through the DNABERT paper and found that their input features were k-mers. This is equivalent to using rolling/sliding window features in the other common family of sequential problem, ...

Avatrin

102

asked Sep 8, 2023 at 19:55

1 vote

1 answer

46 views

Statistical significance in known population

I am working with a data set with the sequence identity (a value in [0,1] representing the conservation between sequences) of many genes for many bacterial strains. I would like to be able to draw ...

Rachel

13

asked Apr 24, 2023 at 18:34

1 vote

1 answer

76 views

How to test differences (over time and between treatments) of a specific species in DNA metabarcoding sequencing data?

I have DNA metabarcoding sequencing data in the following format: plot Time_point reads_species_A Reads_species_B reads_species_C 1 T1 0 245 65 2 T1 48 455 0 3 T1 15 5 10 1 T3 153 23 564 2 T3 ...

RobH

113

asked Feb 27, 2023 at 17:12

1 vote

2 answers

437 views

Weird Cooks distance results using DESeq2

I'm currently trying to assess fold change when comparing two different sample types using DESeq2 package and I'm getting weird Cook's distance values which are causing major problems. The two ...

Miguel

11

asked Feb 15, 2023 at 16:10

1 vote

1 answer

143 views

Multichannel distance from a reference sequence

I am working with a large dataset and applying multichannel sequence analysis to two life course domains. I would like to adapt a solution suggested in the post below to multichannel sequences, but I ...

Léa Pessin

13

asked Dec 1, 2022 at 18:19

0 votes

1 answer

1k views

A method for clustering 1D signals?

I have samples from 150 different genes containing the following information: sequence of the gene signal strength along the length of the gene (the signal can be negative or positive). I have ...

Ender

5

asked Mar 17, 2022 at 8:40

2 votes

0 answers

417 views

How to calculate the evaluation metrics on streaming data for online ML algorithms

I am working on a binary classification problem where I need to develop an online ML model that can work on streaming data. However, I am not sure how can I use the evaluation metrics for ...

Amhs_11

333

asked Aug 30, 2021 at 19:13

1 vote

1 answer

145 views

Viewing automated cost matrix for DHD in TraMineR

I'm using social sequence analysis, and comparing between different distance methods for my data. I'm wondering if there is a way to view/call the automatic substitution cost matrix that the dynamic ...

Siobhan

13

asked Jun 9, 2021 at 0:53

0 votes

1 answer

31 views

Determining a p-value for a test statistic that depends on other test statistics

Sorry for the confusing wording of the title. If some has any better way to word it, please feel free to change it. Background For those unfamiliar with bioinformatics data, I have data from a ...

The_Questioner

119

asked May 9, 2021 at 15:07

0 votes

0 answers

775 views

What does it mean if a simple linear neural network performs better than an LSTM on sequential data?

I'm working on a genetic data project, where one data sample is represented as sequences of integers (of length 2000) and it needs to be classified into one of 4 classes, so I guess it is similar to ...

Ronny

1

asked Apr 25, 2021 at 16:02

1 vote

1 answer

58 views

Why shouldn't you mix variable size inputs in the same minibatch?

I am trying to build a CNN-LSTM architecure in tf.keras that classifies sequences of varying sizes. My training data is highly variable and I would have to crop/pad sequences in order to create ...

Tom

53

asked Mar 24, 2021 at 22:54

0 votes

0 answers

22 views

Choosing a model for input: categorised, weighted sequence, output: binary variable

What would be an appropriate model for predicting a binary target variable, given a weighted sequence? Sequences will be reasonably short, typically between ~ 1 and 5 elements. I have in the order of ...

Ian

101

asked Dec 17, 2020 at 16:01

1 vote

0 answers

29 views

What are the classifiers that can be used for sequence data?

I've been going through the classifiers like Naive Bayes, Decision Tree etc. I've a sequence data like so ...

vbnr

111

asked Sep 9, 2020 at 4:06

0 votes

0 answers

402 views

Training and testing transformer model from scratch

As you know, transformers are one of the strongest model in the field of NLP and machine translation. I know there are many resources, but I still could not find a good tutorial teaching how to use ...

Kadaj13

395

asked Jul 18, 2020 at 12:15

2 votes

1 answer

618 views

Sequences comparison metrics

I know about Edit distance, Longest Common Subsequence and their normalized versions to measure the similarity between sequences But do we have any similarity measures other than the above ones?

Shivanisrivarshini

49

asked Jun 7, 2020 at 5:29

0 votes

0 answers

27 views

Predicting the Winner of a sequence of numbers

I have various series of numbers of different lengths (ranging from 4 to 10) such as the example below: [1.5, 5.0, 6.0, 6.0, 8.0] [1.4, 6.0, 7.5, 9.0, 50.0, 100.0, 200.0] For each one of them I ...

Peterlytics

1

asked Mar 10, 2020 at 8:17

1 vote

0 answers

84 views

Analyse set of Sequences of varying length with PCA?

Task description I have a dataset with strings indicating the sequence of the screens a user visit when making a purchase on an app. A string could be: "1,2,1,2,3,3,4,5,6,7,3,2,5,6". Another string ...

Mikkel Miqlliot Lehmann

11

asked Dec 16, 2019 at 16:47

0 votes

1 answer

181 views

Binary Sequence Prediction Model with Time dependant features

I got a very long sequence of binary items (0 or 1). Each item is associated to a timestamp. For example : ...

hans glick

55

asked Oct 18, 2019 at 14:01

1 vote

0 answers

98 views

Likelihood Matrix from a Random Forest?

I'm going through the supporting material of a paper (https://science.sciencemag.org/content/360/6384/81) , trying to reproduce their results (see below, note that HVG=highly variable gene). The data ...

Jo Fisher

11

asked Sep 24, 2019 at 15:50

1 vote

0 answers

39 views

Ideas for determining the optimal sequence of calls and emails to maximize the probability of a sales lead converting to a sale?

I have a large data set of sales leads that are in the form of a lead_id, a sequence of binary integers that denote the order of emails and phone calls made to a sales lead, and the binary outcome of ...

statsquestions

11

asked Jul 26, 2019 at 19:41

1 vote

1 answer

99 views

have many error likelihoods, how to combine to get a confidence or p value?

I'm working in bioinformatics and its been a long time since I dusted on my statistics. Basically I'm working on variant calling which amounts to sequencing a large number of sequence reads and ...

lonestar21

111

asked Jul 19, 2019 at 23:27

3 votes

0 answers

300 views

Which distance metric to use to cluster categorical sequences (clickstreams or clickpaths)?

For my research, I want to cluster website visitors based on their clickstreams to understand different information behavior patterns (i.e., customer/visitor journeys). The data can be characterized ...

MLud

51

asked Jun 24, 2019 at 12:30

0 votes

1 answer

53 views

ATGC sequence of gene expression data [closed]

I am not a pro in genetics so please excuse my non-technical language. I need dataset which contains the gene expression as well as the associated ATGC sequence with each gene expression value. For ...

Statistical_Research

156

asked May 27, 2019 at 7:36

1 vote

1 answer

575 views

Handling missing data in Sequence Analysis (TraMineR) within the observation window

I'm using sequence analysis. I have a question about how to deal with missing data within the observation window. The starting point of the analysis is when respondents leave secondary school (t0). I ...

Robin

11

asked Apr 4, 2019 at 13:51

4 votes

1 answer

431 views

Unsupervised clustering of sequence of events to subsequences

I have a big dataset of M sequences of [1 - N] events, where each event has multiple properties (start date, end date, location, ...

Dimgold

318

asked Mar 25, 2019 at 8:09

3 votes

2 answers

115 views

Localized distance function on sequential binary data

I am trying to find a good distance function for sequential data that is all binary. For now, I am using Edit distance however I have some more domain-specific knowledge that I would like to ...

Maximal

213

asked Jan 15, 2019 at 11:09

2 votes

2 answers

931 views

Sequence prediction based on non-sequential inputs

I have a dataset with timestamps and event values (true or false -- these are based on sensor data which detect room occupancy). I'd like to build a model that would take a timestamp as an input and ...

de1pher

163

asked Dec 24, 2018 at 22:20

1 vote

0 answers

168 views

Probability of Finding Two Matching Subsequences in a Sequence

I'm currently studying DNA sequencing and am trying to find a formula which gives the probability that a subsequence of length $k$ appears twice in a sequence of $L$ bases (characters); this is pretty ...

BodneyC

11

asked Aug 15, 2018 at 12:03

0 votes

1 answer

36 views

Estimation of partitioning error in next-generation sequencing experiments

[Edited: explanation of the partitioning error] I would like to estimate how the initial number of molecules (or the level of gene expression) affects reproducibility between technical replicates of ...

hibernicah

120

asked Aug 5, 2018 at 19:33

7 votes

0 answers

805 views

traditional state-space models and LSTMs

I am trying to understand the nature of LSTMs in relation to intuitions from traditional state-space models (e.g., Kalman filtering). The code below aims to simulate a simple univariate linear state-...

user46098

71

asked Jul 8, 2018 at 18:53

2 votes

0 answers

42 views

Selection of differentially expressed genes

I don't have any statistical background. Have some questions. I see in some research papers they select differentially expressed genes based on fold change and p.value. And in some other papers I ...

beginner

175

asked May 28, 2018 at 9:10

2 votes

1 answer

145 views

Calculating p-values for ratios of binomial variables

I have a problem that I will express in 2 ways: a math-y way and a biology way. Hopefully this will make it more clear. Math-y way: I have N observations of a pair of binomial variables, call them ...

Nathan Crook

21

asked May 9, 2018 at 23:54

4 votes

2 answers

2k views

Clustering customers by their orders sequence patterns

I have dataset with clients orders. Example: ...

Andrey

61

asked Mar 29, 2018 at 12:50

2 votes

1 answer

91 views

Sequence prediction: ambiguity in training set

There is a set of sequences (train set), where each element is one or multiple tags: A, B -- A -- Z -- Z, A B -- A -- Z -- D ... Given a new sequence: ...

Denis Kulagin

223

asked Mar 4, 2018 at 14:54

3 votes

1 answer

417 views

TraMineR: predicting class membership of new sequences

Question summary This question pertains to analyzing dissimilarities between discrete state sequences (e.g., using TraMineR), more specifically to classifying new, ...

Maxim.K

560

asked Feb 12, 2018 at 16:29

3 votes

1 answer

268 views

Occurrence of at least 1 HT and HH in sequences of 4 coin flips not equally likely

I was reading this interesting article on hot hands and streaks in sports. The article revolves around the 16 possible sequences of 4 coin flips (H = heads, T = tails): ...

beta

253

asked Sep 30, 2017 at 16:05

0 votes

1 answer

1k views

Recurrent Neural Network model with more than one input

I know RNNs (with LSTMs or GRUs) are now one of the most promising options for modelling sequential data, when ordering of the data matters. However, sometimes there are also some categorical ...

PDRX

103

asked Aug 20, 2017 at 20:48

2 votes

1 answer

73 views

Testing pdist() for statistical significance

Using pdist() in the PST package, two probabilistic suffix trees (PSTs) can be compared to each other. The function will output ...

histelheim

3,063

asked Aug 4, 2017 at 21:18

2 votes

0 answers

458 views

Building RNNs on mixed sequential and non-sequential data

I have a data set that is a bunch of windows, for each window I want to perform regression. The windows themselves have four sequential features that span back about 24 time steps. Additionally each ...

Brock

21

asked Jul 26, 2017 at 23:37

7 votes

1 answer

23k views

Optimum number of epochs and neurons for an LSTM network

I wanted to know if there's a way to select an optimum number of epochs and neurons to forecast a certain time series using LSTM, the motive being automation of the forecasting problem, i.e. the ...

Ankush Raut

71

asked Jun 22, 2017 at 12:19

5 votes

2 answers

10k views

Sliding window for time series modelling

I am modelling on an univariate time series in a form as shown. Suppose the time interval in the series is daily base, namely every y was collected every day. I wanna use sliding window method to ...

LUSAQX

463

asked May 30, 2017 at 4:06

5 votes

1 answer

572 views

Predicting the observations in a POMDP with a recurrent neural network

I use neural networks for online sequence prediction. The performance of LSTM in this case, however, is not nearly as good as I expected. Maybe someone can help me understand where the problem lies. ...

wehnsdaefflae

546

asked May 28, 2017 at 9:53

0 votes

0 answers

105 views

Is there any sequence dissimilarity measure that has an intuitive interpretation like the dissimilarity index has?

I want to know how much the sequences in my sample differ from a given ideal-typical sequence. Is there any intuitive way of interpreting the dissimilarity measures for sequences? If it would be the ...

Kenji

858

asked May 4, 2017 at 9:33

0 votes

1 answer

101 views

Is using "Normal Approximation to binomial distribution" to test mutation enrichment in genomic region correct?

We are analyzing cancer patient mutation data. We defined set of region on the human genome as binding events, (for the ones who is interested in to the subject, it is a transcription factor binding ...

MorTunco

55

asked Mar 24, 2017 at 11:06

1 vote

1 answer

159 views

Can I use chi-squared test of independence to compare base compositions of human genome?

I need to prove that a given region on the genome maintains has the same base composition proportions (20% A, 15% T, 19% C, etc…) of that genome. I thought about doing Chi-squared test of independence ...

MorTunco

55

asked Mar 19, 2017 at 12:40

Questions tagged [sequence-analysis]

Related Tags