Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
0 votes
0 answers
19 views

Get document ID from LDA output R

I'm trying to do LDA over two very large corpus of documents. I need to compare the LDA output (planning to use the Kullback-Leibler similarity measure) across time for each pair of documents. ...
JF96's user avatar
  • 169
1 vote
1 answer
77 views

Error in tm package while topic modelling

I am running into an error while trying to make a corpus object from the tm package in R. The data have been scraped from a website and I have included the full code below so you can run and see how ...
I_like_insights's user avatar
1 vote
1 answer
1k views

DocumentTermMatrix /LDA produces non-zero entry error when there is no empty documents

I'm trying my first LDA model in R and got thrown in an error Error in LDA(Corpus_clean_dtm, k, method = "Gibbs", control = list(nstart = nstart, : Each row of the input matrix needs to contain ...
byc's user avatar
  • 141
0 votes
1 answer
496 views

sLDA for predicting categorical response instead of continuous in R

I have a collection of documents, that might have latent topics associated with them. It is likely that each document might relate to one or more topics. I have a master file of all possible "topics"/...
MiscRas's user avatar
3 votes
0 answers
290 views

How do I create DocumentTermMatrix directly from list/vector of terms?

How do I create DocumentTermMatrix directly from list/vector of terms ? I'd like to calculate LDA for my corpus using bigrams instead of words. Thus I do following: Convert each document to words ...
expert's user avatar
  • 30k
0 votes
0 answers
339 views

DocumentTermMatrix() return 0 terms in tm package

I have an object like that: str(apps) chr [1:17517] "35 44 33 40 33 40 44 38 33 37 37" ... In each row, the number is separated by space. corpus<-Corpus(VectorSource(apps)) dtm<-...
ysfseu's user avatar
  • 676
3 votes
1 answer
2k views

Plot the evolution of an LDA topic across time

I'd like to plot how the proportion of a particular topic changes over time, but I've been having some trouble isolating a single topic and plotting over time, especially for plotting multiple groups ...
mlinegar's user avatar
  • 1,399
0 votes
1 answer
356 views

Manually Specifying a Topic Model in R

I have a corpus of text with each line in the csv file uniquely specifying a "topic" I am interested in. If I were to run an topic model on this corpus using an LDA or Gibbs method from either the ...
william's user avatar
0 votes
1 answer
2k views

LDA with tm package in R using bigrams

I have a csv with every row as a document. I need to perform LDA upon this. I have the following code : library(tm) library(SnowballC) library(topicmodels) library(RWeka) X = read.csv('doc.csv',sep="...
dulla's user avatar
  • 136
21 votes
4 answers
37k views

How does the removeSparseTerms in R work?

I am using the removeSparseTerms method in R and it required a threshold value to be input. I also read that the higher the value, the more will be the number of terms retained in the returned matrix. ...
London guy's user avatar
1 vote
0 answers
329 views

Error using lexicalize() and lda.collapsed.gibbs.sampler() in R

I am new to topic modeling and was testing the lda.collapsed.gibbs.sampler() method by trying to "characterize" some 98 CVs. I first tried to do it using a corpus (as it is easier to do filtering etc)...
Krish's user avatar
  • 11
0 votes
1 answer
2k views

Text Analysis Using LDA and tm in R

Hey guys I have a little bit of trouble conduction LDA because for some reason once I get ready to conduct the analysis I get errors. I'll do my best to go through what I am doing unfortunately I will ...
theamateurdataanalyst's user avatar
4 votes
1 answer
5k views

In R tm package, build corpus FROM Document-Term-Matrix

It's straightforward to build a document-term matrix from a corpus with the tm package. I'd like to build a corpus from a document-term-matrix. Let M be the number of documents in a document set. ...
sinwav's user avatar
  • 724
0 votes
2 answers
1k views

Work-around to clear blank entries in a document term matrix?

I have some r code that I've used in the past to produce topic models. Everything was working fine until I updated all of my r packages in the hopes of fixing a slightly unrelated problem. Now, code ...
beniam's user avatar
  • 99
1 vote
1 answer
982 views

R topic modeling - lda command 'lexicalize' giving unexpected results

I am using the 'lda' package in R to perform a topic model analysis of a corpus (let's call it 'corpusB'). I am preparing the corpus for the analysis by first using the command 'lexicalize', which ...
user3197869's user avatar
2 votes
1 answer
1k views

lda.collapsed.gibbs.sampler initial not working in R

I'm totally new to R and I'm currently working with the tm and lda packages to analyze a log. The lda.collapsed.gibbs.sampler can take an "initial" parameter, and in documentation it's ...
J. Born's user avatar
  • 101
1 vote
2 answers
1k views

Number of documents for Latent Dirichlet Allocation (LDA)

Thanks for taking the time to look at this question. I recently scraped some text from the web and saved the output as one .txt file of about ~300 pages. I am trying to implement LDA to build topics ...
user2928284's user avatar
20 votes
3 answers
28k views

LDA with topicmodels, how can I see which topics different documents belong to?

I am using LDA from the topicmodels package, and I have run it on about 30.000 documents, acquired 30 topics, and got the top 10 words for the topics, they look very good. But I would like to see ...
d12n's user avatar
  • 861