All Questions
18 questions
0
votes
0
answers
19
views
Get document ID from LDA output R
I'm trying to do LDA over two very large corpus of documents.
I need to compare the LDA output (planning to use the Kullback-Leibler similarity measure) across time for each pair of documents. ...
1
vote
1
answer
77
views
Error in tm package while topic modelling
I am running into an error while trying to make a corpus object from the tm package in R.
The data have been scraped from a website and I have included the full code below so you can run and see how ...
1
vote
1
answer
1k
views
DocumentTermMatrix /LDA produces non-zero entry error when there is no empty documents
I'm trying my first LDA model in R and got thrown in an error
Error in LDA(Corpus_clean_dtm, k, method = "Gibbs", control = list(nstart = nstart, : Each row of the input matrix needs to contain ...
0
votes
1
answer
496
views
sLDA for predicting categorical response instead of continuous in R
I have a collection of documents, that might have latent topics associated with them. It is likely that each document might relate to one or more topics. I have a master file of all possible "topics"/...
3
votes
0
answers
290
views
How do I create DocumentTermMatrix directly from list/vector of terms?
How do I create DocumentTermMatrix directly from list/vector of terms ?
I'd like to calculate LDA for my corpus using bigrams instead of words. Thus I do following:
Convert each document to words ...
0
votes
0
answers
339
views
DocumentTermMatrix() return 0 terms in tm package
I have an object like that:
str(apps)
chr [1:17517] "35 44 33 40 33 40 44 38 33 37 37" ...
In each row, the number is separated by space.
corpus<-Corpus(VectorSource(apps))
dtm<-...
3
votes
1
answer
2k
views
Plot the evolution of an LDA topic across time
I'd like to plot how the proportion of a particular topic changes over time, but I've been having some trouble isolating a single topic and plotting over time, especially for plotting multiple groups ...
0
votes
1
answer
356
views
Manually Specifying a Topic Model in R
I have a corpus of text with each line in the csv file uniquely specifying a "topic" I am interested in. If I were to run an topic model on this corpus using an LDA or Gibbs method from either the ...
0
votes
1
answer
2k
views
LDA with tm package in R using bigrams
I have a csv with every row as a document. I need to perform LDA upon this. I have the following code :
library(tm)
library(SnowballC)
library(topicmodels)
library(RWeka)
X = read.csv('doc.csv',sep="...
21
votes
4
answers
37k
views
How does the removeSparseTerms in R work?
I am using the removeSparseTerms method in R and it required a threshold value to be input. I also read that the higher the value, the more will be the number of terms retained in the returned matrix.
...
1
vote
0
answers
329
views
Error using lexicalize() and lda.collapsed.gibbs.sampler() in R
I am new to topic modeling and was testing the lda.collapsed.gibbs.sampler() method by trying to "characterize" some 98 CVs. I first tried to do it using a corpus (as it is easier to do filtering etc)...
0
votes
1
answer
2k
views
Text Analysis Using LDA and tm in R
Hey guys I have a little bit of trouble conduction LDA because for some reason once I get ready to conduct the analysis I get errors. I'll do my best to go through what I am doing unfortunately I will ...
4
votes
1
answer
5k
views
In R tm package, build corpus FROM Document-Term-Matrix
It's straightforward to build a document-term matrix from a corpus with the tm package.
I'd like to build a corpus from a document-term-matrix.
Let M be the number of documents in a document set.
...
0
votes
2
answers
1k
views
Work-around to clear blank entries in a document term matrix?
I have some r code that I've used in the past to produce topic models. Everything was working fine until I updated all of my r packages in the hopes of fixing a slightly unrelated problem. Now, code ...
1
vote
1
answer
982
views
R topic modeling - lda command 'lexicalize' giving unexpected results
I am using the 'lda' package in R to perform a topic model analysis of a corpus (let's call it 'corpusB'). I am preparing the corpus for the analysis by first using the command 'lexicalize', which ...
2
votes
1
answer
1k
views
lda.collapsed.gibbs.sampler initial not working in R
I'm totally new to R and I'm currently working with the tm and lda packages to analyze a log.
The lda.collapsed.gibbs.sampler can take an "initial" parameter, and in documentation it's ...
1
vote
2
answers
1k
views
Number of documents for Latent Dirichlet Allocation (LDA)
Thanks for taking the time to look at this question. I recently scraped some text from the web and saved the output as one .txt file of about ~300 pages. I am trying to implement LDA to build topics ...
20
votes
3
answers
28k
views
LDA with topicmodels, how can I see which topics different documents belong to?
I am using LDA from the topicmodels package, and I have run it on about 30.000 documents, acquired 30 topics, and got the top 10 words for the topics, they look very good. But I would like to see ...