Newest 'tm' Questions

0 votes

1 answer

32 views

How to extract terms and probabilities from tmResult$terms in topic modeling?

I like to create separate word clouds for each of my 8 topics in an LDA model. I extracted top 40 words across 8 topics - an object of length 320 containing top words and occurrence probabilities. I ...

NoaMi

41

asked Aug 8 at 12:16

0 votes

0 answers

22 views

How to obtain and save trigrams from text mining program TM - in text or csv format

I'm hoping to identify tigrams and phrases in a corpus using TM and save the output as a text or csv file. I haven't found a way to do this in Quanteda: How to save n-gram output This reproducible ...

bgreen

87

asked Jul 16 at 21:51

0 votes

0 answers

33 views

Errors attaching metadata to corpus

I am trying to generate a corpus with two documents: one is responses of participants characterized as "supporters" and one is responses of "non-supporters". I've entered this as ...

Nicolette

1

asked Jun 14 at 20:00

1 vote

0 answers

56 views

How do I remove list() from a Corpus?

I have three text files. After preprocessing them using Corpus and the tm package, the resulting text includes the phrases "list(language = "portuguese")" and "list()". ...

Carla

23

asked May 25 at 14:51

0 votes

0 answers

19 views

Get document ID from LDA output R

I'm trying to do LDA over two very large corpus of documents. I need to compare the LDA output (planning to use the Kullback-Leibler similarity measure) across time for each pair of documents. ...

JF96

169

asked May 1 at 18:02

0 votes

1 answer

45 views

Undo stemming after tm::stemDocument()?

I have a list of stemmed words in R. Now, I want to undo my stemming in order to receive a list of all the "complete" words in R. This is the code I used for stemming my wordlist: library(tm)...

lili4491li

11

asked Dec 27, 2023 at 17:18

-1 votes

1 answer

24 views

Error while creating the TDM - "No applicable method for 'meta' applied to an object of class "character""

While creating the tm package TermDocumentMatrix, i am getting error. following code i have used. int_vc <- VCorpus(int_vc) int_vc <- tm_map(int_vc, tolower) int_vc <- tm_map(int_vc, ...

yem

29

asked Oct 20, 2023 at 9:45

2 votes

0 answers

91 views

Unable to edit metadata in corpus

I have the following corpus: library(jsonlite) library(tm) query = "https://www.ebi.ac.uk/ebisearch/ws/rest/pride?query=submitter_country:Norway&size=1000&fields=submitter_keywords&...

Illimar Rekand

103

asked Oct 18, 2023 at 13:54

0 votes

1 answer

120 views

Searching for specific words in Corpus with R (tm package)

I have a Corpus (tm package), containing a collection of 1.300 different text documents [Content: documents: 1.300]. My goal is now to search the frequency of a specific wordlist in each of those ...

Li4991

81

asked Sep 2, 2023 at 12:41

1 vote

0 answers

26 views

Issue with stemCompletion in R

Dear stack overflow community, I have an issue when trying to complete a stemmed Corpus in R using the function stemCompletion within the tm package (https://cran.r-project.org/web/packages/tm/tm.pdf)....

d4rkneo

11

asked Apr 27, 2023 at 9:30

0 votes

1 answer

188 views

How to extract entities names with SpacyR with personalized data?

Good afternoon, I am trying to sort a large corpus of normative texts of different lengths, and to tag the parts of speech (POS). For that purpose, I was using the tm and udpipe libraries, and given ...

Sergio A. Gottret Rios

11

asked Jan 30, 2023 at 0:39

1 vote

1 answer

77 views

Error in tm package while topic modelling

I am running into an error while trying to make a corpus object from the tm package in R. The data have been scraped from a website and I have included the full code below so you can run and see how ...

I_like_insights

91

asked Dec 29, 2022 at 14:50

3 votes

1 answer

151 views

Find overlap in terms between a pair of documents

I have a sparse term-document matrix produced by tm's TermDocumentMatrix. I am trying to write a function that takes two document names and k as its arguments, finds all terms that occur in both ...

dimitriy

9,440

asked Nov 20, 2022 at 4:37

2 votes

1 answer

425 views

Using R to analyse pubmed articles. Trying to create wordcloud but also association with year of publication

MOST RECENT EDIT: I have successfully created my required data frames containing pmid,year and abstract as columns from a literature search on pubmed. I then split this data frame into many separate ...

Aidi

23

asked Oct 29, 2022 at 21:35

1 vote

1 answer

90 views

Text analysis with dictionary of words: NGramTokenizer not working

I am trying to look for a list of keywords in a text. Some of these keywords are n-grams. However, the TermDocumentMatrix will only find single words. I already had a look at several similar questions ...

gitcanzo

129

asked Sep 29, 2022 at 14:40

1 vote

1 answer

33 views

DocumentTermMatrix misses some words

I am using DocumentTermMatrix to find a list of keywords in a long text. Most of the words in my list are correctly found, but there are a couple that are missing. Now, I would love to post here a ...

gitcanzo

129

asked Sep 28, 2022 at 9:52

0 votes

1 answer

310 views

localtime() returns a pointer to a structure with uninitialized members

char datetime[DATETIME_LEN]; time_t timer; struct tm* tm_info; timer = time(NULL); tm_info = localtime(&timer); // debug: tm_info: 0xcccccccccccccccc {tm_sec=??? tm_min=??? tm_hour=??? ...} if (...

cxↄ

1,330

asked Sep 23, 2022 at 14:53

0 votes

1 answer

160 views

is package tm suitable for extracting scores from text data?

I have many cognitive assessment data stored as txt files. Each file looks like this: patient number xxxxxx score A (98) (95)ile% score B (100) (97)ile% test C score D (76) ...

Ian Wang

157

asked Aug 13, 2022 at 13:54

0 votes

0 answers

317 views

Unused argument error using stopwords in R

I am trying to clean and process post data from twitter. The initial corpus produces the following after cleaning: text_corpus[[1]]$content [1] "I actually would love if my Mad Scientist was ...

Macy

11

asked May 19, 2022 at 7:53

1 vote

1 answer

181 views

Remove Words with less than Certain Character Lengths plus Noise Reduction before Tokenization

I have the following data frame report <- data.frame(Text = c("unit 1 crosses the street", "driver 2 was speeding and saw driver# 1", "year 2019 was the ...

S Das

3,391

asked Apr 22, 2022 at 15:46

2 votes

1 answer

235 views

Remove Numbers, Punctuations, White Spaces before Tokenization

I have the following data frame report <- data.frame(Text = c("unit 1 crosses the street", "driver 2 was speeding and saw driver# 1", "year 2019 was the ...

S Das

3,391

asked Apr 22, 2022 at 15:20

0 votes

1 answer

85 views

row_sums vs findFreqTerms for subsetting TermDocMatrix to include words with a given min frequency

my question is straightforward. I have a (binary) TDM and I want to reduce the number of rows to include only those rows that appear in at least two documents: I thought that these two methods would ...

KArrow'sBest

160

asked Apr 22, 2022 at 10:29

2 votes

1 answer

211 views

Turkish characters problem while plotting graphs in R igraph

I have a dataset which includes Tweets in Turkish language. I'm trying to do text mining with tm package and plot the networks with igraph R packages. library(tm) #build corpus corpus <- iconv(...

Naim Cinar

67

asked Apr 7, 2022 at 12:05

2 votes

1 answer

103 views

How to add target variable whether to see sentence belongs to data 1 or data 2?

I am working on a project. I would like to summarize it with similar case. I need to collect n number of tweets with different hastags. Here is similar code: library(tm) #tweets from first hastag ...

Narimanoglu

61

asked Apr 6, 2022 at 13:11

0 votes

0 answers

60 views

Restore original data from document term matrix in R

I want to know if there is a way to go back to my original database (df) after I have made it a document term matrix. Here is an example of what I want to do. df <- data.frame(group=c("A",...

Sergio Parra

23

asked Mar 1, 2022 at 20:09

0 votes

0 answers

78 views

filtered content of corpus by custom function with R

I want to analysis filtered texts by custom function (function with parameters) using R. I used readlines function to extract my text and I get large list with 258 lists. Then, using VCorpus(...

Sari

5

asked Feb 13, 2022 at 8:49

0 votes

2 answers

296 views

How to create a document term incidence matrix from long format text data?

I've got data that look like this: ID word 1 blue 1 red 1 green 1 yellow 2 blue 2 purple 2 orange 2 green But I want to transform them into a binary incidence matrix denoting whether or not ...

nlplearner

115

asked Feb 2, 2022 at 0:56

1 vote

1 answer

264 views

How can I extract bigrams from text without removing the hash symbol?

I am using the following function (based on https://rpubs.com/sprishi/twitterIBM) to extract bigrams from text. However, I want to keep the hash symbol for analysis purposes. The function to clean ...

Chamil Rathnayake

129

asked Jan 8, 2022 at 17:18

0 votes

1 answer

69 views

TermDocumentMatrix Error after Cleaning Corpus

My problem is that I want to pass my corpus to the tm function termdocumentmatrix() and it fails with the error: Error in UseMethod("meta", x): no applicable method for meta' applied to an ...

Mauras

1

asked Dec 15, 2021 at 12:35

1 vote

2 answers

2k views

How to remove these special characters in r in a set of string : â€™s, â€¦

I have this string which contain special characters, I am not able to remove these characters from the main data frame however, when I prepared a separate object by dft and then I use the following ...

Sachin

145

asked Dec 9, 2021 at 14:22

0 votes

1 answer

196 views

Unable to remove these characters from the data in a string in r

I am trying to remove the special character from the following string with the help of following code , but not getting the result : library(tm) v <- "rt shibxwarrior hodl trust processsome ...

Sachin

145

asked Dec 9, 2021 at 11:22

0 votes

1 answer

79 views

Text Mining: Cluster Analysis phrases. ERROR: cannot take a sample larger than the population

I'm working on a dataset of thousands of sentences. The dataset is structured by a column and k rows. I've to find some similarities between them e i'm doing a cluster Analysis. I created a corpus and ...

GIORIGO

59

asked Nov 19, 2021 at 15:50

-1 votes

1 answer

493 views

Extract table from unstructured text file in r

I have a text file namely data.txt containing multiple tables in the following format. // // TABLE ET_ARCMAT // ARCID MATID VALTO VALFR ...

abdul samad

61

asked Oct 10, 2021 at 3:16

0 votes

0 answers

128 views

Custom dictionary for word removing in R

I'd like to create a custom dictionary of word to be removed into a Corpus. I'm using the tm_map command. I'd like to start from a .txt file (like word1,word2,word3; file.txt), import it in R and ...

user15499252

asked Oct 6, 2021 at 13:06

1 vote

0 answers

23 views

Why does the clean.text() function change word frequencies?

I am doing text analysis and reading articles into R. When I use the clean.text() function from TextReg to clean the text of a corpus and then look up word frequencies using term_stats() from tm, the ...

user6542495

11

asked Sep 7, 2021 at 16:40

2 votes

1 answer

75 views

Some words won't be stemmed using tm ("easier" or "easiest")

I have large questionaire dataset where some of the features need to be stemmed, with the goal being to assign a topic to each response. However, I'm having trouble stemming some words using the ...

Chris Oosthuizen

99

asked Aug 7, 2021 at 17:43

0 votes

2 answers

359 views

subscript out of bounds error in document-term matrix

I am doing text mining in following data, but I get following error at the end Error in `[.simple_triplet_matrix`(dtm, 1:10, 1:10) : subscript out of bounds can you help me address this error. ...

Cina

10.2k

asked Jul 28, 2021 at 19:05

2 votes

4 answers

2k views

Have mktime() ignore DST and local time zone in C++

Our system receives data from a vendor in ASCII format "20210715083015". This time is US Eastern time, and is already adjusted for Daylight Savings. Our backend needs this time in ...

Flyboy Wilson

49

asked Jul 21, 2021 at 21:10

1 vote

2 answers

388 views

Convert Corpus from quanteda to tm

My data mycorpus is in a quanteda-corpus (corpus-function from quanteda) which I need to convert to a corpus under the tm package. I know about quanteda's convert-function. This, though, only converts ...

arndtupb

62

asked Jul 21, 2021 at 14:08

1 vote

1 answer

36 views

tm package removeWords function concatenate words in R

Am cleaning the sample data using removewords from tm package but removeWords function concatenate the words post removal. It should be "environmental dead frog" "environmental dead ...

Dhinesh G

23

asked Jul 20, 2021 at 7:41

1 vote

2 answers

420 views

search for word/phrase from column in R

I have data that looks like this: > head(df) ID Comment 1 1 I ate dinner. 2 2 ...

user11015000

159

asked Jul 1, 2021 at 17:27

1 vote

1 answer

79 views

Lost one document during tokenization

I lost one row of data in the tokenization process. There are three documents in this data set structure(list(ID = c("N12277Y", "N12284X", "N12291W"), corrected = c("...

karyn-h

133

asked Jun 24, 2021 at 22:15

0 votes

1 answer

27 views

R TM package produces strange results with Inspect command

I'm having a little trouble with the inspect function from the tm package in R. I have a sample 2-row data.table as defined below: dt <- data.table(doc_id = c(1, 2), text = c('the driver of the 1st ...

AlexP

637

asked Jun 14, 2021 at 14:58

2 votes

1 answer

44 views

combining words in tm R is not achieving desired result

I am trying to combine a few words so that they count as one. In this example I want val and valuatin to be counted as valuation. The code I have been using to try and do this is below: #load in ...

user11015000

159

asked Jun 11, 2021 at 5:23

0 votes

1 answer

1k views

Cosine Similarity Matrix in R

I have a document term matrix, "mydtm" that I have created in R, using the 'tm' package. I am attempting to depict the similarities between each of the 557 documents contained within the dtm/...

Luke Hansen

17

asked Jun 2, 2021 at 19:47

0 votes

1 answer

186 views

Dealing with several text columns in a labeled data set while running NLP in R

Hope all of you guys are healthy and well. I am new to the world of NLP and my question may sound stupid, so I apologize in advance.I would like to perform NLP on some text data which is labeled and ...

Alex

245

asked Apr 29, 2021 at 21:11

0 votes

1 answer

65 views

R Tm package dictionary matching leads to higher frequency than actual words of text

I have been using the code below to load text as a corpus and using the tm package to clean the text. As a next step I am loading a dictionary and cleaning it as well. Then I am matching the words ...

user15721704

1

asked Apr 21, 2021 at 14:32

0 votes

1 answer

495 views

How to break a corpus into paragraphs using custom delimiters

I am scraping the New york Times webpages to do some natural language processing on it, I want to split the webpage into paragraphs when using corpus in order to do frequency counts on words that ...

SLE

85

asked Apr 9, 2021 at 14:20

0 votes

1 answer

120 views

Issue with adding breaks to a tm_object

I am having troubles with adding fixed breaks to a tm_map. I tried the same code as another topic at this forum (Customize how R tmap legend values are printed) on a different dataset, but the ...

Jelmer Visser

29

asked Jan 8, 2021 at 9:46

0 votes

2 answers

308 views

Calculating term frequencies in a big corpus efficiently regardless of document boundaries

I have a corpus of almost 2m documents. I want to calculate the term frequencies of the terms in the whole corpus, regardless of document boundaries. A naive approach would be combining all the ...

Rafs

796

asked Dec 18, 2020 at 12:37

Collectives™ on Stack Overflow

Related Tags