R : Text Analysis - tm Package - stemComplete error

Question

Machine: Windows 7 - 64 bit R Version : R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"

I am working on stemming some text for an analysis that I am doing, I am able to do everything all the way up until 'stemComplete' For more context please see the below;

Packages:

TM
SnowballC
rJava
RWeka
Rwekajars
NLP

Sample list of words

test <- as.vector(c('win', 'winner', 'wins', 'wins', 'winning'))

Convert to Corpus

Test_Corpus <- Corpus(VectorSource(test))

Text manipulations`

Test_Corpus <- tm_map(Survey_Corpus, content_transformer(tolower))
Test_Corpus <- tm_map(Survey_Corpus, removePunctuation)
Test_Corpus <- tm_map(Survey_Corpus, removeNumbers)

Stemming using tm_map under the tm package

>Test_stem <- tm_map(Test_Corpus, stemDocument, language = 'english' )

Below is the result from stemming above, which is all correct so far:

win
winner
win
win
win

Now comes the issue! When I try to use test_corpus as a dictionary to transform the words back to an appropriate format using the following code;

>Test_complete <- tm_map(Test_stem, stemCompletion, Test_Corpus)

Below is the error message that I am getting:

Warning messages:

1: In grep(sprintf("^%s", w), dictionary, value = TRUE) :
argument 'pattern' has length > 1 and only the first element will be  used
2: In grep(sprintf("^%s", w), dictionary, value = TRUE) :
argument 'pattern' has length > 1 and only the first element will be used
3: In grep(sprintf("^%s", w), dictionary, value = TRUE) :
argument 'pattern' has length > 1 and only the first element will be used
4: In grep(sprintf("^%s", w), dictionary, value = TRUE) :
argument 'pattern' has length > 1 and only the first element will be used
5: In grep(sprintf("^%s", w), dictionary, value = TRUE) :
argument 'pattern' has length > 1 and only the first element will be used

I have tried several things listed on previous posts and seen that other people with the same problem have tried with no luck. Below is a list of those things:

Update Java
used content_transformation
used PlainTextDocument

I'm not sure your formatting is doing what you think it is. Indent for code blocks (including comments) and try to avoid overuse of headers. — Nathan Tuggy, Commented Feb 20, 2015 at 1:19

saldaihani · Accepted Answer · 2015-02-20 03:17:46Z

0

I think you need to save your test_corpus as a dictionary before the stemming process. You could try something like Test_Corpus <- corpus then you could start the steming and using corpus later on in Test_complete <- tm_map(corpus, stemCompletion).

answered Feb 20, 2015 at 3:17

saldaihani

12 bronze badges

By changing the name of the corpus at the point of stemming it does the same things right?
– Jacob Johnston
Commented Feb 20, 2015 at 20:12

Add a comment |

Collectives™ on Stack Overflow

R : Text Analysis - tm Package - stemComplete error

Sample list of words

Convert to Corpus

Text manipulations`

Stemming using tm_map under the tm package

Below is the result from stemming above, which is all correct so far:

Now comes the issue! When I try to use test_corpus as a dictionary to transform the words back to an appropriate format using the following code;

Below is the error message that I am getting:

Warning messages:

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
regex
r
text
tm
stemming
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

Sample list of words

Convert to Corpus

Text manipulations`

Stemming using tm_map under the tm package

Below is the result from stemming above, which is all correct so far:

Now comes the issue! When I try to use test_corpus as a dictionary to transform the words back to an appropriate format using the following code;

Below is the error message that I am getting:

Warning messages:

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged regexrtexttmstemming or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
regex
r
text
tm
stemming
or ask your own question.