Machine: Windows 7 - 64 bit R Version : R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
I am working on stemming some text for an analysis that I am doing, I am able to do everything all the way up until 'stemComplete' For more context please see the below;
Packages:
- TM
- SnowballC
- rJava
- RWeka
- Rwekajars
- NLP
Sample list of words
test <- as.vector(c('win', 'winner', 'wins', 'wins', 'winning'))
Convert to Corpus
Test_Corpus <- Corpus(VectorSource(test))
Text manipulations`
Test_Corpus <- tm_map(Survey_Corpus, content_transformer(tolower))
Test_Corpus <- tm_map(Survey_Corpus, removePunctuation)
Test_Corpus <- tm_map(Survey_Corpus, removeNumbers)
Stemming using tm_map under the tm package
>Test_stem <- tm_map(Test_Corpus, stemDocument, language = 'english' )
Below is the result from stemming above, which is all correct so far:
- win
- winner
- win
- win
- win
Now comes the issue! When I try to use test_corpus as a dictionary to transform the words back to an appropriate format using the following code;
>Test_complete <- tm_map(Test_stem, stemCompletion, Test_Corpus)
Below is the error message that I am getting:
Warning messages:
1: In grep(sprintf("^%s", w), dictionary, value = TRUE) :
argument 'pattern' has length > 1 and only the first element will be used
2: In grep(sprintf("^%s", w), dictionary, value = TRUE) :
argument 'pattern' has length > 1 and only the first element will be used
3: In grep(sprintf("^%s", w), dictionary, value = TRUE) :
argument 'pattern' has length > 1 and only the first element will be used
4: In grep(sprintf("^%s", w), dictionary, value = TRUE) :
argument 'pattern' has length > 1 and only the first element will be used
5: In grep(sprintf("^%s", w), dictionary, value = TRUE) :
argument 'pattern' has length > 1 and only the first element will be used
I have tried several things listed on previous posts and seen that other people with the same problem have tried with no luck. Below is a list of those things:
- Update Java
- used content_transformation
- used PlainTextDocument