0

I am performing spam detection and want to visualize spam and ham keywords separately in Wordcloud. Here's my .csv file.

data = pd.read_csv("spam.csv",encoding='latin-1')
data = data.rename(columns = {"v1":"label", "v2":"message"})
data = data.replace({"spam":"1","ham":"0"})

data.head()

Here's my code for WordCloud. I need help with spam_words. I cannot generate the right graph.

import matplotlib.pyplot as plt
from wordcloud import WordCloud 

spam_words = ' '.join(list(data[data['label'] == 1 ]['message']))
spam_wc = WordCloud(width = 512, height = 512).generate(spam_words)

plt.figure(figsize = (10,8), facecolor = 'k')
plt.imshow(spam_wc)
plt.axis('off')
plt.tight_layout(pad = 0)
plt.show()
3
  • 1
    Specifically what is wrong with your current output? Also, you posted the names of your csv files, but it would help if you posted the first few lines of the actual data. Commented Apr 7, 2018 at 17:22
  • Hello @Peter, I want the spam_words variable to only take in messages that were labelled spam. Currently, it is taking in all the messages and showing me a combined wordcloud of spam and ham.
    – Prashant
    Commented Apr 7, 2018 at 17:27
  • @PeterLeimbigler I would like to know if you need more information about the question.
    – Prashant
    Commented Apr 7, 2018 at 18:11

1 Answer 1

0

The issue is that the current code replaces "spam" and "ham" with the one-character strings "1" and "0", but you filter the DataFrame based on comparison with the integer 1. Change the replace line to this:

data = data.replace({"spam": 1, "ham": 0})
0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.