Skip to main content
added dtm / tdm difference
Source Link
phiver
  • 23.6k
  • 14
  • 47
  • 58

To retrieve your vector you can do it in multiple ways.

simple, but not recommended unless for quick test:

my_doc <- inspect(dtm[dtm$dimnames$Docs == "181288",])

Doing it like this limits you to what inspect does and this only shows a maximum of 10 documents.

Better way, create a selection list if you want to and filter the dtm. This keeps the sparse matrix format, then transform what you need into a data.frame for further manipulation if needed.

my_selection <- c("181288", "182465") 

# selection in case of dtm
my_dtm_selection <- dtm[dtm$dimnames$Docs %in% my_selection, ]

# selection in case of tdm
my_tdm_selection <- tdm[, tdm$dimnames$Docs %in% my_selection]

# create data.frame with document names as first column, followed by the terms
my_df_selection <- data.frame(docs = Docs(my_dtm_selection), as.matrix(my_dtm_selection))

The answer to your second question: yes, almost empty. Or better framed, a lot of empty cells. But you might have more data than you think if you have a lot of documents and terms.

To retrieve your vector you can do it in multiple ways.

simple, but not recommended unless for quick test:

my_doc <- inspect(dtm[dtm$dimnames$Docs == "181288",])

Doing it like this limits you to what inspect does and this only shows a maximum of 10 documents.

Better way, create a selection list if you want to and filter the dtm. This keeps the sparse matrix format, then transform what you need into a data.frame for further manipulation if needed.

my_selection <- c("181288", "182465")
my_dtm_selection <- dtm[dtm$dimnames$Docs %in% my_selection, ]

# create data.frame with document names as first column, followed by the terms
my_df_selection <- data.frame(docs = Docs(my_dtm_selection), as.matrix(my_dtm_selection))

The answer to your second question: yes, almost empty. Or better framed, a lot of empty cells. But you might have more data than you think if you have a lot of documents and terms.

To retrieve your vector you can do it in multiple ways.

simple, but not recommended unless for quick test:

my_doc <- inspect(dtm[dtm$dimnames$Docs == "181288",])

Doing it like this limits you to what inspect does and this only shows a maximum of 10 documents.

Better way, create a selection list if you want to and filter the dtm. This keeps the sparse matrix format, then transform what you need into a data.frame for further manipulation if needed.

my_selection <- c("181288", "182465") 

# selection in case of dtm
my_dtm_selection <- dtm[dtm$dimnames$Docs %in% my_selection, ]

# selection in case of tdm
my_tdm_selection <- tdm[, tdm$dimnames$Docs %in% my_selection]

# create data.frame with document names as first column, followed by the terms
my_df_selection <- data.frame(docs = Docs(my_dtm_selection), as.matrix(my_dtm_selection))

The answer to your second question: yes, almost empty. Or better framed, a lot of empty cells. But you might have more data than you think if you have a lot of documents and terms.

Source Link
phiver
  • 23.6k
  • 14
  • 47
  • 58

To retrieve your vector you can do it in multiple ways.

simple, but not recommended unless for quick test:

my_doc <- inspect(dtm[dtm$dimnames$Docs == "181288",])

Doing it like this limits you to what inspect does and this only shows a maximum of 10 documents.

Better way, create a selection list if you want to and filter the dtm. This keeps the sparse matrix format, then transform what you need into a data.frame for further manipulation if needed.

my_selection <- c("181288", "182465")
my_dtm_selection <- dtm[dtm$dimnames$Docs %in% my_selection, ]

# create data.frame with document names as first column, followed by the terms
my_df_selection <- data.frame(docs = Docs(my_dtm_selection), as.matrix(my_dtm_selection))

The answer to your second question: yes, almost empty. Or better framed, a lot of empty cells. But you might have more data than you think if you have a lot of documents and terms.