Bag of Tricks For Text Classification
Bag of Tricks For Text Classification
Bag of Tricks For Text Classification
Table 1: Test accuracy [%] on sentiment datasets. FastText has been run with the same parameters
for all the datasets. It has 10 hidden units and we evaluate it with and without bigrams. For char-CNN,
we show the best reported numbers without data augmentation.
Table 2: Training time for a single epoch on sentiment analysis datasets compared to char-CNN and
VDCNN.
worse than VDCNN. Note that we can increase ing convolutions are several orders of magnitude
the accuracy slightly by using more n-grams, for slower than fastText. While it is possible
example with trigrams, the performance on Sogou to have a 10× speed up for char-CNN by using
goes up to 97.1%. Finally, Figure 3 shows that more recent CUDA implementations of convolu-
our method is competitive with the methods pre- tions, fastText takes less than a minute to train
sented in Tang et al. (2015). We tune the hyper- on these datasets. The GRNNs method of Tang et
parameters on the validation set and observe that al. (2015) takes around 12 hours per epoch on CPU
using n-grams up to 5 leads to the best perfor- with a single thread. Our speed-up compared to
mance. Unlike Tang et al. (2015), fastText neural network based methods increases with the
does not use pre-trained word embeddings, which size of the dataset, going up to at least a 15,000×
can be explained the 1% difference in accuracy. speed-up.
Table 4: Examples from the validation set of YFCC100M dataset obtained with fastText with 200
hidden units and bigrams. We show a few correct and incorrect tag predictions.