Confusion matrix vs error curves of “plot.randomForest” function in R

by davo.biainili   Last Updated August 14, 2019 01:19 AM

I am building a random forest for a classification problem, using the randomForest package in R. The curves resulting from the plot.randomForest function are not in line with what I get from the confusion matrix, when I try to predict the training data itself (instead of the test set). My intuition was that if I predict the training set itself, I would be getting misclassification rates from the confusion matrix resembling the curves resulting from function plot.randomForest. However, the curves and the confusion matrix are telling me different things. I am not sure why this happens, but my gut feeling is that all curves resulting from the plot.randomForest function are based on the out of bag error and that is why they indicate lower accuracy than the confusion matrix (this is just a conjecture which may as well be incorrect). I will appreciate if someone could please let me know what I am missing, if anything.

Here is a reproducible example using the iris data.

library(datasets) library(gmodels) library(randomForest)

data(iris)

set.seed(123)
rf.train=randomForest(Species~
                Sepal.Length+
                Sepal.Width+
                Petal.Length+
                Petal.Width,
                data=iris,
                ntree=50,
                importance=TRUE)

 plot(rf.train, main="Error Rate vs Number of Trees In the Forest")

 predictions=predict(rf.train, newdata = iris)
 mydata_with_predictions=cbind(iris, predictions)

#Confusion Matrix
CrossTable(mydata_with_predictions$Species,
        mydata_with_predictions$predictions,
        prop.chisq=F,
        prop.t=F)


Related Questions


Updated May 24, 2015 05:08 AM

Updated April 11, 2015 20:08 PM

Updated June 08, 2015 22:08 PM

Updated April 23, 2016 10:08 AM

Updated June 12, 2015 06:08 AM