I am training a model on an imbalanced dataset (about 5-20% of positive class) and trying out different algorithms in R using caret package. I have 57 predictors and around 2000-3000 observations in my training dataset.
So far, I tried several models and got ROC and AUC PR plots for these models:
I see a lot of criticism of using Stepwise Logistic Regression with R and I do understand that there are indeed a lot of problems with it. At the same time, I see that it is doing rather well and I am not sure how to interpret it. May it be that I do something wrong with training other models?
I am using repeated 5-fold cross-validation:
objControl <- trainControl(method = 'repeatedcv', number = 5, repeats = 5, summaryFunction = twoClassSummary, classProbs = TRUE) gbm_fit <- train(training[,predictors, drop = FALSE], training[[bm_name]], method='gbm', verbose = TRUE, trControl = objControl, metric = "ROC", preProc = c("center", "scale"), train.fraction = 0.5)
Any guidance is highly appreciated.