--- title: "Version 4.0 New Features" author: "Zach Deane-Mayer" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Version 4.0 New Features} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", echo = TRUE, warning = FALSE, message = FALSE ) set.seed(42L) ``` caretEnsemble 4.0.0 introduces many new features! Let's quickly go over them. # Multiclass support caretEnsemble now fully supports multiclass problems: ```{r} model_list <- caretEnsemble::caretList( x = iris[, 1L:4L], y = iris[, 5L], methodList = c("rpart", "rf") ) print(summary(model_list)) ``` # Greedy Optimizer in caretEnsemble The new version uses a greedy optimizer by default, ensuring the ensemble is never worse than the worst single model: ```{r} ens <- caretEnsemble::caretEnsemble(model_list) print(summary(ens)) ``` # Enhanced S3 Methods caretStack (and by extension, caretEnsemble) now supports various S3 methods: ```{r} print(ens) print(summary(ens)) ``` ```{r, fig.alt="A dot and whisker plot of ROC for glmnet, rpart, and an ensemble. The ensemble has the highest ROC and is slighly better than the glmnet. The rpart model is bad."} plot(ens) ``` ```{r, fig.alt="A 4-panel plot for glmnet, rpart, and an ensemble. The ensemble has the highest ROC and is slighly better than the glmnet. The rpart model is bad. The glmnet has the highest weight, and the residuals look biased."} ggplot2::autoplot(ens) ``` # Improved Default trainControl A new default trainControl constructor makes it easier to build appropriate controls for caretLists. These controls include explicit indexes based on the target, return stacked predictions, and use probability estimates for classification models. ```{r} class_control <- caretEnsemble::defaultControl(iris$Species) print(ls(class_control)) ``` ```{r} reg_control <- caretEnsemble::defaultControl(iris$Sepal.Length) print(ls(reg_control)) ``` # Mixed Resampling Strategies Models with different resampling strategies can now be ensembled: ```{r} y <- iris[, 1L] x <- iris[, 2L:3L] flex_list <- caretEnsemble::caretList( x = x, y = y, methodList = c("rpart", "rf"), trControl = caretEnsemble::defaultControl(y, number = 3L) ) flex_list$glm_boot <- caret::train( x = x, y = y, method = "glm", trControl = caretEnsemble::defaultControl(y, method = "boot", number = 25L) ) flex_ens <- caretEnsemble::caretEnsemble(flex_list) print(flex_ens) ``` # Mixed Model Types caretEnsemble now allows ensembling of mixed lists of classification and regression models: ```{r} X <- iris[, 1L:4L] target_class <- iris[, 5L] target_reg <- as.integer(iris[, 5L] == "virginica") ctrl_class <- caretEnsemble::defaultControl(target_class) ctrl_reg <- caretEnsemble::defaultControl(target_reg) model_class <- caret::train(iris[, 1L:4L], target_class, method = "rf", trControl = ctrl_class) model_reg <- caret::train(iris[, 1L:4L], target_reg, method = "rf", trControl = ctrl_reg) mixed_list <- caretEnsemble::as.caretList(list(class = model_class, reg = model_reg)) mixed_ens <- caretEnsemble::caretEnsemble(mixed_list) print(mixed_ens) ``` # Transfer Learning caretStack now supports transfer learning for ensembling models trained on different datasets: ```{r} train_idx <- sample.int(nrow(iris), 100L) train_data <- iris[train_idx, ] new_data <- iris[-train_idx, ] model_list <- caretEnsemble::caretList( x = train_data[, 1L:4L], y = train_data[, 5L], methodList = c("rpart", "rf") ) transfer_ens <- caretEnsemble::caretEnsemble( model_list, new_X = new_data[, 1L:4L], new_y = new_data[, 5L] ) print(transfer_ens) ``` We can also predict on new data: ```{r} preds <- predict(transfer_ens, newdata = head(new_data)) knitr::kable(preds, format = "markdown") ``` # Permutation Importance Permutation importance is now the default method for variable importance in caretLists and caretStacks: ```{r} importance <- caret::varImp(transfer_ens) print(round(importance, 2L)) ``` Note that the ensemble uses rpart to classify the easy class (setosa) and then uses the rf to distinguish between the 2 more difficult classes. This completes our demonstration of the key new features in caretEnsemble 4.0. These enhancements provide greater flexibility, improved performance, and easier usage for ensemble modeling in R.