Stacking Ensemble with Keras and Sklearn

5 min readJan 25, 2023

Hello Folks! In this article, I will explain how to use the stacking classifier to combine the predictions of a deep learning model with those of other models to improve the overall performance of the ensemble. We easily do that by using Keras and Sklearn.

What is Stacking Ensemble ?

Stacking ensemble is a technique in which multiple models are trained to make predictions, and the predictions of these models are then combined to make a final prediction. This is often done by training a “meta-model” to take as input the predictions of the base models and output a final prediction. The idea behind stacking is that the combination of predictions from multiple models can lead to better performance than any single model alone.

As you can see in the above figure, a stacking ensemble uses the predictions of different models to train a final classifier. One important point to note here is that if we are using a Keras model within the sklearn stacking classifier, we need to specify the shape of our input. As shown in the figure above, the dimension of the data input to our neural network is no longer the same as our training data.

2. Packages for Stacking Ensembling with Keras and Sklearn

To use Tensorflow models with the scikit-learn library, we need to use a wrapper. In older versions of Tensorflow, a wrapper was included within the Tensorflow library. But as of Tensorflow 2.0, it was removed.

#The old version
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

If you use the old wrapper, you may encounter many problems. One of the most common issues is the “The estimator should be a classifier” error. This occurs when the wrapper is not properly converting the TensorFlow model to a scikit-learn compatible model. To avoid this issue, it is recommended to use the from scikeras.wrappers import KerasClassifier to convert the Tensorflow model to a scikit-learn compatible model. These wrapper classes are specifically designed to work with Tensorflow 2.0 and later versions and should help prevent any compatibility issues with scikit-learn. To install scikeras wrappers you can use pip:

# To install scikeras
!pip install scikeras[tensorflow]

# To use wrapper
from scikeras.wrappers import KerasClassifier

For more information please see the documentation!

3. Code Example for Stacking Classifier

3.1 Stacking Classifier with ML libraries

After creating the wrapper, we can define the parameters of our model and make it ready to use within sklearn. The parameters that we can define include the architecture of the model, the optimizer to be used, the loss function, the metrics, and any other hyperparameters that the model might have. Once we have defined these parameters, we can pass the wrapper model to the sklearn’s StackingClassifier and use it just like any other model in scikit-learn.

Now let’s define a Stacking Classifier in python. I will use CatBoostClassifier and KNN as estimators and XGBoost as a final estimator :

from sklearn.ensemble import StackingClassifier
from sklearn.neighbors import KNeighborsClassifier
from xgboost import XGBClassifier
from catboost import CatBoostClassifier

knn = KNeighborsClassifier()
xgb = XGBClassifier()
cat = CatBoostClassifier()
clf = [('knn',knn),('cat',cat)]
stack_model = StackingClassifier( estimators = clf,final_estimator = xgb)
stack_model.fit(X_train,y_train)
predictions = stack_model.predict_proba(X_test)

As you can see in the example, each model is defined in advance and used within the StackingClassifier. Therefore, if you are using a Keras model, you need to define this model in advance using a wrapper. This is an important step because it allows the StackingClassifier to understand the input and output shape of the Keras model, and how to use it to make predictions. Once the Keras model is wrapped and defined, it can be used along with other models in the StackingClassifier just like any other scikit-learn model.

3.2 Define Keras Model with KerasClassifier

from scikeras.wrappers import KerasClassifier
outputs = 3 #let's say we have 3 classes
def my_model(input_shape):
    
    activation_func = 'relu'

    model = keras.Sequential([
        Input(shape=(input_shape,)),
        
        Dense(units = 128,activation=activation_func),
        Dropout(rate = 0.15),
        BatchNormalization(),
        Dense(units = 64,activation=activation_func),
        Dropout(rate = 0.15),
        BatchNormalization(),

        Dense(outputs, activation='softmax'),
    ])

    model.compile(
    loss="sparse_categorical_crossentropy",
    optimizer=Adam(learning_rate=0.001),
    metrics='accuracy')
    
    return model

keras_model = KerasClassifier(model=my_model(input_shape),batch_size=128,epochs=280,verbose=0)

By using the KerasClassifier, we have defined our model and we have control over all the model’s parameters, as you can see. Therefore, if desired, we can also use the KerasClassifier within scikit-learn to fine-tune the model. This means that we can use the scikit-learn’s built-in methods for hyperparameter tuning and cross-validation to optimize the performance of our Keras model.

3.3 Stacking Classifier with ML libraries and Keras Model

Now we will define the StackingClassifier with the final estimator as a neural network. The important thing here is the input shape. There are two possibilities:

If our Keras model is the final estimator, it is important to make sure that the input shape of the final estimator is specified correctly so that the StackingClassifier can use the predictions of the base models to train the final estimator. For instance assume that we have a 1-d data (sampleNumber, featureNumber). If we have 3 classes in our classification problem and if we have 3 models in estimators such as XGBoost, KNN and CatBoost, all classifiers predict 3 probabilities for each samples and when they concatenated our new input shape will be 3x3 = 9. We can simplify it as number of estimators x number of classes
If our Keras model is the stacked estimator, input shape of the keras model should be same as the input data shape.

To better understand this, let’s create a model that includes both possibilities. Let’s say “keras_model_estimator” is used as the estimator inside the stacked ensemble and “keras_model_final_estimator” is the final estimator of our StackingClassifier. Our other stacked estimators are XGBoost, KNN and CatBoost.

from sklearn.ensemble import StackingClassifier
from sklearn.neighbors import KNeighborsClassifier
from xgboost import XGBClassifier
from catboost import CatBoostClassifier
from scikeras.wrappers import KerasClassifier

outputs = 3 #let's say we have 3 classes
def my_model(input_shape):
    
    activation_func = 'relu'

    model = keras.Sequential([
        Input(shape=(input_shape,)),
        
        Dense(units = 128,activation=activation_func),
        Dropout(rate = 0.15),
        BatchNormalization(),
        Dense(units = 64,activation=activation_func),
        Dropout(rate = 0.15),
        BatchNormalization(),

        Dense(outputs, activation='softmax'),
    ])

    model.compile(
    loss="sparse_categorical_crossentropy",
    optimizer=Adam(learning_rate=0.001),
    metrics='accuracy')
    
    return model

knn = KNeighborsClassifier()
xgb = XGBClassifier()
cat = CatBoostClassifier()
input_shape_estimator = X_train.shape[1] #With respect to the training data
keras_model_estimator = KerasClassifier(model=my_model(input_shape_estimator),batch_size=128,epochs=280,verbose=0)


clf = [('keras_estimator',keras_model_estimator),('xgb',xgb),('knn',knn),('cat',cat)]

input_shape_final = len(clf) * outputs # With respect to the number of classifiers in stacking and number of classes
keras_model_final_estimator = KerasClassifier(model=my_model(input_shape_final),batch_size=128,epochs=280,verbose=0)
stack_model = StackingClassifier( estimators = clf,final_estimator = keras_model_final_estimator)
stack_model.fit(X_train,y_train)
predictions = stack_model.predict_proba(X_test)

Conclusion

In this article, we have seen how we can use Keras models with sklearn’s Stacking Classifier. The same methods can also be applied to Regression. Using the Stacking ensemble, we can achieve better results than the performance of each individual model. The important thing is to combine models that make errors in different places. The more diversity in the models you use, the more effective your ensembling strategy will be. Additionally, after setting up your Stacking Classifier model, you can fine-tune each model to achieve better results.

References:

Stacking Ensemble with Keras and Sklearn

Written by Berkedilekoglu