.. _sec_tabularcustommodel:

Adding a custom model to AutoGluon
==================================


**Tip**: If you are new to AutoGluon, review :ref:`sec_tabularquick`
to learn the basics of the AutoGluon API.

This tutorial describes how to add a custom model to AutoGluon that can
be trained, hyperparameter-tuned, and ensembled alongside the default
models (`default model
documentation <../../api/autogluon.tabular.models.html#module-autogluon.tabular.models>`__).

In this example, we create a custom Random Forest model for use in
AutoGluon. All models in AutoGluon inherit from the AbstractModel class
(`AbstractModel source
code <../../_modules/autogluon/core/models/abstract/abstract_model.html>`__),
and must follow its API to work alongside other models.

Note that while this tutorial provides a basic model implementation,
this does not cover many aspects that are used in most implemented
models.

To best understand how to implement more advanced functionality, refer
to the `source
code <../../api/autogluon.tabular.models.html#module-autogluon.tabular.models>`__
of the following models:

===================================================
=====================================================================================================================================================================
Functionality                                       Reference Implementation
===================================================
=====================================================================================================================================================================
Respecting time limit / early stopping logic        `LGBModel <../../_modules/autogluon/tabular/models/lgb/lgb_model.html#LGBModel>`__ and `RFModel <../../_modules/autogluon/tabular/models/rf/rf_model.html#RFModel>`__
Respecting memory usage limit                       LGBModel and RFModel
Sample weight support                               LGBModel
Validation data and eval_metric usage               LGBModel
GPU training support                                LGBModel
Save / load logic of non-serializable models        `NNFastAiTabularModel <../../_modules/autogluon/tabular/models/fastainn/tabular_nn_fastai.html#NNFastAiTabularModel>`__
Advanced problem type support (Softclass, Quantile) RFModel
Text feature type support                           `TextPredictorModel <../../_modules/autogluon/tabular/models/text_prediction/text_prediction_v1_model.html#TextPredictorModel>`__
Image feature type support                          `ImagePredictorModel <../../_modules/autogluon/tabular/models/image_prediction/image_predictor.html#ImagePredictorModel>`__
Lazy import of package dependencies                 LGBModel
Custom HPO logic                                    LGBModel
===================================================
=====================================================================================================================================================================

Implementing a custom model
---------------------------

Here we define the custom model we will be working with for the rest of
the tutorial.

The most important methods that must be implemented are ``_fit`` and
``_preprocess``.

To compare with the official AutoGluon Random Forest implementation, see
the
`RFModel <../../_modules/autogluon/tabular/models/rf/rf_model.html#RFModel>`__
source code.

Follow along with the code comments to better understand how the code
works.

.. code:: python

    import numpy as np
    import pandas as pd
    
    from autogluon.core.models import AbstractModel
    from autogluon.features.generators import LabelEncoderFeatureGenerator
    
    class CustomRandomForestModel(AbstractModel):
        def __init__(self, **kwargs):
            # Simply pass along kwargs to parent, and init our internal `_feature_generator` variable to None
            super().__init__(**kwargs)
            self._feature_generator = None
    
        # The `_preprocess` method takes the input data and transforms it to the internal representation usable by the model.
        # `_preprocess` is called by `preprocess` and is used during model fit and model inference.
        def _preprocess(self, X: pd.DataFrame, is_train=False, **kwargs) -> np.ndarray:
            print(f'Entering the `_preprocess` method: {len(X)} rows of data (is_train={is_train})')
            X = super()._preprocess(X, **kwargs)
    
            if is_train:
                # X will be the training data.
                self._feature_generator = LabelEncoderFeatureGenerator(verbosity=0)
                self._feature_generator.fit(X=X)
            if self._feature_generator.features_in:
                # This converts categorical features to numeric via stateful label encoding.
                X = X.copy()
                X[self._feature_generator.features_in] = self._feature_generator.transform(X=X)
            # Add a fillna call to handle missing values.
            # Some algorithms will be able to handle NaN values internally (LightGBM).
            # In those cases, you can simply pass the NaN values into the inner model.
            # Finally, convert to numpy for optimized memory usage and because sklearn RF works with raw numpy input.
            return X.fillna(0).to_numpy(dtype=np.float32)
    
        # The `_fit` method takes the input training data (and optionally the validation data) and trains the model.
        def _fit(self,
                 X: pd.DataFrame,  # training data
                 y: pd.Series,  # training labels
                 # X_val=None,  # val data (unused in RF model)
                 # y_val=None,  # val labels (unused in RF model)
                 # time_limit=None,  # time limit in seconds (ignored in tutorial)
                 **kwargs):  # kwargs includes many other potential inputs, refer to AbstractModel documentation for details
            print('Entering the `_fit` method')
    
            # First we import the required dependencies for the model. Note that we do not import them outside of the method.
            # This enables AutoGluon to be highly extensible and modular.
            # For an example of best practices when importing model dependencies, refer to LGBModel.
            from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
    
            # Valid self.problem_type values include ['binary', 'multiclass', 'regression', 'quantile', 'softclass']
            if self.problem_type in ['regression', 'softclass']:
                model_cls = RandomForestRegressor
            else:
                model_cls = RandomForestClassifier
    
            # Make sure to call preprocess on X near the start of `_fit`.
            # This is necessary because the data is converted via preprocess during predict, and needs to be in the same format as during fit.
            X = self.preprocess(X, is_train=True)
            # This fetches the user-specified (and default) hyperparameters for the model.
            params = self._get_model_params()
            print(f'Hyperparameters: {params}')
            # self.model should be set to the trained inner model, so that internally during predict we can call `self.model.predict(...)`
            self.model = model_cls(**params)
            self.model.fit(X, y)
            print('Exiting the `_fit` method')
    
        # The `_set_default_params` method defines the default hyperparameters of the model.
        # User-specified parameters will override these values on a key-by-key basis.
        def _set_default_params(self):
            default_params = {
                'n_estimators': 300,
                'n_jobs': -1,
                'random_state': 0,
            }
            for param, val in default_params.items():
                self._set_default_param_value(param, val)
    
        # The `_get_default_auxiliary_params` method defines various model-agnostic parameters such as maximum memory usage and valid input column dtypes.
        # For most users who build custom models, they will only need to specify the valid/invalid dtypes to the model here.
        def _get_default_auxiliary_params(self) -> dict:
            default_auxiliary_params = super()._get_default_auxiliary_params()
            extra_auxiliary_params = dict(
                # the total set of raw dtypes are: ['int', 'float', 'category', 'object', 'datetime']
                # object feature dtypes include raw text and image paths, which should only be handled by specialized models
                # datetime raw dtypes are generally converted to int in upstream pre-processing,
                # so models generally shouldn't need to explicitly support datetime dtypes.
                valid_raw_types=['int', 'float', 'category'],
                # Other options include `valid_special_types`, `ignored_type_group_raw`, and `ignored_type_group_special`.
                # Refer to AbstractModel for more details on available options.
            )
            default_auxiliary_params.update(extra_auxiliary_params)
            return default_auxiliary_params


Loading the data
----------------

Next we will load the data. For this tutorial we will use the adult
income dataset because it has a mix of integer, float, and categorical
features.

.. code:: python

    from autogluon.tabular import TabularDataset
    
    train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')  # can be local CSV file as well, returns Pandas DataFrame
    test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')  # another Pandas DataFrame
    label = 'class'  # specifies which column do we want to predict
    train_data = train_data.sample(n=1000, random_state=0)  # subsample for faster demo
    
    train_data.head(5)


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>age</th>
          <th>workclass</th>
          <th>fnlwgt</th>
          <th>education</th>
          <th>education-num</th>
          <th>marital-status</th>
          <th>occupation</th>
          <th>relationship</th>
          <th>race</th>
          <th>sex</th>
          <th>capital-gain</th>
          <th>capital-loss</th>
          <th>hours-per-week</th>
          <th>native-country</th>
          <th>class</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>6118</th>
          <td>51</td>
          <td>Private</td>
          <td>39264</td>
          <td>Some-college</td>
          <td>10</td>
          <td>Married-civ-spouse</td>
          <td>Exec-managerial</td>
          <td>Wife</td>
          <td>White</td>
          <td>Female</td>
          <td>0</td>
          <td>0</td>
          <td>40</td>
          <td>United-States</td>
          <td>&gt;50K</td>
        </tr>
        <tr>
          <th>23204</th>
          <td>58</td>
          <td>Private</td>
          <td>51662</td>
          <td>10th</td>
          <td>6</td>
          <td>Married-civ-spouse</td>
          <td>Other-service</td>
          <td>Wife</td>
          <td>White</td>
          <td>Female</td>
          <td>0</td>
          <td>0</td>
          <td>8</td>
          <td>United-States</td>
          <td>&lt;=50K</td>
        </tr>
        <tr>
          <th>29590</th>
          <td>40</td>
          <td>Private</td>
          <td>326310</td>
          <td>Some-college</td>
          <td>10</td>
          <td>Married-civ-spouse</td>
          <td>Craft-repair</td>
          <td>Husband</td>
          <td>White</td>
          <td>Male</td>
          <td>0</td>
          <td>0</td>
          <td>44</td>
          <td>United-States</td>
          <td>&lt;=50K</td>
        </tr>
        <tr>
          <th>18116</th>
          <td>37</td>
          <td>Private</td>
          <td>222450</td>
          <td>HS-grad</td>
          <td>9</td>
          <td>Never-married</td>
          <td>Sales</td>
          <td>Not-in-family</td>
          <td>White</td>
          <td>Male</td>
          <td>0</td>
          <td>2339</td>
          <td>40</td>
          <td>El-Salvador</td>
          <td>&lt;=50K</td>
        </tr>
        <tr>
          <th>33964</th>
          <td>62</td>
          <td>Private</td>
          <td>109190</td>
          <td>Bachelors</td>
          <td>13</td>
          <td>Married-civ-spouse</td>
          <td>Exec-managerial</td>
          <td>Husband</td>
          <td>White</td>
          <td>Male</td>
          <td>15024</td>
          <td>0</td>
          <td>40</td>
          <td>United-States</td>
          <td>&gt;50K</td>
        </tr>
      </tbody>
    </table>
    </div>


Training a custom model without TabularPredictor
------------------------------------------------

Below we will demonstrate how to train the model outside
`TabularPredictor <../../api/autogluon.predictor.html#module-0>`__. This
is useful for debugging and minimizing the amount of code you need to
understand while implementing the model.

This process is similar to what happens internally when calling fit on
``TabularPredictor``, but is simplified and minimal.

If the data was already cleaned (all numeric), then we could call fit
directly with the data, but the adult dataset is not.

Clean labels
~~~~~~~~~~~~

The first step to making the input data as valid input to the model is
to clean the labels.

Currently, they are strings, but we need to convert them to numeric
values (0 and 1) for binary classification.

Luckily, AutoGluon already implements logic to both detect that this is
binary classification (via ``infer_problem_type``), and a converter to
map the labels to 0 and 1 (``LabelCleaner``):

.. code:: python

    # Separate features and labels
    X = train_data.drop(columns=[label])
    y = train_data[label]
    X_test = test_data.drop(columns=[label])
    y_test = test_data[label]
    
    from autogluon.core.data import LabelCleaner
    from autogluon.core.utils import infer_problem_type
    # Construct a LabelCleaner to neatly convert labels to float/integers during model training/inference, can also use to inverse_transform back to original.
    problem_type = infer_problem_type(y=y)  # Infer problem type (or else specify directly)
    label_cleaner = LabelCleaner.construct(problem_type=problem_type, y=y)
    y_clean = label_cleaner.transform(y)
    
    print(f'Labels cleaned: {label_cleaner.inv_map}')
    print(f'inferred problem type as: {problem_type}')
    print('Cleaned label values:')
    y_clean.head(5)


.. parsed-literal::
    :class: output

    Labels cleaned: {' <=50K': 0, ' >50K': 1}
    inferred problem type as: binary
    Cleaned label values:


.. parsed-literal::
    :class: output

    6118     1
    23204    0
    29590    0
    18116    0
    33964    1
    Name: class, dtype: uint8


Clean features
~~~~~~~~~~~~~~

Next, we need to clean the features. Currently, features like
‘workclass’ are object dtypes (strings), but we actually want to use
them as categorical features. Most models won’t accept string inputs, so
we need to convert the strings to numbers.

AutoGluon contains an entire module dedicated to cleaning, transforming,
and generating features called
`autogluon.features <../../api/autogluon.features.html>`__. Here we will
use the same feature generator used internally by ``TabularPredictor``
to convert the object dtypes to categorical and minimize memory usage.

.. code:: python

    from autogluon.common.utils.log_utils import set_logger_verbosity
    from autogluon.features.generators import AutoMLPipelineFeatureGenerator
    set_logger_verbosity(2)  # Set logger so more detailed logging is shown for tutorial
    
    feature_generator = AutoMLPipelineFeatureGenerator()
    X_clean = feature_generator.fit_transform(X)
    
    X_clean.head(5)


.. parsed-literal::
    :class: output

    Fitting AutoMLPipelineFeatureGenerator...
    	Available Memory:                    31594.43 MB
    	Train Data (Original)  Memory Usage: 0.59 MB (0.0% of available memory)
    	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
    	Stage 1 Generators:
    		Fitting AsTypeFeatureGenerator...
    			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
    	Stage 2 Generators:
    		Fitting FillNaFeatureGenerator...
    	Stage 3 Generators:
    		Fitting IdentityFeatureGenerator...
    		Fitting CategoryFeatureGenerator...
    			Fitting CategoryMemoryMinimizeFeatureGenerator...
    	Stage 4 Generators:
    		Fitting DropUniqueFeatureGenerator...
    	Types of features in original data (raw dtype, special dtypes):
    		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
    		('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
    	Types of features in processed data (raw dtype, special dtypes):
    		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
    		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
    		('int', ['bool']) : 1 | ['sex']
    	0.1s = Fit runtime
    	14 features in original data used to generate 14 features in processed data.
    	Train Data (Processed) Memory Usage: 0.07 MB (0.0% of available memory)


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>age</th>
          <th>fnlwgt</th>
          <th>education-num</th>
          <th>sex</th>
          <th>capital-gain</th>
          <th>capital-loss</th>
          <th>hours-per-week</th>
          <th>workclass</th>
          <th>education</th>
          <th>marital-status</th>
          <th>occupation</th>
          <th>relationship</th>
          <th>race</th>
          <th>native-country</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>6118</th>
          <td>51</td>
          <td>39264</td>
          <td>10</td>
          <td>0</td>
          <td>0</td>
          <td>0</td>
          <td>40</td>
          <td>3</td>
          <td>14</td>
          <td>1</td>
          <td>4</td>
          <td>5</td>
          <td>4</td>
          <td>24</td>
        </tr>
        <tr>
          <th>23204</th>
          <td>58</td>
          <td>51662</td>
          <td>6</td>
          <td>0</td>
          <td>0</td>
          <td>0</td>
          <td>8</td>
          <td>3</td>
          <td>0</td>
          <td>1</td>
          <td>8</td>
          <td>5</td>
          <td>4</td>
          <td>24</td>
        </tr>
        <tr>
          <th>29590</th>
          <td>40</td>
          <td>326310</td>
          <td>10</td>
          <td>1</td>
          <td>0</td>
          <td>0</td>
          <td>44</td>
          <td>3</td>
          <td>14</td>
          <td>1</td>
          <td>3</td>
          <td>0</td>
          <td>4</td>
          <td>24</td>
        </tr>
        <tr>
          <th>18116</th>
          <td>37</td>
          <td>222450</td>
          <td>9</td>
          <td>1</td>
          <td>0</td>
          <td>2339</td>
          <td>40</td>
          <td>3</td>
          <td>11</td>
          <td>3</td>
          <td>12</td>
          <td>1</td>
          <td>4</td>
          <td>6</td>
        </tr>
        <tr>
          <th>33964</th>
          <td>62</td>
          <td>109190</td>
          <td>13</td>
          <td>1</td>
          <td>15024</td>
          <td>0</td>
          <td>40</td>
          <td>3</td>
          <td>9</td>
          <td>1</td>
          <td>4</td>
          <td>0</td>
          <td>4</td>
          <td>24</td>
        </tr>
      </tbody>
    </table>
    </div>


`AutoMLPipelineFeatureGenerator <../../api/autogluon.features.html#automlpipelinefeaturegenerator>`__
does not fill missing values for numeric features nor does it rescale
the values of numeric features or one-hot encode categoricals. If a
model requires these operations, you’ll need to add these operations
into your ``_preprocess`` method, and may find some FeatureGenerator
classes useful for this.

Fit model
~~~~~~~~~

We are now ready to fit the model with the cleaned features and labels.

.. code:: python

    custom_model = CustomRandomForestModel()
    # We could also specify hyperparameters to override defaults
    # custom_model = CustomRandomForestModel(hyperparameters={'max_depth': 10})
    custom_model.fit(X=X_clean, y=y_clean)  # Fit custom model
    
    # To save to disk and load the model, do the following:
    # load_path = custom_model.path
    # custom_model.save()
    # del custom_model
    # custom_model = CustomRandomForestModel.load(path=load_path)


.. parsed-literal::
    :class: output

    Warning: No name was specified for model, defaulting to class name: CustomRandomForestModel
    No path specified. Models will be saved in: "AutogluonModels/ag-20221213_014228/CustomRandomForestModel/"
    Warning: No path was specified for model, defaulting to: AutogluonModels/ag-20221213_014228/
    AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    	2 unique label values:  [1, 0]
    	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Selected class <--> label mapping:  class 1 = 1, class 0 = 0
    Model CustomRandomForestModel's eval_metric inferred to be 'accuracy' because problem_type='binary' and eval_metric was not specified during init.


.. parsed-literal::
    :class: output

    Entering the `_fit` method
    Entering the `_preprocess` method: 1000 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0}
    Exiting the `_fit` method


.. parsed-literal::
    :class: output

    <__main__.CustomRandomForestModel at 0x7f90e2983a00>


Predict with trained model
~~~~~~~~~~~~~~~~~~~~~~~~~~

Now that the model is fit, we can make predictions on new data. Remember
that we need to perform the same data and label transformations to the
new data as we did to the training data.

.. code:: python

    # Prepare test data
    X_test_clean = feature_generator.transform(X_test)
    y_test_clean = label_cleaner.transform(y_test)
    
    X_test.head(5)


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>age</th>
          <th>workclass</th>
          <th>fnlwgt</th>
          <th>education</th>
          <th>education-num</th>
          <th>marital-status</th>
          <th>occupation</th>
          <th>relationship</th>
          <th>race</th>
          <th>sex</th>
          <th>capital-gain</th>
          <th>capital-loss</th>
          <th>hours-per-week</th>
          <th>native-country</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>31</td>
          <td>Private</td>
          <td>169085</td>
          <td>11th</td>
          <td>7</td>
          <td>Married-civ-spouse</td>
          <td>Sales</td>
          <td>Wife</td>
          <td>White</td>
          <td>Female</td>
          <td>0</td>
          <td>0</td>
          <td>20</td>
          <td>United-States</td>
        </tr>
        <tr>
          <th>1</th>
          <td>17</td>
          <td>Self-emp-not-inc</td>
          <td>226203</td>
          <td>12th</td>
          <td>8</td>
          <td>Never-married</td>
          <td>Sales</td>
          <td>Own-child</td>
          <td>White</td>
          <td>Male</td>
          <td>0</td>
          <td>0</td>
          <td>45</td>
          <td>United-States</td>
        </tr>
        <tr>
          <th>2</th>
          <td>47</td>
          <td>Private</td>
          <td>54260</td>
          <td>Assoc-voc</td>
          <td>11</td>
          <td>Married-civ-spouse</td>
          <td>Exec-managerial</td>
          <td>Husband</td>
          <td>White</td>
          <td>Male</td>
          <td>0</td>
          <td>1887</td>
          <td>60</td>
          <td>United-States</td>
        </tr>
        <tr>
          <th>3</th>
          <td>21</td>
          <td>Private</td>
          <td>176262</td>
          <td>Some-college</td>
          <td>10</td>
          <td>Never-married</td>
          <td>Exec-managerial</td>
          <td>Own-child</td>
          <td>White</td>
          <td>Female</td>
          <td>0</td>
          <td>0</td>
          <td>30</td>
          <td>United-States</td>
        </tr>
        <tr>
          <th>4</th>
          <td>17</td>
          <td>Private</td>
          <td>241185</td>
          <td>12th</td>
          <td>8</td>
          <td>Never-married</td>
          <td>Prof-specialty</td>
          <td>Own-child</td>
          <td>White</td>
          <td>Male</td>
          <td>0</td>
          <td>0</td>
          <td>20</td>
          <td>United-States</td>
        </tr>
      </tbody>
    </table>
    </div>


Get raw predictions from the test data

.. code:: python

    y_pred = custom_model.predict(X_test_clean)
    print(y_pred[:5])


.. parsed-literal::
    :class: output

    Entering the `_preprocess` method: 9769 rows of data (is_train=False)
    [0, 0, 1, 0, 0]


Note that these predictions are of the positive class (whichever class
was inferred to 1). To get more interpretable results, do the following:

.. code:: python

    y_pred_orig = label_cleaner.inverse_transform(y_pred)
    y_pred_orig.head(5)


.. parsed-literal::
    :class: output

    0     <=50K
    1     <=50K
    2      >50K
    3     <=50K
    4     <=50K
    dtype: object


Score with trained model
~~~~~~~~~~~~~~~~~~~~~~~~

By default, the model has an eval_metric specific to the problem_type.
For binary classification, it uses accuracy.

We can get the accuracy score of the model by doing the following:

.. code:: python

    score = custom_model.score(X_test_clean, y_test_clean)
    print(f'Test score ({custom_model.eval_metric.name}) = {score}')


.. parsed-literal::
    :class: output

    Entering the `_preprocess` method: 9769 rows of data (is_train=False)
    Test score (accuracy) = 0.8424608455317842


Training a bagged custom model without TabularPredictor
-------------------------------------------------------

Some of the more advanced functionality in AutoGluon such as bagging can
be done very easily to models once they inherit from AbstractModel.

You can even bag your custom model in a couple lines of code. This is a
quick way to get quality improvements on nearly any model:

.. code:: python

    from autogluon.core.models import BaggedEnsembleModel
    bagged_custom_model = BaggedEnsembleModel(CustomRandomForestModel())
    # Parallel folding currently doesn't work with a class not defined in a separate module because of underlying pickle serialization issue
    # You don't need this following line if you put your custom model in a separate file and import it.
    bagged_custom_model.params['fold_fitting_strategy'] = 'sequential_local' 
    bagged_custom_model.fit(X=X_clean, y=y_clean, k_fold=10)  # Perform 10-fold bagging
    bagged_score = bagged_custom_model.score(X_test_clean, y_test_clean)
    print(f'Test score ({bagged_custom_model.eval_metric.name}) = {bagged_score} (bagged)')
    print(f'Bagging increased model accuracy by {round(bagged_score - score, 4) * 100}%!')


.. parsed-literal::
    :class: output

    Warning: No name was specified for model, defaulting to class name: CustomRandomForestModel
    No path specified. Models will be saved in: "AutogluonModels/ag-20221213_014230/CustomRandomForestModel/"
    Warning: No path was specified for model, defaulting to: AutogluonModels/ag-20221213_014230/
    Warning: No name was specified for model, defaulting to class name: BaggedEnsembleModel
    No path specified. Models will be saved in: "AutogluonModels/ag-20221213_014230/BaggedEnsembleModel/"
    Warning: No path was specified for model, defaulting to: AutogluonModels/ag-20221213_014230/
    AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    	2 unique label values:  [1, 0]
    	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Selected class <--> label mapping:  class 1 = 1, class 0 = 0
    AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    	2 unique label values:  [1, 0]
    	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Selected class <--> label mapping:  class 1 = 1, class 0 = 0
    Model CustomRandomForestModel's eval_metric inferred to be 'accuracy' because problem_type='binary' and eval_metric was not specified during init.
    AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    	2 unique label values:  [1, 0]
    	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Model 's eval_metric inferred to be 'accuracy' because problem_type='binary' and eval_metric was not specified during init.
    	Fitting 10 child models (S1F1 - S1F10) | Fitting with SequentialLocalFoldFittingStrategy
    AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    	2 unique label values:  [1, 0]
    	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Model S1F1's eval_metric inferred to be 'accuracy' because problem_type='binary' and eval_metric was not specified during init.


.. parsed-literal::
    :class: output

    Entering the `_fit` method
    Entering the `_preprocess` method: 900 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0}


.. parsed-literal::
    :class: output

    AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    	2 unique label values:  [1, 0]
    	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Model S1F2's eval_metric inferred to be 'accuracy' because problem_type='binary' and eval_metric was not specified during init.


.. parsed-literal::
    :class: output

    Exiting the `_fit` method
    Entering the `_preprocess` method: 100 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 900 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0}


.. parsed-literal::
    :class: output

    AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    	2 unique label values:  [1, 0]
    	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Model S1F3's eval_metric inferred to be 'accuracy' because problem_type='binary' and eval_metric was not specified during init.


.. parsed-literal::
    :class: output

    Exiting the `_fit` method
    Entering the `_preprocess` method: 100 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 900 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0}


.. parsed-literal::
    :class: output

    AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    	2 unique label values:  [1, 0]
    	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Model S1F4's eval_metric inferred to be 'accuracy' because problem_type='binary' and eval_metric was not specified during init.


.. parsed-literal::
    :class: output

    Exiting the `_fit` method
    Entering the `_preprocess` method: 100 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 900 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0}


.. parsed-literal::
    :class: output

    AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    	2 unique label values:  [1, 0]
    	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Model S1F5's eval_metric inferred to be 'accuracy' because problem_type='binary' and eval_metric was not specified during init.


.. parsed-literal::
    :class: output

    Exiting the `_fit` method
    Entering the `_preprocess` method: 100 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 900 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0}


.. parsed-literal::
    :class: output

    AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    	2 unique label values:  [1, 0]
    	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Model S1F6's eval_metric inferred to be 'accuracy' because problem_type='binary' and eval_metric was not specified during init.


.. parsed-literal::
    :class: output

    Exiting the `_fit` method
    Entering the `_preprocess` method: 100 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 900 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0}


.. parsed-literal::
    :class: output

    AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    	2 unique label values:  [0, 1]
    	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Model S1F7's eval_metric inferred to be 'accuracy' because problem_type='binary' and eval_metric was not specified during init.


.. parsed-literal::
    :class: output

    Exiting the `_fit` method
    Entering the `_preprocess` method: 100 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 900 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0}


.. parsed-literal::
    :class: output

    AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    	2 unique label values:  [1, 0]
    	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Model S1F8's eval_metric inferred to be 'accuracy' because problem_type='binary' and eval_metric was not specified during init.


.. parsed-literal::
    :class: output

    Exiting the `_fit` method
    Entering the `_preprocess` method: 100 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 900 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0}


.. parsed-literal::
    :class: output

    AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    	2 unique label values:  [1, 0]
    	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Model S1F9's eval_metric inferred to be 'accuracy' because problem_type='binary' and eval_metric was not specified during init.


.. parsed-literal::
    :class: output

    Exiting the `_fit` method
    Entering the `_preprocess` method: 100 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 900 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0}


.. parsed-literal::
    :class: output

    AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    	2 unique label values:  [1, 0]
    	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Model S1F10's eval_metric inferred to be 'accuracy' because problem_type='binary' and eval_metric was not specified during init.


.. parsed-literal::
    :class: output

    Exiting the `_fit` method
    Entering the `_preprocess` method: 100 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 900 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 100 rows of data (is_train=False)
    Entering the `_preprocess` method: 9769 rows of data (is_train=False)
    Entering the `_preprocess` method: 9769 rows of data (is_train=False)
    Entering the `_preprocess` method: 9769 rows of data (is_train=False)
    Entering the `_preprocess` method: 9769 rows of data (is_train=False)
    Entering the `_preprocess` method: 9769 rows of data (is_train=False)
    Entering the `_preprocess` method: 9769 rows of data (is_train=False)
    Entering the `_preprocess` method: 9769 rows of data (is_train=False)
    Entering the `_preprocess` method: 9769 rows of data (is_train=False)
    Entering the `_preprocess` method: 9769 rows of data (is_train=False)
    Entering the `_preprocess` method: 9769 rows of data (is_train=False)
    Test score (accuracy) = 0.8436892210052206 (bagged)
    Bagging increased model accuracy by 0.12%!


Note that the bagged model trained 10 CustomRandomForestModels on
different splits of the training data. When making a prediction, the
bagged model averages the predictions from these 10 models.

Training a custom model with TabularPredictor
---------------------------------------------

While not using
`TabularPredictor <../../api/autogluon.predictor.html#module-0>`__
allows us to simplify the amount of code we need to worry about while
developing and debugging our model, eventually we want to leverage
TabularPredictor to get the most out of our model.

The code to train the model from the raw data is very simple when using
TabularPredictor. There is no need to specify a LabelCleaner,
FeatureGenerator, or a validation set, all of that is handled
internally.

Here we train 3 CustomRandomForestModel with different hyperparameters.

.. code:: python

    from autogluon.tabular import TabularPredictor
    
    # custom_hyperparameters = {CustomRandomForestModel: {}}  # train 1 CustomRandomForestModel Model with default hyperparameters
    custom_hyperparameters = {CustomRandomForestModel: [{}, {'max_depth': 10}, {'max_features': 0.9, 'max_depth': 20}]}  # Train 3 CustomRandomForestModel with different hyperparameters
    predictor = TabularPredictor(label=label).fit(train_data, hyperparameters=custom_hyperparameters)


.. parsed-literal::
    :class: output

    No path specified. Models will be saved in: "AutogluonModels/ag-20221213_014239/"
    Beginning AutoGluon training ...
    AutoGluon will save models to "AutogluonModels/ag-20221213_014239/"
    AutoGluon Version:  0.6.1b20221213
    Python Version:     3.8.10
    Operating System:   Linux
    Platform Machine:   x86_64
    Platform Version:   #1 SMP Tue Nov 30 00:17:50 UTC 2021
    Train Data Rows:    1000
    Train Data Columns: 14
    Label Column: class
    Preprocessing data ...
    AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    	2 unique label values:  [' >50K', ' <=50K']
    	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
    	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
    	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
    Using Feature Generators to preprocess the data ...
    Fitting AutoMLPipelineFeatureGenerator...
    	Available Memory:                    31555.11 MB
    	Train Data (Original)  Memory Usage: 0.59 MB (0.0% of available memory)
    	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
    	Stage 1 Generators:
    		Fitting AsTypeFeatureGenerator...
    			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
    	Stage 2 Generators:
    		Fitting FillNaFeatureGenerator...
    	Stage 3 Generators:
    		Fitting IdentityFeatureGenerator...
    		Fitting CategoryFeatureGenerator...
    			Fitting CategoryMemoryMinimizeFeatureGenerator...
    	Stage 4 Generators:
    		Fitting DropUniqueFeatureGenerator...
    	Types of features in original data (raw dtype, special dtypes):
    		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
    		('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
    	Types of features in processed data (raw dtype, special dtypes):
    		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
    		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
    		('int', ['bool']) : 1 | ['sex']
    	0.1s = Fit runtime
    	14 features in original data used to generate 14 features in processed data.
    	Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
    Data preprocessing and feature engineering runtime = 0.09s ...
    AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
    	To change this, specify the eval_metric parameter of Predictor()
    Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 800, Val Rows: 200
    Custom Model Type Detected: <class '__main__.CustomRandomForestModel'>
    Custom Model Type Detected: <class '__main__.CustomRandomForestModel'>
    Custom Model Type Detected: <class '__main__.CustomRandomForestModel'>
    Fitting 3 L1 models ...
    Fitting model: CustomRandomForestModel ...


.. parsed-literal::
    :class: output

    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0}


.. parsed-literal::
    :class: output

    	0.84	 = Validation score   (accuracy)
    	1.2s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitting model: CustomRandomForestModel_2 ...


.. parsed-literal::
    :class: output

    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 10}


.. parsed-literal::
    :class: output

    	0.845	 = Validation score   (accuracy)
    	1.19s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitting model: CustomRandomForestModel_3 ...


.. parsed-literal::
    :class: output

    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_features': 0.9, 'max_depth': 20}


.. parsed-literal::
    :class: output

    	0.835	 = Validation score   (accuracy)
    	1.19s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitting model: WeightedEnsemble_L2 ...


.. parsed-literal::
    :class: output

    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)


.. parsed-literal::
    :class: output

    	0.855	 = Validation score   (accuracy)
    	0.11s	 = Training   runtime
    	0.0s	 = Validation runtime
    AutoGluon training complete, total runtime = 4.65s ... Best model: "WeightedEnsemble_L2"
    TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20221213_014239/")


Predictor leaderboard
~~~~~~~~~~~~~~~~~~~~~

Here we show the stats of each of the models trained. Notice that a
WeightedEnsemble model was also trained. This model tries to combine the
predictions of the other models to get a better validation score via
ensembling.

.. code:: python

    predictor.leaderboard(test_data, silent=True)


.. parsed-literal::
    :class: output

    Entering the `_preprocess` method: 9769 rows of data (is_train=False)
    Entering the `_preprocess` method: 9769 rows of data (is_train=False)
    Entering the `_preprocess` method: 9769 rows of data (is_train=False)


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>model</th>
          <th>score_test</th>
          <th>score_val</th>
          <th>pred_time_test</th>
          <th>pred_time_val</th>
          <th>fit_time</th>
          <th>pred_time_test_marginal</th>
          <th>pred_time_val_marginal</th>
          <th>fit_time_marginal</th>
          <th>stack_level</th>
          <th>can_infer</th>
          <th>fit_order</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>CustomRandomForestModel_2</td>
          <td>0.846044</td>
          <td>0.845</td>
          <td>0.138547</td>
          <td>0.062872</td>
          <td>1.193779</td>
          <td>0.138547</td>
          <td>0.062872</td>
          <td>1.193779</td>
          <td>1</td>
          <td>True</td>
          <td>2</td>
        </tr>
        <tr>
          <th>1</th>
          <td>CustomRandomForestModel</td>
          <td>0.840414</td>
          <td>0.840</td>
          <td>0.141398</td>
          <td>0.061999</td>
          <td>1.199242</td>
          <td>0.141398</td>
          <td>0.061999</td>
          <td>1.199242</td>
          <td>1</td>
          <td>True</td>
          <td>1</td>
        </tr>
        <tr>
          <th>2</th>
          <td>WeightedEnsemble_L2</td>
          <td>0.839390</td>
          <td>0.855</td>
          <td>0.421377</td>
          <td>0.188749</td>
          <td>3.698206</td>
          <td>0.003341</td>
          <td>0.000844</td>
          <td>0.113758</td>
          <td>2</td>
          <td>True</td>
          <td>4</td>
        </tr>
        <tr>
          <th>3</th>
          <td>CustomRandomForestModel_3</td>
          <td>0.828744</td>
          <td>0.835</td>
          <td>0.138091</td>
          <td>0.063035</td>
          <td>1.191428</td>
          <td>0.138091</td>
          <td>0.063035</td>
          <td>1.191428</td>
          <td>1</td>
          <td>True</td>
          <td>3</td>
        </tr>
      </tbody>
    </table>
    </div>


Predict with fit predictor
~~~~~~~~~~~~~~~~~~~~~~~~~~

Here we predict with the fit predictor. This will automatically use the
best model (the one with highest score_val) to predict.

.. code:: python

    y_pred = predictor.predict(test_data)
    # y_pred = predictor.predict(test_data, model='CustomRandomForestModel_3')  # If we want a specific model to predict
    y_pred.head(5)


.. parsed-literal::
    :class: output

    Entering the `_preprocess` method: 9769 rows of data (is_train=False)
    Entering the `_preprocess` method: 9769 rows of data (is_train=False)
    Entering the `_preprocess` method: 9769 rows of data (is_train=False)


.. parsed-literal::
    :class: output

    0     <=50K
    1     <=50K
    2      >50K
    3     <=50K
    4     <=50K
    Name: class, dtype: object


Hyperparameter tuning a custom model with TabularPredictor
----------------------------------------------------------

We can easily hyperparameter tune custom models by specifying a
hyperparameter search space in-place of exact values.

Here we hyperparameter tune the custom model for 20 seconds:

.. code:: python

    from autogluon.core.space import Categorical, Int, Real
    custom_hyperparameters_hpo = {CustomRandomForestModel: {
        'max_depth': Int(lower=5, upper=30),
        'max_features': Real(lower=0.1, upper=1.0),
        'criterion': Categorical('gini', 'entropy'),
    }}
    # Hyperparameter tune CustomRandomForestModel for 20 seconds
    predictor = TabularPredictor(label=label).fit(train_data,
                                                  hyperparameters=custom_hyperparameters_hpo,
                                                  hyperparameter_tune_kwargs='auto',  # enables HPO
                                                  time_limit=20)


.. parsed-literal::
    :class: output

    No path specified. Models will be saved in: "AutogluonModels/ag-20221213_014245/"
    Warning: hyperparameter tuning is currently experimental and may cause the process to hang.
    Beginning AutoGluon training ... Time limit = 20s
    AutoGluon will save models to "AutogluonModels/ag-20221213_014245/"
    AutoGluon Version:  0.6.1b20221213
    Python Version:     3.8.10
    Operating System:   Linux
    Platform Machine:   x86_64
    Platform Version:   #1 SMP Tue Nov 30 00:17:50 UTC 2021
    Train Data Rows:    1000
    Train Data Columns: 14
    Label Column: class
    Preprocessing data ...
    AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    	2 unique label values:  [' >50K', ' <=50K']
    	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
    	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
    	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
    Using Feature Generators to preprocess the data ...
    Fitting AutoMLPipelineFeatureGenerator...
    	Available Memory:                    31553.64 MB
    	Train Data (Original)  Memory Usage: 0.59 MB (0.0% of available memory)
    	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
    	Stage 1 Generators:
    		Fitting AsTypeFeatureGenerator...
    			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
    	Stage 2 Generators:
    		Fitting FillNaFeatureGenerator...
    	Stage 3 Generators:
    		Fitting IdentityFeatureGenerator...
    		Fitting CategoryFeatureGenerator...
    			Fitting CategoryMemoryMinimizeFeatureGenerator...
    	Stage 4 Generators:
    		Fitting DropUniqueFeatureGenerator...
    	Types of features in original data (raw dtype, special dtypes):
    		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
    		('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
    	Types of features in processed data (raw dtype, special dtypes):
    		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
    		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
    		('int', ['bool']) : 1 | ['sex']
    	0.1s = Fit runtime
    	14 features in original data used to generate 14 features in processed data.
    	Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
    Data preprocessing and feature engineering runtime = 0.1s ...
    AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
    	To change this, specify the eval_metric parameter of Predictor()
    Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 800, Val Rows: 200
    Custom Model Type Detected: <class '__main__.CustomRandomForestModel'>
    Fitting 1 L1 models ...
    Hyperparameter tuning model: CustomRandomForestModel ... Tuning model for up to 17.91s of the 19.9s of remaining time.


.. parsed-literal::
    :class: output

    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 5, 'max_features': 0.1, 'criterion': 'gini'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 20, 'max_features': 0.7436704297351775, 'criterion': 'gini'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 8, 'max_features': 0.8625265649057129, 'criterion': 'entropy'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 26, 'max_features': 0.4459435365634299, 'criterion': 'entropy'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 11, 'max_features': 0.15104167958569886, 'criterion': 'entropy'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 6, 'max_features': 0.8125525342743981, 'criterion': 'gini'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 19, 'max_features': 0.6112401049845391, 'criterion': 'entropy'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 30, 'max_features': 0.16393245237809825, 'criterion': 'entropy'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 25, 'max_features': 0.11819655769629316, 'criterion': 'entropy'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 10, 'max_features': 0.8003410758548655, 'criterion': 'entropy'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 5, 'max_features': 0.9807565080094875, 'criterion': 'entropy'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 22, 'max_features': 0.5153314260276387, 'criterion': 'gini'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 24, 'max_features': 0.20644698328203992, 'criterion': 'entropy'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 6, 'max_features': 0.22901795866814179, 'criterion': 'gini'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 5, 'max_features': 0.5696634895750645, 'criterion': 'entropy'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 28, 'max_features': 0.3381000508941643, 'criterion': 'gini'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 23, 'max_features': 0.5105352989948937, 'criterion': 'entropy'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 5, 'max_features': 0.11691082039271963, 'criterion': 'gini'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 10, 'max_features': 0.6508861504501793, 'criterion': 'entropy'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 22, 'max_features': 0.9493732706631618, 'criterion': 'gini'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 15, 'max_features': 0.42355711051640743, 'criterion': 'entropy'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 6, 'max_features': 0.7278680763345383, 'criterion': 'gini'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 30, 'max_features': 0.7000900439011009, 'criterion': 'entropy'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 16, 'max_features': 0.2893443049664568, 'criterion': 'entropy'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 5, 'max_features': 0.38388551583176544, 'criterion': 'entropy'}
    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)


.. parsed-literal::
    :class: output

    	Stopping HPO to satisfy time limit...
    Fitted model: CustomRandomForestModel/T1 ...
    	0.805	 = Validation score   (accuracy)
    	0.57s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T2 ...
    	0.835	 = Validation score   (accuracy)
    	0.59s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T3 ...
    	0.825	 = Validation score   (accuracy)
    	0.59s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T4 ...
    	0.855	 = Validation score   (accuracy)
    	0.59s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T5 ...
    	0.835	 = Validation score   (accuracy)
    	0.6s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T6 ...
    	0.83	 = Validation score   (accuracy)
    	0.57s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T7 ...
    	0.845	 = Validation score   (accuracy)
    	0.59s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T8 ...
    	0.845	 = Validation score   (accuracy)
    	0.61s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T9 ...
    	0.84	 = Validation score   (accuracy)
    	0.61s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T10 ...
    	0.845	 = Validation score   (accuracy)
    	0.58s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T11 ...
    	0.85	 = Validation score   (accuracy)
    	0.58s	 = Training   runtime
    	0.07s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T12 ...
    	0.835	 = Validation score   (accuracy)
    	0.58s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T13 ...
    	0.84	 = Validation score   (accuracy)
    	0.6s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T14 ...
    	0.835	 = Validation score   (accuracy)
    	0.58s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T15 ...
    	0.845	 = Validation score   (accuracy)
    	0.57s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T16 ...
    	0.85	 = Validation score   (accuracy)
    	0.59s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T17 ...
    	0.85	 = Validation score   (accuracy)
    	0.58s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T18 ...
    	0.805	 = Validation score   (accuracy)
    	0.57s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T19 ...
    	0.845	 = Validation score   (accuracy)
    	0.57s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T20 ...
    	0.835	 = Validation score   (accuracy)
    	0.59s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T21 ...
    	0.85	 = Validation score   (accuracy)
    	0.59s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T22 ...
    	0.83	 = Validation score   (accuracy)
    	0.58s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T23 ...
    	0.845	 = Validation score   (accuracy)
    	0.59s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T24 ...
    	0.845	 = Validation score   (accuracy)
    	0.59s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitted model: CustomRandomForestModel/T25 ...
    	0.845	 = Validation score   (accuracy)
    	0.57s	 = Training   runtime
    	0.06s	 = Validation runtime


.. parsed-literal::
    :class: output

    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)
    Entering the `_preprocess` method: 200 rows of data (is_train=False)


.. parsed-literal::
    :class: output

    Fitting model: WeightedEnsemble_L2 ... Training model for up to 19.9s of the -0.74s of remaining time.


.. parsed-literal::
    :class: output

    Entering the `_preprocess` method: 200 rows of data (is_train=False)


.. parsed-literal::
    :class: output

    	0.86	 = Validation score   (accuracy)
    	0.16s	 = Training   runtime
    	0.0s	 = Validation runtime
    AutoGluon training complete, total runtime = 21.52s ... Best model: "WeightedEnsemble_L2"
    TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20221213_014245/")


Predictor leaderboard (HPO)
~~~~~~~~~~~~~~~~~~~~~~~~~~~

The leaderboard for the HPO run will show models with suffix ``'/Tx'``
in their name. This indicates the HPO trial they were performed in.

.. code:: python

    leaderboard_hpo = predictor.leaderboard(silent=True)
    leaderboard_hpo


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>model</th>
          <th>score_val</th>
          <th>pred_time_val</th>
          <th>fit_time</th>
          <th>pred_time_val_marginal</th>
          <th>fit_time_marginal</th>
          <th>stack_level</th>
          <th>can_infer</th>
          <th>fit_order</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>WeightedEnsemble_L2</td>
          <td>0.860</td>
          <td>0.124356</td>
          <td>1.326905</td>
          <td>0.000749</td>
          <td>0.157713</td>
          <td>2</td>
          <td>True</td>
          <td>26</td>
        </tr>
        <tr>
          <th>1</th>
          <td>CustomRandomForestModel/T4</td>
          <td>0.855</td>
          <td>0.060825</td>
          <td>0.591918</td>
          <td>0.060825</td>
          <td>0.591918</td>
          <td>1</td>
          <td>True</td>
          <td>4</td>
        </tr>
        <tr>
          <th>2</th>
          <td>CustomRandomForestModel/T21</td>
          <td>0.850</td>
          <td>0.061973</td>
          <td>0.589022</td>
          <td>0.061973</td>
          <td>0.589022</td>
          <td>1</td>
          <td>True</td>
          <td>21</td>
        </tr>
        <tr>
          <th>3</th>
          <td>CustomRandomForestModel/T16</td>
          <td>0.850</td>
          <td>0.062684</td>
          <td>0.590520</td>
          <td>0.062684</td>
          <td>0.590520</td>
          <td>1</td>
          <td>True</td>
          <td>16</td>
        </tr>
        <tr>
          <th>4</th>
          <td>CustomRandomForestModel/T17</td>
          <td>0.850</td>
          <td>0.062782</td>
          <td>0.577274</td>
          <td>0.062782</td>
          <td>0.577274</td>
          <td>1</td>
          <td>True</td>
          <td>17</td>
        </tr>
        <tr>
          <th>5</th>
          <td>CustomRandomForestModel/T11</td>
          <td>0.850</td>
          <td>0.065084</td>
          <td>0.581048</td>
          <td>0.065084</td>
          <td>0.581048</td>
          <td>1</td>
          <td>True</td>
          <td>11</td>
        </tr>
        <tr>
          <th>6</th>
          <td>CustomRandomForestModel/T15</td>
          <td>0.845</td>
          <td>0.060246</td>
          <td>0.568683</td>
          <td>0.060246</td>
          <td>0.568683</td>
          <td>1</td>
          <td>True</td>
          <td>15</td>
        </tr>
        <tr>
          <th>7</th>
          <td>CustomRandomForestModel/T10</td>
          <td>0.845</td>
          <td>0.060620</td>
          <td>0.582077</td>
          <td>0.060620</td>
          <td>0.582077</td>
          <td>1</td>
          <td>True</td>
          <td>10</td>
        </tr>
        <tr>
          <th>8</th>
          <td>CustomRandomForestModel/T7</td>
          <td>0.845</td>
          <td>0.061639</td>
          <td>0.587917</td>
          <td>0.061639</td>
          <td>0.587917</td>
          <td>1</td>
          <td>True</td>
          <td>7</td>
        </tr>
        <tr>
          <th>9</th>
          <td>CustomRandomForestModel/T24</td>
          <td>0.845</td>
          <td>0.061696</td>
          <td>0.586394</td>
          <td>0.061696</td>
          <td>0.586394</td>
          <td>1</td>
          <td>True</td>
          <td>24</td>
        </tr>
        <tr>
          <th>10</th>
          <td>CustomRandomForestModel/T25</td>
          <td>0.845</td>
          <td>0.062128</td>
          <td>0.568804</td>
          <td>0.062128</td>
          <td>0.568804</td>
          <td>1</td>
          <td>True</td>
          <td>25</td>
        </tr>
        <tr>
          <th>11</th>
          <td>CustomRandomForestModel/T8</td>
          <td>0.845</td>
          <td>0.062556</td>
          <td>0.605307</td>
          <td>0.062556</td>
          <td>0.605307</td>
          <td>1</td>
          <td>True</td>
          <td>8</td>
        </tr>
        <tr>
          <th>12</th>
          <td>CustomRandomForestModel/T23</td>
          <td>0.845</td>
          <td>0.062659</td>
          <td>0.591935</td>
          <td>0.062659</td>
          <td>0.591935</td>
          <td>1</td>
          <td>True</td>
          <td>23</td>
        </tr>
        <tr>
          <th>13</th>
          <td>CustomRandomForestModel/T19</td>
          <td>0.845</td>
          <td>0.062919</td>
          <td>0.566890</td>
          <td>0.062919</td>
          <td>0.566890</td>
          <td>1</td>
          <td>True</td>
          <td>19</td>
        </tr>
        <tr>
          <th>14</th>
          <td>CustomRandomForestModel/T13</td>
          <td>0.840</td>
          <td>0.062589</td>
          <td>0.601684</td>
          <td>0.062589</td>
          <td>0.601684</td>
          <td>1</td>
          <td>True</td>
          <td>13</td>
        </tr>
        <tr>
          <th>15</th>
          <td>CustomRandomForestModel/T9</td>
          <td>0.840</td>
          <td>0.064390</td>
          <td>0.608347</td>
          <td>0.064390</td>
          <td>0.608347</td>
          <td>1</td>
          <td>True</td>
          <td>9</td>
        </tr>
        <tr>
          <th>16</th>
          <td>CustomRandomForestModel/T20</td>
          <td>0.835</td>
          <td>0.060931</td>
          <td>0.590263</td>
          <td>0.060931</td>
          <td>0.590263</td>
          <td>1</td>
          <td>True</td>
          <td>20</td>
        </tr>
        <tr>
          <th>17</th>
          <td>CustomRandomForestModel/T2</td>
          <td>0.835</td>
          <td>0.061561</td>
          <td>0.586263</td>
          <td>0.061561</td>
          <td>0.586263</td>
          <td>1</td>
          <td>True</td>
          <td>2</td>
        </tr>
        <tr>
          <th>18</th>
          <td>CustomRandomForestModel/T12</td>
          <td>0.835</td>
          <td>0.061966</td>
          <td>0.584415</td>
          <td>0.061966</td>
          <td>0.584415</td>
          <td>1</td>
          <td>True</td>
          <td>12</td>
        </tr>
        <tr>
          <th>19</th>
          <td>CustomRandomForestModel/T14</td>
          <td>0.835</td>
          <td>0.062724</td>
          <td>0.579637</td>
          <td>0.062724</td>
          <td>0.579637</td>
          <td>1</td>
          <td>True</td>
          <td>14</td>
        </tr>
        <tr>
          <th>20</th>
          <td>CustomRandomForestModel/T5</td>
          <td>0.835</td>
          <td>0.063126</td>
          <td>0.597193</td>
          <td>0.063126</td>
          <td>0.597193</td>
          <td>1</td>
          <td>True</td>
          <td>5</td>
        </tr>
        <tr>
          <th>21</th>
          <td>CustomRandomForestModel/T22</td>
          <td>0.830</td>
          <td>0.061154</td>
          <td>0.575902</td>
          <td>0.061154</td>
          <td>0.575902</td>
          <td>1</td>
          <td>True</td>
          <td>22</td>
        </tr>
        <tr>
          <th>22</th>
          <td>CustomRandomForestModel/T6</td>
          <td>0.830</td>
          <td>0.061339</td>
          <td>0.570987</td>
          <td>0.061339</td>
          <td>0.570987</td>
          <td>1</td>
          <td>True</td>
          <td>6</td>
        </tr>
        <tr>
          <th>23</th>
          <td>CustomRandomForestModel/T3</td>
          <td>0.825</td>
          <td>0.063802</td>
          <td>0.594073</td>
          <td>0.063802</td>
          <td>0.594073</td>
          <td>1</td>
          <td>True</td>
          <td>3</td>
        </tr>
        <tr>
          <th>24</th>
          <td>CustomRandomForestModel/T1</td>
          <td>0.805</td>
          <td>0.060655</td>
          <td>0.565793</td>
          <td>0.060655</td>
          <td>0.565793</td>
          <td>1</td>
          <td>True</td>
          <td>1</td>
        </tr>
        <tr>
          <th>25</th>
          <td>CustomRandomForestModel/T18</td>
          <td>0.805</td>
          <td>0.063372</td>
          <td>0.565666</td>
          <td>0.063372</td>
          <td>0.565666</td>
          <td>1</td>
          <td>True</td>
          <td>18</td>
        </tr>
      </tbody>
    </table>
    </div>


Getting the hyperparameters of a trained model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let’s get the hyperparameters of the model with the highest validation
score.

.. code:: python

    best_model_name = leaderboard_hpo[leaderboard_hpo['stack_level'] == 1]['model'].iloc[0]
    
    predictor_info = predictor.info()
    best_model_info = predictor_info['model_info'][best_model_name]
    
    print(best_model_info)
    
    print(f'Best Model Hyperparameters ({best_model_name}):')
    print(best_model_info['hyperparameters'])


.. parsed-literal::
    :class: output

    {'name': 'CustomRandomForestModel/T4', 'model_type': 'CustomRandomForestModel', 'problem_type': 'binary', 'eval_metric': 'accuracy', 'stopping_metric': 'accuracy', 'fit_time': 0.5919175148010254, 'num_classes': 2, 'quantile_levels': None, 'predict_time': 0.06082510948181152, 'val_score': 0.855, 'hyperparameters': {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 26, 'max_features': 0.4459435365634299, 'criterion': 'entropy'}, 'hyperparameters_fit': {}, 'hyperparameters_nondefault': ['max_depth', 'max_features', 'criterion', 'n_estimators', 'n_jobs', 'random_state'], 'ag_args_fit': {'max_memory_usage_ratio': 1.0, 'max_time_limit_ratio': 1.0, 'max_time_limit': None, 'min_time_limit': 0, 'valid_raw_types': ['int', 'float', 'category'], 'valid_special_types': None, 'ignored_type_group_special': None, 'ignored_type_group_raw': None, 'get_features_kwargs': None, 'get_features_kwargs_extra': None, 'predict_1_batch_size': None, 'temperature_scalar': None}, 'num_features': 14, 'features': ['age', 'fnlwgt', 'education-num', 'sex', 'capital-gain', 'capital-loss', 'hours-per-week', 'workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'native-country'], 'feature_metadata': <autogluon.common.features.feature_metadata.FeatureMetadata object at 0x7f90e1234730>, 'memory_size': 4331673, 'compile_time': None}
    Best Model Hyperparameters (CustomRandomForestModel/T4):
    {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 26, 'max_features': 0.4459435365634299, 'criterion': 'entropy'}


Training a custom model alongside other models with TabularPredictor
--------------------------------------------------------------------

Finally, we will train the custom model (with tuned hyperparameters)
alongside the default AutoGluon models.

All this requires is getting the hyperparameter dictionary of the
default models via ``get_hyperparameter_config``, and adding
CustomRandomForestModel as a key.

.. code:: python

    from autogluon.tabular.configs.hyperparameter_configs import get_hyperparameter_config
    
    # Now we can add the custom model with tuned hyperparameters to be trained alongside the default models:
    custom_hyperparameters = get_hyperparameter_config('default')
    
    custom_hyperparameters[CustomRandomForestModel] = best_model_info['hyperparameters']
    
    print(custom_hyperparameters)


.. parsed-literal::
    :class: output

    {'NN_TORCH': {}, 'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge'], 'CAT': {}, 'XGB': {}, 'FASTAI': {}, 'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}], 'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}], 'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}], <class '__main__.CustomRandomForestModel'>: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 26, 'max_features': 0.4459435365634299, 'criterion': 'entropy'}}


.. code:: python

    predictor = TabularPredictor(label=label).fit(train_data, hyperparameters=custom_hyperparameters)  # Train the default models plus a single tuned CustomRandomForestModel
    # predictor = TabularPredictor(label=label).fit(train_data, hyperparameters=custom_hyperparameters, presets='best_quality')  # We can even use the custom model in a multi-layer stack ensemble
    predictor.leaderboard(test_data, silent=True)


.. parsed-literal::
    :class: output

    No path specified. Models will be saved in: "AutogluonModels/ag-20221213_014308/"
    Beginning AutoGluon training ...
    AutoGluon will save models to "AutogluonModels/ag-20221213_014308/"
    AutoGluon Version:  0.6.1b20221213
    Python Version:     3.8.10
    Operating System:   Linux
    Platform Machine:   x86_64
    Platform Version:   #1 SMP Tue Nov 30 00:17:50 UTC 2021
    Train Data Rows:    1000
    Train Data Columns: 14
    Label Column: class
    Preprocessing data ...
    AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    	2 unique label values:  [' >50K', ' <=50K']
    	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
    	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
    	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
    Using Feature Generators to preprocess the data ...
    Fitting AutoMLPipelineFeatureGenerator...
    	Available Memory:                    31532.41 MB
    	Train Data (Original)  Memory Usage: 0.59 MB (0.0% of available memory)
    	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
    	Stage 1 Generators:
    		Fitting AsTypeFeatureGenerator...
    			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
    	Stage 2 Generators:
    		Fitting FillNaFeatureGenerator...
    	Stage 3 Generators:
    		Fitting IdentityFeatureGenerator...
    		Fitting CategoryFeatureGenerator...
    			Fitting CategoryMemoryMinimizeFeatureGenerator...
    	Stage 4 Generators:
    		Fitting DropUniqueFeatureGenerator...
    	Types of features in original data (raw dtype, special dtypes):
    		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
    		('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
    	Types of features in processed data (raw dtype, special dtypes):
    		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
    		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
    		('int', ['bool']) : 1 | ['sex']
    	0.1s = Fit runtime
    	14 features in original data used to generate 14 features in processed data.
    	Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
    Data preprocessing and feature engineering runtime = 0.09s ...
    AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
    	To change this, specify the eval_metric parameter of Predictor()
    Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 800, Val Rows: 200
    Custom Model Type Detected: <class '__main__.CustomRandomForestModel'>
    Fitting 14 L1 models ...
    Fitting model: KNeighborsUnif ...
    	0.725	 = Validation score   (accuracy)
    	0.6s	 = Training   runtime
    	0.01s	 = Validation runtime
    Fitting model: KNeighborsDist ...
    	0.71	 = Validation score   (accuracy)
    	0.6s	 = Training   runtime
    	0.01s	 = Validation runtime
    Fitting model: LightGBMXT ...
    	0.85	 = Validation score   (accuracy)
    	1.31s	 = Training   runtime
    	0.01s	 = Validation runtime
    Fitting model: LightGBM ...
    	0.84	 = Validation score   (accuracy)
    	0.92s	 = Training   runtime
    	0.01s	 = Validation runtime
    Fitting model: RandomForestGini ...
    	0.845	 = Validation score   (accuracy)
    	1.12s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitting model: RandomForestEntr ...
    	0.835	 = Validation score   (accuracy)
    	1.13s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitting model: CatBoost ...
    	0.86	 = Validation score   (accuracy)
    	2.48s	 = Training   runtime
    	0.01s	 = Validation runtime
    Fitting model: ExtraTreesGini ...
    	0.82	 = Validation score   (accuracy)
    	1.12s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitting model: ExtraTreesEntr ...
    	0.82	 = Validation score   (accuracy)
    	1.12s	 = Training   runtime
    	0.06s	 = Validation runtime
    Fitting model: NeuralNetFastAI ...
    No improvement since epoch 7: early stopping
    	0.86	 = Validation score   (accuracy)
    	3.29s	 = Training   runtime
    	0.01s	 = Validation runtime
    Fitting model: XGBoost ...
    	0.85	 = Validation score   (accuracy)
    	0.3s	 = Training   runtime
    	0.01s	 = Validation runtime
    Fitting model: NeuralNetTorch ...
    	0.85	 = Validation score   (accuracy)
    	3.42s	 = Training   runtime
    	0.01s	 = Validation runtime
    Fitting model: LightGBMLarge ...
    	0.815	 = Validation score   (accuracy)
    	1.11s	 = Training   runtime
    	0.01s	 = Validation runtime
    Fitting model: CustomRandomForestModel ...


.. parsed-literal::
    :class: output

    Entering the `_fit` method
    Entering the `_preprocess` method: 800 rows of data (is_train=True)
    Hyperparameters: {'n_estimators': 300, 'n_jobs': -1, 'random_state': 0, 'max_depth': 26, 'max_features': 0.4459435365634299, 'criterion': 'entropy'}


.. parsed-literal::
    :class: output

    	0.855	 = Validation score   (accuracy)
    	0.6s	 = Training   runtime
    	0.07s	 = Validation runtime
    Fitting model: WeightedEnsemble_L2 ...


.. parsed-literal::
    :class: output

    Exiting the `_fit` method
    Entering the `_preprocess` method: 200 rows of data (is_train=False)


.. parsed-literal::
    :class: output

    	0.88	 = Validation score   (accuracy)
    	0.38s	 = Training   runtime
    	0.0s	 = Validation runtime
    AutoGluon training complete, total runtime = 20.19s ... Best model: "WeightedEnsemble_L2"
    TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20221213_014308/")


.. parsed-literal::
    :class: output

    Entering the `_preprocess` method: 9769 rows of data (is_train=False)


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>model</th>
          <th>score_test</th>
          <th>score_val</th>
          <th>pred_time_test</th>
          <th>pred_time_val</th>
          <th>fit_time</th>
          <th>pred_time_test_marginal</th>
          <th>pred_time_val_marginal</th>
          <th>fit_time_marginal</th>
          <th>stack_level</th>
          <th>can_infer</th>
          <th>fit_order</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>CatBoost</td>
          <td>0.852902</td>
          <td>0.860</td>
          <td>0.016674</td>
          <td>0.005907</td>
          <td>2.476585</td>
          <td>0.016674</td>
          <td>0.005907</td>
          <td>2.476585</td>
          <td>1</td>
          <td>True</td>
          <td>7</td>
        </tr>
        <tr>
          <th>1</th>
          <td>WeightedEnsemble_L2</td>
          <td>0.852083</td>
          <td>0.880</td>
          <td>0.369239</td>
          <td>0.149117</td>
          <td>5.801734</td>
          <td>0.004375</td>
          <td>0.000788</td>
          <td>0.382812</td>
          <td>2</td>
          <td>True</td>
          <td>15</td>
        </tr>
        <tr>
          <th>2</th>
          <td>LightGBMXT</td>
          <td>0.850752</td>
          <td>0.850</td>
          <td>0.017225</td>
          <td>0.005951</td>
          <td>1.313833</td>
          <td>0.017225</td>
          <td>0.005951</td>
          <td>1.313833</td>
          <td>1</td>
          <td>True</td>
          <td>3</td>
        </tr>
        <tr>
          <th>3</th>
          <td>XGBoost</td>
          <td>0.850036</td>
          <td>0.850</td>
          <td>0.039560</td>
          <td>0.007545</td>
          <td>0.303070</td>
          <td>0.039560</td>
          <td>0.007545</td>
          <td>0.303070</td>
          <td>1</td>
          <td>True</td>
          <td>11</td>
        </tr>
        <tr>
          <th>4</th>
          <td>NeuralNetFastAI</td>
          <td>0.841437</td>
          <td>0.860</td>
          <td>0.158014</td>
          <td>0.014076</td>
          <td>3.293159</td>
          <td>0.158014</td>
          <td>0.014076</td>
          <td>3.293159</td>
          <td>1</td>
          <td>True</td>
          <td>10</td>
        </tr>
        <tr>
          <th>5</th>
          <td>LightGBM</td>
          <td>0.841335</td>
          <td>0.840</td>
          <td>0.013739</td>
          <td>0.005840</td>
          <td>0.918098</td>
          <td>0.013739</td>
          <td>0.005840</td>
          <td>0.918098</td>
          <td>1</td>
          <td>True</td>
          <td>4</td>
        </tr>
        <tr>
          <th>6</th>
          <td>RandomForestGini</td>
          <td>0.839492</td>
          <td>0.845</td>
          <td>0.144891</td>
          <td>0.062550</td>
          <td>1.118562</td>
          <td>0.144891</td>
          <td>0.062550</td>
          <td>1.118562</td>
          <td>1</td>
          <td>True</td>
          <td>5</td>
        </tr>
        <tr>
          <th>7</th>
          <td>RandomForestEntr</td>
          <td>0.838162</td>
          <td>0.835</td>
          <td>0.142712</td>
          <td>0.063799</td>
          <td>1.126035</td>
          <td>0.142712</td>
          <td>0.063799</td>
          <td>1.126035</td>
          <td>1</td>
          <td>True</td>
          <td>6</td>
        </tr>
        <tr>
          <th>8</th>
          <td>NeuralNetTorch</td>
          <td>0.836524</td>
          <td>0.850</td>
          <td>0.059845</td>
          <td>0.013888</td>
          <td>3.421460</td>
          <td>0.059845</td>
          <td>0.013888</td>
          <td>3.421460</td>
          <td>1</td>
          <td>True</td>
          <td>12</td>
        </tr>
        <tr>
          <th>9</th>
          <td>CustomRandomForestModel</td>
          <td>0.835091</td>
          <td>0.855</td>
          <td>0.142135</td>
          <td>0.065328</td>
          <td>0.599762</td>
          <td>0.142135</td>
          <td>0.065328</td>
          <td>0.599762</td>
          <td>1</td>
          <td>True</td>
          <td>14</td>
        </tr>
        <tr>
          <th>10</th>
          <td>LightGBMLarge</td>
          <td>0.832122</td>
          <td>0.815</td>
          <td>0.069972</td>
          <td>0.006558</td>
          <td>1.109222</td>
          <td>0.069972</td>
          <td>0.006558</td>
          <td>1.109222</td>
          <td>1</td>
          <td>True</td>
          <td>13</td>
        </tr>
        <tr>
          <th>11</th>
          <td>ExtraTreesGini</td>
          <td>0.831303</td>
          <td>0.820</td>
          <td>0.145072</td>
          <td>0.064440</td>
          <td>1.119756</td>
          <td>0.145072</td>
          <td>0.064440</td>
          <td>1.119756</td>
          <td>1</td>
          <td>True</td>
          <td>8</td>
        </tr>
        <tr>
          <th>12</th>
          <td>ExtraTreesEntr</td>
          <td>0.829358</td>
          <td>0.820</td>
          <td>0.152757</td>
          <td>0.063709</td>
          <td>1.121407</td>
          <td>0.152757</td>
          <td>0.063709</td>
          <td>1.121407</td>
          <td>1</td>
          <td>True</td>
          <td>9</td>
        </tr>
        <tr>
          <th>13</th>
          <td>KNeighborsUnif</td>
          <td>0.744600</td>
          <td>0.725</td>
          <td>0.032326</td>
          <td>0.006918</td>
          <td>0.603907</td>
          <td>0.032326</td>
          <td>0.006918</td>
          <td>0.603907</td>
          <td>1</td>
          <td>True</td>
          <td>1</td>
        </tr>
        <tr>
          <th>14</th>
          <td>KNeighborsDist</td>
          <td>0.710922</td>
          <td>0.710</td>
          <td>0.035227</td>
          <td>0.006434</td>
          <td>0.602240</td>
          <td>0.035227</td>
          <td>0.006434</td>
          <td>0.602240</td>
          <td>1</td>
          <td>True</td>
          <td>2</td>
        </tr>
      </tbody>
    </table>
    </div>


Wrapping up
-----------

That’s all it takes to add a custom model to AutoGluon. If you create a
custom model, consider `submitting a
PR <https://github.com/autogluon/autogluon/pulls>`__ so that we can add
it officially to AutoGluon!

For more tutorials, refer to :ref:`sec_tabularquick` and
:ref:`sec_tabularadvanced`.

For a tutorial on advanced custom models, refer to
:ref:`sec_tabularcustommodeladvanced`