.. _sec_tabularcustommetric:

Adding a custom metric to AutoGluon
===================================


**Tip**: If you are new to AutoGluon, review :ref:`sec_tabularquick`
to learn the basics of the AutoGluon API.

This tutorial describes how to add a custom evaluation metric to
AutoGluon that is used to inform validation scores, model ensembling,
hyperparameter tuning, and more.

In this example, we show a variety of evaluation metrics and how to
convert them to an AutoGluon Scorer, which can then be passed to
AutoGluon models and predictors.

First, we will randomly generate 10 ground truth labels and predictions,
and show how to calculate metric scores from them.

.. code:: python

    import numpy as np
    y_true = np.random.randint(low=0, high=2, size=10)
    y_pred = np.random.randint(low=0, high=2, size=10)
    
    print(f'y_true: {y_true}')
    print(f'y_pred: {y_pred}')


.. parsed-literal::
    :class: output

    y_true: [0 0 1 1 1 0 1 0 0 1]
    y_pred: [1 1 1 0 1 1 1 1 1 0]


Ensuring Metric is Serializable
-------------------------------

You must define your custom metric in a separate python file that is
imported for it to be serializable (able to be pickled). If this is not
done, AutoGluon will crash during fit when trying to parallelize model
training with Ray. In the below example, you would want to create a new
python file such as ``my_metrics.py`` with ``ag_accuracy_scorer``
defined in it, and then use it via
``from my_metrics import ag_accuracy_scorer``.

If your metric is not serializable, you will get many errors similar to:
``_pickle.PicklingError: Can't pickle``. Refer to
https://github.com/awslabs/autogluon/issues/1637 for an example.

The custom metrics in this tutorial are **not** serializable for ease of
demonstration. If ``best_quality`` preset was used, it would crash.

Custom Accuracy Metric
----------------------

We will start with calculating accuracy. A prediction is correct if the
predicted value is the same as the true value, otherwise it is wrong.

.. code:: python

    import sklearn.metrics
    sklearn.metrics.accuracy_score(y_true, y_pred)


.. parsed-literal::
    :class: output

    0.3


There are a variety of limitations with the above logic. For example,
without outside knowledge of the metric it is unknown: 1. What the
optimal value is (1) 2. If higher values are better (True) 3. If the
metric requires prediction labels or probabilities (labels)

Now, let’s convert this evaluation metric to an AutoGluon Scorer to
address these limitations.

We do this by calling ``autogluon.core.metrics.make_scorer``.

.. code:: python

    from autogluon.core.metrics import make_scorer
    ag_accuracy_scorer = make_scorer(name='accuracy',
                                     score_func=sklearn.metrics.accuracy_score,
                                     optimum=1,
                                     greater_is_better=True)

When creating the Scorer, we need to specify a name for the Scorer. This
does not need to be any particular value, but is used when printing
information about the Scorer during training.

Next, we specify the ``score_func``. This is the function we want to
wrap, in this case, sklearn’s ``accuracy_score`` function.

We then need to specify the ``optimum`` value. This is necessary when
calculating ``error`` (also known as ``regret``) as opposed to
``score``. ``error`` is defined as ``sign * optimum - score``, where
``sign=1`` if ``greater_is_better=True``, else ``sign=-1``. It is also
useful to identify when a score is optimal and cannot be improved.
Because the best possible value from ``sklearn.metrics.accuracy_score``
is ``1``, we specify ``optimum=1``.

Finally, we need to specify ``greater_is_better``. In this case,
``greater_is_better=True`` because the best value returned is 1, and the
worst value returned is less than 1 (0). It is very important to set
this value correctly, otherwise AutoGluon will try to optimize for the
**worst** model instead of the best.

**Advanced Note**: ``optimum`` must correspond to the optimal value from
the original metric callable (in this case
``sklearn.metrics.accuracy_score``). Hypothetically, if a metric
callable was ``greater_is_better=False`` with an optimal value of
``-2``, you should specify ``optimum=-2, greater_is_better=False``. In
this case, if ``raw_metric_value=-0.5`` then Scorer would return
``score=0.5`` to enforce higher_is_better
(``score = sign * raw_metric_value``). Scorer’s error would be
``error=1.5`` because ``sign (-1) * optimum (-2) - score (0.5) = 1.5``

Once created, the AutoGluon Scorer can be called in the same fashion as
the original metric to compute ``score``.

.. code:: python

    # score
    ag_accuracy_scorer(y_true, y_pred)


.. parsed-literal::
    :class: output

    0.3


Alternatively, ``.score`` is an alias to the above callable for
convenience:

.. code:: python

    ag_accuracy_scorer.score(y_true, y_pred)


.. parsed-literal::
    :class: output

    0.3


To get the error instead of score:

.. code:: python

    # error, error=sign*optimum-score -> error=1*1-score -> error=1-score
    ag_accuracy_scorer.error(y_true, y_pred)
    
    # Can also convert score to error:
    # score = ag_accuracy_scorer(y_true, y_pred)
    # error = ag_accuracy_scorer.convert_score_to_error(score)


.. parsed-literal::
    :class: output

    0.7


Note that ``score`` is in ``higher_is_better`` format, while error is in
``lower_is_better`` format. An error of 0 corresponds to a perfect
prediction.

Custom Mean Squared Error Metric
--------------------------------

Next, let’s show examples of how to convert regression metrics into
Scorers.

First we generate random ground truth labels and their predictions,
however this time they are floats instead of integers.

.. code:: python

    y_true = np.random.rand(10)
    y_pred = np.random.rand(10)
    
    print(f'y_true: {y_true}')
    print(f'y_pred: {y_pred}')


.. parsed-literal::
    :class: output

    y_true: [0.02699553 0.64990002 0.42448745 0.36962433 0.28254704 0.15919657
     0.96311214 0.11365288 0.46360337 0.77840413]
    y_pred: [0.38242437 0.69829472 0.22758035 0.31218035 0.20107508 0.20939278
     0.0752841  0.63571492 0.93852683 0.2689341 ]


A common regression metric is Mean Squared Error:

.. code:: python

    sklearn.metrics.mean_squared_error(y_true, y_pred)


.. parsed-literal::
    :class: output

    0.17258006743548032


.. code:: python

    ag_mean_squared_error_scorer = make_scorer(name='mean_squared_error',
                                               score_func=sklearn.metrics.mean_squared_error,
                                               optimum=0,
                                               greater_is_better=False)

In this case, ``optimum=0`` because this is an error metric.

Additionally, ``greater_is_better=False`` because sklearn reports error
as positive values, and the lower the value is, the better.

A very important point about AutoGluon Scorers is that internally, they
will always report scores in ``greater_is_better=True`` form. This means
if the original metric was ``greater_is_better=False``, AutoGluon’s
Scorer will flip the value. Therefore, ``score`` will be represented as
a negative value.

This is done to ensure consistency between different metrics.

.. code:: python

    # score
    ag_mean_squared_error_scorer(y_true, y_pred)


.. parsed-literal::
    :class: output

    -0.17258006743548032


.. code:: python

    # error, error=sign*optimum-score -> error=-1*0-score -> error=-score
    ag_mean_squared_error_scorer.error(y_true, y_pred)


.. parsed-literal::
    :class: output

    0.17258006743548032


We can also specify metrics outside of sklearn. For example, below is a
minimal implementation of mean squared error:

.. code:: python

    def mse_func(y_true: np.ndarray, y_pred: np.ndarray) -> float:
        return ((y_true - y_pred) ** 2).mean()
    
    mse_func(y_true, y_pred)


.. parsed-literal::
    :class: output

    0.17258006743548032


All that is required is that the function take two arguments:
``y_true``, and ``y_pred`` (or ``y_pred_proba``), as numpy arrays, and
return a float value.

With the same code as before, we can create an AutoGluon Scorer.

.. code:: python

    ag_mean_squared_error_custom_scorer = make_scorer(name='mean_squared_error',
                                                      score_func=mse_func,
                                                      optimum=0,
                                                      greater_is_better=False)
    ag_mean_squared_error_custom_scorer(y_true, y_pred)


.. parsed-literal::
    :class: output

    -0.17258006743548032


Custom ROC AUC Metric
---------------------

Here we show an example of a thresholding metric, ``roc_auc``. A
thresholding metric cares about the relative ordering of predictions,
but not their absolute values.

.. code:: python

    y_true = np.random.randint(low=0, high=2, size=10)
    y_pred_proba = np.random.rand(10)
    
    print(f'y_true:       {y_true}')
    print(f'y_pred_proba: {y_pred_proba}')


.. parsed-literal::
    :class: output

    y_true:       [1 1 1 0 0 0 1 1 0 0]
    y_pred_proba: [0.06899752 0.3314323  0.40070054 0.34922714 0.01315847 0.35570796
     0.22464644 0.4920395  0.46101915 0.31727665]


.. code:: python

    sklearn.metrics.roc_auc_score(y_true, y_pred_proba)


.. parsed-literal::
    :class: output

    0.52


We will need to specify ``needs_threshold=True`` in order for downstream
models to properly use the metric.

.. code:: python

    # Score functions that need decision values
    ag_roc_auc_scorer = make_scorer(name='roc_auc',
                                    score_func=sklearn.metrics.roc_auc_score,
                                    optimum=1,
                                    greater_is_better=True,
                                    needs_threshold=True)
    ag_roc_auc_scorer(y_true, y_pred_proba)


.. parsed-literal::
    :class: output

    0.52


Using Custom Metrics in TabularPredictor
----------------------------------------

Now that we have created several custom Scorers, let’s use them for
training and evaluating models.

For this tutorial, we will be using the Adult Income dataset.

.. code:: python

    from autogluon.tabular import TabularDataset
    
    train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')  # can be local CSV file as well, returns Pandas DataFrame
    test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')  # another Pandas DataFrame
    label = 'class'  # specifies which column do we want to predict
    train_data = train_data.sample(n=1000, random_state=0)  # subsample for faster demo
    
    train_data.head(5)


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>age</th>
          <th>workclass</th>
          <th>fnlwgt</th>
          <th>education</th>
          <th>education-num</th>
          <th>marital-status</th>
          <th>occupation</th>
          <th>relationship</th>
          <th>race</th>
          <th>sex</th>
          <th>capital-gain</th>
          <th>capital-loss</th>
          <th>hours-per-week</th>
          <th>native-country</th>
          <th>class</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>6118</th>
          <td>51</td>
          <td>Private</td>
          <td>39264</td>
          <td>Some-college</td>
          <td>10</td>
          <td>Married-civ-spouse</td>
          <td>Exec-managerial</td>
          <td>Wife</td>
          <td>White</td>
          <td>Female</td>
          <td>0</td>
          <td>0</td>
          <td>40</td>
          <td>United-States</td>
          <td>&gt;50K</td>
        </tr>
        <tr>
          <th>23204</th>
          <td>58</td>
          <td>Private</td>
          <td>51662</td>
          <td>10th</td>
          <td>6</td>
          <td>Married-civ-spouse</td>
          <td>Other-service</td>
          <td>Wife</td>
          <td>White</td>
          <td>Female</td>
          <td>0</td>
          <td>0</td>
          <td>8</td>
          <td>United-States</td>
          <td>&lt;=50K</td>
        </tr>
        <tr>
          <th>29590</th>
          <td>40</td>
          <td>Private</td>
          <td>326310</td>
          <td>Some-college</td>
          <td>10</td>
          <td>Married-civ-spouse</td>
          <td>Craft-repair</td>
          <td>Husband</td>
          <td>White</td>
          <td>Male</td>
          <td>0</td>
          <td>0</td>
          <td>44</td>
          <td>United-States</td>
          <td>&lt;=50K</td>
        </tr>
        <tr>
          <th>18116</th>
          <td>37</td>
          <td>Private</td>
          <td>222450</td>
          <td>HS-grad</td>
          <td>9</td>
          <td>Never-married</td>
          <td>Sales</td>
          <td>Not-in-family</td>
          <td>White</td>
          <td>Male</td>
          <td>0</td>
          <td>2339</td>
          <td>40</td>
          <td>El-Salvador</td>
          <td>&lt;=50K</td>
        </tr>
        <tr>
          <th>33964</th>
          <td>62</td>
          <td>Private</td>
          <td>109190</td>
          <td>Bachelors</td>
          <td>13</td>
          <td>Married-civ-spouse</td>
          <td>Exec-managerial</td>
          <td>Husband</td>
          <td>White</td>
          <td>Male</td>
          <td>15024</td>
          <td>0</td>
          <td>40</td>
          <td>United-States</td>
          <td>&gt;50K</td>
        </tr>
      </tbody>
    </table>
    </div>


.. code:: python

    from autogluon.tabular import TabularPredictor
    
    predictor = TabularPredictor(label=label).fit(train_data, hyperparameters='toy')
    
    predictor.leaderboard(test_data, silent=True)


.. parsed-literal::
    :class: output

    No path specified. Models will be saved in: "AutogluonModels/ag-20221117_031844/"
    Beginning AutoGluon training ...
    AutoGluon will save models to "AutogluonModels/ag-20221117_031844/"
    AutoGluon Version:  0.6.0b20221117
    Python Version:     3.8.10
    Operating System:   Linux
    Platform Machine:   x86_64
    Platform Version:   #1 SMP Tue Nov 30 00:17:50 UTC 2021
    Train Data Rows:    1000
    Train Data Columns: 14
    Label Column: class
    Preprocessing data ...
    AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    	2 unique label values:  [' >50K', ' <=50K']
    	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
    	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
    	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
    Using Feature Generators to preprocess the data ...
    Fitting AutoMLPipelineFeatureGenerator...
    	Available Memory:                    31608.91 MB
    	Train Data (Original)  Memory Usage: 0.59 MB (0.0% of available memory)
    	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
    	Stage 1 Generators:
    		Fitting AsTypeFeatureGenerator...
    			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
    	Stage 2 Generators:
    		Fitting FillNaFeatureGenerator...
    	Stage 3 Generators:
    		Fitting IdentityFeatureGenerator...
    		Fitting CategoryFeatureGenerator...
    			Fitting CategoryMemoryMinimizeFeatureGenerator...
    	Stage 4 Generators:
    		Fitting DropUniqueFeatureGenerator...
    	Types of features in original data (raw dtype, special dtypes):
    		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
    		('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
    	Types of features in processed data (raw dtype, special dtypes):
    		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
    		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
    		('int', ['bool']) : 1 | ['sex']
    	0.1s = Fit runtime
    	14 features in original data used to generate 14 features in processed data.
    	Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
    Data preprocessing and feature engineering runtime = 0.09s ...
    AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
    	To change this, specify the eval_metric parameter of Predictor()
    Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 800, Val Rows: 200
    Fitting 4 L1 models ...
    Fitting model: LightGBM ...
    	0.77	 = Validation score   (accuracy)
    	1.13s	 = Training   runtime
    	0.01s	 = Validation runtime
    Fitting model: CatBoost ...
    	0.86	 = Validation score   (accuracy)
    	0.71s	 = Training   runtime
    	0.0s	 = Validation runtime
    Fitting model: XGBoost ...
    	0.84	 = Validation score   (accuracy)
    	1.47s	 = Training   runtime
    	0.01s	 = Validation runtime
    Fitting model: NeuralNetTorch ...
    	0.83	 = Validation score   (accuracy)
    	0.97s	 = Training   runtime
    	0.01s	 = Validation runtime
    Fitting model: WeightedEnsemble_L2 ...
    	0.88	 = Validation score   (accuracy)
    	0.13s	 = Training   runtime
    	0.0s	 = Validation runtime
    AutoGluon training complete, total runtime = 4.57s ... Best model: "WeightedEnsemble_L2"
    TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20221117_031844/")


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>model</th>
          <th>score_test</th>
          <th>score_val</th>
          <th>pred_time_test</th>
          <th>pred_time_val</th>
          <th>fit_time</th>
          <th>pred_time_test_marginal</th>
          <th>pred_time_val_marginal</th>
          <th>fit_time_marginal</th>
          <th>stack_level</th>
          <th>can_infer</th>
          <th>fit_order</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>WeightedEnsemble_L2</td>
          <td>0.852493</td>
          <td>0.88</td>
          <td>0.260513</td>
          <td>0.025901</td>
          <td>3.272966</td>
          <td>0.003017</td>
          <td>0.000743</td>
          <td>0.132241</td>
          <td>2</td>
          <td>True</td>
          <td>5</td>
        </tr>
        <tr>
          <th>1</th>
          <td>XGBoost</td>
          <td>0.847784</td>
          <td>0.84</td>
          <td>0.022974</td>
          <td>0.006565</td>
          <td>1.466220</td>
          <td>0.022974</td>
          <td>0.006565</td>
          <td>1.466220</td>
          <td>1</td>
          <td>True</td>
          <td>3</td>
        </tr>
        <tr>
          <th>2</th>
          <td>CatBoost</td>
          <td>0.844406</td>
          <td>0.86</td>
          <td>0.011331</td>
          <td>0.004841</td>
          <td>0.706402</td>
          <td>0.011331</td>
          <td>0.004841</td>
          <td>0.706402</td>
          <td>1</td>
          <td>True</td>
          <td>2</td>
        </tr>
        <tr>
          <th>3</th>
          <td>NeuralNetTorch</td>
          <td>0.829461</td>
          <td>0.83</td>
          <td>0.223191</td>
          <td>0.013753</td>
          <td>0.968103</td>
          <td>0.223191</td>
          <td>0.013753</td>
          <td>0.968103</td>
          <td>1</td>
          <td>True</td>
          <td>4</td>
        </tr>
        <tr>
          <th>4</th>
          <td>LightGBM</td>
          <td>0.780940</td>
          <td>0.77</td>
          <td>0.008924</td>
          <td>0.005397</td>
          <td>1.130286</td>
          <td>0.008924</td>
          <td>0.005397</td>
          <td>1.130286</td>
          <td>1</td>
          <td>True</td>
          <td>1</td>
        </tr>
      </tbody>
    </table>
    </div>


We can pass our custom metrics into ``predictor.leaderboard`` via the
``extra_metrics`` argument:

.. code:: python

    predictor.leaderboard(test_data, extra_metrics=[ag_roc_auc_scorer, ag_accuracy_scorer], silent=True)


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>model</th>
          <th>score_test</th>
          <th>roc_auc</th>
          <th>accuracy</th>
          <th>score_val</th>
          <th>pred_time_test</th>
          <th>pred_time_val</th>
          <th>fit_time</th>
          <th>pred_time_test_marginal</th>
          <th>pred_time_val_marginal</th>
          <th>fit_time_marginal</th>
          <th>stack_level</th>
          <th>can_infer</th>
          <th>fit_order</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>WeightedEnsemble_L2</td>
          <td>0.852493</td>
          <td>0.901063</td>
          <td>0.852493</td>
          <td>0.88</td>
          <td>0.174045</td>
          <td>0.025901</td>
          <td>3.272966</td>
          <td>0.002734</td>
          <td>0.000743</td>
          <td>0.132241</td>
          <td>2</td>
          <td>True</td>
          <td>5</td>
        </tr>
        <tr>
          <th>1</th>
          <td>XGBoost</td>
          <td>0.847784</td>
          <td>0.894112</td>
          <td>0.847784</td>
          <td>0.84</td>
          <td>0.022521</td>
          <td>0.006565</td>
          <td>1.466220</td>
          <td>0.022521</td>
          <td>0.006565</td>
          <td>1.466220</td>
          <td>1</td>
          <td>True</td>
          <td>3</td>
        </tr>
        <tr>
          <th>2</th>
          <td>CatBoost</td>
          <td>0.844406</td>
          <td>0.863760</td>
          <td>0.844406</td>
          <td>0.86</td>
          <td>0.010578</td>
          <td>0.004841</td>
          <td>0.706402</td>
          <td>0.010578</td>
          <td>0.004841</td>
          <td>0.706402</td>
          <td>1</td>
          <td>True</td>
          <td>2</td>
        </tr>
        <tr>
          <th>3</th>
          <td>NeuralNetTorch</td>
          <td>0.829461</td>
          <td>0.885435</td>
          <td>0.829461</td>
          <td>0.83</td>
          <td>0.138213</td>
          <td>0.013753</td>
          <td>0.968103</td>
          <td>0.138213</td>
          <td>0.013753</td>
          <td>0.968103</td>
          <td>1</td>
          <td>True</td>
          <td>4</td>
        </tr>
        <tr>
          <th>4</th>
          <td>LightGBM</td>
          <td>0.780940</td>
          <td>0.861131</td>
          <td>0.780940</td>
          <td>0.77</td>
          <td>0.006706</td>
          <td>0.005397</td>
          <td>1.130286</td>
          <td>0.006706</td>
          <td>0.005397</td>
          <td>1.130286</td>
          <td>1</td>
          <td>True</td>
          <td>1</td>
        </tr>
      </tbody>
    </table>
    </div>


We can also pass our custom metric into the Predictor itself by
specifying it during initialization via the ``eval_metric`` parameter:

.. code:: python

    predictor_custom = TabularPredictor(label=label, eval_metric=ag_roc_auc_scorer).fit(train_data, hyperparameters='toy')
    
    predictor_custom.leaderboard(test_data, silent=True)


.. parsed-literal::
    :class: output

    No path specified. Models will be saved in: "AutogluonModels/ag-20221117_031849/"
    Beginning AutoGluon training ...
    AutoGluon will save models to "AutogluonModels/ag-20221117_031849/"
    AutoGluon Version:  0.6.0b20221117
    Python Version:     3.8.10
    Operating System:   Linux
    Platform Machine:   x86_64
    Platform Version:   #1 SMP Tue Nov 30 00:17:50 UTC 2021
    Train Data Rows:    1000
    Train Data Columns: 14
    Label Column: class
    Preprocessing data ...
    AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
    	2 unique label values:  [' >50K', ' <=50K']
    	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
    	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
    	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
    Using Feature Generators to preprocess the data ...
    Fitting AutoMLPipelineFeatureGenerator...
    	Available Memory:                    31447.33 MB
    	Train Data (Original)  Memory Usage: 0.59 MB (0.0% of available memory)
    	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
    	Stage 1 Generators:
    		Fitting AsTypeFeatureGenerator...
    			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
    	Stage 2 Generators:
    		Fitting FillNaFeatureGenerator...
    	Stage 3 Generators:
    		Fitting IdentityFeatureGenerator...
    		Fitting CategoryFeatureGenerator...
    			Fitting CategoryMemoryMinimizeFeatureGenerator...
    	Stage 4 Generators:
    		Fitting DropUniqueFeatureGenerator...
    	Types of features in original data (raw dtype, special dtypes):
    		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
    		('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
    	Types of features in processed data (raw dtype, special dtypes):
    		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
    		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
    		('int', ['bool']) : 1 | ['sex']
    	0.1s = Fit runtime
    	14 features in original data used to generate 14 features in processed data.
    	Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
    Data preprocessing and feature engineering runtime = 0.08s ...
    AutoGluon will gauge predictive performance using evaluation metric: 'roc_auc'
    	This metric expects predicted probabilities rather than predicted class labels, so you'll need to use predict_proba() instead of predict()
    	To change this, specify the eval_metric parameter of Predictor()
    Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 800, Val Rows: 200
    Fitting 4 L1 models ...
    Fitting model: LightGBM ...
    	0.85	 = Validation score   (roc_auc)
    	0.11s	 = Training   runtime
    	0.01s	 = Validation runtime
    Fitting model: CatBoost ...
    	0.8693	 = Validation score   (roc_auc)
    	0.03s	 = Training   runtime
    	0.01s	 = Validation runtime
    Fitting model: XGBoost ...
    	0.8585	 = Validation score   (roc_auc)
    	0.05s	 = Training   runtime
    	0.01s	 = Validation runtime
    Fitting model: NeuralNetTorch ...
    	0.8504	 = Validation score   (roc_auc)
    	0.46s	 = Training   runtime
    	0.01s	 = Validation runtime
    Fitting model: WeightedEnsemble_L2 ...
    	0.8753	 = Validation score   (roc_auc)
    	0.41s	 = Training   runtime
    	0.0s	 = Validation runtime
    AutoGluon training complete, total runtime = 1.22s ... Best model: "WeightedEnsemble_L2"
    TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20221117_031849/")


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>model</th>
          <th>score_test</th>
          <th>score_val</th>
          <th>pred_time_test</th>
          <th>pred_time_val</th>
          <th>fit_time</th>
          <th>pred_time_test_marginal</th>
          <th>pred_time_val_marginal</th>
          <th>fit_time_marginal</th>
          <th>stack_level</th>
          <th>can_infer</th>
          <th>fit_order</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>WeightedEnsemble_L2</td>
          <td>0.897780</td>
          <td>0.875313</td>
          <td>0.047681</td>
          <td>0.018646</td>
          <td>0.603993</td>
          <td>0.002764</td>
          <td>0.001121</td>
          <td>0.412235</td>
          <td>2</td>
          <td>True</td>
          <td>5</td>
        </tr>
        <tr>
          <th>1</th>
          <td>XGBoost</td>
          <td>0.894331</td>
          <td>0.858534</td>
          <td>0.022444</td>
          <td>0.006461</td>
          <td>0.047564</td>
          <td>0.022444</td>
          <td>0.006461</td>
          <td>0.047564</td>
          <td>1</td>
          <td>True</td>
          <td>3</td>
        </tr>
        <tr>
          <th>2</th>
          <td>CatBoost</td>
          <td>0.887425</td>
          <td>0.869325</td>
          <td>0.010631</td>
          <td>0.005301</td>
          <td>0.031831</td>
          <td>0.010631</td>
          <td>0.005301</td>
          <td>0.031831</td>
          <td>1</td>
          <td>True</td>
          <td>2</td>
        </tr>
        <tr>
          <th>3</th>
          <td>NeuralNetTorch</td>
          <td>0.884256</td>
          <td>0.850375</td>
          <td>0.137678</td>
          <td>0.013720</td>
          <td>0.460357</td>
          <td>0.137678</td>
          <td>0.013720</td>
          <td>0.460357</td>
          <td>1</td>
          <td>True</td>
          <td>4</td>
        </tr>
        <tr>
          <th>4</th>
          <td>LightGBM</td>
          <td>0.870968</td>
          <td>0.849980</td>
          <td>0.011843</td>
          <td>0.005764</td>
          <td>0.112363</td>
          <td>0.011843</td>
          <td>0.005764</td>
          <td>0.112363</td>
          <td>1</td>
          <td>True</td>
          <td>1</td>
        </tr>
      </tbody>
    </table>
    </div>


That’s all it takes to create and use custom metrics in AutoGluon!

If you create a custom metric, consider `submitting a
PR <https://github.com/awslabs/autogluon/pulls>`__ so that we can add it
officially to AutoGluon!

For a tutorial on implementing custom models in AutoGluon, refer to
:ref:`sec_tabularcustommodel`.

For more tutorials, refer to :ref:`sec_tabularquick` and
:ref:`sec_tabularadvanced`.