TabularPredictor.evaluate_predictions¶

TabularPredictor.evaluate_predictions(y_true, y_pred, sample_weight=None, decision_threshold=None, display: bool = False, auxiliary_metrics=True, detailed_report=False, **kwargs) → dict[source]¶

Evaluate the provided prediction probabilities against ground truth labels. Evaluation is based on the eval_metric previously specified in init, or default metrics if none was specified.

Parameters:

y_true (np.array or pd.Series) – The ordered collection of ground-truth labels.
y_pred (pd.Series or pd.DataFrame) – The ordered collection of prediction probabilities or predictions. Obtainable via the output of predictor.predict_proba. Caution: For certain types of eval_metric (such as ‘roc_auc’), y_pred must be predicted-probabilities rather than predicted labels.
sample_weight (pd.Series, default = None) – Sample weight for each row of data. If None, uniform sample weights are used.
decision_threshold (float, default = None) – The decision threshold to use when converting prediction probabilities to predictions. This will impact the scores of metrics such as f1 and accuracy. If None, defaults to predictor.decision_threshold. Ignored unless problem_type=’binary’. Refer to the predictor.decision_threshold docstring for more information.
display (bool, default = False) – If True, performance results are printed.
auxiliary_metrics (bool, default = True) – Should we compute other (problem_type specific) metrics in addition to the default metric?
detailed_report (bool, default = False) – Should we computed more detailed versions of the auxiliary_metrics? (requires auxiliary_metrics = True)

Returns:

Returns dict where keys = metrics, values = performance along each metric.
NOTE (Metrics scores always show in higher is better form.)
This means that metrics such as log_loss and root_mean_squared_error will have their signs FLIPPED, and values will be negative.