Components: shift#

autogluon.eda.visualization.shift#

XShiftSummary

Summarize the results of the XShiftDetector.

XShiftSummary#

class autogluon.eda.visualization.shift.XShiftSummary(headers: bool = False, namespace: Optional[str] = None, **kwargs)[source]#: Summarize the results of the XShiftDetector. It will render the results as markdown in jupyter. This will contain the detection status (True if detected), the details of the hypothesis test (test statistic, pvalue), and the feature importances for the detection.

autogluon.eda.analysis.shift#

XShiftDetector

Detect a change in covariate (X) distribution between training and test, which we call XShift.

XShiftDetector#

class autogluon.eda.analysis.shift.XShiftDetector(classifier_class: ~typing.Any = <class 'autogluon.tabular.predictor.predictor.TabularPredictor'>, compute_fi: bool = True, pvalue_thresh: float = 0.01, eval_metric: str = 'roc_auc', sample_label: str = '__label__', classifier_kwargs: ~typing.Optional[dict] = None, classifier_fit_kwargs: ~typing.Optional[dict] = None, num_permutations: int = 1000, test_size_2st: float = 0.3, parent: ~typing.Optional[~autogluon.eda.analysis.base.AbstractAnalysis] = None, children: ~typing.Optional[~typing.List[~autogluon.eda.analysis.base.AbstractAnalysis]] = None, **kwargs)[source]#

Detect a change in covariate (X) distribution between training and test, which we call XShift. It can tell you if your training set is not representative of your test set distribution. This is done with a Classifier 2 Sample Test.

State attributes

xshift_results.detection_status:
bool, True if detected
xshift_results.test_statistic: float
Classifier Two-Sample Test (C2ST) statistic. It is a measure how well a classifier distinguishes between the samples from the training and test sets. If the classifier can accurately separate the samples, it suggests that the input distributions differ significantly, indicating the presence of covariate shift. A C2ST value close to 0.5 implies that the classifier struggles to differentiate between the sets, indicating minimal covariate shift. In contrast, a value significantly different from 0.5 suggests the presence of covariate shift, warranting further investigation and potential adjustments to the model or data preprocessing.
xshift_results.pvalue: float
p-value using permutation test
xshift_results.pvalue_threshold: float,
decision boundary of p-value threshold
xshift_results.feature_importance: DataFrame,
the feature importance dataframe, if computed
xshift_results.shift_features
list of features whose contribution is statistically significant; only present if xshift_results.detection_status = True

Parameters

classifier_class (an AutoGluon predictor, such as autogluon.tabular.TabularPredictor (default)) – The predictor that will be fit on training set and predict the test set
compute_fi (bool, default = True) – To compute the feature importances set to True, this can be computationally intensive
pvalue_thresh (float, default = 0.01) – The threshold for the pvalue
eval_metric (str, default = 'balanced_accuracy') – The metric used for the C2ST, it must be one of the binary metrics from autogluon.core.metrics
sample_label (str, default = '__label__') – The label internally used for the classifier 2 sample test, the only reason to change it is in the off chance that the default value is a column in the data.
classifier_kwargs (dict, default = {}) – The kwargs passed to the classifier, a member of classifier_class
classifier_fit_kwargs (dict, default = {}) – The kwargs passed to the classifier’s fit call, a member of classifier_class
num_permutations (int, default = 1000) – The number of permutations used for any permutation based method
test_size_2st (float, default = 0.3) – The size of the test set in the training test split in 2ST