Components: shift

autogluon.eda.visualization.shift

XShiftSummary

Summarize the results of the XShiftDetector.

XShiftSummary

class autogluon.eda.visualization.shift.XShiftSummary(headers: bool = False, namespace: Optional[str] = None, **kwargs)[source]

Summarize the results of the XShiftDetector. It will render the results as markdown in jupyter. This will contain the detection status (True if detected), the details of the hypothesis test (test statistic, pvalue), and the feature importances for the detection.

autogluon.eda.analysis.shift

XShiftDetector

Detect a change in covariate (X) distribution between training and test, which we call XShift.

XShiftDetector

class autogluon.eda.analysis.shift.XShiftDetector(classifier_class: Any = <class 'autogluon.tabular.predictor.predictor.TabularPredictor'>, compute_fi: bool = True, pvalue_thresh: float = 0.01, eval_metric: str = 'roc_auc', sample_label: str = 'i2vkyc0p64', classifier_kwargs: Optional[dict] = None, classifier_fit_kwargs: Optional[dict] = None, num_permutations: int = 1000, test_size_2st: float = 0.3, parent: Union[None, autogluon.eda.analysis.base.AbstractAnalysis] = None, children: Optional[List[autogluon.eda.analysis.base.AbstractAnalysis]] = None, **kwargs)[source]

Detect a change in covariate (X) distribution between training and test, which we call XShift. It can tell you if your training set is not representative of your test set distribution. This is done with a Classifier 2 Sample Test.

Parameters
classifier_classan AutoGluon predictor, such as autogluon.tabular.TabularPredictor (default)

The predictor that will be fit on training set and predict the test set

compute_fibool, default = True

To compute the feature importances set to True, this can be computationally intensive

pvalue_threshfloat, default = 0.01

The threshold for the pvalue

eval_metricstr, default = ‘balanced_accuracy’

The metric used for the C2ST, it must be one of the binary metrics from autogluon.core.metrics

sample_labelstr, default = ‘i2vkyc0p64’

The label internally used for the classifier 2 sample test, the only reason to change it is in the off chance that the default value is a column in the data.

classifier_kwargsdict, default = {}

The kwargs passed to the classifier, a member of classifier_class

classifier_fit_kwargsdict, default = {}

The kwargs passed to the classifier’s fit call, a member of classifier_class

num_permutations: int, default = 1000

The number of permutations used for any permutation based method

test_size_2st: float, default = 0.3

The size of the test set in the training test split in 2ST