Components: interaction#

autogluon.eda.visualization.interaction#

`CorrelationVisualization`	Display feature correlations matrix.
`CorrelationSignificanceVisualization`	Display feature correlations significance matrix.
`FeatureInteractionVisualization`	Feature interaction visualization.
`FeatureDistanceAnalysisVisualization`	Feature distance visualization.
`PDPInteractions`	Display Partial Dependence Plots (PDP) with Individual Conditional Expectation (ICE)

CorrelationVisualization#

class autogluon.eda.visualization.interaction.CorrelationVisualization(headers: bool = False, namespace: Optional[str] = None, fig_args: Optional[Dict[str, Any]] = None, **kwargs)[source]#

Display feature correlations matrix.

This report renders correlations between variable in a form of heatmap. The details of the report to be rendered depend on the configuration of Correlation

Parameters

headers (bool, default = False) – if True then render headers
namespace (Optional[str], default = None) – namespace to use; can be nested like ns_a.ns_b.ns_c
fig_args (Optional[Dict[str, Any]], default = None,) – kwargs to pass into chart figure

See also

Correlation

CorrelationSignificanceVisualization#

class autogluon.eda.visualization.interaction.CorrelationSignificanceVisualization(headers: bool = False, namespace: Optional[str] = None, fig_args: Optional[Dict[str, Any]] = None, **kwargs)[source]#

Display feature correlations significance matrix.

This report renders correlations significance matrix in a form of heatmap. The details of the report to be rendered depend on the configuration of Correlation and CorrelationSignificance analyses.

Parameters

headers (bool, default = False) – if True then render headers
namespace (Optional[str], default = None) – namespace to use; can be nested like ns_a.ns_b.ns_c
fig_args (Optional[Dict[str, Any]] = None,) – kwargs to pass into chart figure

FeatureInteractionVisualization#

class autogluon.eda.visualization.interaction.FeatureInteractionVisualization(key: str, numeric_as_categorical_threshold: int = 20, max_categories_to_consider_render: int = 30, headers: bool = False, namespace: Optional[str] = None, fig_args: Optional[Dict[str, Any]] = None, **kwargs)[source]#

Feature interaction visualization.

This report renders feature interaction analysis results. The details of the report to be rendered depend on the variable types combination in x/y/hue. key is used to link analysis and visualization - this allows to have multiple analyses/visualizations in one composite analysis.

Parameters

key (str) – key used to store the analysis in the state; the value is placed in the state by FeatureInteraction. If the key is not provided, then use one of theform: ‘x:A|y:B|hue:C’ (omit corresponding x/y/hue if the value not provided) See also autogluon.eda.analysis.interaction.FeatureInteraction
numeric_as_categorical_threshold –
headers (bool, default = False) – if True then render headers
namespace (Optional[str], default = None) – namespace to use; can be nested like ns_a.ns_b.ns_c
fig_args (Optional[Dict[str, Any]] = None,) – kwargs to pass into chart figure
kwargs – parameters to pass as a chart args

See also

FeatureInteraction

FeatureDistanceAnalysisVisualization#

class autogluon.eda.visualization.interaction.FeatureDistanceAnalysisVisualization(namespace: Optional[str] = None, fig_args: Optional[Dict[str, Any]] = None, **kwargs)[source]#

Feature distance visualization.

This component renders graphical representations of distances between features to highlight features that can be either simplified or completely removed.

Parameters

headers (bool, default = False) – if True then render headers
namespace (Optional[str], default = None) – namespace to use; can be nested like ns_a.ns_b.ns_c
fig_args (Optional[Dict[str, Any]] = None,) – kwargs to pass into chart figure
kwargs –

PDPInteractions#

class autogluon.eda.visualization.interaction.PDPInteractions(features: Union[str, List[str]], two_way: bool = False, target: Optional[Any] = None, namespace: Optional[str] = None, fig_args: Optional[Dict[str, Dict[str, Any]]] = None, sample: Optional[Union[int, float]] = 300, headers: bool = False, **kwargs)[source]#

Display Partial Dependence Plots (PDP) with Individual Conditional Expectation (ICE)

The visualizations have two modes: - regular PDP + ICE plots - this is the default mode of operation - two-way PDP plots - this mode can be selected via passing two features and setting two_way = True

ICE plots complement PDP by showing the relationship between a feature and the model’s output for each individual instance in the dataset. ICE lines (blue) can be overlaid on PDPs (red) to provide a more detailed view of how the model behaves for specific instances.

Parameters

features (Union[str, List[str]]) – feature to display on the plots
two_way (bool, default = False) – render two-way PDP; this mode works only when two features are specified
target (Optional[Any], default = None) – In a multiclass setting, specifies the class for which the PDPs should be computed. Ignored in binary classification or classical regression settings
namespace (Optional[str], default = None) – namespace to use; can be nested like ns_a.ns_b.ns_c
fig_args (Optional[Dict[str, Any]] = None,) – kwargs to pass into chart figure
headers (bool, default = False) – if True then render headers
sample (Union[None, int, float], default = None) – sample size; if int, then row number is used; float must be between 0.0 and 1.0 and represents fraction of dataset to sample; None means no sampling See also autogluon.eda.analysis.dataset.Sampler()
kwargs –

autogluon.eda.analysis.interaction#

`Correlation`	Correlation analysis.
`CorrelationSignificance`	Significance of correlation of all variable combinations in the DataFrame.
`FeatureInteraction`	Feature interaction analysis
`DistributionFit`	This component attempts to fit various distributions for further plotting via `FeatureInteractionVisualization`.

Correlation#

class autogluon.eda.analysis.interaction.Correlation(method: str = 'spearman', focus_field: Optional[str] = None, focus_field_threshold: float = 0.5, parent: Optional[AbstractAnalysis] = None, children: Optional[List[AbstractAnalysis]] = None, **kwargs)[source]#

Correlation analysis.

Note: it is recommended to apply AutoGluon standard pre-processing - this will allow to include categorical variables into the analysis. This can be done via wrapping analysis into ApplyFeatureGenerator

Parameters

method (str {'pearson', 'kendall', 'spearman', 'phik'}, default='spearman') –
Method of correlation:
- pearson : standard correlation coefficient
- kendall : Kendall Tau correlation coefficient
- spearman : Spearman rank correlation
- phikphi_k correlation
  Correlation matrix of bivariate gaussian derived from chi2-value Chi2-value gets converted into correlation coefficient of bivariate gauss with correlation value rho, assuming given binning and number of records. Correlation coefficient value is between 0 and 1. Bivariate gaussian’s range is set to [-5,5] by construction. See Also phik documentation.
focus_field (Optional[str], default = None) – field name to focus. Specifying a field would filter all correlations only when they are >= focus_field_threshold This is helpful when dealing with a large number of variables.
focus_field_threshold (float, default = 0.5) – a cut-off threshold when focus_field is specified
parent (Optional[AbstractAnalysis], default = None) – parent Analysis
children (List[AbstractAnalysis], default = []) – wrapped analyses; these will receive sampled args during fit call

Examples

>>> import autogluon.eda.analysis as eda
>>> import autogluon.eda.visualization as viz
>>> import autogluon.eda.auto as auto
>>> import pandas as pd
>>> import numpy as np
>>> df_train = pd.DataFrame(...)
>>>
>>> auto.analyze(return_sttrain_data=df_train, label=target_col, anlz_facets=[
>>>     # Apply standard AutoGluon pre-processing to transform categorical variables to numbers to ensure correlation includes them.
>>>     eda.transform.ApplyFeatureGenerator(category_to_numbers=True, children=[
>>>         # We use `spearman` correlation to capture non-linear interactions because it is based on the order rank.
>>>         eda.interaction.Correlation(method='spearman', focus_field=target_col, focus_field_threshold=0.3),
>>>     ])
>>> ], viz_facets=[
>>>     viz.interaction.CorrelationVisualization(fig_args=dict(figsize=(12,8)), **common_args),
>>> ])

See also

ApplyFeatureGenerator

CorrelationSignificance#

class autogluon.eda.analysis.interaction.CorrelationSignificance(parent: Optional[AbstractAnalysis] = None, children: Optional[List[AbstractAnalysis]] = None, state: Optional[AnalysisState] = None, **kwargs)[source]#

Significance of correlation of all variable combinations in the DataFrame.

See significance_matrix() for more details. This analysis requires Correlation results to be available in the state.

Examples

>>> import autogluon.eda.analysis as eda
>>> import autogluon.eda.visualization as viz
>>> import autogluon.eda.auto as auto
>>> import pandas as pd
>>> df_train = pd.DataFrame(...)
>>>
>>> auto.analyze(return_sttrain_data=df_train, label=target_col, anlz_facets=[
>>>     # Apply standard AutoGluon pre-processing to transform categorical variables to numbers to ensure correlation includes them.
>>>     eda.transform.ApplyFeatureGenerator(category_to_numbers=True, children=[
>>>         # We use `spearman` correlation to capture non-linear interactions because it is based on the order rank.
>>>         eda.interaction.Correlation(method='spearman', focus_field=target_col, focus_field_threshold=0.3),
>>>         eda.interaction.CorrelationSignificance()
>>>     ])
>>> ], viz_facets=[
>>>     viz.interaction.CorrelationSignificanceVisualization(fig_args=dict(figsize=(12,8))),
>>> ])

See also

significance_matrix(), Correlation, ApplyFeatureGenerator

FeatureInteraction#

class autogluon.eda.analysis.interaction.FeatureInteraction(x: Optional[str] = None, y: Optional[str] = None, hue: Optional[str] = None, key: Optional[str] = None, parent: Optional[AbstractAnalysis] = None, children: Optional[List[AbstractAnalysis]] = None, **kwargs)[source]#

Feature interaction analysis

Parameters

x (Optional[str], default = None) – variable to analyse which would be placed on x-axis
y (Optional[str], default = None) – variable to analyse which would be placed on y-axis
hue (Optional[str], default = None) – variable to use as hue in x/y-analysis.
key (Optional[str], default = None) – key to use to store the analysis in the state; the value is later to be used by FeatureInteractionVisualization. If the key is not provided, then use one of theform: ‘x:A|y:B|hue:C’ (omit corresponding x/y/hue if the value not provided) See also FeatureInteractionVisualization
parent (Optional[AbstractAnalysis], default = None) – parent Analysis
children (Optional[List[AbstractAnalysis]], default None) – wrapped analyses; these will receive sampled args during fit call
kwargs –

Examples

>>> import pandas as pd
>>> import numpy as np
>>> import autogluon.eda.analysis as eda
>>> import autogluon.eda.visualization as viz
>>> import autogluon.eda.auto as auto
>>>
>>> df_train = pd.DataFrame(...)
>>>
>>> state = auto.analyze(
>>>     train_data=df_train, label='Survived',
>>>     anlz_facets=[
>>>         eda.dataset.RawTypesAnalysis(),
>>>         eda.interaction.FeatureInteraction(key='target_col', x='Survived'),
>>>         eda.interaction.FeatureInteraction(key='target_col_vs_age', x='Survived', y='Age')
>>>     ],
>>>     viz_facets=[
>>>         # Bar Plot with counts per each of the values in Survived
>>>         viz.interaction.FeatureInteractionVisualization(key='target_col', headers=True),
>>>         # Box Plot Survived vs Age
>>>         viz.interaction.FeatureInteractionVisualization(key='target_col_vs_age', headers=True),
>>>     ]
>>> )
>>>
>>> # Simplified shortcut for interactions: scatter plot of Fare vs Age colored based on Survived values.
>>> auto.analyze_interaction(x='Fare', y='Age', hue='Survived', train_data=df_train)

DistributionFit#

class autogluon.eda.analysis.interaction.DistributionFit(columns: Union[str, List[str]], pvalue_min: float = 0.01, keep_top_n: Optional[int] = None, distributions_to_fit: Optional[Union[str, List[str]]] = None, parent: Optional[AbstractAnalysis] = None, children: Optional[List[AbstractAnalysis]] = None, **kwargs)[source]#

This component attempts to fit various distributions for further plotting via FeatureInteractionVisualization.

The data specified in columns must be numeric to be considered for fitting (categorical variables are not supported).

Only the distributions with statistical significance above pvalue_min threshold will be included in the results.

Note: this analysis is an augmentation for FeatureInteraction and should be used in pair to be visualized via FeatureInteractionVisualization.

Parameters

columns (Union[str, List[str]]) – column to be included into analysis. Can be passed as a string or a list of strings.
pvalue_min (float = 0.01,) – min pvalue to consider including distribution fit in the results.
keep_top_n (Optional[int] = None,) – how many distributions exceeding pvalue_min to include in the results. I.e. if keep_top_n=3, but 10 distributions satisfied pvalue_min, only top 3 will be included. If not specified and distributions_to_fit is not provided, then only top 3 will be included in the results.
distributions_to_fit (Optional[Union[str, List[str]]] = None,) – list of distributions to fit. See DistributionFit.AVAILABLE_DISTRIBUTIONS for the list of supported values. See scipy documentation for each distribution details. If not specified, then all supported distributions will be attempted to fit.
parent (Optional[AbstractAnalysis], default = None) – parent Analysis
children (Optional[List[AbstractAnalysis]], default None) – wrapped analyses; these will receive sampled args during fit call
kwargs –

Examples

>>> import autogluon.eda.analysis as eda
>>> import autogluon.eda.visualization as viz
>>> import autogluon.eda.auto as auto
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df_train = pd.DataFrame(...)
>>>
>>> auto.analyze(
>>>     train_data=df_train, label=target_col,
>>>     anlz_facets=[
>>>         eda.dataset.RawTypesAnalysis(),
>>>         eda.interaction.DistributionFit(columns=['Fare', 'Age'], distributions_to_fit=['lognorm', 'beta', 'gamma', 'fisk']),
>>>         eda.interaction.FeatureInteraction(key='age-chart', x='Age'),
>>>
>>>     ],
>>>     viz_facets=[
>>>         viz.interaction.FeatureInteractionVisualization(key='age-chart', headers=True),
>>>     ]
>>> )