Components: interaction

autogluon.eda.visualization.interaction

CorrelationVisualization

Display feature correlations matrix.

CorrelationSignificanceVisualization

Display feature correlations significance matrix.

FeatureInteractionVisualization

Feature interaction visualization.

CorrelationVisualization

class autogluon.eda.visualization.interaction.CorrelationVisualization(headers: bool = False, namespace: Optional[str] = None, fig_args: Optional[Dict[str, Any]] = None, **kwargs)[source]

Display feature correlations matrix.

This report renders correlations between variable in a form of heatmap. The details of the report to be rendered depend on the configuration of Correlation

Parameters
headers: bool, default = False

if True then render headers

namespace: Optional[str], default = None

namespace to use; can be nested like ns_a.ns_b.ns_c

fig_args: Optional[Dict[str, Any]], default = None,

kwargs to pass into chart figure

See also

Correlation

CorrelationSignificanceVisualization

class autogluon.eda.visualization.interaction.CorrelationSignificanceVisualization(headers: bool = False, namespace: Optional[str] = None, fig_args: Optional[Dict[str, Any]] = None, **kwargs)[source]

Display feature correlations significance matrix.

This report renders correlations significance matrix in a form of heatmap. The details of the report to be rendered depend on the configuration of Correlation and CorrelationSignificance analyses.

Parameters
headers: bool, default = False

if True then render headers

namespace: Optional[str], default = None

namespace to use; can be nested like ns_a.ns_b.ns_c

fig_args: Optional[Dict[str, Any]] = None,

kwargs to pass into chart figure

FeatureInteractionVisualization

class autogluon.eda.visualization.interaction.FeatureInteractionVisualization(key: str, numeric_as_categorical_threshold: int = 20, max_categories_to_consider_render: int = 30, headers: bool = False, namespace: Optional[str] = None, fig_args: Optional[Dict[str, Any]] = None, **kwargs)[source]

Feature interaction visualization.

This report renders feature interaction analysis results. The details of the report to be rendered depend on the variable types combination in x/y/hue. key is used to link analysis and visualization - this allows to have multiple analyses/visualizations in one composite analysis.

Parameters
key: str

key used to store the analysis in the state; the value is placed in the state by FeatureInteraction. If the key is not provided, then use one of theform: ‘x:A|y:B|hue:C’ (omit corresponding x/y/hue if the value not provided) See also autogluon.eda.analysis.interaction.FeatureInteraction

numeric_as_categorical_threshold
headers: bool, default = False

if True then render headers

namespace: Optional[str], default = None

namespace to use; can be nested like ns_a.ns_b.ns_c

fig_args: Optional[Dict[str, Any]] = None,

kwargs to pass into chart figure

kwargs

parameters to pass as a chart args

autogluon.eda.analysis.interaction

Correlation

Correlation analysis.

CorrelationSignificance

Significance of correlation of all variable combinations in the DataFrame.

FeatureInteraction

Feature interaction analysis

DistributionFit

This component attempts to fit various distributions for further plotting via FeatureInteractionVisualization.

Correlation

class autogluon.eda.analysis.interaction.Correlation(method: str = 'spearman', focus_field: Optional[str] = None, focus_field_threshold: float = 0.5, parent: Optional[autogluon.eda.analysis.base.AbstractAnalysis] = None, children: Optional[List[autogluon.eda.analysis.base.AbstractAnalysis]] = None, **kwargs)[source]

Correlation analysis.

Note: it is recommended to apply AutoGluon standard pre-processing - this will allow to include categorical variables into the analysis. This can be done via wrapping analysis into ApplyFeatureGenerator

Parameters
method: str {‘pearson’, ‘kendall’, ‘spearman’, ‘phik’}, default=’spearman’
Method of correlation:
  • pearson : standard correlation coefficient

  • kendall : Kendall Tau correlation coefficient

  • spearman : Spearman rank correlation

  • phikphi_k correlation

    Correlation matrix of bivariate gaussian derived from chi2-value Chi2-value gets converted into correlation coefficient of bivariate gauss with correlation value rho, assuming given binning and number of records. Correlation coefficient value is between 0 and 1. Bivariate gaussian’s range is set to [-5,5] by construction. See Also phik documentation.

focus_field: Optional[str], default = None

field name to focus. Specifying a field would filter all correlations only when they are >= focus_field_threshold This is helpful when dealing with a large number of variables.

focus_field_threshold: float, default = 0.5

a cut-off threshold when focus_field is specified

parent: Optional[AbstractAnalysis], default = None

parent Analysis

children: List[AbstractAnalysis], default = []

wrapped analyses; these will receive sampled args during fit call

Examples

>>> import autogluon.eda.analysis as eda
>>> import autogluon.eda.visualization as viz
>>> import autogluon.eda.auto as auto
>>> import pandas as pd
>>> import numpy as np
>>> df_train = pd.DataFrame(...)
>>>
>>> auto.analyze(return_sttrain_data=df_train, label=target_col, anlz_facets=[
>>>     # Apply standard AutoGluon pre-processing to transform categorical variables to numbers to ensure correlation includes them.
>>>     eda.transform.ApplyFeatureGenerator(category_to_numbers=True, children=[
>>>         # We use `spearman` correlation to capture non-linear interactions because it is based on the order rank.
>>>         eda.interaction.Correlation(method='spearman', focus_field=target_col, focus_field_threshold=0.3),
>>>     ])
>>> ], viz_facets=[
>>>     viz.interaction.CorrelationVisualization(fig_args=dict(figsize=(12,8)), **common_args),
>>> ])

CorrelationSignificance

class autogluon.eda.analysis.interaction.CorrelationSignificance(parent: Optional[autogluon.eda.analysis.base.AbstractAnalysis] = None, children: Optional[List[autogluon.eda.analysis.base.AbstractAnalysis]] = None, state: Optional[autogluon.eda.state.AnalysisState] = None, **kwargs)[source]

Significance of correlation of all variable combinations in the DataFrame.

See significance_matrix() for more details. This analysis requires Correlation results to be available in the state.

Note: it is recommended to apply AutoGluon standard pre-processing - this will allow to include categorical variables into the analysis. This can be done via wrapping analysis into ApplyFeatureGenerator

See also

significance_matrix()
Correlation
ApplyFeatureGenerator

Examples

>>> import autogluon.eda.analysis as eda
>>> import autogluon.eda.visualization as viz
>>> import autogluon.eda.auto as auto
>>> import pandas as pd
>>> df_train = pd.DataFrame(...)
>>>
>>> auto.analyze(return_sttrain_data=df_train, label=target_col, anlz_facets=[
>>>     # Apply standard AutoGluon pre-processing to transform categorical variables to numbers to ensure correlation includes them.
>>>     eda.transform.ApplyFeatureGenerator(category_to_numbers=True, children=[
>>>         # We use `spearman` correlation to capture non-linear interactions because it is based on the order rank.
>>>         eda.interaction.Correlation(method='spearman', focus_field=target_col, focus_field_threshold=0.3),
>>>         eda.interaction.CorrelationSignificance()
>>>     ])
>>> ], viz_facets=[
>>>     viz.interaction.CorrelationSignificanceVisualization(fig_args=dict(figsize=(12,8))),
>>> ])

FeatureInteraction

class autogluon.eda.analysis.interaction.FeatureInteraction(x: Optional[str] = None, y: Optional[str] = None, hue: Optional[str] = None, key: Optional[str] = None, parent: Optional[autogluon.eda.analysis.base.AbstractAnalysis] = None, children: Optional[List[autogluon.eda.analysis.base.AbstractAnalysis]] = None, **kwargs)[source]

Feature interaction analysis

Parameters
x: Optional[str], default = None

variable to analyse which would be placed on x-axis

y: Optional[str], default = None

variable to analyse which would be placed on y-axis

hue: Optional[str], default = None

variable to use as hue in x/y-analysis.

key: Optional[str], default = None

key to use to store the analysis in the state; the value is later to be used by FeatureInteractionVisualization. If the key is not provided, then use one of theform: ‘x:A|y:B|hue:C’ (omit corresponding x/y/hue if the value not provided) See also FeatureInteractionVisualization

parent: Optional[AbstractAnalysis], default = None

parent Analysis

children: Optional[List[AbstractAnalysis]], default None

wrapped analyses; these will receive sampled args during fit call

kwargs

Examples

>>> import pandas as pd
>>> import numpy as np
>>> import autogluon.eda.analysis as eda
>>> import autogluon.eda.visualization as viz
>>> import autogluon.eda.auto as auto
>>>
>>> df_train = pd.DataFrame(...)
>>>
>>> state = auto.analyze(
>>>     train_data=df_train, label='Survived',
>>>     anlz_facets=[
>>>         eda.dataset.RawTypesAnalysis(),
>>>         eda.interaction.FeatureInteraction(key='target_col', x='Survived'),
>>>         eda.interaction.FeatureInteraction(key='target_col_vs_age', x='Survived', y='Age')
>>>     ],
>>>     viz_facets=[
>>>         # Bar Plot with counts per each of the values in Survived
>>>         viz.interaction.FeatureInteractionVisualization(key='target_col', headers=True),
>>>         # Box Plot Survived vs Age
>>>         viz.interaction.FeatureInteractionVisualization(key='target_col_vs_age', headers=True),
>>>     ]
>>> )
>>>
>>> # Simplified shortcut for interactions: scatter plot of Fare vs Age colored based on Survived values.
>>> auto.analyze_interaction(x='Fare', y='Age', hue='Survived', train_data=df_train)

DistributionFit

class autogluon.eda.analysis.interaction.DistributionFit(columns: Union[str, List[str]], pvalue_min: float = 0.01, keep_top_n: Optional[int] = None, distributions_to_fit: Optional[Union[str, List[str]]] = None, parent: Optional[autogluon.eda.analysis.base.AbstractAnalysis] = None, children: Optional[List[autogluon.eda.analysis.base.AbstractAnalysis]] = None, **kwargs)[source]

This component attempts to fit various distributions for further plotting via FeatureInteractionVisualization.

The data specified in columns must be numeric to be considered for fitting (categorical variables are not supported).

Only the distributions with statistical significance above pvalue_min threshold will be included in the results.

Note: this analysis is an augmentation for FeatureInteraction and should be used in pair to be visualized via FeatureInteractionVisualization.

Parameters
columns: Union[str, List[str]]

column to be included into analysis. Can be passed as a string or a list of strings.

pvalue_min: float = 0.01,

min pvalue to consider including distribution fit in the results.

keep_top_n: Optional[int] = None,

how many distributions exceeding pvalue_min to include in the results. I.e. if keep_top_n=3, but 10 distributions satisfied pvalue_min, only top 3 will be included. If not specified and distributions_to_fit is not provided, then only top 3 will be included in the results.

distributions_to_fit: Optional[Union[str, List[str]]] = None,

list of distributions to fit. See DistributionFit.AVAILABLE_DISTRIBUTIONS for the list of supported values. See scipy documentation for each distribution details. If not specified, then all supported distributions will be attempted to fit.

parent: Optional[AbstractAnalysis], default = None

parent Analysis

children: Optional[List[AbstractAnalysis]], default None

wrapped analyses; these will receive sampled args during fit call

kwargs

Examples

>>> import autogluon.eda.analysis as eda
>>> import autogluon.eda.visualization as viz
>>> import autogluon.eda.auto as auto
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df_train = pd.DataFrame(...)
>>>
>>> auto.analyze(
>>>     train_data=df_train, label=target_col,
>>>     anlz_facets=[
>>>         eda.dataset.RawTypesAnalysis(),
>>>         eda.interaction.DistributionFit(columns=['Fare', 'Age'], distributions_to_fit=['lognorm', 'beta', 'gamma', 'fisk']),
>>>         eda.interaction.FeatureInteraction(key='age-chart', x='Age'),
>>>
>>>     ],
>>>     viz_facets=[
>>>         viz.interaction.FeatureInteractionVisualization(key='age-chart', headers=True),
>>>     ]
>>> )