Components: interaction¶
autogluon.eda.visualization.interaction¶
Display feature correlations matrix. |
|
Display feature correlations significance matrix. |
|
Feature interaction visualization. |
CorrelationVisualization¶
-
class
autogluon.eda.visualization.interaction.
CorrelationVisualization
(headers: bool = False, namespace: Optional[str] = None, fig_args: Optional[Dict[str, Any]] = None, **kwargs)[source]¶ Display feature correlations matrix.
This report renders correlations between variable in a form of heatmap. The details of the report to be rendered depend on the configuration of
Correlation
- Parameters
- headers: bool, default = False
if True then render headers
- namespace: Optional[str], default = None
namespace to use; can be nested like ns_a.ns_b.ns_c
- fig_args: Optional[Dict[str, Any]], default = None,
kwargs to pass into chart figure
See also
CorrelationSignificanceVisualization¶
-
class
autogluon.eda.visualization.interaction.
CorrelationSignificanceVisualization
(headers: bool = False, namespace: Optional[str] = None, fig_args: Optional[Dict[str, Any]] = None, **kwargs)[source]¶ Display feature correlations significance matrix.
This report renders correlations significance matrix in a form of heatmap. The details of the report to be rendered depend on the configuration of
Correlation
andCorrelationSignificance
analyses.- Parameters
- headers: bool, default = False
if True then render headers
- namespace: Optional[str], default = None
namespace to use; can be nested like ns_a.ns_b.ns_c
- fig_args: Optional[Dict[str, Any]] = None,
kwargs to pass into chart figure
See also
FeatureInteractionVisualization¶
-
class
autogluon.eda.visualization.interaction.
FeatureInteractionVisualization
(key: str, numeric_as_categorical_threshold: int = 20, max_categories_to_consider_render: int = 30, headers: bool = False, namespace: Optional[str] = None, fig_args: Optional[Dict[str, Any]] = None, **kwargs)[source]¶ Feature interaction visualization.
This report renders feature interaction analysis results. The details of the report to be rendered depend on the variable types combination in x/y/hue. key is used to link analysis and visualization - this allows to have multiple analyses/visualizations in one composite analysis.
- Parameters
- key: str
key used to store the analysis in the state; the value is placed in the state by FeatureInteraction. If the key is not provided, then use one of theform: ‘x:A|y:B|hue:C’ (omit corresponding x/y/hue if the value not provided) See also
autogluon.eda.analysis.interaction.FeatureInteraction
- numeric_as_categorical_threshold
- headers: bool, default = False
if True then render headers
- namespace: Optional[str], default = None
namespace to use; can be nested like ns_a.ns_b.ns_c
- fig_args: Optional[Dict[str, Any]] = None,
kwargs to pass into chart figure
- kwargs
parameters to pass as a chart args
See also
autogluon.eda.analysis.interaction¶
Correlation analysis. |
|
Significance of correlation of all variable combinations in the DataFrame. |
|
Feature interaction analysis |
|
This component attempts to fit various distributions for further plotting via |
Correlation¶
-
class
autogluon.eda.analysis.interaction.
Correlation
(method: str = 'spearman', focus_field: Optional[str] = None, focus_field_threshold: float = 0.5, parent: Optional[autogluon.eda.analysis.base.AbstractAnalysis] = None, children: Optional[List[autogluon.eda.analysis.base.AbstractAnalysis]] = None, **kwargs)[source]¶ Correlation analysis.
Note: it is recommended to apply AutoGluon standard pre-processing - this will allow to include categorical variables into the analysis. This can be done via wrapping analysis into
ApplyFeatureGenerator
- Parameters
- method: str {‘pearson’, ‘kendall’, ‘spearman’, ‘phik’}, default=’spearman’
- Method of correlation:
pearson : standard correlation coefficient
kendall : Kendall Tau correlation coefficient
spearman : Spearman rank correlation
- phikphi_k correlation
Correlation matrix of bivariate gaussian derived from chi2-value Chi2-value gets converted into correlation coefficient of bivariate gauss with correlation value rho, assuming given binning and number of records. Correlation coefficient value is between 0 and 1. Bivariate gaussian’s range is set to [-5,5] by construction. See Also phik documentation.
- focus_field: Optional[str], default = None
field name to focus. Specifying a field would filter all correlations only when they are >= focus_field_threshold This is helpful when dealing with a large number of variables.
- focus_field_threshold: float, default = 0.5
a cut-off threshold when focus_field is specified
- parent: Optional[AbstractAnalysis], default = None
parent Analysis
- children: List[AbstractAnalysis], default = []
wrapped analyses; these will receive sampled args during fit call
See also
Examples
>>> import autogluon.eda.analysis as eda >>> import autogluon.eda.visualization as viz >>> import autogluon.eda.auto as auto >>> import pandas as pd >>> import numpy as np >>> df_train = pd.DataFrame(...) >>> >>> auto.analyze(return_sttrain_data=df_train, label=target_col, anlz_facets=[ >>> # Apply standard AutoGluon pre-processing to transform categorical variables to numbers to ensure correlation includes them. >>> eda.transform.ApplyFeatureGenerator(category_to_numbers=True, children=[ >>> # We use `spearman` correlation to capture non-linear interactions because it is based on the order rank. >>> eda.interaction.Correlation(method='spearman', focus_field=target_col, focus_field_threshold=0.3), >>> ]) >>> ], viz_facets=[ >>> viz.interaction.CorrelationVisualization(fig_args=dict(figsize=(12,8)), **common_args), >>> ])
CorrelationSignificance¶
-
class
autogluon.eda.analysis.interaction.
CorrelationSignificance
(parent: Optional[autogluon.eda.analysis.base.AbstractAnalysis] = None, children: Optional[List[autogluon.eda.analysis.base.AbstractAnalysis]] = None, state: Optional[autogluon.eda.state.AnalysisState] = None, **kwargs)[source]¶ Significance of correlation of all variable combinations in the DataFrame.
See
significance_matrix()
for more details. This analysis requiresCorrelation
results to be available in the state.Note: it is recommended to apply AutoGluon standard pre-processing - this will allow to include categorical variables into the analysis. This can be done via wrapping analysis into
ApplyFeatureGenerator
See also
significance_matrix()
Correlation
ApplyFeatureGenerator
Examples
>>> import autogluon.eda.analysis as eda >>> import autogluon.eda.visualization as viz >>> import autogluon.eda.auto as auto >>> import pandas as pd >>> df_train = pd.DataFrame(...) >>> >>> auto.analyze(return_sttrain_data=df_train, label=target_col, anlz_facets=[ >>> # Apply standard AutoGluon pre-processing to transform categorical variables to numbers to ensure correlation includes them. >>> eda.transform.ApplyFeatureGenerator(category_to_numbers=True, children=[ >>> # We use `spearman` correlation to capture non-linear interactions because it is based on the order rank. >>> eda.interaction.Correlation(method='spearman', focus_field=target_col, focus_field_threshold=0.3), >>> eda.interaction.CorrelationSignificance() >>> ]) >>> ], viz_facets=[ >>> viz.interaction.CorrelationSignificanceVisualization(fig_args=dict(figsize=(12,8))), >>> ])
FeatureInteraction¶
-
class
autogluon.eda.analysis.interaction.
FeatureInteraction
(x: Optional[str] = None, y: Optional[str] = None, hue: Optional[str] = None, key: Optional[str] = None, parent: Optional[autogluon.eda.analysis.base.AbstractAnalysis] = None, children: Optional[List[autogluon.eda.analysis.base.AbstractAnalysis]] = None, **kwargs)[source]¶ Feature interaction analysis
- Parameters
- x: Optional[str], default = None
variable to analyse which would be placed on x-axis
- y: Optional[str], default = None
variable to analyse which would be placed on y-axis
- hue: Optional[str], default = None
variable to use as hue in x/y-analysis.
- key: Optional[str], default = None
key to use to store the analysis in the state; the value is later to be used by FeatureInteractionVisualization. If the key is not provided, then use one of theform: ‘x:A|y:B|hue:C’ (omit corresponding x/y/hue if the value not provided) See also
FeatureInteractionVisualization
- parent: Optional[AbstractAnalysis], default = None
parent Analysis
- children: Optional[List[AbstractAnalysis]], default None
wrapped analyses; these will receive sampled args during fit call
- kwargs
Examples
>>> import pandas as pd >>> import numpy as np >>> import autogluon.eda.analysis as eda >>> import autogluon.eda.visualization as viz >>> import autogluon.eda.auto as auto >>> >>> df_train = pd.DataFrame(...) >>> >>> state = auto.analyze( >>> train_data=df_train, label='Survived', >>> anlz_facets=[ >>> eda.dataset.RawTypesAnalysis(), >>> eda.interaction.FeatureInteraction(key='target_col', x='Survived'), >>> eda.interaction.FeatureInteraction(key='target_col_vs_age', x='Survived', y='Age') >>> ], >>> viz_facets=[ >>> # Bar Plot with counts per each of the values in Survived >>> viz.interaction.FeatureInteractionVisualization(key='target_col', headers=True), >>> # Box Plot Survived vs Age >>> viz.interaction.FeatureInteractionVisualization(key='target_col_vs_age', headers=True), >>> ] >>> ) >>> >>> # Simplified shortcut for interactions: scatter plot of Fare vs Age colored based on Survived values. >>> auto.analyze_interaction(x='Fare', y='Age', hue='Survived', train_data=df_train)
DistributionFit¶
-
class
autogluon.eda.analysis.interaction.
DistributionFit
(columns: Union[str, List[str]], pvalue_min: float = 0.01, keep_top_n: Optional[int] = None, distributions_to_fit: Optional[Union[str, List[str]]] = None, parent: Optional[autogluon.eda.analysis.base.AbstractAnalysis] = None, children: Optional[List[autogluon.eda.analysis.base.AbstractAnalysis]] = None, **kwargs)[source]¶ This component attempts to fit various distributions for further plotting via
FeatureInteractionVisualization
.The data specified in columns must be numeric to be considered for fitting (categorical variables are not supported).
Only the distributions with statistical significance above pvalue_min threshold will be included in the results.
Note: this analysis is an augmentation for
FeatureInteraction
and should be used in pair to be visualized viaFeatureInteractionVisualization
.- Parameters
- columns: Union[str, List[str]]
column to be included into analysis. Can be passed as a string or a list of strings.
- pvalue_min: float = 0.01,
min pvalue to consider including distribution fit in the results.
- keep_top_n: Optional[int] = None,
how many distributions exceeding pvalue_min to include in the results. I.e. if keep_top_n=3, but 10 distributions satisfied pvalue_min, only top 3 will be included. If not specified and distributions_to_fit is not provided, then only top 3 will be included in the results.
- distributions_to_fit: Optional[Union[str, List[str]]] = None,
list of distributions to fit. See DistributionFit.AVAILABLE_DISTRIBUTIONS for the list of supported values. See scipy documentation for each distribution details. If not specified, then all supported distributions will be attempted to fit.
- parent: Optional[AbstractAnalysis], default = None
parent Analysis
- children: Optional[List[AbstractAnalysis]], default None
wrapped analyses; these will receive sampled args during fit call
- kwargs
Examples
>>> import autogluon.eda.analysis as eda >>> import autogluon.eda.visualization as viz >>> import autogluon.eda.auto as auto >>> import pandas as pd >>> import numpy as np >>> >>> df_train = pd.DataFrame(...) >>> >>> auto.analyze( >>> train_data=df_train, label=target_col, >>> anlz_facets=[ >>> eda.dataset.RawTypesAnalysis(), >>> eda.interaction.DistributionFit(columns=['Fare', 'Age'], distributions_to_fit=['lognorm', 'beta', 'gamma', 'fisk']), >>> eda.interaction.FeatureInteraction(key='age-chart', x='Age'), >>> >>> ], >>> viz_facets=[ >>> viz.interaction.FeatureInteractionVisualization(key='age-chart', headers=True), >>> ] >>> )