Reference: Base APIs#

This section highlights the base APIs used by the EDA framework. The processing consists of the following parts:

Analysis graph construction - in this part a nested graph of analyses is constructed.

analysis = BaseAnalysis(
    # State
    state=state,
    # Arguments
    train_data=train_data, test_data=test_data, val_data=val_data, model=model, label=label,
    # Nested analyses
    children=[
        Sampler(sample=sample, children=[
            DatasetSummary(),
            MissingValuesAnalysis(),
            RawTypesAnalysis(),
            SpecialTypesAnalysis(),
            ApplyFeatureGenerator(category_to_numbers=True, children=[
                FeatureDistanceAnalysis()
            ]),
        ]),
    ],
)

2. .fit() call. This call will execute operations in the graph and produce a state. The state is a nested dictionary without any prescribed structure. All components share the same namespace. If multiple components are fitted with different parameters, they can be put into separate sub-spaces via Namespace component that can be passed either for further processing via next analysis or be rendered.

state = analysis.fit()

3. Rendering: in this stage we construct components graph (a combination of layout components and visual components) and then pass State generated previously as an input argument into render() call.

viz = SimpleVerticalLinearLayout(
    facets=[
        DatasetStatistics(headers=True),
        DatasetTypeMismatch(headers=True),
        MarkdownSectionComponent("### Feature Distance"),
        FeatureDistanceAnalysisVisualization(),
    ],
)
viz.render(state)

Please note: it is possible that the components may depend on each other’s output; all the pre-requisites to fit() the component must be checked in can_handle(). There are two ways the components can share the information: 1) using state; 2) share values/shadow arguments (i.e., sample component modifies train_data, test_data and val_data arguments in the scope of calling children’s fit().

autogluon.eda.analysis.base#

`AbstractAnalysis`
`Namespace`	Creates a nested namespace in state.

AbstractAnalysis#

class autogluon.eda.analysis.base.AbstractAnalysis(parent: Optional[AbstractAnalysis] = None, children: Optional[List[AbstractAnalysis]] = None, state: Optional[AnalysisState] = None, **kwargs)[source]#

all_keys_must_be_present(state: AnalysisState, *keys) → bool#

Checks if all the keys are present in the state

Parameters

state (AnalysisState) – state object to perform check on
keys – list of the keys to check

Return type

True if all the key from the keys list are present in the state

at_least_one_key_must_be_present(state: AnalysisState, *keys) → bool#

Checks if at least one key is present in the state

Parameters

state (AnalysisState) – state object to perform check on
keys – list of the keys to check

Return type

True if at least one key from the keys list is present in the state

static available_datasets(args: AnalysisState) → Generator[Tuple[str, DataFrame], None, None][source]#

Generator which iterates only through the datasets provided in arguments

Parameters: args (AnalysisState) – arguments passed into the call. These are different from self.args in a way that it’s arguments assembled from the parents and shadowed via children (allows to isolate reused parameters in upper arguments declarations.
Return type: tuple of dataset name (train_data, test_data or tuning_data) and dataset itself

abstract can_handle(state: AnalysisState, args: AnalysisState) → bool[source]#

Checks if state and args has all the required parameters for fitting. See also at_least_one_key_must_be_present() and all_keys_must_be_present() helpers to construct more complex logic.

Parameters

state (AnalysisState) – state to be updated by this fit function
args (AnalysisState) – analysis properties assembled from root of analysis hierarchy to this component (with lower levels shadowing upper level args).

Return type

True if all the pre-requisites for fitting are present

fit(**kwargs) → AnalysisState[source]#

Fit the analysis tree.

Parameters: kwargs – fit arguments
Return type: state produced by fit

Namespace#

class autogluon.eda.analysis.base.Namespace(namespace: Optional[str] = None, parent: Optional[AbstractAnalysis] = None, children: Optional[List[AbstractAnalysis]] = None, **kwargs)[source]#

Creates a nested namespace in state. All the components within children will have relative root of the state moved into this subspace. To instruct visualization facets to use a specific subspace, please use namespace argument (see the example).

Parameters

namespace (Optional[str], default = None) – namespace to use; use root if not specified
parent (Optional[AbstractAnalysis], default = None) – parent Analysis
children (Optional[List[AbstractAnalysis]], default None) – wrapped analyses; these will receive sampled args during fit call
kwargs –

Examples

>>> import autogluon.eda.analysis as eda
>>> import autogluon.eda.visualization as viz
>>> import autogluon.eda.auto as auto
>>>
>>> auto.analyze(
>>>     train_data=..., label=...,
>>>     anlz_facets=[
>>>         # Puts output into the root namespace
>>>         eda.interaction.Correlation(),
>>>         # Puts output into the focus namespace
>>>         eda.Namespace(namespace='focus', children=[
>>>             eda.interaction.Correlation(focus_field='Fare', focus_field_threshold=0.3),
>>>         ])
>>>     ],
>>>     viz_facets=[
>>>         # Renders correlations from the root namespace
>>>         viz.interaction.CorrelationVisualization(),
>>>         # Renders correlations from the focus namespace
>>>         viz.interaction.CorrelationVisualization(namespace='focus'),
>>>     ]
>>> )

autogluon.eda.visualization.base#

AbstractVisualization

AbstractVisualization#

class autogluon.eda.visualization.base.AbstractVisualization(namespace: Optional[str] = None, **kwargs)[source]#

all_keys_must_be_present(state: AnalysisState, *keys) → bool#

Checks if all the keys are present in the state

Parameters

state (AnalysisState) – state object to perform check on
keys – list of the keys to check

Return type

True if all the key from the keys list are present in the state

at_least_one_key_must_be_present(state: AnalysisState, *keys) → bool#

Checks if at least one key is present in the state

Parameters

state (AnalysisState) – state object to perform check on
keys – list of the keys to check

Return type

True if at least one key from the keys list is present in the state

abstract can_handle(state: AnalysisState) → bool[source]#

Checks if state has all the required parameters for visualization. See also at_least_one_key_must_be_present() and all_keys_must_be_present() helpers to construct more complex logic.

Parameters: state (AnalysisState) – fitted state
Return type: True if all the pre-requisites for rendering are present

render(state: AnalysisState) → None[source]#

Render component.

Parameters: state (AnalysisState) – state to render