Reference: Base APIs

This section highlights the base APIs used by the EDA framework. The processing consists of the following parts:

  1. Analysis graph construction - in this part a nested graph of analyses is constructed.

analysis = BaseAnalysis(
    # State
    state=state,
    # Arguments
    train_data=train_data, test_data=test_data, val_data=val_data, model=model, label=label,
    # Nested analyses
    children=[
        Sampler(sample=sample, children=[
            DatasetSummary(),
            MissingValuesAnalysis(),
            RawTypesAnalysis(),
            SpecialTypesAnalysis(),
            ApplyFeatureGenerator(category_to_numbers=True, children=[
                FeatureDistanceAnalysis()
            ]),
        ]),
    ],
)

2. .fit() call. This call will execute operations in the graph and produce a state. The state is a nested dictionary without any prescribed structure. All components share the same namespace. If multiple components are fitted with different parameters, they can be put into separate sub-spaces via Namespace component that can be passed either for further processing via next analysis or be rendered.

state = analysis.fit()

3. Rendering: in this stage we construct components graph (a combination of layout components and visual components) and then pass State generated previously as an input argument into render() call.

viz = SimpleVerticalLinearLayout(
    facets=[
        DatasetStatistics(headers=True),
        DatasetTypeMismatch(headers=True),
        MarkdownSectionComponent("### Feature Distance"),
        FeatureDistanceAnalysisVisualization(),
    ],
)
viz.render(state)

Please note: it is possible that the components may depend on each other’s output; all the pre-requisites to fit() the component must be checked in can_handle(). There are two ways the components can share the information: 1) using state; 2) share values/shadow arguments (i.e., sample component modifies train_data, test_data and val_data arguments in the scope of calling children’s fit().

autogluon.eda.analysis.base

AbstractAnalysis

Methods

Namespace

Creates a nested namespace in state.

AbstractAnalysis

class autogluon.eda.analysis.base.AbstractAnalysis(parent: Optional[autogluon.eda.analysis.base.AbstractAnalysis] = None, children: Optional[List[autogluon.eda.analysis.base.AbstractAnalysis]] = None, state: Optional[autogluon.eda.state.AnalysisState] = None, **kwargs)[source]

Methods

all_keys_must_be_present(state, *keys)

Checks if all the keys are present in the state

at_least_one_key_must_be_present(state, *keys)

Checks if at least one key is present in the state

available_datasets(args)

Generator which iterates only through the datasets provided in arguments

can_handle(state, args)

Checks if state and args has all the required parameters for fitting.

fit(**kwargs)

Fit the analysis tree.

all_keys_must_be_present(state: autogluon.eda.state.AnalysisState, *keys) → bool

Checks if all the keys are present in the state

Parameters
state: AnalysisState

state object to perform check on

keys:

list of the keys to check

Returns
True if all the key from the keys list are present in the state
at_least_one_key_must_be_present(state: autogluon.eda.state.AnalysisState, *keys) → bool

Checks if at least one key is present in the state

Parameters
state: AnalysisState

state object to perform check on

keys:

list of the keys to check

Returns
True if at least one key from the keys list is present in the state
static available_datasets(args: autogluon.eda.state.AnalysisState) → Generator[Tuple[str, pandas.core.frame.DataFrame], None, None][source]

Generator which iterates only through the datasets provided in arguments

Parameters
args: AnalysisState

arguments passed into the call. These are different from self.args in a way that it’s arguments assembled from the parents and shadowed via children (allows to isolate reused parameters in upper arguments declarations.

Returns
tuple of dataset name (train_data, test_data or tuning_data) and dataset itself
abstract can_handle(state: autogluon.eda.state.AnalysisState, args: autogluon.eda.state.AnalysisState) → bool[source]

Checks if state and args has all the required parameters for fitting. See also at_least_one_key_must_be_present() and all_keys_must_be_present() helpers to construct more complex logic.

Parameters
state: AnalysisState

state to be updated by this fit function

args: AnalysisState

analysis properties assembled from root of analysis hierarchy to this component (with lower levels shadowing upper level args).

Returns
True if all the pre-requisites for fitting are present
fit(**kwargs) → autogluon.eda.state.AnalysisState[source]

Fit the analysis tree.

Parameters
kwargs

fit arguments

Returns
state produced by fit

Namespace

class autogluon.eda.analysis.base.Namespace(namespace: Optional[str] = None, parent: Optional[autogluon.eda.analysis.base.AbstractAnalysis] = None, children: Optional[List[autogluon.eda.analysis.base.AbstractAnalysis]] = None, **kwargs)[source]

Creates a nested namespace in state. All the components within children will have relative root of the state moved into this subspace. To instruct visualization facets to use a specific subspace, please use namespace argument (see the example).

Parameters
namespace: Optional[str], default = None

namespace to use; use root if not specified

parent: Optional[AbstractAnalysis], default = None

parent Analysis

children: Optional[List[AbstractAnalysis]], default None

wrapped analyses; these will receive sampled args during fit call

kwargs

Examples

>>> import autogluon.eda.analysis as eda
>>> import autogluon.eda.visualization as viz
>>> import autogluon.eda.auto as auto
>>>
>>> auto.analyze(
>>>     train_data=..., label=...,
>>>     anlz_facets=[
>>>         # Puts output into the root namespace
>>>         eda.interaction.Correlation(),
>>>         # Puts output into the focus namespace
>>>         eda.Namespace(namespace='focus', children=[
>>>             eda.interaction.Correlation(focus_field='Fare', focus_field_threshold=0.3),
>>>         ])
>>>     ],
>>>     viz_facets=[
>>>         # Renders correlations from the root namespace
>>>         viz.interaction.CorrelationVisualization(),
>>>         # Renders correlations from the focus namespace
>>>         viz.interaction.CorrelationVisualization(namespace='focus'),
>>>     ]
>>> )

autogluon.eda.visualization.base

AbstractVisualization

Methods

AbstractVisualization

class autogluon.eda.visualization.base.AbstractVisualization(namespace: Optional[str] = None, **kwargs)[source]

Methods

all_keys_must_be_present(state, *keys)

Checks if all the keys are present in the state

at_least_one_key_must_be_present(state, *keys)

Checks if at least one key is present in the state

can_handle(state)

Checks if state has all the required parameters for visualization.

render(state)

Render component.

all_keys_must_be_present(state: autogluon.eda.state.AnalysisState, *keys) → bool

Checks if all the keys are present in the state

Parameters
state: AnalysisState

state object to perform check on

keys:

list of the keys to check

Returns
True if all the key from the keys list are present in the state
at_least_one_key_must_be_present(state: autogluon.eda.state.AnalysisState, *keys) → bool

Checks if at least one key is present in the state

Parameters
state: AnalysisState

state object to perform check on

keys:

list of the keys to check

Returns
True if at least one key from the keys list is present in the state
abstract can_handle(state: autogluon.eda.state.AnalysisState) → bool[source]

Checks if state has all the required parameters for visualization. See also at_least_one_key_must_be_present() and all_keys_must_be_present() helpers to construct more complex logic.

Parameters
state: AnalysisState

fitted state

Returns
True if all the pre-requisites for rendering are present
render(state: autogluon.eda.state.AnalysisState) → None[source]

Render component.

Parameters
state: AnalysisState

state to render