Reference: Base APIs¶

This section highlights the base APIs used by the EDA framework. The processing consists of the following parts:

Analysis graph construction - in this part a nested graph of analyses is constructed.

analysis = BaseAnalysis(
    # State
    state=state,
    # Arguments
    train_data=train_data, test_data=test_data, val_data=val_data, model=model, label=label,
    # Nested analyses
    children=[
        Sampler(sample=sample, children=[
            DatasetSummary(),
            MissingValuesAnalysis(),
            RawTypesAnalysis(),
            SpecialTypesAnalysis(),
            ApplyFeatureGenerator(category_to_numbers=True, children=[
                FeatureDistanceAnalysis()
            ]),
        ]),
    ],
)

2. .fit() call. This call will execute operations in the graph and produce a state. The state is a nested dictionary without any prescribed structure. All components share the same namespace. If multiple components are fitted with different parameters, they can be put into separate sub-spaces via Namespace component that can be passed either for further processing via next analysis or be rendered.

state = analysis.fit()

3. Rendering: in this stage we construct components graph (a combination of layout components and visual components) and then pass State generated previously as an input argument into render() call.

viz = SimpleVerticalLinearLayout(
    facets=[
        DatasetStatistics(headers=True),
        DatasetTypeMismatch(headers=True),
        MarkdownSectionComponent("### Feature Distance"),
        FeatureDistanceAnalysisVisualization(),
    ],
)
viz.render(state)

Please note: it is possible that the components may depend on each other’s output; all the pre-requisites to fit() the component must be checked in can_handle(). There are two ways the components can share the information: 1) using state; 2) share values/shadow arguments (i.e., sample component modifies train_data, test_data and val_data arguments in the scope of calling children’s fit().

autogluon.eda.analysis.base¶

AbstractAnalysis

Methods

Namespace

Creates a nested namespace in state.

AbstractAnalysis¶

class autogluon.eda.analysis.base.AbstractAnalysis(parent: Optional[autogluon.eda.analysis.base.AbstractAnalysis] = None, children: Optional[List[autogluon.eda.analysis.base.AbstractAnalysis]] = None, state: Optional[autogluon.eda.state.AnalysisState] = None, **kwargs)[source]¶

Methods

`all_keys_must_be_present`(state, *keys)	Checks if all the keys are present in the state
`at_least_one_key_must_be_present`(state, *keys)	Checks if at least one key is present in the state
`available_datasets`(args)	Generator which iterates only through the datasets provided in arguments
`can_handle`(state, args)	Checks if state and args has all the required parameters for fitting.
`fit`(**kwargs)	Fit the analysis tree.

all_keys_must_be_present(state: autogluon.eda.state.AnalysisState, *keys) → bool¶

Checks if all the keys are present in the state

Parameters

state: AnalysisState: state object to perform check on
keys:: list of the keys to check

Returns

True if all the key from the keys list are present in the state

at_least_one_key_must_be_present(state: autogluon.eda.state.AnalysisState, *keys) → bool¶

Checks if at least one key is present in the state

Parameters

state: AnalysisState: state object to perform check on
keys:: list of the keys to check

Returns

True if at least one key from the keys list is present in the state

static available_datasets(args: autogluon.eda.state.AnalysisState) → Generator[Tuple[str, pandas.core.frame.DataFrame], None, None][source]¶

Generator which iterates only through the datasets provided in arguments

Parameters

args: AnalysisState: arguments passed into the call. These are different from self.args in a way that it’s arguments assembled from the parents and shadowed via children (allows to isolate reused parameters in upper arguments declarations.

Returns

tuple of dataset name (train_data, test_data or tuning_data) and dataset itself

abstract can_handle(state: autogluon.eda.state.AnalysisState, args: autogluon.eda.state.AnalysisState) → bool[source]¶

Checks if state and args has all the required parameters for fitting. See also at_least_one_key_must_be_present() and all_keys_must_be_present() helpers to construct more complex logic.

Parameters

state: AnalysisState: state to be updated by this fit function
args: AnalysisState: analysis properties assembled from root of analysis hierarchy to this component (with lower levels shadowing upper level args).

Returns

True if all the pre-requisites for fitting are present

fit(**kwargs) → autogluon.eda.state.AnalysisState[source]¶

Fit the analysis tree.

Parameters

kwargs: fit arguments

Returns

state produced by fit

Namespace¶

class autogluon.eda.analysis.base.Namespace(namespace: Optional[str] = None, parent: Optional[autogluon.eda.analysis.base.AbstractAnalysis] = None, children: Optional[List[autogluon.eda.analysis.base.AbstractAnalysis]] = None, **kwargs)[source]¶

Creates a nested namespace in state. All the components within children will have relative root of the state moved into this subspace. To instruct visualization facets to use a specific subspace, please use namespace argument (see the example).

Parameters

namespace: Optional[str], default = None: namespace to use; use root if not specified
parent: Optional[AbstractAnalysis], default = None: parent Analysis
children: Optional[List[AbstractAnalysis]], default None: wrapped analyses; these will receive sampled args during fit call
kwargs

Examples

>>> import autogluon.eda.analysis as eda
>>> import autogluon.eda.visualization as viz
>>> import autogluon.eda.auto as auto
>>>
>>> auto.analyze(
>>>     train_data=..., label=...,
>>>     anlz_facets=[
>>>         # Puts output into the root namespace
>>>         eda.interaction.Correlation(),
>>>         # Puts output into the focus namespace
>>>         eda.Namespace(namespace='focus', children=[
>>>             eda.interaction.Correlation(focus_field='Fare', focus_field_threshold=0.3),
>>>         ])
>>>     ],
>>>     viz_facets=[
>>>         # Renders correlations from the root namespace
>>>         viz.interaction.CorrelationVisualization(),
>>>         # Renders correlations from the focus namespace
>>>         viz.interaction.CorrelationVisualization(namespace='focus'),
>>>     ]
>>> )

autogluon.eda.visualization.base¶

AbstractVisualization

Methods

AbstractVisualization¶

class autogluon.eda.visualization.base.AbstractVisualization(namespace: Optional[str] = None, **kwargs)[source]¶

Methods

`all_keys_must_be_present`(state, *keys)	Checks if all the keys are present in the state
`at_least_one_key_must_be_present`(state, *keys)	Checks if at least one key is present in the state
`can_handle`(state)	Checks if state has all the required parameters for visualization.
`render`(state)	Render component.

all_keys_must_be_present(state: autogluon.eda.state.AnalysisState, *keys) → bool¶

Checks if all the keys are present in the state

Parameters

state: AnalysisState: state object to perform check on
keys:: list of the keys to check

Returns

True if all the key from the keys list are present in the state

at_least_one_key_must_be_present(state: autogluon.eda.state.AnalysisState, *keys) → bool¶

Checks if at least one key is present in the state

Parameters

state: AnalysisState: state object to perform check on
keys:: list of the keys to check

Returns

True if at least one key from the keys list is present in the state

abstract can_handle(state: autogluon.eda.state.AnalysisState) → bool[source]¶

Checks if state has all the required parameters for visualization. See also at_least_one_key_must_be_present() and all_keys_must_be_present() helpers to construct more complex logic.

Parameters

state: AnalysisState: fitted state

Returns

True if all the pre-requisites for rendering are present

render(state: autogluon.eda.state.AnalysisState) → None[source]¶

Render component.

Parameters

state: AnalysisState: state to render