Reference: Base APIs#

This section highlights the base APIs used by the EDA framework. The processing consists of the following parts:

Analysis graph construction - in this part a nested graph of analyses is constructed.

analysis = BaseAnalysis(
    # State
    state=state,
    # Arguments
    train_data=train_data, test_data=test_data, val_data=val_data, model=model, label=label,
    # Nested analyses
    children=[
        Sampler(sample=sample, children=[
            DatasetSummary(),
            MissingValuesAnalysis(),
            RawTypesAnalysis(),
            SpecialTypesAnalysis(),
            ApplyFeatureGenerator(category_to_numbers=True, children=[
                FeatureDistanceAnalysis()
            ]),
        ]),
    ],
)

2. .fit() call. This call will execute operations in the graph and produce a state. The state is a nested dictionary without any prescribed structure. All components share the same namespace. If multiple components are fitted with different parameters, they can be put into separate sub-spaces via Namespace component that can be passed either for further processing via next analysis or be rendered.

state = analysis.fit()

3. Rendering: in this stage we construct components graph (a combination of layout components and visual components) and then pass State generated previously as an input argument into render() call.

viz = SimpleVerticalLinearLayout(
    facets=[
        DatasetStatistics(headers=True),
        DatasetTypeMismatch(headers=True),
        MarkdownSectionComponent("### Feature Distance"),
        FeatureDistanceAnalysisVisualization(),
    ],
)
viz.render(state)

Please note: it is possible that the components may depend on each other’s output; all the pre-requisites to fit() the component must be checked in can_handle(). There are two ways the components can share the information: 1) using state; 2) share values/shadow arguments (i.e., sample component modifies train_data, test_data and val_data arguments in the scope of calling children’s fit().

autogluon.eda.analysis.base#

`AbstractAnalysis`	Base class for analysis functionality.
`BaseAnalysis`	Simple implementation of `AbstractAnalysis`
`Namespace`	Creates a nested namespace in state.

AbstractAnalysis#

class autogluon.eda.analysis.base.AbstractAnalysis(parent: Optional[AbstractAnalysis] = None, children: Optional[List[AbstractAnalysis]] = None, state: Optional[AnalysisState] = None, **kwargs)[source]#

Base class for analysis functionality.

Provides basic functionality for state/args management in analysis hierarchy and helper method to access frequently-used methods.

Analyses can be nested; the hierarchical relationships can be navigated via parent and children properties.

The main entry method of analysis is fit function. This _fit method is designed to be overridden by the component developer and should encapsulate all the outputs into state object provided. When called, the execution flow is the following: - gather args from the parent levels of analysis hierarchy; this is done to avoid referencing same args on each

nested component (i.e. train_data can be specified at the top and all the children will be able to access it via args on all levels (unless overridden by one of the components in the hierarchy)

call _fit function for each component that returned True from can_handle call

Please note: state is shared across the whole analysis hierarchy. If two components change the same space, then it will be overridden by each consecutive update. If same component have to be reused and requires writing different outputs, please use Namespace wrapper to isolate the components.

Parameters

parent (Optional[AbstractAnalysis], default = None) – parent Analysis
children (Optional[List[AbstractAnalysis]], default None) – wrapped analyses; these will receive sampled args during fit call
state (Optional[AnalysisState], default = None) – state object to perform check on; if not provided a new state will be created during the fit call
kwargs – arguments to pass into the component

BaseAnalysis#

class autogluon.eda.analysis.base.BaseAnalysis(parent: Optional[AbstractAnalysis] = None, children: Optional[List[AbstractAnalysis]] = None, **kwargs)[source]#

Simple implementation of AbstractAnalysis

Parameters

parent (Optional[AbstractAnalysis], default = None) – parent Analysis
children (Optional[List[AbstractAnalysis]], default None) – wrapped analyses; these will receive sampled args during fit call
kwargs –

Namespace#

class autogluon.eda.analysis.base.Namespace(namespace: Optional[str] = None, parent: Optional[AbstractAnalysis] = None, children: Optional[List[AbstractAnalysis]] = None, **kwargs)[source]#

Creates a nested namespace in state. All the components within children will have relative root of the state moved into this subspace. To instruct visualization facets to use a specific subspace, please use namespace argument (see the example).

Parameters

namespace (Optional[str], default = None) – namespace to use; use root if not specified
parent (Optional[AbstractAnalysis], default = None) – parent Analysis
children (Optional[List[AbstractAnalysis]], default None) – wrapped analyses; these will receive sampled args during fit call
kwargs –

Examples

>>> import autogluon.eda.analysis as eda
>>> import autogluon.eda.visualization as viz
>>> import autogluon.eda.auto as auto
>>>
>>> auto.analyze(
>>>     train_data=..., label=...,
>>>     anlz_facets=[
>>>         # Puts output into the root namespace
>>>         eda.interaction.Correlation(),
>>>         # Puts output into the focus namespace
>>>         eda.Namespace(namespace='focus', children=[
>>>             eda.interaction.Correlation(focus_field='Fare', focus_field_threshold=0.3),
>>>         ])
>>>     ],
>>>     viz_facets=[
>>>         # Renders correlations from the root namespace
>>>         viz.interaction.CorrelationVisualization(),
>>>         # Renders correlations from the focus namespace
>>>         viz.interaction.CorrelationVisualization(namespace='focus'),
>>>     ]
>>> )

autogluon.eda.visualization.base#

AbstractVisualization

Base class for visualization functionality.

AbstractVisualization#

class autogluon.eda.visualization.base.AbstractVisualization(namespace: Optional[str] = None, **kwargs)[source]#

Base class for visualization functionality.

Provides basic functionality for namespace management and helper method to access frequently-used methods.

Specifying namespace would narrow visibility scope to specific subspace of state. Namespaces can be specified in a nested form: ns_a.ns_b.ns_c. Please see Namespace wrapper on how to create namespaces.

The main entry method of analysis is render function. When called, the execution flow is the following: - narrow state scope to specified namespace - call _render function for each component that returned True from can_handle call

Parameters

namespace (str) – namespace to use; can be nested like ns_a.ns_b.ns_c
kwargs –