Reference: Base APIs#

This section highlights the base APIs used by the EDA framework. The processing consists of the following parts:

  1. Analysis graph construction - in this part a nested graph of analyses is constructed.

analysis = BaseAnalysis(
    # State
    state=state,
    # Arguments
    train_data=train_data, test_data=test_data, val_data=val_data, model=model, label=label,
    # Nested analyses
    children=[
        Sampler(sample=sample, children=[
            DatasetSummary(),
            MissingValuesAnalysis(),
            RawTypesAnalysis(),
            SpecialTypesAnalysis(),
            ApplyFeatureGenerator(category_to_numbers=True, children=[
                FeatureDistanceAnalysis()
            ]),
        ]),
    ],
)

2. .fit() call. This call will execute operations in the graph and produce a state. The state is a nested dictionary without any prescribed structure. All components share the same namespace. If multiple components are fitted with different parameters, they can be put into separate sub-spaces via Namespace component that can be passed either for further processing via next analysis or be rendered.

state = analysis.fit()

3. Rendering: in this stage we construct components graph (a combination of layout components and visual components) and then pass State generated previously as an input argument into render() call.

viz = SimpleVerticalLinearLayout(
    facets=[
        DatasetStatistics(headers=True),
        DatasetTypeMismatch(headers=True),
        MarkdownSectionComponent("### Feature Distance"),
        FeatureDistanceAnalysisVisualization(),
    ],
)
viz.render(state)

Please note: it is possible that the components may depend on each other’s output; all the pre-requisites to fit() the component must be checked in can_handle(). There are two ways the components can share the information: 1) using state; 2) share values/shadow arguments (i.e., sample component modifies train_data, test_data and val_data arguments in the scope of calling children’s fit().

autogluon.eda.analysis.base#

AbstractAnalysis

Base class for analysis functionality.

BaseAnalysis

Simple implementation of AbstractAnalysis

Namespace

Creates a nested namespace in state.

AbstractAnalysis#

class autogluon.eda.analysis.base.AbstractAnalysis(parent: Optional[AbstractAnalysis] = None, children: Optional[List[AbstractAnalysis]] = None, state: Optional[AnalysisState] = None, **kwargs)[source]#

Base class for analysis functionality.

Provides basic functionality for state/args management in analysis hierarchy and helper method to access frequently-used methods.

Analyses can be nested; the hierarchical relationships can be navigated via parent and children properties.

The main entry method of analysis is fit function. This _fit method is designed to be overridden by the component developer and should encapsulate all the outputs into state object provided. When called, the execution flow is the following: - gather args from the parent levels of analysis hierarchy; this is done to avoid referencing same args on each

nested component (i.e. train_data can be specified at the top and all the children will be able to access it via args on all levels (unless overridden by one of the components in the hierarchy)

  • call _fit function for each component that returned True from can_handle call

Please note: state is shared across the whole analysis hierarchy. If two components change the same space, then it will be overridden by each consecutive update. If same component have to be reused and requires writing different outputs, please use Namespace wrapper to isolate the components.

Parameters
  • parent (Optional[AbstractAnalysis], default = None) – parent Analysis

  • children (Optional[List[AbstractAnalysis]], default None) – wrapped analyses; these will receive sampled args during fit call

  • state (Optional[AnalysisState], default = None) – state object to perform check on; if not provided a new state will be created during the fit call

  • kwargs – arguments to pass into the component

See also

Namespace

all_keys_must_be_present(state: AnalysisState, *keys) bool#

Checks if all the keys are present in the state

Parameters
  • state (AnalysisState) – state object to perform check on

  • keys – list of the keys to check

Return type

True if all the key from the keys list are present in the state

at_least_one_key_must_be_present(state: AnalysisState, *keys) bool#

Checks if at least one key is present in the state

Parameters
  • state (AnalysisState) – state object to perform check on

  • keys – list of the keys to check

Return type

True if at least one key from the keys list is present in the state

static available_datasets(args: AnalysisState) Generator[Tuple[str, DataFrame], None, None][source]#

Generator which iterates only through the datasets provided in arguments

Parameters

args (AnalysisState) – arguments passed into the call. These are different from self.args in a way that it’s arguments assembled from the parents and shadowed via children (allows to isolate reused parameters in upper arguments declarations.

Return type

tuple of dataset name (train_data, test_data or tuning_data) and dataset itself

abstract can_handle(state: AnalysisState, args: AnalysisState) bool[source]#

Checks if state and args has all the required parameters for fitting. See also at_least_one_key_must_be_present() and all_keys_must_be_present() helpers to construct more complex logic.

Parameters
  • state (AnalysisState) – state to be updated by this fit function

  • args (AnalysisState) – analysis properties assembled from root of analysis hierarchy to this component (with lower levels shadowing upper level args).

Return type

True if all the pre-requisites for fitting are present

fit(**kwargs) AnalysisState[source]#

Fit the analysis tree.

Parameters

kwargs – fit arguments

Return type

state produced by fit

BaseAnalysis#

class autogluon.eda.analysis.base.BaseAnalysis(parent: Optional[AbstractAnalysis] = None, children: Optional[List[AbstractAnalysis]] = None, **kwargs)[source]#

Simple implementation of AbstractAnalysis

Parameters
  • parent (Optional[AbstractAnalysis], default = None) – parent Analysis

  • children (Optional[List[AbstractAnalysis]], default None) – wrapped analyses; these will receive sampled args during fit call

  • kwargs

See also

AbstractAnalysis

Namespace#

class autogluon.eda.analysis.base.Namespace(namespace: Optional[str] = None, parent: Optional[AbstractAnalysis] = None, children: Optional[List[AbstractAnalysis]] = None, **kwargs)[source]#

Creates a nested namespace in state. All the components within children will have relative root of the state moved into this subspace. To instruct visualization facets to use a specific subspace, please use namespace argument (see the example).

Parameters
  • namespace (Optional[str], default = None) – namespace to use; use root if not specified

  • parent (Optional[AbstractAnalysis], default = None) – parent Analysis

  • children (Optional[List[AbstractAnalysis]], default None) – wrapped analyses; these will receive sampled args during fit call

  • kwargs

Examples

>>> import autogluon.eda.analysis as eda
>>> import autogluon.eda.visualization as viz
>>> import autogluon.eda.auto as auto
>>>
>>> auto.analyze(
>>>     train_data=..., label=...,
>>>     anlz_facets=[
>>>         # Puts output into the root namespace
>>>         eda.interaction.Correlation(),
>>>         # Puts output into the focus namespace
>>>         eda.Namespace(namespace='focus', children=[
>>>             eda.interaction.Correlation(focus_field='Fare', focus_field_threshold=0.3),
>>>         ])
>>>     ],
>>>     viz_facets=[
>>>         # Renders correlations from the root namespace
>>>         viz.interaction.CorrelationVisualization(),
>>>         # Renders correlations from the focus namespace
>>>         viz.interaction.CorrelationVisualization(namespace='focus'),
>>>     ]
>>> )

autogluon.eda.visualization.base#

AbstractVisualization

Base class for visualization functionality.

AbstractVisualization#

class autogluon.eda.visualization.base.AbstractVisualization(namespace: Optional[str] = None, **kwargs)[source]#

Base class for visualization functionality.

Provides basic functionality for namespace management and helper method to access frequently-used methods.

Specifying namespace would narrow visibility scope to specific subspace of state. Namespaces can be specified in a nested form: ns_a.ns_b.ns_c. Please see Namespace wrapper on how to create namespaces.

The main entry method of analysis is render function. When called, the execution flow is the following: - narrow state scope to specified namespace - call _render function for each component that returned True from can_handle call

Parameters
  • namespace (str) – namespace to use; can be nested like ns_a.ns_b.ns_c

  • kwargs

See also

Namespace

all_keys_must_be_present(state: AnalysisState, *keys) bool#

Checks if all the keys are present in the state

Parameters
  • state (AnalysisState) – state object to perform check on

  • keys – list of the keys to check

Return type

True if all the key from the keys list are present in the state

at_least_one_key_must_be_present(state: AnalysisState, *keys) bool#

Checks if at least one key is present in the state

Parameters
  • state (AnalysisState) – state object to perform check on

  • keys – list of the keys to check

Return type

True if at least one key from the keys list is present in the state

abstract can_handle(state: AnalysisState) bool[source]#

Checks if state has all the required parameters for visualization. See also at_least_one_key_must_be_present() and all_keys_must_be_present() helpers to construct more complex logic.

Parameters

state (AnalysisState) – fitted state

Return type

True if all the pre-requisites for rendering are present

render(state: AnalysisState) None[source]#

Render component.

Parameters

state (AnalysisState) – state to render