Reference: Base APIs#

This section highlights the base APIs used by the EDA framework. The processing consists of the following parts:

  1. Analysis graph construction - in this part a nested graph of analyses is constructed.

analysis = BaseAnalysis(
    # State
    state=state,
    # Arguments
    train_data=train_data, test_data=test_data, val_data=val_data, model=model, label=label,
    # Nested analyses
    children=[
        Sampler(sample=sample, children=[
            DatasetSummary(),
            MissingValuesAnalysis(),
            RawTypesAnalysis(),
            SpecialTypesAnalysis(),
            ApplyFeatureGenerator(category_to_numbers=True, children=[
                FeatureDistanceAnalysis()
            ]),
        ]),
    ],
)

2. .fit() call. This call will execute operations in the graph and produce a state. The state is a nested dictionary without any prescribed structure. All components share the same namespace. If multiple components are fitted with different parameters, they can be put into separate sub-spaces via Namespace component that can be passed either for further processing via next analysis or be rendered.

state = analysis.fit()

3. Rendering: in this stage we construct components graph (a combination of layout components and visual components) and then pass State generated previously as an input argument into render() call.

viz = SimpleVerticalLinearLayout(
    facets=[
        DatasetStatistics(headers=True),
        DatasetTypeMismatch(headers=True),
        MarkdownSectionComponent("### Feature Distance"),
        FeatureDistanceAnalysisVisualization(),
    ],
)
viz.render(state)

Please note: it is possible that the components may depend on each other’s output; all the pre-requisites to fit() the component must be checked in can_handle(). There are two ways the components can share the information: 1) using state; 2) share values/shadow arguments (i.e., sample component modifies train_data, test_data and val_data arguments in the scope of calling children’s fit().

autogluon.eda.analysis.base#

AbstractAnalysis

Namespace

Creates a nested namespace in state.

AbstractAnalysis#

class autogluon.eda.analysis.base.AbstractAnalysis(parent: Optional[AbstractAnalysis] = None, children: Optional[List[AbstractAnalysis]] = None, state: Optional[AnalysisState] = None, **kwargs)[source]#
all_keys_must_be_present(state: AnalysisState, *keys) bool#

Checks if all the keys are present in the state

Parameters
  • state (AnalysisState) – state object to perform check on

  • keys – list of the keys to check

Return type

True if all the key from the keys list are present in the state

at_least_one_key_must_be_present(state: AnalysisState, *keys) bool#

Checks if at least one key is present in the state

Parameters
  • state (AnalysisState) – state object to perform check on

  • keys – list of the keys to check

Return type

True if at least one key from the keys list is present in the state

static available_datasets(args: AnalysisState) Generator[Tuple[str, DataFrame], None, None][source]#

Generator which iterates only through the datasets provided in arguments

Parameters

args (AnalysisState) – arguments passed into the call. These are different from self.args in a way that it’s arguments assembled from the parents and shadowed via children (allows to isolate reused parameters in upper arguments declarations.

Return type

tuple of dataset name (train_data, test_data or tuning_data) and dataset itself

abstract can_handle(state: AnalysisState, args: AnalysisState) bool[source]#

Checks if state and args has all the required parameters for fitting. See also at_least_one_key_must_be_present() and all_keys_must_be_present() helpers to construct more complex logic.

Parameters
  • state (AnalysisState) – state to be updated by this fit function

  • args (AnalysisState) – analysis properties assembled from root of analysis hierarchy to this component (with lower levels shadowing upper level args).

Return type

True if all the pre-requisites for fitting are present

fit(**kwargs) AnalysisState[source]#

Fit the analysis tree.

Parameters

kwargs – fit arguments

Return type

state produced by fit

Namespace#

class autogluon.eda.analysis.base.Namespace(namespace: Optional[str] = None, parent: Optional[AbstractAnalysis] = None, children: Optional[List[AbstractAnalysis]] = None, **kwargs)[source]#

Creates a nested namespace in state. All the components within children will have relative root of the state moved into this subspace. To instruct visualization facets to use a specific subspace, please use namespace argument (see the example).

Parameters
  • namespace (Optional[str], default = None) – namespace to use; use root if not specified

  • parent (Optional[AbstractAnalysis], default = None) – parent Analysis

  • children (Optional[List[AbstractAnalysis]], default None) – wrapped analyses; these will receive sampled args during fit call

  • kwargs

Examples

>>> import autogluon.eda.analysis as eda
>>> import autogluon.eda.visualization as viz
>>> import autogluon.eda.auto as auto
>>>
>>> auto.analyze(
>>>     train_data=..., label=...,
>>>     anlz_facets=[
>>>         # Puts output into the root namespace
>>>         eda.interaction.Correlation(),
>>>         # Puts output into the focus namespace
>>>         eda.Namespace(namespace='focus', children=[
>>>             eda.interaction.Correlation(focus_field='Fare', focus_field_threshold=0.3),
>>>         ])
>>>     ],
>>>     viz_facets=[
>>>         # Renders correlations from the root namespace
>>>         viz.interaction.CorrelationVisualization(),
>>>         # Renders correlations from the focus namespace
>>>         viz.interaction.CorrelationVisualization(namespace='focus'),
>>>     ]
>>> )

autogluon.eda.visualization.base#

AbstractVisualization#

class autogluon.eda.visualization.base.AbstractVisualization(namespace: Optional[str] = None, **kwargs)[source]#
all_keys_must_be_present(state: AnalysisState, *keys) bool#

Checks if all the keys are present in the state

Parameters
  • state (AnalysisState) – state object to perform check on

  • keys – list of the keys to check

Return type

True if all the key from the keys list are present in the state

at_least_one_key_must_be_present(state: AnalysisState, *keys) bool#

Checks if at least one key is present in the state

Parameters
  • state (AnalysisState) – state object to perform check on

  • keys – list of the keys to check

Return type

True if at least one key from the keys list is present in the state

abstract can_handle(state: AnalysisState) bool[source]#

Checks if state has all the required parameters for visualization. See also at_least_one_key_must_be_present() and all_keys_must_be_present() helpers to construct more complex logic.

Parameters

state (AnalysisState) – fitted state

Return type

True if all the pre-requisites for rendering are present

render(state: AnalysisState) None[source]#

Render component.

Parameters

state (AnalysisState) – state to render