Reference: Base APIs#
This section highlights the base APIs used by the EDA framework. The processing consists of the following parts:
Analysis graph construction - in this part a nested graph of analyses is constructed.
analysis = BaseAnalysis(
# State
state=state,
# Arguments
train_data=train_data, test_data=test_data, val_data=val_data, model=model, label=label,
# Nested analyses
children=[
Sampler(sample=sample, children=[
DatasetSummary(),
MissingValuesAnalysis(),
RawTypesAnalysis(),
SpecialTypesAnalysis(),
ApplyFeatureGenerator(category_to_numbers=True, children=[
FeatureDistanceAnalysis()
]),
]),
],
)
2. .fit() call. This call will execute operations in the graph and produce a state. The state is a nested dictionary without any prescribed structure. All components share the same namespace. If multiple components are fitted with different parameters, they can be put into separate sub-spaces via Namespace component that can be passed either for further processing via next analysis or be rendered.
state = analysis.fit()
3. Rendering: in this stage we construct components graph (a combination of layout components and visual components) and then pass State generated previously as an input argument into render() call.
viz = SimpleVerticalLinearLayout(
facets=[
DatasetStatistics(headers=True),
DatasetTypeMismatch(headers=True),
MarkdownSectionComponent("### Feature Distance"),
FeatureDistanceAnalysisVisualization(),
],
)
viz.render(state)
Please note: it is possible that the components may depend on each other’s output; all the pre-requisites to fit() the component must be checked in can_handle(). There are two ways the components can share the information: 1) using state; 2) share values/shadow arguments (i.e., sample component modifies train_data, test_data and val_data arguments in the scope of calling children’s fit().
autogluon.eda.analysis.base#
Base class for analysis functionality. |
|
Simple implementation of |
|
Creates a nested namespace in state. |
AbstractAnalysis#
- class autogluon.eda.analysis.base.AbstractAnalysis(parent: Optional[AbstractAnalysis] = None, children: Optional[List[AbstractAnalysis]] = None, state: Optional[AnalysisState] = None, **kwargs)[source]#
Base class for analysis functionality.
Provides basic functionality for state/args management in analysis hierarchy and helper method to access frequently-used methods.
Analyses can be nested; the hierarchical relationships can be navigated via parent and children properties.
The main entry method of analysis is fit function. This _fit method is designed to be overridden by the component developer and should encapsulate all the outputs into state object provided. When called, the execution flow is the following: - gather args from the parent levels of analysis hierarchy; this is done to avoid referencing same args on each
nested component (i.e. train_data can be specified at the top and all the children will be able to access it via args on all levels (unless overriden by one of the components in the hierarchy)
call _fit function for each component that returned True from can_handle call
Please note: state is shared across the whole analysis hierarchy. If two components change the same space, then it will be overridden by each consecutive update. If same component have to be reused and requires writing different outputs, please use
Namespace
wrapper to isolate the components.- Parameters
parent (Optional[AbstractAnalysis], default = None) – parent Analysis
children (Optional[List[AbstractAnalysis]], default None) – wrapped analyses; these will receive sampled args during fit call
state (Optional[AnalysisState], default = None) – state object to perform check on; if not provided a new state will be created during the fit call
kwargs – arguments to pass into the component
See also
- all_keys_must_be_present(state: AnalysisState, *keys) bool #
Checks if all the keys are present in the state
- Parameters
state (AnalysisState) – state object to perform check on
keys – list of the keys to check
- Return type
True if all the key from the keys list are present in the state
- at_least_one_key_must_be_present(state: AnalysisState, *keys) bool #
Checks if at least one key is present in the state
- Parameters
state (AnalysisState) – state object to perform check on
keys – list of the keys to check
- Return type
True if at least one key from the keys list is present in the state
- static available_datasets(args: AnalysisState) Generator[Tuple[str, DataFrame], None, None] [source]#
Generator which iterates only through the datasets provided in arguments
- Parameters
args (AnalysisState) – arguments passed into the call. These are different from self.args in a way that it’s arguments assembled from the parents and shadowed via children (allows to isolate reused parameters in upper arguments declarations.
- Return type
tuple of dataset name (train_data, test_data or tuning_data) and dataset itself
- abstract can_handle(state: AnalysisState, args: AnalysisState) bool [source]#
Checks if state and args has all the required parameters for fitting. See also
at_least_one_key_must_be_present()
andall_keys_must_be_present()
helpers to construct more complex logic.- Parameters
state (AnalysisState) – state to be updated by this fit function
args (AnalysisState) – analysis properties assembled from root of analysis hierarchy to this component (with lower levels shadowing upper level args).
- Return type
True if all the pre-requisites for fitting are present
BaseAnalysis#
- class autogluon.eda.analysis.base.BaseAnalysis(parent: Optional[AbstractAnalysis] = None, children: Optional[List[AbstractAnalysis]] = None, **kwargs)[source]#
Simple implementation of
AbstractAnalysis
- Parameters
parent (Optional[AbstractAnalysis], default = None) – parent Analysis
children (Optional[List[AbstractAnalysis]], default None) – wrapped analyses; these will receive sampled args during fit call
kwargs –
See also
Namespace#
- class autogluon.eda.analysis.base.Namespace(namespace: Optional[str] = None, parent: Optional[AbstractAnalysis] = None, children: Optional[List[AbstractAnalysis]] = None, **kwargs)[source]#
Creates a nested namespace in state. All the components within children will have relative root of the state moved into this subspace. To instruct visualization facets to use a specific subspace, please use namespace argument (see the example).
- Parameters
namespace (Optional[str], default = None) – namespace to use; use root if not specified
parent (Optional[AbstractAnalysis], default = None) – parent Analysis
children (Optional[List[AbstractAnalysis]], default None) – wrapped analyses; these will receive sampled args during fit call
kwargs –
Examples
>>> import autogluon.eda.analysis as eda >>> import autogluon.eda.visualization as viz >>> import autogluon.eda.auto as auto >>> >>> auto.analyze( >>> train_data=..., label=..., >>> anlz_facets=[ >>> # Puts output into the root namespace >>> eda.interaction.Correlation(), >>> # Puts output into the focus namespace >>> eda.Namespace(namespace='focus', children=[ >>> eda.interaction.Correlation(focus_field='Fare', focus_field_threshold=0.3), >>> ]) >>> ], >>> viz_facets=[ >>> # Renders correlations from the root namespace >>> viz.interaction.CorrelationVisualization(), >>> # Renders correlations from the focus namespace >>> viz.interaction.CorrelationVisualization(namespace='focus'), >>> ] >>> )
autogluon.eda.visualization.base#
Base class for visualization functionality. |
AbstractVisualization#
- class autogluon.eda.visualization.base.AbstractVisualization(namespace: Optional[str] = None, **kwargs)[source]#
Base class for visualization functionality.
Provides basic functionality for namespace management and helper method to access frequently-used methods.
Specifying namespace would narrow visibility scope to specific subspace of state. Namespaces can be specified in a nested form: ns_a.ns_b.ns_c. Please see
Namespace
wrapper on how to create namespaces.The main entry method of analysis is render function. When called, the execution flow is the following: - narrow state scope to specified namespace - call _render function for each component that returned True from can_handle call
- Parameters
namespace (str) – namespace to use; can be nested like ns_a.ns_b.ns_c
kwargs –
See also
- all_keys_must_be_present(state: AnalysisState, *keys) bool #
Checks if all the keys are present in the state
- Parameters
state (AnalysisState) – state object to perform check on
keys – list of the keys to check
- Return type
True if all the key from the keys list are present in the state
- at_least_one_key_must_be_present(state: AnalysisState, *keys) bool #
Checks if at least one key is present in the state
- Parameters
state (AnalysisState) – state object to perform check on
keys – list of the keys to check
- Return type
True if at least one key from the keys list is present in the state
- abstract can_handle(state: AnalysisState) bool [source]#
Checks if state has all the required parameters for visualization. See also
at_least_one_key_must_be_present()
andall_keys_must_be_present()
helpers to construct more complex logic.- Parameters
state (AnalysisState) – fitted state
- Return type
True if all the pre-requisites for rendering are present