Reference: Base APIs#
This section highlights the base APIs used by the EDA framework. The processing consists of the following parts:
Analysis graph construction - in this part a nested graph of analyses is constructed.
analysis = BaseAnalysis(
# State
state=state,
# Arguments
train_data=train_data, test_data=test_data, val_data=val_data, model=model, label=label,
# Nested analyses
children=[
Sampler(sample=sample, children=[
DatasetSummary(),
MissingValuesAnalysis(),
RawTypesAnalysis(),
SpecialTypesAnalysis(),
ApplyFeatureGenerator(category_to_numbers=True, children=[
FeatureDistanceAnalysis()
]),
]),
],
)
2. .fit() call. This call will execute operations in the graph and produce a state. The state is a nested dictionary without any prescribed structure. All components share the same namespace. If multiple components are fitted with different parameters, they can be put into separate sub-spaces via Namespace component that can be passed either for further processing via next analysis or be rendered.
state = analysis.fit()
3. Rendering: in this stage we construct components graph (a combination of layout components and visual components) and then pass State generated previously as an input argument into render() call.
viz = SimpleVerticalLinearLayout(
facets=[
DatasetStatistics(headers=True),
DatasetTypeMismatch(headers=True),
MarkdownSectionComponent("### Feature Distance"),
FeatureDistanceAnalysisVisualization(),
],
)
viz.render(state)
Please note: it is possible that the components may depend on each other’s output; all the pre-requisites to fit() the component must be checked in can_handle(). There are two ways the components can share the information: 1) using state; 2) share values/shadow arguments (i.e., sample component modifies train_data, test_data and val_data arguments in the scope of calling children’s fit().
autogluon.eda.analysis.base#
Creates a nested namespace in state. |
AbstractAnalysis#
- class autogluon.eda.analysis.base.AbstractAnalysis(parent: Optional[AbstractAnalysis] = None, children: Optional[List[AbstractAnalysis]] = None, state: Optional[AnalysisState] = None, **kwargs)[source]#
- all_keys_must_be_present(state: AnalysisState, *keys) bool #
Checks if all the keys are present in the state
- Parameters
state (AnalysisState) – state object to perform check on
keys – list of the keys to check
- Return type
True if all the key from the keys list are present in the state
- at_least_one_key_must_be_present(state: AnalysisState, *keys) bool #
Checks if at least one key is present in the state
- Parameters
state (AnalysisState) – state object to perform check on
keys – list of the keys to check
- Return type
True if at least one key from the keys list is present in the state
- static available_datasets(args: AnalysisState) Generator[Tuple[str, DataFrame], None, None] [source]#
Generator which iterates only through the datasets provided in arguments
- Parameters
args (AnalysisState) – arguments passed into the call. These are different from self.args in a way that it’s arguments assembled from the parents and shadowed via children (allows to isolate reused parameters in upper arguments declarations.
- Return type
tuple of dataset name (train_data, test_data or tuning_data) and dataset itself
- abstract can_handle(state: AnalysisState, args: AnalysisState) bool [source]#
Checks if state and args has all the required parameters for fitting. See also
at_least_one_key_must_be_present()
andall_keys_must_be_present()
helpers to construct more complex logic.- Parameters
state (AnalysisState) – state to be updated by this fit function
args (AnalysisState) – analysis properties assembled from root of analysis hierarchy to this component (with lower levels shadowing upper level args).
- Return type
True if all the pre-requisites for fitting are present
Namespace#
- class autogluon.eda.analysis.base.Namespace(namespace: Optional[str] = None, parent: Optional[AbstractAnalysis] = None, children: Optional[List[AbstractAnalysis]] = None, **kwargs)[source]#
Creates a nested namespace in state. All the components within children will have relative root of the state moved into this subspace. To instruct visualization facets to use a specific subspace, please use namespace argument (see the example).
- Parameters
namespace (Optional[str], default = None) – namespace to use; use root if not specified
parent (Optional[AbstractAnalysis], default = None) – parent Analysis
children (Optional[List[AbstractAnalysis]], default None) – wrapped analyses; these will receive sampled args during fit call
kwargs –
Examples
>>> import autogluon.eda.analysis as eda >>> import autogluon.eda.visualization as viz >>> import autogluon.eda.auto as auto >>> >>> auto.analyze( >>> train_data=..., label=..., >>> anlz_facets=[ >>> # Puts output into the root namespace >>> eda.interaction.Correlation(), >>> # Puts output into the focus namespace >>> eda.Namespace(namespace='focus', children=[ >>> eda.interaction.Correlation(focus_field='Fare', focus_field_threshold=0.3), >>> ]) >>> ], >>> viz_facets=[ >>> # Renders correlations from the root namespace >>> viz.interaction.CorrelationVisualization(), >>> # Renders correlations from the focus namespace >>> viz.interaction.CorrelationVisualization(namespace='focus'), >>> ] >>> )
autogluon.eda.visualization.base#
AbstractVisualization#
- class autogluon.eda.visualization.base.AbstractVisualization(namespace: Optional[str] = None, **kwargs)[source]#
- all_keys_must_be_present(state: AnalysisState, *keys) bool #
Checks if all the keys are present in the state
- Parameters
state (AnalysisState) – state object to perform check on
keys – list of the keys to check
- Return type
True if all the key from the keys list are present in the state
- at_least_one_key_must_be_present(state: AnalysisState, *keys) bool #
Checks if at least one key is present in the state
- Parameters
state (AnalysisState) – state object to perform check on
keys – list of the keys to check
- Return type
True if at least one key from the keys list is present in the state
- abstract can_handle(state: AnalysisState) bool [source]#
Checks if state has all the required parameters for visualization. See also
at_least_one_key_must_be_present()
andall_keys_must_be_present()
helpers to construct more complex logic.- Parameters
state (AnalysisState) – fitted state
- Return type
True if all the pre-requisites for rendering are present