Components: missing#

autogluon.eda.visualization.missing#

MissingValues

Renders visualization of missingness for datasets using one of the methods specified in `graph_type'.

MissingValues#

class autogluon.eda.visualization.missing.MissingValues(graph_type: str = 'matrix', headers: bool = False, namespace: Optional[str] = None, **kwargs)[source]#

Renders visualization of missingness for datasets using one of the methods specified in `graph_type’.

This visualization depends on MissingValuesAnalysis analysis.

See also missingno documentation

Parameters

graph_type (str, default = 'matrix') –
One of the following visualization types: - matrix - nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion

This visualization will comfortably accommodate up to 50 labelled variables. Past that range labels begin to overlap or become unreadable, and by default large displays omit them.
- bar - visualizes how many rows are non-null vs null in the column. Logarithmic scale can by specifying log=True in kwargs
- heatmap - correlation heatmap measures nullity correlation: how strongly the presence or absence of one
  variable affects the presence of another. Nullity correlation ranges from -1 (if one variable appears the other definitely does not) to 0 (variables appearing or not appearing have no effect on one another) to 1 (if one variable appears the other definitely also does). Entries marked <1 or >-1 have a correlation that is close to being exactingly negative or positive but is still not quite perfectly so.
- dendrogram - the dendrogram allows to more fully correlate variable completion, revealing trends deeper than the pairwise ones
  visible in the correlation heatmap. The dendrogram uses a hierarchical clustering algorithm (courtesy of scipy) to bin variables against one another by their nullity correlation (measured in terms of binary distance). At each step of the tree the variables are split up based on which combination minimizes the distance of the remaining clusters. The more monotone the set of variables, the closer their total distance is to zero, and the closer their average distance (the y-axis) is to zero.
headers (bool, default = False) – if True then render headers
namespace (str, default = None) – namespace to use; can be nested like ns_a.ns_b.ns_c

autogluon.eda.analysis.missing#

MissingValuesAnalysis

Analyze dataset's missing value counts and frequencies

MissingValuesAnalysis#

class autogluon.eda.analysis.missing.MissingValuesAnalysis(parent: Optional[AbstractAnalysis] = None, children: Optional[List[AbstractAnalysis]] = None, state: Optional[AnalysisState] = None, **kwargs)[source]#

Analyze dataset’s missing value counts and frequencies

Examples

>>> import autogluon.eda.analysis as eda
>>> import autogluon.eda.visualization as viz
>>> import autogluon.eda.auto as auto
>>> state = auto.analyze(
>>>     train_data=..., label=...,
>>>     anlz_facets=[
>>>         eda.missing.MissingValuesAnalysis(),
>>>     ],
>>>     viz_facets=[
>>>         viz.dataset.DatasetStatistics()
>>>         viz.missing.MissingValues()
>>>     ]
>>> )