Feature Interaction Charting#
This tool is made for quick interactions visualization between variables in a dataset. User can specify the variables to be plotted on the x, y and hue (color) parameters. The tool automatically picks chart type to render based on the detected variable types and renders 1/2/3-way interactions.
This feature can be useful in exploring patterns, trends, and outliers and potentially identify good predictors for the task.
Using Interaction Charts for Missing Values Filling#
Let’s load the titanic dataset:
import pandas as pd
df_train = pd.read_csv('https://autogluon.s3.amazonaws.com/datasets/titanic/train.csv')
df_test = pd.read_csv('https://autogluon.s3.amazonaws.com/datasets/titanic/test.csv')
target_col = 'Survived'
Next we will look at missing data in the variables:
import autogluon.eda.auto as auto
auto.missing_values_analysis(train_data=df_train)
Missing Values Analysis
missing_count | missing_ratio | |
---|---|---|
Age | 177 | 0.198653 |
Cabin | 687 | 0.771044 |
Embarked | 2 | 0.002245 |

It looks like there are only two null values in the Embarked
feature. Let’s see what those two null values are:
df_train[df_train.Embarked.isna()]
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
61 | 62 | 1 | 1 | Icard, Miss. Amelie | female | 38.0 | 0 | 0 | 113572 | 80.0 | B28 | NaN |
829 | 830 | 1 | 1 | Stone, Mrs. George Nelson (Martha Evelyn) | female | 62.0 | 0 | 0 | 113572 | 80.0 | B28 | NaN |
We may be able to fill these by looking at other independent variables. Both passengers paid a Fare
of $80
, are
of Pclass
1
and female
Sex
. Let’s see how the Fare
is distributed among all Pclass
and Embarked
feature
values:
auto.analyze_interaction(train_data=df_train, x='Embarked', y='Fare', hue='Pclass')

The average Fare
closest to $80
are in the C
Embarked
values where Pclass
is 1
. Let’s fill in the missing
values as C
.
Using Interaction Charts To Learn Information About the Data#
auto.analyze_interaction(x='Pclass', y='Survived', train_data=df_train, test_data=df_test)

It looks like 63%
of first class passengers survived, while; 48%
of second class and only 24%
of third class
passengers survived. Similar information is visible via Fare
variable:
auto.analyze_interaction(x='Fare', hue='Survived', train_data=df_train, test_data=df_test, chart_args=dict(fill=True))

auto.analyze_interaction(x='Age', hue='Survived', train_data=df_train, test_data=df_test)

The very left part of the distribution on this chart possibly hints that children and infants were the priority.
auto.analyze_interaction(x='Fare', y='Age', hue='Survived', train_data=df_train, test_data=df_test)

This chart highlights three outliers with a Fare of over $500
. Let’s take a look at these:
df_train[df_train.Fare > 400]
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
258 | 259 | 1 | 1 | Ward, Miss. Anna | female | 35.0 | 0 | 0 | PC 17755 | 512.3292 | NaN | C |
679 | 680 | 1 | 1 | Cardeza, Mr. Thomas Drake Martinez | male | 36.0 | 0 | 1 | PC 17755 | 512.3292 | B51 B53 B55 | C |
737 | 738 | 1 | 1 | Lesurer, Mr. Gustave J | male | 35.0 | 0 | 0 | PC 17755 | 512.3292 | B101 | C |
As you can see all 4 passengers share the same ticket. Per-person fare would be 1/4 of this value. Looks like we can add a new feature to the dataset fare per person; also this allows us to see if some passengers travelled in larger groups. Let’s create two new features and take at the Fare-Age relationship once again.
ticket_to_count = df_train.groupby(by='Ticket')['Embarked'].count().to_dict()
data = df_train.copy()
data['GroupSize'] = data.Ticket.map(ticket_to_count)
data['FarePerPerson'] = data.Fare / data.GroupSize
auto.analyze_interaction(x='FarePerPerson', y='Age', hue='Survived', train_data=data)
auto.analyze_interaction(x='FarePerPerson', y='Age', hue='Pclass', train_data=data)


You can see cleaner separation between Fare
, Pclass
and Survived
now.