Missing Data-MCAR,MAR, MNAR

Thiruthuvaraj Rajasekhar
2 min readJan 27, 2021

--

In our data science problems, missing value treatment itself can be very tedious if we don’t actually learn what that missing value is really saying. Whenever we see missing values, first thing comes to our mind is shall we impute with mean, median, or can we use iterative imputer, simple imputer from sklearn or any other built-in methods that are implemented.

In this section, we do not focus on the built-in methods, but analyze the missing values. Data Scientists often use these terminologies Missing Completely at Random, Missing at Random, Missing Not at Random.

I will take a simple example to explain these concepts:

Here I have a simple dataset with columns A,B,C,D with some missing values NA

If we take first 2 columns, we can see a linear relationship between the columns A and B. Since missing values in B are dependent on the non missing values at A, this type of missing values are called as Missing at Random(MAR).

In Column D, lets assume that NA is because, the customer doesn’t want to disclose his telephone information, so such customer’s missing values are dependent on the other missing values, i.e., it is difficult to find the values for such customers, hence, this is termed as Missing Not at Random(MNAR). There is some dependency with the missingness. These missing values are difficult to fill.

Missing Completely at Random, is a state where, we miss to record information due to some unexpected mistake like if you are recording some response and suddenly you entered a wrong data. This is much broader case of MAR.

MAR can be imputed with some methods like Iterative Imputer, if we have some relationships. MCAR can be addressed by creating a new class called “Missing”. MNAR is much complicated case, hence it requires more samples to analyze and get to some rule based conclusion for imputing.

--

--

Thiruthuvaraj Rajasekhar
Thiruthuvaraj Rajasekhar

Written by Thiruthuvaraj Rajasekhar

Mining Data For Insights | Passionate to write | Data Scientist

No responses yet