You are given some data by a collaborator and asked to build

You are given some data by a collaborator, and asked to build a two-class classifier with n = 1000 observations and p = 500 features, to predict the risk of a customer defaulting on a loan. Unfortunately about 25% of the features are missing at random (and not the same 25% each time). The result is that nearly every observation has some missing features. How would you deal with this? If you later learn that some of the features like monthly income are not missing at random, but are more likely to be missing because the mortgage company has lost track of the customer. How would you deal with this issue?

Solution

in this case very large amount of data was missing(25%) first try to complete the data if possible.

we can not delete full row of missing value data beacause we might lose data

we can also replace data with mean or median (not recomended) but this is not clever way to handle this situation in data analysis

their is package in R called MICE

please install mice package and try to run following code

install.packages(\"mice\")

library(\"mice\")

x1=mice(DataName,m=5,seed=100)
x2=complete(x1)
View(x2)

your missing values are filled

if you have any doubt regarding this please comment.

You are given some data by a collaborator, and asked to build a two-class classifier with n = 1000 observations and p = 500 features, to predict the risk of a c

Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site