I'm getting an error when trying to run my code. I don't have missing data as I checked using is.na() in my dataframe.
CI <- read.csv("Census_income.csv")
CI <- CI %>%
mutate(Sex_Indicator = ifelse(`Sex` == " Male", 1, 0))
set.seed(1)
groups = c(rep(1, 20000), rep(2, 12577))
random_groups = sample(groups, 32577)
in_train1 = (random_groups ==1)
quant_train_std = scale(CI[in_train1, c(1,5)])
quant_test_std=scale(CI[!in_train1, c(1,5)],
center=attr(quant_train_std, "scaled:center"),
scale=attr(quant_train_std,"scaled:scale"))
x_train = cbind(CI$Sex_Indicator[in_train1],
quant_train_std)
x_test = cbind(CI$Sex_Indicator[!in_train1],
quant_test_std)
predictions = knn(train = x_train,
test = x_test,
cl = CI[in_train1, 15],
k = 25)
Error that I get when running the predictions variable:
Error in knn(train = x_train, test = x_test, cl = CI[in_train1, 15], k = 25) : no missing values are allowed
Link to csv file:
https://docs.google.com/spreadsheets/d/1N68aU812YqZZdksKocsPdOLOdtcU7YGIU_ydKm8uKow/edit?usp=sharing
CodePudding user response:
You've got NA values in quant_train_std which(complete.cases(quant_train_std) == FALSE) [1] 19991 19992 19993 19994 19995 19996 19997 19998 19999 20000
