can someone explain to me why the value of split is false in the test set?
split = sample.split(dataset$Salary, SplitRatio = 2/3)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)
CodePudding user response:
I assume you got this code from some kind of caTools documentation? I recommend trying to run the first line of code and it should start to make sense.
Basically what caTools::sample.split does is create a random vector of length nrow(x) with TRUEs and FALSEs, in the given ratio. Let's take the iris dataset for example (which has 150 rows):
split = sample.split(iris$Sepal.Length, SplitRatio = 2/3)
The result will be a 150 item vector with 2/3 TRUE and 1/3 FALSE.
Next you use the subset function to extract all the rows i from iris where split[i] == TRUE to create the training set and use all the rows i from iris where split[i] == FALSE to create the test set.
That is why you use split == TRUE in the training set and split == FALSE in the test set
