I'm trying to create a very basic function in which I want to remove any 0 values (or less) from the df based on a specific column in the df. When I run these lines outside of the function they work but when I try to run them within the function I get this error "Error in $<-.data.frame(*tmp*, name, value = numeric(0)) : replacement has 0 rows". Does anyone know what the problem is?
Remove_Missing=function(x,name){
x$name=as.numeric(x$name)
x=x[x$name>0,]
}
EDIT:
Example Code:
#First two lines work but those same two lines won't work if function is called
merged_data$name=as.numeric(merged_data$HETENURE)
merged_data=merged_data[merged_data$HETENURE>0,]
Remove_Missing(merged_data, HETENURE) #Call function
Data
structure(list(HRHHID = c("008906910993941", "008906910993941",
"648061954059610", "160916068405549", "160916068405549", "168069009100998"
), HRYEAR4 = c("2010", "2010", "2010", "2010", "2010", "2010"
), HETENURE = c(" 1", " 1", " 3", " 1", " 1", " 1"), HEFAMINC = c("11",
"11", "10", "13", "13", "14"), HRNUMHOU = c(" 2", " 2", " 1",
" 2", " 2", " 3"), GESTFIPS = c("01", "01", "01", "01", "01",
"01"), GTMETSTA = c("2", "2", "1", "1", "1", "1"), PEMARITL = c(" 1",
" 1", " 4", " 1", " 1", " 1"), PESEX = c(" 2", " 1", " 1", " 2",
" 1", " 2"), PEEDUCA = c("40", "45", "40", "42", "41", "39"),
PTDTRACE = c(" 1", " 1", " 1", " 1", " 1", " 1"), PEHSPNON = c(" 2",
" 2", " 2", " 2", " 2", " 2"), PEMLR = c(" 5", " 5", " 5",
" 1", " 1", " 7"), PRFTLF = c("-1", "-1", "-1", " 1", " 1",
"-1"), PRHRUSL = c("-1", "-1", "-1", " 4", " 4", "-1"), HESP1 = c("-1",
"-1", "-1", "-1", "-1", "-1"), HESP6 = c("-1", "-1", "-1",
"-1", "-1", "-1"), HESP7A = c("-1", "-1", "-1", "-1", "-1",
"-1"), HESP8 = c("-1", "-1", "-1", "-1", "-1", "-1"), HRFS12M1 = c(" 1", " 1", " 1", " 1", " 1", " 1")), row.names = c(9L, 10L, 11L,
12L, 13L, 15L), class = "data.frame")
CodePudding user response:
There are two problems and an enabling mistake here:
You define your function with
function(x, name)but then try to reference the particular column asx$name, which should fail. That is, ifnameis supposed to identify (via standard-evaluation) a column, then it should really be a string, and$does not work that way. You should instead be usingx[[name]](see The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe).This is not reporting as a problem (though it should), however, because of the next two bugs.
You are calling your function as
Remove_Missing(merged_data, HETENURE)but since you are not attempting to do non-standard evaluation (NSE), the use of
HETENUREis wrong. What should be happening is that in your function, when itsnameis referenced, it should look for an object namedHETENUREand not find it; it should err withError: object 'HETENURE' not found. What I think you should be doing isRemove_Missing(merged_data, "HETENURE")Not a bug so much as a weakness that allowed other bugs to remain undiscovered: you assigned
merged_data$name <- as.numeric(...), so in your function whenx$nameshould have been referencingx$HETENUREand should have failed, it instead found a column namednamein your data (and therefore the function's passed argument ofnamewas never referenced/used).
First, let's remove the tempting hidden-bug of the column named name:
merged_data$name <- NULL
Second, the fixed function:
Remove_Missing = function(x, name) {
x[[name]] = as.numeric(x[[name]])
x[x[[name]] > 0,]
}
Third, fixing the invocation and getting return data:
Remove_Missing(merged_data, "HETENURE")
# HRHHID HRYEAR4 HETENURE HEFAMINC HRNUMHOU GESTFIPS GTMETSTA PEMARITL PESEX PEEDUCA PTDTRACE PEHSPNON PEMLR PRFTLF PRHRUSL HESP1 HESP6 HESP7A HESP8 HRFS12M1 name
# 9 008906910993941 2010 1 11 2 01 2 1 2 40 1 2 5 -1 -1 -1 -1 -1 -1 1 1
# 10 008906910993941 2010 1 11 2 01 2 1 1 45 1 2 5 -1 -1 -1 -1 -1 -1 1 1
# 11 648061954059610 2010 3 10 1 01 1 4 1 40 1 2 5 -1 -1 -1 -1 -1 -1 1 3
# 12 160916068405549 2010 1 13 2 01 1 1 2 42 1 2 1 1 4 -1 -1 -1 -1 1 1
# 13 160916068405549 2010 1 13 2 01 1 1 1 41 1 2 1 1 4 -1 -1 -1 -1 1 1
# 15 168069009100998 2010 1 14 3 01 1 1 2 39 1 2 7 -1 -1 -1 -1 -1 -1 1 1
Granted, in this case nothing was filtered out (since all of your data passed the condition), so if I temporarily revise the function to condition on > 1 instead, we'll see the change:
Remove_Missing = function(x, name) {
x[[name]] = as.numeric(x[[name]])
x[x[[name]] > 1,]
}
Remove_Missing(merged_data, "HETENURE")
# HRHHID HRYEAR4 HETENURE HEFAMINC HRNUMHOU GESTFIPS GTMETSTA PEMARITL PESEX PEEDUCA PTDTRACE PEHSPNON PEMLR PRFTLF PRHRUSL HESP1 HESP6 HESP7A HESP8 HRFS12M1 name
# 11 648061954059610 2010 3 10 1 01 1 4 1 40 1 2 5 -1 -1 -1 -1 -1 -1 1 3
