I have a DataFrame that may contain missing values and I want to filter out all the rows that contain at least one missing value, so from this
DataFrame(a = [1, 2, 3, 4], b = [5, missing, 7, 8], c = [9, 10, missing, 12])
4×3 DataFrame
Row │ a b c
│ Int64 Int64? Int64?
─────┼─────────────────────────
1 │ 1 5 9
2 │ 2 missing 10
3 │ 3 7 missing
4 │ 4 8 12
I want something like
Row │ a b c
│ Int64 Int64? Int64?
─────┼─────────────────────────
1 │ 1 5 9
4 │ 4 8 12
Ideally, there would be a filter function where I can pass each row into a lambda and then do a combo of collect and findfirst and whatnot, but I can't figure out how to pass lambdas to subset or @subset (from DataFramesMeta), because I don't only have three columns, I have over 200.
CodePudding user response:
I will let @Antonello to add an answer for dropmissing and dropmissing! (in place variant).
Here is how you could perform filtering using subset and ismissing instead:
julia> subset(df, All() .=> ByRow(!ismissing))
2×3 DataFrame
Row │ a b c
│ Int64 Int64? Int64?
─────┼───────────────────────
1 │ 1 5 9
2 │ 4 8 12
(I am using standard select from DataFrames.jl)
or if you have a very wide data frame (like thousands of columns):
subset(df, AsTable(All()) => ByRow((x -> all(!ismissing, x))∘collect))
(this is a special syntax optimized for fast row-wise aggregation of wide tables)
CodePudding user response:
OK, this seems to work but I'm leaving this open for more suggestions.
DataFrame(collect(filter(r -> nothing .== findfirst(collect(ismissing.(collect(r)))), eachrow(data[:, before_qs]))))
