Home > Back-end >  Filter DataFrame by rows which have no "missing" value
Filter DataFrame by rows which have no "missing" value

Time:01-14

I have a DataFrame that may contain missing values and I want to filter out all the rows that contain at least one missing value, so from this

DataFrame(a = [1, 2, 3, 4], b = [5, missing, 7, 8], c = [9, 10, missing, 12])
4×3 DataFrame
 Row │ a      b        c
     │ Int64  Int64?   Int64?
─────┼─────────────────────────
   1 │     1        5        9
   2 │     2  missing       10
   3 │     3        7  missing
   4 │     4        8       12

I want something like

 Row │ a      b        c
     │ Int64  Int64?   Int64?
─────┼─────────────────────────
   1 │     1        5        9
   4 │     4        8       12

Ideally, there would be a filter function where I can pass each row into a lambda and then do a combo of collect and findfirst and whatnot, but I can't figure out how to pass lambdas to subset or @subset (from DataFramesMeta), because I don't only have three columns, I have over 200.

CodePudding user response:

I will let @Antonello to add an answer for dropmissing and dropmissing! (in place variant).

Here is how you could perform filtering using subset and ismissing instead:

julia> subset(df, All() .=> ByRow(!ismissing))
2×3 DataFrame
 Row │ a      b       c
     │ Int64  Int64?  Int64?
─────┼───────────────────────
   1 │     1       5       9
   2 │     4       8      12

(I am using standard select from DataFrames.jl)

or if you have a very wide data frame (like thousands of columns):

subset(df, AsTable(All()) => ByRow((x -> all(!ismissing, x))∘collect))

(this is a special syntax optimized for fast row-wise aggregation of wide tables)

CodePudding user response:

OK, this seems to work but I'm leaving this open for more suggestions.

DataFrame(collect(filter(r -> nothing .== findfirst(collect(ismissing.(collect(r)))), eachrow(data[:, before_qs]))))
  •  Tags:  
  • Related