I'm working with a CSV in which one column of numbers is separated with commas (ex. 1,000,000 = 1000000) Is there a way I can replace the entire column? When I try:
replace(df2.Volume, "," => "")
it gives me back the entire column as if nothing has changed. ... and when I tried:
julia> parse(Int, replace("df2.Volume",","=>"") )
ERROR: ArgumentError: invalid base 10 digit 'd' in "df2.Volume"
Stacktrace:
[1] tryparse_internal(#unused#::Type{Int64}, s::String, startpos::Int64, endpos::Int64, base_::Int64, raise::Bool)
@ Base .\parse.jl:137
[2] parse(::Type{Int64}, s::String; base::Nothing)
@ Base .\parse.jl:241
[3] parse(::Type{Int64}, s::String)
@ Base .\parse.jl:241
[4] top-level scope
@ REPL[263]:1
The data is all numbers in the millions, so how can I remove these commas?? I appreciate your help! Source: https://testdataframesjl.readthedocs.io/en/readthedocs/subsets/
CodePudding user response:
You can do something like:
df.Volume = [parse(Int, replace(v, ","=>"")) for v in df.Volume]
CodePudding user response:
A column of a DataFrame in Julia is a Vector. Hence if you want to do something with the entire column you usually need to vectorize the operation using the dot (.) operator.
julia> df = DataFrame(Volume=["1,000","1,000,000","1,000,000,0000"]);
julia> df.VolumeOK = replace.(df.Volume, "," => "");
julia> df
3×2 DataFrame
Row │ Volume VolumeOK
│ String String
─────┼─────────────────────────────
1 │ 1,000 1000
2 │ 1,000,000 1000000
3 │ 1,000,000,0000 10000000000
Note the dot . after replace.
You can of course further parse it to Int using vectorized parse function such as parse.(Int, df.VolumeOK).
Finally, note that you could handle all issues directly when reading data with CSV.jl such as:
CSV.read("df.csv", delim=";", decimal=",")
