Home > Blockchain >  Keeping leading 0s when at using read.csv()
Keeping leading 0s when at using read.csv()

Time:01-20

I am trying to create a tool that reads multiple CSVs from a folder and converts them into xlsx. My problem is that in some variables there are leading zeros that I want to keep. But the variable names vary between files and also every time I will need this tool.

So, is there a way to automatically detect leading 0s in any variables when at reading a file with read.csv()?

I cannot apply formats after reading because I will not fully know the variable names in which I need to apply this. I cannot force every column to turn into text because I have other variables that need to be a number.

CodePudding user response:

I'd do this in multiple steps:

First, I'd read in the table with everything as character:

df <- read.table(file, sep=',', colClasses='character')
df

  a  b   c
1 1 01   3
2 2 10 043
3 3 30  43
4 4 40 043

Then, I'd loop through the table to check for leading zeros

leading_zeros = sapply(df, function(x) any(startsWith(x, '0')))
leading_zeros
    a     b     c 
FALSE  TRUE  TRUE 

Then, you can convert the columns without leading zeros to numeric:

str(df)
'data.frame':   4 obs. of  3 variables:
 $ a: chr  "1" "2" "3" "4"
 $ b: chr  "01" "10" "30" "40"
 $ c: chr  "3" "043" "43" "043"

df[!leading_zeros] <- sapply(df[!leading_zeros], as.numeric)

str(df)
'data.frame':   4 obs. of  3 variables:
 $ a: num [1:4, 1] 1 2 3 4
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr "a"
 $ b: chr  "01" "10" "30" "40"
 $ c: chr  "3" "043" "43" "043"

CodePudding user response:

Define a special class, num2, and then run read.csv with that.

setClass("num2")

setAs("character", "num2",
  function(from) {
    from2 <- type.convert(from, as.is = TRUE)
    if (is.numeric(from2) && any(grepl("^0", from))) from else from2
  })

DF <- read.csv(text = Lines, colClasses = "num2")
str(DF)
## 'data.frame':   2 obs. of  4 variables:
##  $ a: int  1 2
##  $ b: int  2 4
##  $ c: chr  "03" "05"
##  $ d: chr  "ab" "cd"

Note

Sample data

Lines <- "a,b,c,d
1,2,03,ab
2,4,05,cd"
  •  Tags:  
  • Related