could you please help me read this file in R:
Weekly SST data starts week centered on 3Jan1990
Nino1 2 Nino3 Nino34 Nino4
Week SST SSTA SST SSTA SST SSTA SST SSTA
03JAN1990 23.4-0.4 25.1-0.3 26.6 0.1 28.6 0.5
10JAN1990 23.4-0.8 25.2-0.3 26.6 0.1 28.6 0.5
17JAN1990 24.2-0.3 25.3-0.3 26.5-0.1 28.6 0.5
24JAN1990 24.4-0.4 25.5-0.4 26.5-0.1 28.4 0.3
31JAN1990 25.1-0.1 25.8-0.2 26.7 0.1 28.4 0.3
07FEB1990 25.8 0.2 26.1-0.1 26.8 0.2 28.4 0.4
14FEB1990 25.9 0.0 26.4 0.0 26.9 0.2 28.5 0.5
21FEB1990 26.1 0.0 26.7 0.2 27.1 0.3 28.9 0.8
As you can see, below each NinoXX header, there are two data columns with SST and SSTA.
Any help appreciated!!
CodePudding user response:
Kludgy hack. It would be far better to ask the originating author(s) to provide a better format.
dat <- read.fwf(textConnection("
Nino1 2 Nino3 Nino34 Nino4
Week SST SSTA SST SSTA SST SSTA SST SSTA
03JAN1990 23.4-0.4 25.1-0.3 26.6 0.1 28.6 0.5
10JAN1990 23.4-0.8 25.2-0.3 26.6 0.1 28.6 0.5
17JAN1990 24.2-0.3 25.3-0.3 26.5-0.1 28.6 0.5
24JAN1990 24.4-0.4 25.5-0.4 26.5-0.1 28.4 0.3
31JAN1990 25.1-0.1 25.8-0.2 26.7 0.1 28.4 0.3
07FEB1990 25.8 0.2 26.1-0.1 26.8 0.2 28.4 0.4
14FEB1990 25.9 0.0 26.4 0.0 26.9 0.2 28.5 0.5
21FEB1990 26.1 0.0 26.7 0.2 27.1 0.3 28.9 0.8"), c(15, 4,9, 4,9, 4,9, 4,4), skip = 2)
colnms <- trimws(unlist(dat[1,], use.names = FALSE))
colnms <- paste0(colnms, ave(as.character(colnms), colnms, FUN = function(z) if (length(z) == 1) "" else seq_along(z)))
dat <- data.frame(lapply(setNames(dat[-1,], colnms), type.convert, as.is = TRUE))
dat
# Week SST1 SSTA1 SST2 SSTA2 SST3 SSTA3 SST4 SSTA4
# 1 03JAN1990 23.4 -0.4 25.1 -0.3 26.6 0.1 28.6 0.5
# 2 10JAN1990 23.4 -0.8 25.2 -0.3 26.6 0.1 28.6 0.5
# 3 17JAN1990 24.2 -0.3 25.3 -0.3 26.5 -0.1 28.6 0.5
# 4 24JAN1990 24.4 -0.4 25.5 -0.4 26.5 -0.1 28.4 0.3
# 5 31JAN1990 25.1 -0.1 25.8 -0.2 26.7 0.1 28.4 0.3
# 6 07FEB1990 25.8 0.2 26.1 -0.1 26.8 0.2 28.4 0.4
# 7 14FEB1990 25.9 0.0 26.4 0.0 26.9 0.2 28.5 0.5
# 8 21FEB1990 26.1 0.0 26.7 0.2 27.1 0.3 28.9 0.8
If you have a file instead of just the text, you would use something like this for your first step.
dat <- read.fwf(filepath, c(15, 4,9, 4,9, 4,9, 4, 4), skip = 1)
Walk-through:
- The widths (
c(15, 4,9, ...)) were determined manually, nothing magical here. (Minor sub-note: I paired them visually as15, then4,9, etc; that is not a comma-decimal notation, it is merely showing visually that the4and9are logically assigned together; R ignores this and treats this asc(15, 4, 9, 4, 9, ...).) skip=2in the first code block is half aesthetic (for the answer), half functional. That is, my first code block has a newline after the opening quote, and whileread.tablewill silently skip that,read.fwfwill not, so I have to setskip=1to skip that. Since I also want to skip theNino*line, I have to increment toskip=2. For production and a real file to read from, you should useskip=1.
If you want to programmatically preserve the Nino number, then perhaps
ninos <- trimws(unlist(read.fwf(textConnection("
Nino1 2 Nino3 Nino34 Nino4
Week SST SSTA SST SSTA SST SSTA SST SSTA"), c(15, 13, 13, 13, 8), skip = 1)[1,], use.names = FALSE))
ninos <- ninos[nzchar(ninos)]
colnames(dat)[-1] <- paste0(rep(ninos, each = 2), "_", colnms[-1])
dat
# Week Nino1 2_SST1 Nino1 2_SSTA1 Nino3_SST2 Nino3_SSTA2 Nino34_SST3 Nino34_SSTA3 Nino4_SST4 Nino4_SSTA4
# 1 03JAN1990 23.4 -0.4 25.1 -0.3 26.6 0.1 28.6 0.5
# 2 10JAN1990 23.4 -0.8 25.2 -0.3 26.6 0.1 28.6 0.5
# 3 17JAN1990 24.2 -0.3 25.3 -0.3 26.5 -0.1 28.6 0.5
# 4 24JAN1990 24.4 -0.4 25.5 -0.4 26.5 -0.1 28.4 0.3
# 5 31JAN1990 25.1 -0.1 25.8 -0.2 26.7 0.1 28.4 0.3
# 6 07FEB1990 25.8 0.2 26.1 -0.1 26.8 0.2 28.4 0.4
# 7 14FEB1990 25.9 0.0 26.4 0.0 26.9 0.2 28.5 0.5
# 8 21FEB1990 26.1 0.0 26.7 0.2 27.1 0.3 28.9 0.8
Note that these names are generally not R-friendly, so you'll need backticks with many of them, e.g.,
dat$`Nino1 2_SST1`
# [1] 23.4 23.4 24.2 24.4 25.1 25.8 25.9 26.1
That can be remedied in any number of ways, over to you.
