Let's say, I have the following my_dt datatable:
| neutrons | spectrum | geography |
|---|---|---|
| 2.30 | -1.2 | KIEL |
| 2.54 | -1.6 | KIEL |
| 2.56 | -0.9 | JUNG |
| 2.31 | -0.3 | ANT |
Also I have the following named list (my_list):
> my_list
$particles
[1] "neutrons"
$station
[1] NA
$energy
[1] "spectrum"
$area
[1] "geography"
$gamma
[1] NA
The values of this list correspond to the columns names from my dataset (if they exist, if they are absent - NA).
Based on my dataset and this list, I need to check which columns exist in my_dt and rename them (based on my_list names), and for NA values - I need to create columns filled with NAs.
So, I want to obtain the following dataset:
>final_dt
| particles | station | energy | area | gamma |
|---|---|---|---|---|
| 2.30 | NA | -1.2 | KIEL | NA |
| 2.54 | NA | -1.6 | KIEL | NA |
| 2.56 | NA | -0.9 | JUNG | NA |
| 2.31 | NA | -0.3 | ANT | NA |
I try to implement this using apply family functions, but at the moment I can't obtain exactly what I want.
So, I would be grateful for any help!
CodePudding user response:
I wrote a simple code that should do the job for you:
l = list(c = 'cc', a = 'aa', b = NA) # replace this with your my_list
dt = data.frame(aa = 1:3, cc = 2:4) # replace this with my_dt
dtl = data.frame(l)
names(dt) = names(l)[na.omit(match(l, names(dt)))]
m = merge(dt, dtl[!is.element(names(dtl), names(dt))])
CodePudding user response:
This may not meet your needs, but since I had come up with this separately thought I would share just in case. You can use setnames to rename the columns based on my_list. After that, add in the missing column names with values of NA. Finally, you can use setcolorder to reorder based on your list if desired.
library(data.table)
my_vec <- unlist(my_list)
setnames(my_dt, names(my_vec[match(names(my_dt), my_vec)]))
my_dt[, (setdiff(names(my_vec), names(my_dt))) := NA]
setcolorder(my_dt, names(my_vec))
my_dt
Output
particles station energy area gamma
1: 2.30 NA -1.2 KIEL NA
2: 2.54 NA -1.6 KIEL NA
3: 2.56 NA -0.9 JUNG NA
4: 2.31 NA -0.3 ANT NA
CodePudding user response:
data.table using lapply
library(data.table)
setDT(my_dt)
setDT(my_list)
final_dt <- setnames( my_list[, lapply( .SD, function(x){
if( x %in% colnames(my_dt)){ my_dt[,x,with=F] }else{ NA } } ) ],
names(my_list) )
final_dt
particles station energy area gamma
1: 2.30 NA -1.2 KIEL NA
2: 2.54 NA -1.6 KIEL NA
3: 2.56 NA -0.9 JUNG NA
4: 2.31 NA -0.3 ANT NA
base R using sapply
setDF(my_dt)
setDF(my_list)
data.frame( sapply( my_list, function(x) if(!is.na(x)){ my_dt[,x] }else{ NA } ) )
particles station energy area gamma
1 2.30 NA -1.2 KIEL NA
2 2.54 NA -1.6 KIEL NA
3 2.56 NA -0.9 JUNG NA
4 2.31 NA -0.3 ANT NA
Data
my_dt <- structure(list(neutrons = c(2.3, 2.54, 2.56, 2.31), spectrum = c(-1.2,
-1.6, -0.9, -0.3), geography = c("KIEL", "KIEL", "JUNG", "ANT"
)), class = "data.frame", row.names = c(NA, -4L))
my_list <- list(particles = "neutrons", station = NA, energy = "spectrum",
area = "geography", gamma = NA)
