I have a table in R containing many files that I need copied to a destination folder. The files are spread out over dozens of folders, each several sub-folders down. I have successfully used the following code to find all of the files and their locations:
(fastq_files <- list.files(Illumina_output, ".fastq.gz", recursive = TRUE, include.dirs = TRUE) %>% as_tibble)
After appending the full path, I have a tibble that looks something like this:
| full_path |
|---|
| Q:/IlluminaOutput/2019/091119 AB NGS/Data/Intensities/BaseCalls/19-15897-HLA-091119-AB-NGS_S14_L001_R1_001.fastq.gz |
| Q:/IlluminaOutput/2019/091119 AB NGS/Data/Intensities/BaseCalls/19-15236-HLA-091119-AB-NGS_S14_L001_R2_001.fastq.gz |
| Q:/IlluminaOutput/2018/062818AB NGS/Data/Intensities/BaseCalls/18-06875-HLA-062818-NGS_S11_L001_R1_001.fastq.gz |
Using the file.copy function gives an error that the file name is too long, a known issue in Windows (I am using RStudio on Windows 10).
I found that if I set the working directory directory to the file location, I am able to copy files. Starting with a table like this:
| file | path |
|---|---|
| 19-14889-HLA-091119-AB-NGS_S14_L001_R1_001.fastq.gz | Q:/IlluminaOutput/2019/091119 AB NGS/Data/Intensities/BaseCalls/ |
| 19-14889-HLA-091119-AB-NGS_S14_L001_R2_001.fastq.gz | Q:/IlluminaOutput/2019/091119 AB NGS/Data/Intensities/BaseCalls/ |
| 18-09772-HLA-062818-NGS_S11_L001_R1_001.fastq.gz | Q:/IlluminaOutput/2018/062818AB NGS/Data/Intensities/BaseCalls/ |
| 18-09772-HLA-062818-NGS_S11_L001_R2_001.fastq.gz | Q:/IlluminaOutput/2018/062818AB NGS/Data/Intensities/BaseCalls/ |
I used the following code to sucsessfully copy the first file:
(dir <- as.character(as.vector(file_and_path[1,2])))
setwd(dir)
(file <- as.character(as.vector(file_and_path[1,1])))
(file.copy(file, Trusight_output) %>% as.tibble)
I got this to work, but I don't know how to apply these steps to every column in my table. I think i probably have to use the lapply function, but I'm not sure how to construct it.
CodePudding user response:
This should do the trick, assuming that file_and_path$file and file_and_path$path are both character vectors and that Trusight_output is an absolute path:
f <- function(file, from, to) {
cwd <- setwd(from)
on.exit(setwd(cwd))
file.copy(file, to)
}
Map(f, file = file_and_path$file, from = file_and_path$path, to = Trusight_output)
We use Map here rather than lapply because we are applying a function of more than one argument. FWIW, operations like this are often better suited for PowerShell.
