I have a directory with a folder for each individual patient on a study, but multiple files within each folder for that patient. I want to run the same code in R for each folder and create a summary file.
so for example:
setwd("~/pt1")
##pull in files
T1<-read.csv('T1.csv')
T2<-read.csv('T2.csv')
T3<-read.csv('T3.csv')
T4<-read.csv('T4.csv')
## lots of code here ##
--> outputsummaryfile
Repeat for 80 more folders(patients).
I know how to create a function and pull in and run multiple files, but I'm stuck on how to pull in multiple folders and run a function on multiple files within that folder.
Any help?
CodePudding user response:
I suggest you read about purrr::map. This video also shows its use (https://www.youtube.com/watch?v=bzUmK0Y07ck&t=1598s) and there is an example about reading multiple files.
CodePudding user response:
The following is an quick way for limited files.
all_files_list<-list.files('path to folder',full.names = TRUE,recursive=TRUE,include.dirs=TRUE) # this will now be a list of all files with full paths.
main_df<- data.frame() # your main dataframe
for(file in all_files_list){ # loops the file listing
tmp_df <- read_tsv(file) #reads each file
#process tmp_df
main_df<- bind_rows(tmp_df) # merge to main dataframe provided same set of columns
}
Note that since all_files_list is a list and if it will contain large number of files, you can user purrr package to streamline it.
Update1:
Another option is to use purrr package and read all files into a list of dataframes as follows:
list_of_dataframes<-purrr::map(all_files_list,read_csv,id="filename") #ID column will have the filenames.
Then you could loop your all_files_list and process each dataframe as per your needs.
CodePudding user response:
You can list directories with list.dirs, then loop the dir's
dirs <- list.dirs(path = "[...]", recursive = FALSE)
for (i in 1:length(dirs)) {
files <- list.files(path = "[...]")
do.the.magic.on(files[i])
}
