I'm trying to open a file in a for loop with read.table(). When I pass the path variable file in read.table(), the path changes: the directory variable is omitted. I searched for similar issues and couldn't find a relevant case.
The code:
directories <- list.dirs('directory_path', recursive = T)
for (directory in 1:length(directories)){
list <- list("File_0", "File_1")
for(file in 1:length(list)){
directory = directories[directory]
file = paste(directory, list[file], sep = '/')
read.table(file, colClasses = c(rep("character", 2), rep("NULL", 1)),
header = T)
output_path <- paste(directory, file, sep = '/')
write.table(data, output_path, sep = '\t', quote = FALSE)
}
}
If I remove the read.table() command and instead type print(file), all the paths are printed correctly.
The content of files I wish to open:
name column_1 column_2
BME_RS00005 878 878
BME_RS00010 257 257
BME_RS00020 2511 2511
BME_RS00025 2611 2611
BME_RS00030 3886 3886
BME_RS17490 1494 1494
BME_RS00035 5922 5922
BME_RS00040 265 265
BME_RS00045 220 220
What should I change?
CodePudding user response:
I'm inferring from your code that your directory structure looks like this:
├── directory_1
│ ├── File_0
│ └── File_1
├── directory_2
│ ├── File_0
│ └── File_1
├── directory_3
│ ├── File_0
│ └── File_1
The best thing is to get all the files into one vector before iterating over them:
directories <- list.dirs(directory_path, recursive = T)
files <- c("File_0", "File_1")
full_paths <- as.character(
sapply(files, function(x) paste0(directories, "/", x))
)
full_paths
# [1] "directory_1/File_0" "directory_2/File_0" "directory_3/File_0" "directory_1/File_1"
# [5] "directory_2/File_1" "directory_3/File_1"
Now you have a vector of files you can just read them in.
You could probably do the next bit with lapply but I'm not sure what you're doing in your loop. Now you have updated the question to say you want to delete a column, just do this:
for(infile in full_paths){
df <- read.table(
infile,
colClasses = c(rep("character", 2), rep("NULL", 1)),
header = T
)
# ... do stuff here
df[["column_2"]] <- NULL
outfile = paste0(infile, "_new")
write.table(df, outfile, sep = '\t', quote = FALSE)
}
CodePudding user response:
You may consider a different approach without any loop. This solution should get all the files you want in each directory in the "main" directory:
# first you get all the directories in the main dir
list_dir <- list.dirs("...\\directory", recursive = T)
# and files you need
list_files <- c('File_0.txt','File_1.txt')
# then you create ALL the combinations of files and directories, in a vector
files_dir <- expand.grid(list_dir, list_files)
files_dir <- paste(files_dir$Var1, files_dir$Var2,sep = '/')
# you lapply a function that if the file in directory exists, it reat it, if not
# it creates an empty element in the list
list_of_file <- lapply(files_dir, function(x) if (file.exists(x)){read.table(x, header = T)} )
# remove the empty elements
list_of_file <- list_of_file[sapply(list_of_file, is.null)] <- NULL
# last you can do everything you need, for example remove one specific column
# from each data.frame
lapply(list_of_file, function(x) { x["column_1"] <- NULL; x })
# or in case you need an index
lapply(list_of_file, function(x) { x[,3] <- NULL; x })
And if you need to save them:
# first you've to give some names, in this case a number
names(list_of_file) <- seq_along(list_of_file)
# then you can save all with a mapply
# to not have printed anything on cosole, wrap it with invisible()
invisible(
mapply( write.table
,x = list_of_file
,file = paste0("...\\directory\\",names(list_of_file), ".txt")
)
)
