Home > Blockchain >  How to list.files() until n level deep subdirectory in R
How to list.files() until n level deep subdirectory in R

Time:02-03

I have a very complex organisation of folders, and here is a simplified version of it.

       |--Folder0 --- Folder0.1
       |
home---|--Folder1 --- Folder1.1 --- Folder1.2
       |
       |--Folder2 --- Folder2.1

I want to list from the second level folders (Folder0.1, Folder1.1, Folder2.1) all the .xlsx files.

Any help is really appreciated.

Please do not take for granted the name of the folders. Only for simplification I called them like this. Their names come as a random.

CodePudding user response:

list.files(recursive=) doesn't allow an immediate limit, I suggest three paths:

  1. Get all, split on the file-separator (typically /) and limit the listing.

    allfiles <- list.files(".", full.names = TRUE, recursive = TRUE)
    under5 <- allfiles[ lengths(strsplit(allfiles, "/")) < 6 ]
    

    Since it likely starts with ".", you may need one more than you think you do. Experiment with the 6 here to get what you need.

  2. If it's only a few deep, then construct the list of directories manually and disable recursive completely.

    dir1 <- list.dirs(".", recursive=FALSE)
    dir2 <- list.dirs(dir1, recursive=FALSE)
    dir3 <- list.dirs(dir2, recursive=FALSE)
    dir4 <- list.dirs(dir3, recursive=FALSE)
    allfiles <- setdiff(list.files(c(dir1, dir2, dir3)), c(dir1, dir2, dir3))
    

    (Admittedly under-tested, not awesome, but it's a start of a method.)

  3. Use system("find . -maxdepth 5 -type f", intern=TRUE) or similar to produce file names.

CodePudding user response:

Calling list.files(recursive = TRUE) then filtering the result is a valid approach, but it is error-prone and doesn't scale well with the size and depth of the file system tree that you are recursing over.

I would consider calling one of the command line utilities provided by your OS, which you can do via R's system and system2. For example, on Unix, you could simply do:

system("find . -maxdepth n -type f -regex '.*[.]xlsx'", intern = TRUE)

replacing n with your desired recursion depth. An advantage of this approach is that find is highly optimized and has many (many) more options than list.files.

Another scalable option is to write your own recursive function that stops where you want. Below is one possibility, which you should probably test more than I have.

lf <- function(path = ".", maxdepth = 0L, pattern = NULL, all.files = FALSE, include.dirs = FALSE) {
    fn <- list.files(path, pattern = pattern, all.files = all.files, full.names = TRUE, no.. = TRUE)
    dn <- list.dirs(path, full.names = TRUE, recursive = FALSE)
    fn <- fn[match(fn, dn, 0L) == 0L]
    if (!all.files) {
        dn <- dn[grepl("^[^.]", basename(dn))]
    }
    if (length(dn) == 0L) {
        return(fn)
    }
    if (maxdepth < 1L) {
        if (include.dirs) {
            return(c(fn, dn))
        } else {
            return(fn)
        }
    }
    l <- lapply(dn, lf, maxdepth = maxdepth - 1L, pattern = pattern, all.files = all.files, include.dirs = include.dirs)
    if (include.dirs) {
        l <- Map(c, dn, l, USE.NAMES = FALSE) 
    }
    c(fn, unlist(l, FALSE, FALSE))
}

Then:

lf(".", maxdepth = n, pattern = "[.]xlsx$")

again replacing n with your desired recursion depth.

  •  Tags:  
  • Related