Home > Mobile >  Sort subgroups of lines in text file
Sort subgroups of lines in text file

Time:01-15

I have a server property file called filelist.txt. This filelist contains a list of files which are arranged in groups based upon their directory prefix. The ordering of the contents is important, in that the file must start and end with specific file names, and the first subgroup must appear first. For example, the file looks like:

config.txt
../../linux/a.txt
../../linux/c.txt
../../linux/d.txt
../../linux/b.txt
../../certificates/../../d.txt
../../certificates/../../a.txt
../../certificates/../../c.txt
../../certificates/../..b.txt
../../bin/b.txt
../../bin/a.txt
properties.server

What I am wondering is what is the best way to sort within these subgroups in a clean and efficient way, while maintaining the overall ordering of the subgroups?

I wrote this code which can filter by a subgroup and sort it:

try(Stream<String> lines = Files.lines(Paths.get("src/filelist.txt"))){

            List<String> linez = lines.filter(l -> l.contains("linux")).sorted().collect(Collectors.toList());

            BufferedWriter bw = new BufferedWriter(new FileWriter("src/output.txt"));

            for(String line : linez){
                bw.write(line "\n");
            }

            bw.close();

I could have a List<String> which will contain all of my lines, and I could filter the original file lines, sort them, and then add them to this list.

There's a couple things that I don't like:

  1. I'm not overwriting the original file, I'm writing to a new one. I'd like to see if there is a way to overwrite instead of writing to a new file.
  2. It seems a bit obtuse. What if there is a new directory prefix that gets added? Then I'll have to edit this code again to filter and sort that new directory prefix group. Also, it would feel weird making a bunch of different lists for each subgroup based on each filter, instead of doing something like lines.filter(...).sort().filter(...).sort().filter(...), but I don't think that kind of syntax makes sense yet, because once the filter is applied, after the sort I can't unapply the filter. It would be really nice to be able to filter, sort in place, then apply another filter and sort, and so on.

What are my options here?

CodePudding user response:

Streams aren't the best approach (At least not for the entire job; maybe for parts of it). The following uses the approach of looking at each line, and if the path up to the last component is the same as the previously read one, adding it to a list, and if not, sorting that list of paths and then adding them to the sorted results, so that the order of each prefix group is unchanged, but each group ends up sorted.

Overwriting the input file is just a matter of using the same file for reading and writing - just make sure to read everything before opening it for writing. The below just reads the file into a list before processing it.

import java.util.ArrayDeque;
import java.nio.file.Path;
import java.nio.file.Files;
import java.io.IOException;
import java.io.PrintWriter;

public class Demo {
    public static void main(String[] args) {
        try {
            Path datafile = Path.of(args[0]);
            var lines = Files.readAllLines(datafile);
            var sorted = new ArrayDeque<String>(lines.size());

            // Add first line
            sorted.addLast(lines.get(0));

            // Iterate through all the remaining but the last line
            var block = new ArrayDeque<Path>();
            for (int i = 1; i < lines.size() - 1; i  ) {
                Path p = Path.of(lines.get(i));
                if (!block.isEmpty()
                    && !p.getParent().equals(block.getLast().getParent())) {
                    // Current path has a different prefix than the current
                    // block. Sort block and add it to output
                    block.stream()
                        .sorted()
                        .forEachOrdered(sp ->
                                        sorted.addLast(sp.toString()));
                    // And reset for a new path prefix
                    block.clear();
                }
                block.addLast(p);
            }
            // Handle the last block of paths
            block.stream()
                .sorted()
                .forEachOrdered(sp -> sorted.addLast(sp.toString()));

            // Add last line
            sorted.addLast(lines.get(lines.size() - 1));

            // Overwrite the original input file
            Files.write(datafile, sorted);
        } catch (IOException e) {
            System.err.println(e);
            System.exit(1);
        }
    }
}

CodePudding user response:

This is based on Shawn’s answer but simplifies the operation by directly sorting the affected groups within the original list.

try {
    Path datafile = Path.of(args[0]);
    var lines = Files.readAllLines(datafile);

    // ensure that the list is mutable
    if(lines.getClass() != ArrayList.class) lines = new ArrayList<>(lines);

    int first = 1; // skip first line
    int last = lines.size() - 1; // and last line

    if(first >= last) return;

    Path previous = Path.of(lines.get(first));
    for (int i = first   1; i < last; i  ) {
        Path p = Path.of(lines.get(i));
        if(!p.getParent().equals(previous.getParent())) {
            lines.subList(first, i).sort(null);
            first = i;
            previous = p;
        }
    }
    // Handle the last block of paths
    if(first < last) lines.subList(first, last).sort(null);

    // Overwrite the original input file
    Files.write(datafile, lines);
} catch (IOException e) {
    System.err.println(e);
    System.exit(1);
}

Note that the type of the list returned by Files.lines is unspecified, but always an ArrayList in practice. The solution protects itself against the hypothetical case that the returned list is not mutable, but does not perform the copying operation in real life. So it’s both, formally correct and efficient.

  •  Tags:  
  • Related