Home > Net >  Perl, to find the common lines between text files in a Dir
Perl, to find the common lines between text files in a Dir

Time:01-19

I have found the code for read all the text file in a directory. But I don't know how to find the commonality between them. Please help me with the code, or share on what area I need to explore more. There is too much to learn but I have a time constraint.

use strict;
use warnings;
use English;

my $dir = 'C:\Perl_Example\Data';

foreach my $fp (glob("$dir/*.txt")) 
{
  printf "%s\n", $fp;
  #the file header
  
  open my $fh, "<", $fp or die "can't read open '$fp': $OS_ERROR";
  #open file to read which is each file in dir
  
  
  while (<$fh>) 
  {
    printf "  %s", $_;
    #print the file content
  }
  
  close $fh or die "can't read close '$fp': $OS_ERROR";
}

CodePudding user response:

Here is one way to "find the common lines". Of course, there is more than one way to do that in Perl :)

#!/usr/bin/perl -w

my %h;

for my $file (@ARGV) {
    open (my $fh, $file) or die "$file: $!\n";
    while(<$fh>) {
        chomp;
        push @{$h{$_}}, $file;
    }
}

for (sort keys %h) {
    if(@{$h{$_}} > 1) {
        print "line <$_>\n";
        print "  occurs in ", join(", ", @{$h{$_}}), "\n";
    }
}

exit 0;

Now the test files, named {1,2,3}:

% cat 1
PING YA.RU (87.250.250.242): 56 data bytes
64 bytes from 87.250.250.242: icmp_seq=0 ttl=249 time=14.615 ms
64 bytes from 87.250.250.242: icmp_seq=1 ttl=249 time=14.943 ms
64 bytes from 87.250.250.242: icmp_seq=2 ttl=249 time=14.381 ms
64 bytes from 87.250.250.242: icmp_seq=3 ttl=249 time=14.852 ms
64 bytes from 87.250.250.242: icmp_seq=4 ttl=249 time=14.791 ms
% cat 2
PING YA.RU (87.250.250.242): 56 data bytes
64 bytes from 87.250.250.242: icmp_seq=0 ttl=249 time=14.615 ms
64 bytes from 87.250.250.242: icmp_seq=3 ttl=249 time=14.852 ms
64 bytes from 87.250.250.242: icmp_seq=4 ttl=249 time=14.791 ms
% cat 3
64 bytes from 87.250.250.242: icmp_seq=3 ttl=249 time=14.852 ms
64 bytes from 87.250.250.242: icmp_seq=4 ttl=249 time=14.791 ms

And test run of the script:

% ./try.pl 1 2 3
line <64 bytes from 87.250.250.242: icmp_seq=0 ttl=249 time=14.615 ms>
  occurs in 1, 2
line <64 bytes from 87.250.250.242: icmp_seq=3 ttl=249 time=14.852 ms>
  occurs in 1, 2, 3
line <64 bytes from 87.250.250.242: icmp_seq=4 ttl=249 time=14.791 ms>
  occurs in 1, 2, 3
line <PING YA.RU (87.250.250.242): 56 data bytes>
  occurs in 1, 2
  •  Tags:  
  • Related