Merging non-zeros, non-overlapping elements from two lists in Perl-CodePudding

I'm looking for a clean method in Perl to merge a collection of lists. The all have the same length, and each consists mainly of zeros, but also has short contiguous segments of non-zero entries. For example, here are two representative lists of length 25:

@flags1 = qw( 0  0  0  0 21 22 23  0  0  0  0  0  0  0  0 41 42 43  0  0  0  0  0  0  0);
@flags2 = qw(11 12 13  0  0  0  0  0  0  0  0  0  0 31 32 33  0  0  0  0  0 51 52  53 0);

The objective is to merge the elements of @flags2 into @flags1 for all the places where a contiguous clump of non-zero elements in @flags2 replaces only zero entries in @flags1. If there is an overlap with any of the non-zero elements of @flags1, the associated contiguous clump of non-zero values in @flags2 is discarded instead of being merged.

Thus, for the example above, the contiguous clump of values 31, 32, and 33 in @flags2[13..15] are discarded because one of the entries, $flags2[15] is non-zero and collides with the non-zeros value at $flags1[15]. The resulting desired merged list would be:

@merged = qw(11 12 13  0 21 22 23  0  0  0  0  0  0  0  0 41 42 43  0  0  0 51 52  53  0);

I have experimented with collecting up contiguous elements of non-zero elements into a list of lists, and then comparing them using for and if statements, but it's a mess, and I think it will be difficult for any other developer to understand the logic. If anyone could propose an more elegant solution that would be much appreciated.

CodePudding user response：

use List::Util qw( none );

my $s = 0;
while (1) {
   # Find start of next clump.
     $s while $s < @flags2 && !$flags2[$s];

   # Exit if at end of array.
   last if $s == @flags2;

   # Find end of clump.
   my $e = $s 1;
     $e while $e < @flags2 && $flags2[$e];

   # Merge in clump.
   my @clump = $s .. $e-1;
   if ( none { $_ } @flags1[ @clump ] ) {      # Or `!grep { $_ }`
      @flags1[ @clump ] = @flags2[ @clump ];
   }

   $s = $e;

   # Exit if at end of array.
   last if $s == @flags2;
}

CodePudding user response：

Your approach is workable, it just needs some organization. Let's take it a step at a time:

sub to_ranges {
  my $in = shift;
  my (@ret, $in_range);
  for my $i (0 .. $#$in) {
    if ($in->[$i]) {
      if ($in_range) { # Extend an existing range
        $ret[-1]{end} = $i;
        push @{$ret[-1]{values}}, $in->[$i];
      } else { # Start a new one
        push @ret, { start => $i, end => $i, values => [ $in->[$i] ] };
        $in_range = 1;
      }
    } else {
      $in_range = 0;
    }
  }
  # Dummy entry to make sure the output will be padded to the right length
  push @ret, { start => scalar @$in, end => scalar @$in, values => [] };
  return \@ret;
}

This turns a list into a list of "chunks", each of which knows its start, end, and the values it contains. (end is not strictly necessary but it makes things tidier).

sub from_ranges {
  my $in = shift;
  my @ret;
  for my $r (@$in) {
    push @ret, 0 while $#ret < $r->{end};
    splice @ret, $r->{start}, $r->{end} - $r->{start}   1, @{ $r->{values} };
  }
  return \@ret;
}

This does the reverse transformation: from_ranges(to_ranges(\@x)) should contain the same elements as @x.

sub overlaps_any {
  my ($r, $ll) = @_;
  for my $l (@$ll) {
    return 1 if $r->{start} >= $l->{start} && $r->{start} <= $l->{end};
    return 1 if $r->{end} >= $l->{start} && $r->{end} <= $l->{end};
  }
  return 0;
}

This is a helper that returns true if the range $r overlaps any of the ranges in @$ll.

sub merge_ranges {
  my ($ll, $rr) = @_;
  my @rr_new = grep { !overlaps_any($_, $ll) } @$rr;
  return [
    sort {
      $a->{start} <=> $b->{start}
    } @$ll, @rr_new
  ];
}

And this takes two sets of ranges, @$ll and @$rr and returns all of the ranges in @$ll plus the ranges in @$rr that don't overlap. The sort is actually only for ease of debugging; you can just return [ @$ll, @rr_new ] if you prefer.

sub merge {
  my ($ll, $rr) = @_;
  return from_ranges(
    merge_ranges(
      to_ranges($ll),
      to_ranges($rr),
    )
  );
}

put the pieces together, and it works.

ikegami has provided an overall simpler soluion, but I'll still offer up this one because maybe you have other things you need to do that would benefit from this representation.