Using PowerShell, I have 14 arrays of strings. Some of the arrays are empty. How would I get the intersection (all elements that exist in all of the arrays) of these arrays (excluding the arrays that are empty)? I am trying to avoid comparing two arrays at a time.
Some of the arrays are empty, so I do not want to include those in my comparisons. Any ideas on how I would approach this? Thank you.
$a = @('hjiejnfnfsd','test','huiwe','test2')
$b = @('test','jnfijweofnew','test2')
$c = @('njwifqbfiwej','test','jnfijweofnew','test2')
$d = @('bhfeukefwgu','test','dasdwdv','test2','hfuweihfei')
$e = @('test','ddwadfedgnh','test2')
$f = @('test','test2')
$g = @('test','bjiewbnefw','test2')
$h = @('uie287278hfjf','test','huiwhiwe','test2')
$i = @()
$j = @()
$k = @('jireohngi','test','gu7y8732hbj','test2')
$l = @()
$m = @('test','test2')
$n = @('test','test2')
My attempt to solve this (although it does not check for empty arrays):
$overlap = Compare-Object $a $b -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $c -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $d -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $e -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $f -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $g -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $h -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $i -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $j -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $k -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $l -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $m -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $n -PassThru -IncludeEqual -ExcludeDifferent
My desired result is that test and test2 appear in $overlap. This solution does not work because it does not check if the array it is comparing is empty.
CodePudding user response:
Note: The following assumes that no individual array contains the same string more than once (more work would be needed to address that).
$a = @('hjiejnfnfsd','test','huiwe','test2')
$b = @('test','jnfijweofnew','test2')
$c = @('njwifqbfiwej','test','jnfijweofnew','test2')
$d = @('bhfeukefwgu','test','dasdwdv','test2','hfuweihfei')
$e = @('test','ddwadfedgnh','test2')
$f = @('test','test2')
$g = @('test','bjiewbnefw','test2')
$h = @('uie287278hfjf','test','huiwhiwe','test2')
$i = @()
$j = @()
$k = @('jireohngi','test','gu7y8732hbj','test2')
$l = @()
$m = @('test','test2')
$n = @('test','test2')
$allArrays = $a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n
# Initialize a hashtable in which we'll keep
# track of unique strings and how often they occur.
$ht = @{}
# Loop over all arrays.
foreach ($arr in $allArrays) {
# Loop over each array's elements.
foreach ($el in $arr) {
# Add each string and increment its occurrence count.
$ht[$el] = 1
}
}
# Output all strings that occurred in every non-empty array
$ht.GetEnumerator() |
Where-Object Value -eq ($allArrays | Where-Object Count -gt 0).Count |
ForEach-Object Key
The above outputs those strings that are present in all of the non-empty input arrays:
test2
test
CodePudding user response:
Here is a solution using a Hashset. A Hashset is a collection that stores only unique items. It has a method IntersectWith which accepts any enumerable type (such as an array) as argument. The method modifies the original Hashset so that it contains only the elements which are contained in both the Hashset and the argument passed to the method.
# Test input
$a = @() # I changed this to empty array for demonstration purposes
$b = @('test','jnfijweofnew','test2')
$c = @('njwifqbfiwej','test','jnfijweofnew','test2')
$d = @('bhfeukefwgu','test','dasdwdv','test2','hfuweihfei')
$e = @('test','ddwadfedgnh','test2')
$f = @('test','test2')
$g = @('test','bjiewbnefw','test2')
$h = @('uie287278hfjf','test','huiwhiwe','test2')
$i = @()
$j = @()
$k = @('jireohngi','test','gu7y8732hbj','test2')
$l = @()
$m = @('test','test2')
$n = @('test','test2')
# Create an empty hashset
$overlap = [Collections.Generic.Hashset[object]]::new()
# For each of the arrays...
($a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n).
Where{ $_.Count -gt 0 }. #... except the empty ones
ForEach{
# If the result Hashset is still empty
if( $overlap.Count -eq 0 ) {
# Create the initial hashset from the first non-empty array.
$overlap = [Collections.Generic.Hashset[object]] $_
}
else {
# Hashset is already initialized, calculate the intersection with next non-empty array.
$overlap.IntersectWith( $_ )
}
}
$overlap # Output
Output:
test
test2
Remarks:
To filter out empty arrays (or in general any kind of collection), we check its
Countmember, which gives the number of elements..Foreachand.Whereare PowerShell intrinsic methods. These can be faster than theForEach-ObjectandWhere-Objectcommands, especially when working directly with collections (as opposed to output of another command). The automatic variable$_represents the current object, as usual.This code using pipeline commands is functionally the same:
$overlap = [Collections.Generic.Hashset[object]]::new() $a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n | Where-Object Count -gt 0 | ForEach-Object{ if( $overlap.Count -eq 0 ) { $overlap = [Collections.Generic.Hashset[object]] $_ } else { $overlap.IntersectWith( $_ ) } }With the first variant, inserting a linebreak before
WhereandForEachis not really necessary, but improves code readability (note that you can't insert a linebreak before.Whereand.ForEach, because this confuses the PowerShell parser).
CodePudding user response:
You're close. Excluding empty arrays from comparison is essential because the intersection of an empty array and any other array is an empty array, and once $overlap contains an empty array that will be the final result regardless of what subsequent arrays contain.
Here's your code with the non-empty check and rewritten using loops...
$a = @('hjiejnfnfsd', 'test', 'huiwe', 'test2')
$b = @('test', 'jnfijweofnew', 'test2')
$c = @('njwifqbfiwej', 'test', 'jnfijweofnew', 'test2')
$d = @('bhfeukefwgu', 'test', 'dasdwdv', 'test2', 'hfuweihfei')
$e = @('test', 'ddwadfedgnh', 'test2')
$f = @('test', 'test2')
$g = @('test', 'bjiewbnefw', 'test2')
$h = @('uie287278hfjf', 'test', 'huiwhiwe', 'test2')
$i = @()
$j = @()
$k = @('jireohngi', 'test', 'gu7y8732hbj', 'test2')
$l = @()
$m = @('test', 'test2')
$n = @('test', 'test2')
# Create an array of arrays $a through $n
$arrays = @(
# 'a'..'n' doesn't work in Windows PowerShell
# Define both ends of the range...
# 'a' → [String]
# 'a'[0] → [Char]
# [Int32] 'a'[0] → 97 (ASCII a)
# ...and cast each element back to a [Char]
[Char[]] ([Int32] 'a'[0]..[Int32] 'n'[0]) |
Get-Variable -ValueOnly
)
# Initialize $overlap to the first non-empty array
for ($initialOverlapIndex = 0; $initialOverlapIndex -lt $arrays.Length; $initialOverlapIndex )
{
if ($arrays[$initialOverlapIndex].Length -gt 0)
{
break;
}
}
<#
Alternative:
$initialOverlapIndex = [Array]::FindIndex(
$arrays,
[Predicate[Array]] { param($array) $array.Length -gt 0 }
)
#>
$overlap = $arrays[$initialOverlapIndex]
for ($comparisonIndex = $initialOverlapIndex 1; $comparisonIndex -lt $arrays.Length; $comparisonIndex )
# Alternative: foreach ($array in $arrays | Select-Object -Skip $initialOverlapIndex)
{
$array = $arrays[$comparisonIndex]
if ($array.Length -gt 0)
{
$overlap = Compare-Object $overlap $array -PassThru -IncludeEqual -ExcludeDifferent
}
}
$overlap
...which outputs...
test
test2
