I'm pretty sure the answer is no, but it keeps bugging me.
I have been tasked with finding duplicate files in certain location, recursively. I can do that with no problem. But seeing as some of the files have 3 or 4 duplicates I cannot answer the question of "How many files are originals?" without resorting to excel editing.
Code:
gci -path $path -recurse -file -erroraction silentlycontinue|
Select @{l='Original Filename';e={$_.PSChildName}}, @{l='Compare Filename';e={$_.BaseName.replace('_','*').replace(' ','*').replace('-','*')}}, @{l="Path";e={$_.PSParentPath.Substring(38,$_.PSParentPath.Length-38)}}, @{l="Link";e={$_.FullName}}|
group -Property 'Compare Filename'|
Where {$_.count -ge 2}|
%{$_.group}|
Export-Csv -Path $path2 -NoTypeInformation
Path variables are irrelevant, so I will not be listing them.
EDIT: I have tested both of the provided resolutions, as well as read the marvelous explanation provided by mklement0. In the end, at least with ~4k files I am working with, the speed of both resolutions is comparable. See below for the 'measure-command' output.
CodePudding user response:
To reliably count the number of groups (Microsoft.PowerShell.Commands.GroupInfo instances) that Group-Object outputs, use either of the following:
- Pipeline-based, as suggested by zett42; while comparatively slow, this results in streaming processing that doesn't require collecting all
Group-Objectoutput in memory first:
(1, 1, 1 | Group-Object | Measure-Object).Count # -> 1 (group)
- Concise, expression-based, as suggested by Lee Dailey; note that this involves collecting all output objects in memory first:
@(1, 1, 1 | Group-Object).Count # -> 1 (group)
# Alternative, using .Length
(1, 1, 1 | Group-Object).Length # -> 1 (group)
Note:
To count all original (non-duplicate) objects, i.e. those that are in a group of their own, simply append
| Where-Object Count -eq 1toGroup-Objectabove.The use of
@(), the array-subexpression operator, is crucial in this case: It ensures that theGroup-Objectoutput is considered an array even if only a single group happens to be output.- This ensures that it is the array's
.Countproperty that is queried rather than a singleGroupInfoinstance's own.Countproperty - which reflects the count of members of the group, and would be3in the example above (try(1, 1, 1 | Group-Object).Count).
- This ensures that it is the array's
Alternatively, using
.Lengthinstead of.Countbypasses this naming conflict:.Lengthand.Countare aliases of each other and are both provided as intrinsic properties even on scalars (single objects), as part of the unified handling of scalars and collections in PowerShell: That is, PowerShell presents even any single object with.Length/.Countproperties that indicate the count of that object, which by definition is1- unless preempted by a type-native property of the same name.The intrinsic
.Lengthproperty therefore works as expected, given thatGroupInfohas no.Lengthproperty.The inverse scenario can be demonstrated with a string scalar:
'foo'.Lengthis3- the value of the type-native.Lengthproperty reflecting the character count - whereas'foo'.Countis1- the intrinsic.Countproperty that "counts" the single object.
In the pipeline solution with
Measure-Objectthe problem doesn't arise due to the pipeline's enumeration behavior: however many objectsGroup-Objectoutputs are sent one by one through the pipeline, andMeasure-Objectcounts them - and in this case the value of the type-native.Countproperty of the always singleMicrosoft.PowerShell.Commands.GenericMeasureInfoinstance thatMeasure-Objectoutputs is the value of interest.


