I would like to be able to also retrieve the file owner , LastAccessTime, LastWriteTime, CreationTime. Get-Childitem has known performance issues when scaled to large directory structures.
We had some performance issue while looking for files in a folder which have more than 100000 subfolders.
Here is my script:
$Dir = get-childitem "W:\DATA" -recurse -force
$Dir | Select-Object name,fullname, LastAccessTime, LastWriteTime, CreationTime, @{N='Owner';E={$_.GetAccessControl().Owner}} | Export-Csv -path C:\Scripts\xlsx.csv -NoTypeInformation
thanks in advance,
CodePudding user response:
Give this a try, should be faster than Get-ChildItem. You could also use [SearchOption]::AllDirectories and no Collections.Queue but I'm not certain if that would consume less memory.
using namespace System.Collections
using namespace System.IO
class FileInfoProps {
[string]$Name
[string]$FullName
[datetime]$LastAccessTime
[datetime]$LastWriteTime
[datetime]$CreationTime
[string]$Owner
FileInfoprops([FileInfo]$FileInfo)
{
$this.Name = $FileInfo.Name
$this.FullName = $FileInfo.FullName
$this.LastAccessTime = $FileInfo.LastAccessTime
$this.LastWriteTime = $FileInfo.LastWriteTime
$this.CreationTime = $FileInfo.CreationTime
$this.Owner = $FileInfo.GetAccessControl().Owner
}
}
$initialDirectory = $pwd.Path
$queue = [Queue]::new()
$queue.Enqueue($initialDirectory)
$result = while ($queue.Count) {
$target = $queue.Dequeue()
foreach ($childs in [Directory]::EnumerateDirectories($target)) {
$queue.Enqueue($childs)
}
[FileInfoProps][FileInfo]$target # => Remove this line if you want only files!
[FileInfoProps[]][FileInfo[]][Directory]::GetFiles($target)
}
$result | Export-Csv ....
CodePudding user response:
Memory
PowerShell objects (PSCustomObject) are optimized for streaming (One-at-a-time processing) and therefore quiet heavy.
Using parenthesis ((...)) or assigning you stream to a variable (like: $Dir =) will choke the pipeline and pile up all the objects into memory.
To reduce memory usage, immediately pass your objects through the pipeline by chaining the concerned cmdlets with a pipe character:
Get-childitem "W:\DATA" -recurse -force |
Select-Object astAccessTime, LastWriteTime, CreationTime |
Export-Csv -path C:\Scripts\xlsx.csv -NoTypeInformation
Performance
Starting with a quote from PowerShell scripting performance considerations:
PowerShell scripts that leverage .NET directly and avoid the pipeline tend to be faster than idiomatic PowerShell. Idiomatic PowerShell typically uses cmdlets and PowerShell functions heavily, often leveraging the pipeline, and dropping down into .NET only when necessary.
In your case, the performance bottleneck is likely not in PowerShell but due to the server and the network. Meaning leveraging from .NET directly would probably not have any effect on the performance.
In fact, using the PowerShell pipeline might be even faster in this case as you do not have to wait until the last file info item is loaded into memory where the native PowerShell pipeline immediately starts processing at the first item while the next items are (slowly) provided by the server.
If you change the last cmdlet (Export-Csv) to ConvertTo-Csv you will probably see the difference where a (correctly setup) pipeline almost starts on fly and other solutions take a while before outputting any data to the console.
The numbers tell the tale
(In Dutch: "meten is weten", which literally means: measuring is knowing)
If you aren't sure what technique would give you the best performance, I recommend you to simply test it (on a subset), like:
Measure-Command {
Get-childitem "W:\DATA" -recurse -force |
Select-Object astAccessTime, LastWriteTime, CreationTime |
Export-Csv -path C:\Scripts\xlsx.csv -NoTypeInformation
} | Select-Object TotalMilliseconds
and compare the results.
