I have a script that was put together with the community help and it works great. I'm trying to alter it to work for my new requirements. I have data (See below) that needs to be captured from 10K files. Inside each file could be multiple iterations of the data I need to capture. The script I'm using captured only the file name and not the data I need. Can someone see where I'm off (Code wise) and offer another option.
Here is my Raw Data. There could be four or five places in a text member with this data and I need to capture it all. The words in Bold (Select and From) are the words I'm keying on. I need to capture those lines as well as all the lines between them.
SELECT FIL_ID_NUM, 00481002
LAU_CRE_DTS, 00482002
LAU_CD, 00483002
LAU_EFF_DT, 00484002
LAU_EFF_TIM, 00485002
LAU_EXP_DT, 00486002
LAU_EXP_TIM, 00487002
LAU_SERV_REAS1_CD, 00488002
LAU_SERV_REAS2_CD, 00489002
LAU_SERV_REAS3_CD, 00489102
LAU_CARE_IND, 00489202
SEG_UPDT_LOCK_NUM, 00489302
LAU_AGRMT_PER_NAM, 00489402
LAU_AGRMT_DSGT_NAM, 00489502
LAU_AGRMT_DIR_TXT, 00489602
LAU_TERMS_TXT 00489702
00520002
FROM TLEGAL
Here is my script....
# create a List object to collect the 'flowerbox' strings in
$Flowerbox = [System.Collections.Generic.List[string]]::new()
$treat = $false
# get a list of the .pl1m files and loop through. Collect all output in variable $result
$CBLFileList = Get-ChildItem -Path 'C:\src' -Filter '*.pl1m' -File -Recurse
$result = foreach ($CBLFile in $CBLFileList) {
Write-Host "Processing ... $($CBLFile.Name)" -ForegroundColor Green
$Flowerbox.Clear() # empty the list for the next run
$treat = $false
switch -Regex -File $CBLFile.FullName {
'SELECT' {
Write-Host "Found Select" -ForegroundColor DarkYellow
# start collecting lines from here
$treat = $true
}
'FROM ' {
Write-Host "Found From" -ForegroundColor DarkRed
if ($treat) {
# stop colecting Flowerbox lines and output what we already have
# output an object with the two properties you need
[PsCustomObject]@{
Program = $CBLFile.Name # or $CBLFile.FullName
Description = $Flowerbox -join [environment]::NewLine
}
}
$Flowerbox.Clear() # empty the list for the next run
$treat = $false
}
default {
# Add captures data here
if ($treat -and ($_ -match '/[^\s\\]/')) {
$Flowerbox.Add($Matches[1])
}
}
}
}
# now you have everything in an array of PSObjects so you can save that as Csv
$result | Export-Csv -Path 'C:\src\SQL_Desc.csv' -UseCulture -NoTypeInformation
CodePudding user response:
Your script needs only a few tweaks:
- Add the full line, reflected in
$_, to your$Flowerboxlist - Also do so in the
'SELECT'and'FROM 'branches.
# ...
$Flowerbox.Clear() # empty the list for the next run
$treat = $false
switch -Regex -CaseSensitive -File $CBLFile.FullName {
'^\s*SELECT\s' {
Write-Host "Found Select" -ForegroundColor DarkYellow
# start collecting lines from here
$treat = $true
$Flowerbox.Add($_) # Add this line to the list.
continue # So that the FROM branch needn't also be checked.
}
'^\s*FROM\s' {
Write-Host "Found From" -ForegroundColor DarkRed
if ($treat) {
$Flowerbox.Add($_) # Add this line to the list.
# stop colecting Flowerbox lines and output what we already have
# output an object with the two properties you need
[PsCustomObject]@{
Program = $CBLFile.Name # or $CBLFile.FullName
Description = $Flowerbox -join [environment]::NewLine
}
$Flowerbox.Clear() # empty the list for the next run
$treat = $false
}
}
default {
if ($treat) {
$Flowerbox.Add($_) # Add this line to the list.
}
}
}
# ...
Note that, in the interest of minimizing the risk of false positives:
-CaseSensitivewas added to theswitchstatement for case-sensitive matching.The regexes were made more robust to prevent accidental substring matching and to tolerate whitespace variations (spaces vs. tabs).
CodePudding user response:
You could try using this function to find all those files that contain those fragments starting on SELECT and ending on FROM, the function would output objects with the captured fragment (Value), Index, Path and Length. The regex is likely to be improved, but this worked for me with some test files.
NOTE: this answer requires that the files fit in memory, as the functions reads the files as a whole string.
using namespace System.IO
function Find-String {
param(
[parameter(ValueFromPipeline, Mandatory)]
[Alias('PSPath')]
[FileInfo]$Path,
[parameter(Mandatory, Position = 0)]
[regex]$Pattern,
[switch]$AllMatches
)
process {
$content = [File]::ReadAllText($Path)
$match = if($AllMatches.IsPresent) {
$Pattern.Matches($content)
}
else {
$Pattern.Match($content)
}
if($match.Success -notcontains $true) {
return
}
foreach($m in $match) {
[pscustomobject]@{
Path = $path.FullName
Value = $m.Value
Index = $m.Index
Length = $m.Length
}
}
}
}
Get-ChildItem -Path 'C:\src' -Filter '*.pl1m' -Recurse |
Find-String -Pattern '(?msi)SELECT.*?FROM.*?$' -AllMatches |
Export-Csv path\to\export.csv -NoTypeInformation
Output with one particular file looks like this:
Path : /home/user/Documents/test.txt
Value : SELECT FIL_ID_NUM, 00481002
...
...
...
FROM SOMETABLE
Index : 11
Length : 449
Path : /home/user/Documents/test.txt
Value : SELECT FIL_ID_NUM, 00481002
...
...
...
FROM SOMEOTHERTABLE
Index : 501
Length : 522
