Home > Blockchain >  Can someone optimize my .net RegEx for Powershell - parsing a table with errors
Can someone optimize my .net RegEx for Powershell - parsing a table with errors

Time:01-14

Currently I am trying to parse a table from a Microsoft Site (the GitHub Version of it) to get proper PowerShell Objects. I'll share the relevant code part so you can test it. It does parse what i want but i want the results to be already trimmed (no leading trailing spaces or line-breaks). I also have to get the result for "CNG Key Isolation" which has a different formatting. Only for that block of data my RegEx includes line breaks and I did not get it to work. I know I could do some parsing in PowerShell after the RegEx, but I want to get better with RegEx.

My not yet optimized RegEx looks like this

(?:^##\s*(?<ServiceTitle>[^\r\n#]*)[\r\n\s]*\|\s Name\s \|\s Description\s \|(?:[\r\n\s\|\-\*] Service name[\|\*\s] (?<ServiceName>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*] Description[\|\*\s] (?<Description>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*] Installation[\|\*\s] (?<Installation>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*] Startup type[\|\*\s] (?<StartupType>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*] Recommendation[\|\*\s] (?<Recommendation>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*] Comments[\|\*\s] (?<Comments>[^\|]*?)(?: ?\|))*)

You can test it here: https://regex101.com/r/xQDRCO/1

The data to parse comes from: https://raw.githubusercontent.com/MicrosoftDocs/windowsserverdocs/main/WindowsServerDocs/security/windows-services/security-guidelines-for-disabling-system-services-in-windows-server.md

It should basically take one block of data for each service and try to get

"ServiceTitle","ServiceName","Description","Installation","StartupType","Recommendation","Comments"

No matter what order they are or if one of them is missing. "ServiceTitle" is something special and has to be there.

Here is the PowerShell code I currently tested:

$fields = "ServiceTitle","ServiceName","Description","Installation","StartupType","Recommendation","Comments"
$RequestData = Invoke-WebRequest -UseBasicParsing -Uri https://raw.githubusercontent.com/MicrosoftDocs/windowsserverdocs/main/WindowsServerDocs/security/windows-services/security-guidelines-for-disabling-system-services-in-windows-server.md
$RegExMatches = [Regex]::Matches($RequestData.content,'(?:^##\s*(?<ServiceTitle>[^\r\n#]*)[\r\n\s]*\|\s Name\s \|\s Description\s \|(?:[\r\n\s\|\-\*] Service name[\|\*\s] (?<ServiceName>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*] Description[\|\*\s] (?<Description>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*] Installation[\|\*\s] (?<Installation>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*] Startup type[\|\*\s] (?<StartupType>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*] Recommendation[\|\*\s] (?<Recommendation>[^\|]*?)(?: ?\|)|[\r\n\s\|\-\*] Comments[\|\*\s] (?<Comments>[^\|]*?)(?: ?\|))*)',[System.Text.RegularExpressions.RegexOptions]::Multiline)
$FullList = @()
foreach ($entry in $RegExMatches) {$ServiceAsObject = [pscustomobject]@{};foreach ($field in $fields) {$ServiceAsObject | Add-Member -MemberType NoteProperty -Name $field -Value $entry.Groups[$field].value};$FullList  = $ServiceAsObject}
$FullList[15..17] # three items to see what problem i have with "CNG Key Isolation"

I am not using larger RegEx like that one often, so feel free to give me some feedback to improve myself.

Thank you, An-Dir

CodePudding user response:

This may not be what you are looking for, but you could do something like the following to output an array of custom objects:

$output = switch -regex ($requestdata.content -split '\r?\n') {
    '^##\s' {
        # tracking empty lines since there is one under the service title
        # start new hash table when a new service is found
        # remove ## from service title names
        $emptyLineCount = 0
        $hash = [ordered]@{}
        $hash.ServiceTitle = $_ -replace '^##\s'
    }
    '\| \*\*' {
        # split on | and surrounding spaces
        # replace ** so name is cleaner
        $key,$value = ($_ -split '\s*\|\s*' -replace '\*\*')[1,2]
        $hash[$key] = $value
    }
    '^$' {
        # when second empty line is reached in a service block, output object
        if ($hash.ServiceTitle -and   $emptyLineCount -eq 2) {
            [pscustomobject]$hash
        }
    }
}

# Finding a service by title
$output | Where ServiceTitle -eq 'CNG Key Isolation'

Splitting the contents makes an array of lines, which is easier for me to use switch statement.

CodePudding user response:

Assuming you have all that text in your $RequestData.content, then I wouldn't try to create one large regex to parse it all out into usable objects, but instead would do:

# first split the tables from the rest of the text and work on the table lines only
$result = ($RequestData.content -split '(?m)^The following tables.*:')[-1].Trim() -split '(?m)^## ' | 
    Where-Object { $_ -match '\S' } |
    ForEach-Object {
        # split each block to parse out the title and the table data
        $title, $table = ($_.Trim() -split '(\r?\n){2}', 2).Trim()
        # now remove the markdown stuff from the data and convert it using ConvertFrom-Csv
        $data = (($table -replace '(?m)^\|--\|--\||[*]{2}|^\||\|$' -replace '\s\|\s', '|') -split '\r?\n' -ne '').Trim()  | ConvertFrom-Csv -Delimiter '|'
        # set up an ordered Hashtable to store the data
        $hash = [ordered]@{ServiceTitle = $title}
        foreach ($item in $data) {
            $hash[$item.Name] = $item.Description
        }
        # output real objects
        [PsCustomObject]$hash
    }

$result
  •  Tags:  
  • Related