I have a file that contains multiples strings between parentheses that represents country's names.
This (USA) is a bad text (France)
with countries (Luxembourg) between () (Germany)
Whith multiple (Luxembourg) countries (USA) per line
and some lines without countries
To search (France) and find (Belgique) duplicate
countries (USA)
I want to extract all countries and display each country found on a new line.
What I'm expecting is following
USA
France
Luxembourg
Germany
Luxembourg
USA
France
Belgique
USA
Using a special tool named BS2EDT editor on BS2000 Mainframe, the solution can be
list-string /(?<=\()[^)] (?=\))/,from=lettre.pays.txt
Using PowerShell, what is the shorter solution ?
CodePudding user response:
Following tricky solutions is working on my PC
$regex = '(?<=\()[^)] (?=\))'
(Select-String .\lettre.pays.txt -Pattern $regex -AllMatches).Matches `
| Select-String -Pattern '.*' `
Get-Content command read input file.
First Select-String command find ALL strings using same Regex given in question.
.Matches and second Select-String command display strings found.
It is also possible to sort all countries found in adding SORT-OBJECT command !
$regex = '(?<=\()[^)] (?=\))'
(Select-String .\lettre.pays.txt -Pattern $regex -AllMatches).Matches `
| Select-String -Pattern '.*' `
| Sort-Object
that display following result ...
Belgique
France
France
Germany
Luxembourg
Luxembourg
USA
USA
USA
CodePudding user response:
Using powershell you can accomplish it using Select-String, definitely not shorter than what you already have:
Select-String .\lettre.pays.txt -Pattern '(?<=\()[^)] (?=\))' -AllMatches |
ForEach-Object { $_.Matches.Value }
As for your regex, I believe (?=\)) is not needed and could be removed from the pattern.
