Hi I would like to add a bit to the code where it splits the notes into two parts configurations and parameters. The configurations resides inside the [] of the notes and is to the left of the curly brackets (). The parameters however resides inside of the curly brackets (). For the notes that have parameters I want to split them up using a commas. If a parameter has one or more configurations a list that contains all elements of the config is separated by commas [element 1, element 2]. For parameters without any configs create and empty list []. If a note has no parameters then both the parameter and configuration section will be of type None. I want to achieve the results from the Expected Outputs below.
Code:
import re
import pandas as pd
lines = ['yes hello there', 'move on to the next command if the previous command was successful.',
"$$n:describes the '&&' character in the RUN command.",
'k',
'$$n[t(a1), mfc(s,expand,rr), np(), k]: description']
notes = []
parameters= []
configurations= []
for i, line in enumerate(lines):
if re.search(r'\$\$.*\:', line):
notes.append(re.sub(r'\$\$.*\:', '', line).strip())
df = pd.DataFrame({
'Note': notes,
'Parameters': parameters,
'configurations': configurations
})
Expected Output:
---- ------------------------------------------------ -------------- --------------------------
| | Note | Parameters | Configurations |
|---- ------------------------------------------------ -------------- --------------------------|
| 0 | describes the && character in the RUN command. | None | None |
| 1 | description | t,mfc,np,k | [a1],[s,expand,rr],[],[] |
---- ------------------------------------------------ -------------- --------------------------
CodePudding user response:
This will create sublists:
notes = []
parameters= []
configurations= []
for i, line in enumerate(lines):
expr = re.search(r'\$\$[^:[]*?(?:\[([^:\]]*)\])?\:', line)
if expr:
notes.append(re.sub(r'\$\$.*?\:', '', line).strip())
if expr[1]:
names = []
confs = []
for part in re.findall(r'([^(,] )(?:\(([^)]*)\))?', expr[1]):
names.append(part[0])
confs.append(part[1].split(",") if part[1] else [])
parameters.append(names)
configurations.append(confs)
else:
parameters.append(None)
configurations.append(None)
If you need those values to be strings instead of sublists, then:
notes = []
parameters= []
configurations= []
for i, line in enumerate(lines):
expr = re.search(r'\$\$[^:[]*?(?:\[([^:\]]*)\])?\:', line)
if expr:
notes.append(re.sub(r'\$\$.*?\:', '', line).strip())
if expr[1]:
names = []
confs = []
for part in re.findall(r'([^\s(,] )(?:\(([^)]*)\))?', expr[1]):
names.append(part[0])
confs.append(f"[{part[1]}]")
parameters.append(",".join(names))
configurations.append(",".join(confs))
else:
parameters.append(None)
configurations.append(None)
