i read an txt file with syslog stuff
Oct 3 12:09:01 webv2 CRON[1903]: (root) CMD (sudo /usr/bin/python3 /var/www/security/py_scripts/security_stuff.py 01_report_connections 0 &)
Oct 3 12:09:01 webv2 CRON[1906]: (root) CMD ( [ -x /usr/lib/php/sessionclean ] && if [ ! -d /run/systemd/system ]; then /usr/lib/php/sessionclean; fi)
Oct 3 12:09:03 webv2 systemd[1]: Starting Clean php session files...
...
..
.
in an array named data (= insert len 6800)
data = string.splitlines()
,which should be filtered by an regex array
regexArray = [
['CRON:', [
'sec_stuff\.py report_cons'
,'\[ -x /usr/lib/php/sessionclean \] && if \[ ! -d /run/systemd/system \]; then /usr/lib/php/sessionclean; fi'
,'...'
,'..'
,'.'
]
],
[...]
]
over an normal function called
def search_regexStuff(what, strings, regexString = ''):
if what == 'allgemein':
return re.findall(r"" regexString "",strings)
,but the problem is, he found and delete only a part of each found regex stuff in the data array.
as example, for regex:
sec_stuff\.py report_cons
i have 2069 entries, but he delete in the data array only 1181. for other regex stuff is the same problem. for:
\[ -x /usr/lib/php/sessionclean \] && if \[ ! -d /run/systemd/system \]; then /usr/lib/php/sessionclean; fi
he found and delete 59 of 68
scope of that action is: i want to decrease the data len of that data array in each loop over pop or del to speed up the loop for the search. the rest of data array i write it in an other file. i cant find my fail why my code will not work. cant see the fail. =( plz help. thx
code:
for b in regexArray:
for c in b[1]:
regex = '.*' b[0][:-1] '.*' c '.*'
n = -1
for a in data:
n = 1
findLINE = search_regexStuff('allgemein', a, regex)
if len(findLINE) != 0: # found returned arraay not empty
del data[n]
n -= 1
o = ''
for i in data:
o = i '\n'
file = open('/folder/file_x.txt','w')
file.write(str(o))
file.close()
UPDATE (solution) and thx @timus:
i defined an extra function who throws me the new data array out to solve that problem
def cleanMyDataArray( data, regex):
o = ''; new_data = []
for a in data:
findLINE = search_regexStuff('allgemein', a, regex)
if len(findLINE) == 0: # not found
new_data.append( a )
return new_data
@code:
for b in regexArray:
for c in b[1]:
regex = '.*' b[0][:-1] '.*' c '.*'
data = cleanMyDataArray( data, regex)
thats it
CodePudding user response:
You're making a classic mistake: You remove items from a list while iterating over it. That tends to go south. Also, modifying a list with del is usually not very efficient.
Example: A list numbers from which you want to remove the even numbers. Your method
numbers = [1, 2, 3, 4, 5]
n = -1
for a in numbers:
n = 1
if a % 2 == 0:
del numbers[n]
n -= 1
print(numbers)
results in [1, 4, 5], which is obviously wrong. Why does that happen:
- Step: The first number is odd, so nothing happens, and
nbecomes0. - Step: The next item
2is even, so the list gets modified to[1, 3, 4, 5], andnstays0. - Step: Now the iteration grabs the item with index
2, which is4. Since it is even andnis0 1 == 1the item3gets removed, the list is[1, 4, 5]now, andnstays0. - Step: The remaining list has length
3, so no item with index3, thus the iteration stops.
How to fix this? Create a new list, ideally with a list comprehension:
numbers = [1, 2, 3, 4, 5]
numbers = [a for a in numbers if not a % 2 == 0]
print(numbers)
Result: [1, 3, 5]
Beyond that: You could use the pattern
for b in regexArray:
regex = b[0][:-1] '.*(' '|'.join(b[1]) ').*'
instead of iterating over b[1]. But there are some odd parts in your use of regex:
- Why do you close your patterns with
.*? - Why do you use
re.findallinstead ofre.search? - Do you really use
'...','..','.'instead of'\.\.\.'etc.? A pattern likeCRON.*....*matches anything that has at least 3 characters afterCRON-- is that what you want?
