Home > OS >  Appending a Pandas Series to a Dataframe in a loop
Appending a Pandas Series to a Dataframe in a loop

Time:01-31

I am trying to append the results of my nmap scan into a dataframe.

def vulnScan(targets):
    portInfo =[]
    columnNames = ["Port","Protocol","State","Service"]
    for target in targets:
        portsDF = pd.DataFrame(columns = columnNames)
        print("Executing: nmap -Pn " target[1])
        result = subprocess.run(['nmap','-Pn',target[1]], universal_newlines = True, stdout = subprocess.PIPE)
        for line in result.stdout.split("\n"):
            if "/" in line and "Starting" not in line:
                tableInfo = line.split(" ")
                port = tableInfo[0].split("/")[0]
                protocol = tableInfo[0].split("/")[1]
                status = tableInfo[1]
                service = tableInfo[3]
                print(port,protocol,status,service)
                newRow = pd.Series(data=[port,protocol,status,service],index=["Port","Protocol","State","Service"])
                portsDF = portsDF.append(newRow, ignore_index=True)
                print(tabulate(portsDF, headers="keys",tablefmt='psql'))
        portInfo = portInfo.append([target[0],portsDF])
    print("")
    print(tabulate(portInfo, headers="keys", tablefmt='psql'))

However as you can see from the output, the dataframe is never populated.

80 tcp
 -------- ------------ --------- ----------- 
| Port   | Protocol   | State   | Service   |
|-------- ------------ --------- -----------|
 -------- ------------ --------- ----------- 
135 tcp open msrpc
 -------- ------------ --------- ----------- 
| Port   | Protocol   | State   | Service   |
|-------- ------------ --------- -----------|
 -------- ------------ --------- ----------- 
139 tcp open netbios-ssn
 -------- ------------ --------- ----------- 
| Port   | Protocol   | State   | Service   |
|-------- ------------ --------- -----------|
 -------- ------------ --------- ----------- 
443 tcp open https
 -------- ------------ --------- ----------- 
| Port   | Protocol   | State   | Service   |
|-------- ------------ --------- -----------|
 -------- ------------ --------- ----------- 
445 tcp open microsoft-ds
 -------- ------------ --------- ----------- 
| Port   | Protocol   | State   | Service   |
|-------- ------------ --------- -----------|
 -------- ------------ --------- ----------- 

 ----------------- ------------------------------------------- 
| 0               | 1                                         |
|----------------- -------------------------------------------|
| DESKTOP-30UOSMD | Empty DataFrame                           |
|                 | Columns: [Port, Protocol, State, Service] |
|                 | Index: []                                 |
 ----------------- ------------------------------------------- 

I am not sure what I am missing as I have checked the documentation https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.append.html and I think I am using append() correctly

Updated Section

Richards answer seems to work but has caused the portInfo list to no longer be a list, it is now of class None.

def vulnScan(targets):
    portInfo = []
    print(type(portInfo))
    columnNames = ["Port","Protocol","State","Service"]
    for target in targets:
        rows = []
        print("Executing: nmap -Pn " target[1])
        result = subprocess.run(['nmap','-Pn',target[1]], universal_newlines = True, stdout = subprocess.PIPE)
        for line in result.stdout.split("\n"):
            #This could be improved "/" indicates a row in table output
            if "/" in line and "Starting" not in line:
                tableInfo = line.split(" ")
                port = tableInfo[0].split("/")[0]
                protocol = tableInfo[0].split("/")[1]
                status = tableInfo[1]
                service = tableInfo[3]
                print(port,protocol,status,service)
                newRow = pd.Series(data=[port,protocol,status,service],index=["Port","Protocol","State","Service"])
                rows.append(newRow)

        portsDF = pd.DataFrame(rows, columns = columnNames)
        print(tabulate(portsDF, headers="keys", tablefmt='psql'))
        portInfo = portInfo.append([target[0],portsDF])
        print(type(portInfo))
        print(portInfo)

Output:

<class 'list'>
Executing: nmap -Pn 192.168.1.86
80 tcp
135 tcp open msrpc
139 tcp open netbios-ssn
443 tcp open https
445 tcp open microsoft-ds
 ---- -------- ------------ --------- -------------- 
|    |   Port | Protocol   | State   | Service      |
|---- -------- ------------ --------- --------------|
|  0 |     80 | tcp        |         |              |
|  1 |    135 | tcp        | open    | msrpc        |
|  2 |    139 | tcp        | open    | netbios-ssn  |
|  3 |    443 | tcp        | open    | https        |
|  4 |    445 | tcp        | open    | microsoft-ds |
 ---- -------- ------------ --------- -------------- 
<class 'NoneType'>
None

In the portInfo list we should have a list object with the hostname(string) and portinformation(dataframe).

CodePudding user response:

pandas.DataFrame.append is not in-place, so it returns a new objects, as the docs page you linked says. Therefore, you'd usually do something like this:

portsDF = portsDF.append(newRow, ignore_index=True)

But in this case, you're populating the dataframe in a loop, so running the above would just create a new variable local to the loop called portsDF, and not modify the original portsDF.

So, in such a case, I'd create a list and append each row to it, and then create portsDF from it, after the loop is done:

columnNames = ["Port","Protocol","State","Service"]
for target in targets:
    # New code:
    rows = []

    print("Executing: nmap -Pn " target[1])
    result = subprocess.run(['nmap','-Pn',target[1]], universal_newlines = True, stdout = subprocess.PIPE)
    for line in result.stdout.split("\n"):
        if "/" in line and "Starting" not in line:
            tableInfo = line.split(" ")
            port = tableInfo[0].split("/")[0]
            protocol = tableInfo[0].split("/")[1]
            status = tableInfo[1]
            service = tableInfo[3]
            print(port,protocol,status,service)
            
            newRow = pd.Series(data=[port,protocol,status,service],index=["Port","Protocol","State","Service"])
            # New Code:
            rows.append(newRow)
    
    # New code:
    portsDF = pd.DataFrame(rows, columns=columnNames)

CodePudding user response:

Since Series is designed to hold atomic values of same type, avoid using for multiple columns. Instead build a list of dictionaries to pass into DataFrame constructor outside loop.

Below separates the command line call for organization and exception handling. Also, target[0] may be a hashable value to use as key (instead of a list element) to identify each data frame object for a list of data frame dictionaries.

def run_cmd(t):
    print("Executing: nmap -Pn " t)
    result = subprocess.run(
        ['nmap','-Pn',t], 
        universal_newlines = True, 
        stdout = subprocess.PIPE
    ) 
        
    return result.stdout.split("\n")
         
def vulnScan(targets): 
    portInfo = []
    for target in targets: 
        rows = [] 
        output_lines = run_cmd(target[1])
        for line in output_lines:
            if "/" in line and "Starting" not in line: 
                tableInfo = line.split(" ")
                d = {
                    "port": tableInfo[0].split("/")[0],
                    "protocol": tableInfo[0].split("/")[1],
                    "status": tableInfo[1],
                    "service": tableInfo[3]
                }
                rows.append(d)

        portDF = {target[0]: pd.DataFrame(rows)}
        portInfo.append(portDF)

    return portInfo
  •  Tags:  
  • Related