Right now, I am running this line of code:
PIDdf = clean_CSV['PID'].value_counts(normalize = True).nlargest(5).mul(100).round(1).astype(str) '%'
on this dataset
| Proto | Local Address | Foreign Address | State | PID | Process_name |
|---|---|---|---|---|---|
| TCP | [0.0.0.0:7] | 0.0.0.0:0 | LISTENING | 4112 | tcpsvcs.exe |
| TCP | 0.0.0.0:111 | 0.0.0.0:0 | LISTENING | 4 | Can not obtain ownership information |
and the code returns just this
| PID | |
|---|---|
| 6356 | 11.1% |
| 32744 | 10.4% |
| 9196 | 3.3% |
| 2652 | 3.3% |
| 27468 | 3.3% |
But I would like to see this:
| PID | Process_name | |
|---|---|---|
| 6356 | 11.1% | sdfasdfa |
| 32744 | 10.4% | adsfasdf |
| 9196 | 3.3% | asdfasd |
| 2652 | 3.3% | asdfsad |
| 27468 | 3.3% | asdfsdaf |
Is there a better of doing this rather than just finding the largest columns of the same process_names and appending it?
CodePudding user response:
IIUC , here is one way :
PIDdf = df[['PID','Process_name']].value_counts(normalize = True).nlargest(5).mul(100).round(1).astype(str) '%'
another way :
PIDdf = df.groupby(['PID','Process_name'])['PID'].count().divide(df.shape[0]).nlargest(5).mul(100).round(1).astype(str) '%'
output:
>>>
PID Process_name
4 Can not obtain ownership information 50.0%
4112 tcpsvcs.exe 50.0%
