Accessing multiple columns when corresponding to certain rows-CodePudding

Right now, I am running this line of code:

PIDdf = clean_CSV['PID'].value_counts(normalize = True).nlargest(5).mul(100).round(1).astype(str)   '%'

on this dataset

Proto	Local Address	Foreign Address	State	PID	Process_name
TCP	[0.0.0.0:7]	0.0.0.0:0	LISTENING	4112	tcpsvcs.exe
TCP	0.0.0.0:111	0.0.0.0:0	LISTENING	4	Can not obtain ownership information

and the code returns just this

PID
6356	11.1%
32744	10.4%
9196	3.3%
2652	3.3%
27468	3.3%

But I would like to see this:

PID		Process_name
6356	11.1%	sdfasdfa
32744	10.4%	adsfasdf
9196	3.3%	asdfasd
2652	3.3%	asdfsad
27468	3.3%	asdfsdaf

Is there a better of doing this rather than just finding the largest columns of the same process_names and appending it?

CodePudding user response：

IIUC , here is one way :

PIDdf = df[['PID','Process_name']].value_counts(normalize = True).nlargest(5).mul(100).round(1).astype(str)   '%'

another way :

PIDdf = df.groupby(['PID','Process_name'])['PID'].count().divide(df.shape[0]).nlargest(5).mul(100).round(1).astype(str)   '%'

output:

>>>
PID   Process_name                        
4     Can not obtain ownership information    50.0%
4112  tcpsvcs.exe                             50.0%