This is my sample data frame.
df =pd.DataFrame({'Name':['Matt', 'John', 'Sam', 'Harry', 'Don', 'Julia', 'Chris', 'Nelson'],
'Age': [30, 34, 56, 43, 67, 23, 19, 8],
'Marks': [100, 200, 434, 543, 234, 412,678, 832],
'Weight': [89, 56, 74, 34, 98, 102, 71, 80]})
I need to find out the two highest and two lowest values per columns and return Name and corresponding values for those rows. I can get them individually but is there a way to get them all together in a table in one code
This is the sample output I am looking for (one table each for lowest and highest values). The table below is for highest values.
| Column Name | Name | Rank | Value |
|---|---|---|---|
| Age | Don | 1 | 67 |
| Age | Sam | 2 | 56 |
| Marks | Nelson | 1 | 832 |
| Marks | Chris | 2 | 678 |
There are more than 80 columns. I am new to Python and some sort of loop would be really helpful.
CodePudding user response:
Use nlargest and nsmallest to get the values and make a data frame by concat.
import pandas as pd
df =pd.DataFrame({'Name':['Matt', 'John', 'Sam', 'Harry', 'Don', 'Julia', 'Chris', 'Nelson'],
'Age': [30, 34, 56, 43, 67, 23, 19, 8],
'Marks': [100, 200, 434, 543, 234, 412,678, 832],
'Weight': [89, 56, 74, 34, 98, 102, 71, 80]})
queries = []
for c in ['Age', 'Marks', 'Weight']:
query = df.nlargest(2, [c])[['Name', c]].rename(columns={c: 'Value'}).reset_index(drop=True)
query['Column Name'] = c
query['Rank'] = query.index 1
query = query[['Column Name', 'Name', 'Rank', 'Value']]
queries.append(query)
new_df = pd.concat(queries).reset_index(drop=True)
print(new_df)
CodePudding user response:
Here is my attempt. Only the highest values are handled to illustrate the concept. It is an alternate way that uses the append method:
import pandas as pd
df =pd.DataFrame({'Name':['Matt', 'John', 'Sam', 'Harry', 'Don', 'Julia', 'Chris', 'Nelson'],
'Age': [30, 34, 56, 43, 67, 23, 19, 8],
'Marks': [100, 200, 434, 543, 234, 412,678, 832],
'Weight': [89, 56, 74, 34, 98, 102, 71, 80]})
df_new =pd.DataFrame({'Column Name':[], 'Name':[], 'Rank':[], 'Value':[]})
for (columnName, columnData) in df.iteritems():
if columnName == "Name":
continue
df2 = df.sort_values(by=[columnName], ascending=False).head(2).reset_index()
for (columnName2, columnData2) in df2.iteritems():
names = df2['Name']
if columnName2 == columnName:
for i in range(len(list(columnData2))):
df_temp = {'Column Name':columnName, 'Name':names[i], 'Rank':i 1, 'Value':columnData2[i]}
df_new = df_new.append(df_temp, ignore_index=True)
print(df_new)
Output:
Column Name Name Rank Value
0 Age Don 1.0 67.0
1 Age Sam 2.0 56.0
2 Marks Nelson 1.0 832.0
3 Marks Chris 2.0 678.0
4 Weight Julia 1.0 102.0
5 Weight Don 2.0 98.0
CodePudding user response:
You can use nlargest method to find the largest 2 values in a dict comprehension in which you iterate over the relevant column names. Once you find the 2 largest values of a column, you pair it with the corresponding column name for a dict key-value pair.
Then you cast the outcome to a pd.DataFrame, and using groupby_cumcount, find the Ranks:
out = pd.DataFrame({col:df[['Name'] [col]].nlargest(2, [col]).to_numpy().T.tolist() for col in ['Age','Marks','Weight']}).T.explode([0,1]).rename(columns={0:'Name',1:'Value'})
out['Rank'] = out.groupby(out[::-1].index).cumcount() 1
out = out[['Name','Rank','Value']]
Two largest values:
Name Rank Value
Age Don 1 67
Age Sam 2 56
Marks Nelson 1 832
Marks Chris 2 678
Weight Julia 1 102
Weight Don 2 98
The two smallest can be found using nsmallest method in a very similar way:
out = pd.DataFrame({col:df[['Name'] [col]].nsmallest(2, [col]).to_numpy().T.tolist() for col in ['Age','Marks','Weight']}).T.explode([0,1]).rename(columns={0:'Name',1:'Value'})
out['Rank'] = out.groupby(out[::-1].index).cumcount() 1
out = out[['Name','Rank','Value']]
Two smallest values:
Name Rank Value
Age Nelson 1 8
Age Chris 2 19
Marks Matt 1 100
Marks John 2 200
Weight Harry 1 34
Weight John 2 56
