Home > Back-end >  Find top and bottom two rows and read column value into a dataframe in Python
Find top and bottom two rows and read column value into a dataframe in Python

Time:01-07

This is my sample data frame.

df =pd.DataFrame({'Name':['Matt', 'John', 'Sam', 'Harry', 'Don', 'Julia', 'Chris', 'Nelson'],
           'Age': [30, 34, 56, 43, 67, 23, 19, 8],
           'Marks': [100, 200, 434, 543, 234, 412,678, 832],
           'Weight': [89, 56, 74, 34, 98, 102, 71, 80]})

I need to find out the two highest and two lowest values per columns and return Name and corresponding values for those rows. I can get them individually but is there a way to get them all together in a table in one code

This is the sample output I am looking for (one table each for lowest and highest values). The table below is for highest values.

Column Name Name Rank Value
Age Don 1 67
Age Sam 2 56
Marks Nelson 1 832
Marks Chris 2 678

There are more than 80 columns. I am new to Python and some sort of loop would be really helpful.

CodePudding user response:

Use nlargest and nsmallest to get the values and make a data frame by concat.

import pandas as pd
df =pd.DataFrame({'Name':['Matt', 'John', 'Sam', 'Harry', 'Don', 'Julia', 'Chris', 'Nelson'],
           'Age': [30, 34, 56, 43, 67, 23, 19, 8],
           'Marks': [100, 200, 434, 543, 234, 412,678, 832],
           'Weight': [89, 56, 74, 34, 98, 102, 71, 80]})
queries = []
for c in ['Age', 'Marks', 'Weight']:
  query = df.nlargest(2, [c])[['Name', c]].rename(columns={c: 'Value'}).reset_index(drop=True)
  query['Column Name'] = c
  query['Rank'] = query.index   1
  query = query[['Column Name', 'Name', 'Rank', 'Value']]
  queries.append(query)
new_df = pd.concat(queries).reset_index(drop=True)
print(new_df)

CodePudding user response:

Here is my attempt. Only the highest values are handled to illustrate the concept. It is an alternate way that uses the append method:

import pandas as pd

df =pd.DataFrame({'Name':['Matt', 'John', 'Sam', 'Harry', 'Don', 'Julia', 'Chris', 'Nelson'],
           'Age': [30, 34, 56, 43, 67, 23, 19, 8],
           'Marks': [100, 200, 434, 543, 234, 412,678, 832],
           'Weight': [89, 56, 74, 34, 98, 102, 71, 80]})

df_new =pd.DataFrame({'Column Name':[], 'Name':[], 'Rank':[], 'Value':[]})

for (columnName, columnData) in df.iteritems():
    if columnName == "Name":
        continue
    df2 = df.sort_values(by=[columnName], ascending=False).head(2).reset_index()
    for (columnName2, columnData2) in df2.iteritems():
        names = df2['Name']
        if columnName2 == columnName:
            for i in range(len(list(columnData2))):
                df_temp = {'Column Name':columnName, 'Name':names[i], 'Rank':i 1, 'Value':columnData2[i]}
                df_new = df_new.append(df_temp, ignore_index=True)


print(df_new)

Output:

  Column Name    Name  Rank  Value
0         Age     Don   1.0   67.0
1         Age     Sam   2.0   56.0
2       Marks  Nelson   1.0  832.0
3       Marks   Chris   2.0  678.0
4      Weight   Julia   1.0  102.0
5      Weight     Don   2.0   98.0

CodePudding user response:

You can use nlargest method to find the largest 2 values in a dict comprehension in which you iterate over the relevant column names. Once you find the 2 largest values of a column, you pair it with the corresponding column name for a dict key-value pair.

Then you cast the outcome to a pd.DataFrame, and using groupby_cumcount, find the Ranks:

out = pd.DataFrame({col:df[['Name']   [col]].nlargest(2, [col]).to_numpy().T.tolist() for col in ['Age','Marks','Weight']}).T.explode([0,1]).rename(columns={0:'Name',1:'Value'})
out['Rank'] = out.groupby(out[::-1].index).cumcount()   1
out = out[['Name','Rank','Value']]

Two largest values:

          Name  Rank Value
Age        Don     1    67
Age        Sam     2    56
Marks   Nelson     1   832
Marks    Chris     2   678
Weight   Julia     1   102
Weight     Don     2    98

The two smallest can be found using nsmallest method in a very similar way:

out = pd.DataFrame({col:df[['Name']   [col]].nsmallest(2, [col]).to_numpy().T.tolist() for col in ['Age','Marks','Weight']}).T.explode([0,1]).rename(columns={0:'Name',1:'Value'})
out['Rank'] = out.groupby(out[::-1].index).cumcount()   1
out = out[['Name','Rank','Value']]

Two smallest values:

          Name  Rank Value
Age     Nelson     1     8
Age      Chris     2    19
Marks     Matt     1   100
Marks     John     2   200
Weight   Harry     1    34
Weight    John     2    56
  •  Tags:  
  • Related