Home > Blockchain >  Row-wise minimum of a DataFrame after applying a function
Row-wise minimum of a DataFrame after applying a function

Time:02-05

I have a DataFrame that looks something like this:

   Col1   Col2   Col3
0   7      11     17
1   13     12     15
2   19     23     14
3   22     19     21

I want to find the row-wise minimum after applying a function, returning the original values before the function was applied, somewhat like using the built-in [min][1] function in python with the key argument. E.g. if the function gives the absolute difference between the values of this DataFrame and a series s:

16
10
17
27

i.e. abs(Col# - s) then a new DataFrame containing the results of the function would look like:

   Col1   Col2   Col3
0   9      5      1
1   3      2      5
2   2      6      3
3   5      8      6

The row-wise minimum of this would then be a series looking like:

1
2
2
5

I want a series containing the corresponding values from the original DataFrame:

17
12
19
22

I'm not aware of any minimum-finding function in pandas that has a key argument like python's built-in function.

I have found a solution that requires iterating through the rows of the DataFrame df as follows:

l = []
for idx,row in df.iterrows():
    l.append(min(row,key = lambda x: abs(x-s[idx])))
pandas.Series(l)

I've also considered applying the function to the DataFrame, then finding idxmin of each row:

df1 = df.apply(lambda x: abs(x-s))
s2 = df1.idxmin(1)

which should give s2 as:

Col3
Col2
Col1
Col1

and then is there a way to select values from a DataFrame using column names from a series, i.e. using the column names from s2, return a series of values from df with the value of Col3 from the 1st row, the value of Col2 from the 2nd row, the value of Col1 from the 3rd row, and the value of Col1 from the 4th row? At the moment, the only way I know of also involves iteration:

l = []
for idx,row in df.iterrows():
    l.append(row[s2[idx]])
pandas.Series(l)

Are there any better ways of achieving this besides what I've mentioned?

Dictionaries used to construct the DataFrame and Series used in the example:

data = {'Col1': [7, 13, 19, 22], 'Col2': [11, 12, 23, 19], 'Col3': [17, 15, 14, 21]}
s = {0: 16, 1: 10, 2: 17, 3: 27}

CodePudding user response:

As lookup is now deprecated, you have to flatten your dataframe manually to retrieve one value per row with loc. The first line computes your operation and get the index of the minimum:

r = df.sub(sr, axis=0).abs().idxmin(axis=1)
df['min_val_with_key'] = df.unstack().loc[zip(r.values, r.index)].values
print(df)
print(r)

# Output of df
   Col1  Col2  Col3  min_val_with_key
0     7    11    17                17
1    13    12    15                12
2    19    23    14                19
3    22    19    21                22

# Output of r
0    Col3
1    Col2
2    Col1
3    Col1
dtype: object

Setup:

df = pd.DataFrame({'Col1': [7, 13, 19, 22],
                   'Col2': [11, 12, 23, 19],
                   'Col3': [17, 15, 14, 21]})

sr = pd.Series([16, 10, 17, 27])

CodePudding user response:

You can use the underlying numpy array and numpy.choose:

s = pd.Series([16,10,17,27])

idx = np.argmin(abs(df.sub(s, axis=0)).values, axis=1)
# array([2, 1, 0, 0])

pd.Series(np.choose(idx, df.values.T), index=s.index)
# 0    17
# 1    12
# 2    19
# 3    22
# dtype: int64

CodePudding user response:

You can apply your function (absolute difference) to each column and use idxmin on axis to find the column names of minimum values in each row. Then lookup the values corresponding to the found column names using factorize reindex:

idx, cols = df.apply(lambda col: (col-s).abs()).idxmin(axis=1).factorize()
df['min_val_with_key'] = df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]

Output:

   Col1  Col2  Col3  min_val_with_key
0     7    11    17                17
1    13    12    15                12
2    19    23    14                19
3    22    19    21                22

CodePudding user response:

You can get the indices using idxmin and then just read the single values using loc and append them to a list:

df = pd.DataFrame({'Col1': {0: 7, 1: 13, 2: 19, 3: 22},
 'Col2': {0: 11, 1: 12, 2: 23, 3: 19},
 'Col3': {0: 17, 1: 15, 2: 14, 3: 21}})
series = pd.Series([16, 10, 17, 27])
df2 = df.subtract(series, axis=0).abs()

minima = df2.idxmin(axis=1)
result = []
for row, column in minima.iteritems():
    result.append(df.loc[row, column]) 
df_result = pd.DataFrame(result, columns=["minimum"])

The last part can also be done using a list comprehension:

minima = df2.idxmin(axis=1)
df_result = pd.DataFrame(
    [df.loc[row, column] for row, column in minima.iteritems()],
    columns=["minimum"]
    )
  •  Tags:  
  • Related