Change field in row if other row has same value in different field-CodePudding

I'm looking for a Python-based solution to a common task I often run into.
(I know, there must be cool SQL-magic for this, but... not now.)

The situation is:
Given a table, we want to change the field f2 if oid is the same.
(Assumed that f2 always contains bar or is empty if oid is 42)

id	f1	f2	oid
1	foo	bar	42
2	foo		49
3	baz		42

Becomes:

id	f1	f2	oid
1	foo	bar	42
2	foo		49
3	baz	bar	42

I'm currently using itertools.combinations to avoid repeated comparison.
This seems more efficient than a double for-loop.
But for very large datasets, it still becomes unhandy... considering the Big-O-Notation.

My question is: Can the problem be solved more elegantly via pandas or numpy?

CodePudding user response：

Assumed that f2 always contains bar or is empty if oid is 42

Maybe we could use groupby transform first as well:

df['f2'] = df.replace('', np.nan).groupby('oid')['f2'].transform('first').fillna('')

Output:

   id   f1   f2  oid
0   1  foo  bar   42
1   2  foo        49
2   3  baz  bar   42

CodePudding user response：

The only solution I can think of would be to use groupby ffill:

df['f2'] = df.replace('', np.nan).groupby('oid')['f2'].ffill()

Output:

>>> df
   id   f1   f2  oid
0   1  foo  bar   42
1   2  foo  NaN   49
2   3  baz  bar   42