Select specific rows, transform and concat in Python-CodePudding

I have a dataframe where I would like to select specific rows, transform and concatenate in Python.

Data

id type stat    country
1  aa1  y       ca
2  bb2  n       ny
3  cc3  y       tx

Desired

Use Input("select type value:  ")

type in:  aa1, bb2

*selects appropriate rows

 id    type stat    country 
 1     aa1  y       ca  
 2     bb2  n       ny

Run some type of transformation - change

df2 = df["country"].replace({"ca": "tx"}, inplace=True)

Concatenate newly transformed rows subset w/ remaining dataset

 id    type stat    country 
 1     aa1  y       tx  
 2     bb2  n       ny
 3     cc3  y       tx

Doing

A SO member suggested this portion which works well, I am trying to piece together the remaining rows w the transformed subset

  out = input("select type value:  ")
  vals = [v.strip() for v in out.split(',')]
  df1 = df.query('type.isin(@vals)')

Any suggestion is appreciated.

CodePudding user response：

This should work:

out = input("select type value:  ")
vals = [v.strip() for v in out.split(',')]

# Find rows where criteria match
query_mask = df["type"].isin(vals)

# Do transformation only for those rows
df[query_mask] = df.loc[query_mask].replace({"ca": "tx"})

If you really want remaining data to be at the end of the df:

out = input("select type value:  ")
vals = [v.strip() for v in out.split(',')]

# Find rows where criteria match
query_mask = df["type"].isin(vals)

# Grab rows where criteria match
query_rows = df.loc[query_mask]

# Grab leftover rows
leftover_rows = df.loc[~query_mask]

# Do transformations
query_rows.replace({"ca": "tx"}, inplace=True)

# Concatenate transformed rows and leftovers
df = pd.concat([query_rows, leftover_rows])

CodePudding user response：

Let us try pandasql update

import pandasql as ps

type = 'aa1','bb1'
q1 = """SELECT * FROM df where type in ('%s','%s') """ %(type)
df.update(ps.sqldf(q1, locals()).replace({"ca": "tx"}))
df
Out[257]: 
    id type stat country
0  1.0  aa1    y      tx
1  2.0  bb2    n      ny
2  3.0  cc3    y      tx

CodePudding user response：

You can use mask and combine_first:

df = df.mask(~df['id'].isin(vals)).replace({'country': {'ca': 'tx'}}) \
       .combine_first(df)
print(df)

# Output
    id type stat country
1  aa1         y      tx
2  bb2         n      ny
3  cc3         y      tx