Home > Enterprise >  create a column with values from other columns
create a column with values from other columns

Time:01-06

I have a dataset that looks like this

Day1 Day2 Day3 Day4 ID Product Price
     Day2      Day4 2  X       50
Day1      Day3 Day4 3  Y       60
     Day2 Day3      4  Z       70

I want it to be aggregated in this way

Days           ID Product  Price
Day2&Day4      2  X        50
Day1&Day3&Day4 3  Y        60 
Day2&Day3      4  Z        70

I am finding it difficult to crack the logic behind this

Raw input:

df = pd.DataFrame({'Day1': ['', 'Day1', ''],
                   'Day2': ['Day2', '', 'Day2'],
                   'Day3': ['', 'Day3', 'Day3'],
                   'Day4': ['Day4', 'Day4', ''],
                   'ID': [2, 3, 4],
                   'Product': ['X', 'Y', 'Z'],
                   'Price': [50, 60, 70]})

CodePudding user response:

You can convert the relevant columns to a list of lists and use list comprehension where in each iteration, you filter out empty spaces and join the remaining items with & in between. Then assign this list of strings to 'Days' column. Finally, drop the redundant columns and rearrange remaining columns in your preferred order.

cols = ['Day1', 'Day2', 'Day3', 'Day4']
df['Days'] = ['&'.join(x for x in lst if x!='') for lst in df[cols].to_numpy().tolist()]
df = df.drop(cols, axis=1)[['Days','ID','Product','Price']]

Output:

             Days  ID Product  Price
0       Day2&Day4   2       X     50
1  Day1&Day3&Day4   3       Y     60
2       Day2&Day3   4       Z     70

CodePudding user response:

NB. Assuming your empty cells are 'NaN', if not you can use df.replace({'': float('nan')}) as a first step

You could stack and groupby apply('&'.join):

df['Days'] = df.filter(like='Day').stack().groupby(level=0).apply('&'.join)

The stacking operation gets rid of the empty cells

output:

   Day1  Day2  Day3  Day4  ID Product  Price            Days
0   NaN  Day2   NaN  Day4   2       X     50       Day2&Day4
1  Day1   NaN  Day3  Day4   3       Y      6  Day1&Day3&Day4
2   NaN  Day2  Day3   NaN   4       Z     70       Day2&Day3
dropping the original DaysX columns:
cols = list(df.filter(like='Day').columns)
df['Days'] = df[cols].stack().groupby(level=0).apply('&'.join)
df.drop(columns=cols)

output:

   ID Product  Price            Days
0   2       X     50       Day2&Day4
1   3       Y      6  Day1&Day3&Day4
2   4       Z     70       Day2&Day3

CodePudding user response:

I have fully automated version. A bit more code lines, but works. I find all columns that startswith "Day" phrase, then concatinate them, then drop the old columns and reorder columns.

cols = df.columns[pd.Series(df.columns).str.startswith('Day')]
df['Days'] = df[cols].apply('&'.join, axis=1).str.strip('&')
df.drop(columns = cols, inplace = True)
col = df.pop("Days")
df.insert(0, col.name, col)
# To clear double '&' chars.
import re
df['Days'] = [re.sub('& ', '&', day) for day in df['Days']]
  •  Tags:  
  • Related