The starting df is as follows:
company metric time data
x X10384 M1 100
x X10384 M2 100
x X10384 M3 100
y X10456 M4 200
y X10456 M5 200
y X10456 M6 200
I need to be able to consolidate these rows based on the value of the time dimension. Basically "M1, M2, M3" will encompass Q1 and "M4, M5, M6" will encompass Q2 and so on.
The resulting df will need to be as follows:
company metric time data
x X10384 Q1 300
y X10456 Q2 600
Similarly, if starting with a df already in quarters, I will need to be able split the time into months and split the data into an equal three, like in the initial df.
How would one go about trying to transformation this data as above? The below is my starting point for reference:
quarters = ['Q1', 'Q2']
months = ['M1','M2','M3','M4','M5','M6']
for time in df['time']:
if time in quarters:
[insert transformation into individual months]
elif time in months:
[insert transformation into quarters]
CodePudding user response:
Extract digits from time column then convert them to quarter number. Finally, a simple groupby_sum do the job:
# Convert M1, M2, M3, M4, M5, M6 to Q1, Q1, Q1, Q2, Q2, Q2
to_quarter = df['time'].str[1:].astype(int).floordiv(4).add(1).astype(str).radd('Q')
out = df.assign(time=to_quarter).groupby(['company', 'metric', 'time']) \
.sum().reset_index()
Output:
>>> out
company metric time data
0 x X10384 Q1 300
1 y X10456 Q2 600
CodePudding user response:
Creating a dataframe based on your data:
data = {'Company' : ['x', 'x', 'x', 'y', 'y', 'y'],
'Metric' : ['X10384', 'X10384', 'X10384', 'X10456', 'X10456', 'X10456'],
'time': ['M1', 'M2', 'M3', 'M4', 'M5', 'M6'],
'data': [100, 100, 100, 200, 200, 200]}
df = pd.DataFrame(data)
Then create a dictionary and map it per time:
dict = {'M1': 'Q1', 'M2' : 'Q1', 'M3' : "Q1", 'M4' : 'Q2', 'M5' : 'Q2', 'M6' : 'Q2'}
df['time'] = df['time'].map(dict)
And groupby will give you the final result:
df.groupby(['Company','Metric','time']).sum().reset_index()
