I don't know what I am doing wrong, but my dataframe does not groupby as I would expect it to do:
This is actual result of my script, but i wanted to fourth to be also grouped - see images below
For some reason it works fine for two first columns, but struggles lately and I don't see thing that could be wrong. Thank you all for your help.
Here is source-code.
import pandas as pd
data = [('AAA1', 'BBB1', 'XXX1', 46.0, 'YYY1'), ('AAA1', 'BBB1', 'XXX2', 1.0, 'YYY3'),
('AAA1', 'BBB1', 'XXX2', 2.0, 'YYY1'), ('AAA1', 'BBB1', 'XXX2', 2.0, 'DDD'),
('AAA1', 'BBB1', 'XXX5', 3.0, 'YYY6'), ('AAA1', 'BBB1', 'XXX6', 3.0, 'YYY1'),
('AAA1', 'BBB1', 'XXX4', 10.0, 'YYY1'), ('AAA1', 'BBB1', 'XXX3', 24.0, 'YYY1'),
('AAA1', 'BBB1', 'XXX5', 4.0, 'YYY89'), ('AAA1', 'BBB1', 'XXX2', 8.0, 'YYY6'),
('AAA1', 'BBB1', 'XXX5', 26.0, 'YYY1'), ('AAA1', 'BBB2', 'XXX2', 1.0, 'DDD'),
('AAA1', 'BBB2', 'XXX5', 1.5, 'YYY3'), ('AAA1', 'BBB2', 'XXX5', 12.0, 'YYY6'),
('AAA10', 'BBB42', 'XXX2', 1.0, 'YYY3'), ('AAA10', 'BBB42', 'XXX2', 2.0, 'YYY89'),
('AAA10', 'BBB42', 'XXX5', 7.0, 'YYY3'), ('AAA12', 'BBB20', 'XXX5', 3.5, 'YYY3'),
('AAA12', 'BBB52', 'XXX4', 8.0, 'YYY3'), ('AAA13', 'BBB21', 'XXX4', 3.0, 'YYY3'),
('AAA13', 'BBB23', 'XXX4', 3.0, 'YYY3'), ('AAA13', 'BBB23', 'XXX5', 1.0, 'YYY6'),
('AAA13', 'BBB23', 'XXX5', 2.0, 'YYY3'), ('AAA13', 'BBB24', 'XXX2', 6.5, 'YYY11'),
('AAA13', 'BBB24', 'XXX2', 7.0, 'YYY10'), ('AAA13', 'BBB24', 'XXX5', 2.0, 'YYY3'),
('AAA13', 'BBB24', 'XXX5', 5.0, 'YYY3'), ('AAA13', 'BBB65', 'XXX5', 8.0, 'YYY19'),
('AAA14', 'BBB26', 'XXX2', 14.0, 'YYY3'), ('AAA14', 'BBB26', 'XXX6', 4.0, 'YYY3'),
('AAA14', 'BBB77', 'XXX2', 3.0, 'YYY19'), ('AAA14', 'BBB77', 'XXX2', 19.5, 'YYY3'),
('AAA15', 'BBB30', 'XXX4', 1.0, 'YYY3'), ('AAA15', 'BBB30', 'XXX3', 8.0, 'YYY3'),
('AAA15', 'BBB30', 'XXX5', 1.0, 'YYY3'), ('AAA16', 'BBB33', 'XXX6', 0.5, 'YYY3'),
('AAA17', 'BBB57', 'XXX4', 4.0, 'YYY3'), ('AAA18', 'BBB36', 'XXX4', 1.0, 'YYY3'),
('AAA2', 'BBB61', 'XXX2', 1.0, 'YYY3'), ('AAA2', 'BBB61', 'XXX4', 4.0, 'YYY3'),
('AAA32', 'BBB76', 'XXX4', 1.0, 'YYY3'), ('AAA32', 'BBB76', 'XXX3', 16.0, 'YYY3'),
('AAA6', 'BBB15', 'XXX3', 8.0, 'YYY6'), ('AAA7', 'BBB10', 'XXX6', 51.0, 'YYY3'),
('AAA7', 'BBB12', 'XXX5', 8.0, 'YYY3'), ('AAA29', 'BBB38', 'XXX4', 12.0, 'YYY3'),
('AAA18', 'BBB40', 'XXX1', 16.0, 'YYY3')]
df = pd.DataFrame(data,
columns=["first", "second", "third", "number", "fourth"])
df["first_numbers"] = df.groupby("first")["number"].transform("sum")
df["second_numbers"] = df.groupby(["first", "second"])["number"].transform("sum")
df["third_numbers"] = df.groupby(["first", "second", "fourth"])["number"].transform("sum")
df.set_index(["first", "first_numbers", "second",
"second_numbers", "fourth", "third_numbers", "third"], inplace=True)
xlsx_writer = pd.ExcelWriter("new.xlsx", engine="xlsxwriter")
df.to_excel(excel_writer=xlsx_writer,
sheet_name="name",
index_label=["First", "First Number", "Second", "Second Number", "Fourth", "Fourth Number", "Third", "Third Number"],
engine="xlsxwriter",
startrow=0)
xlsx_writer.save()
CodePudding user response:
You code is working fine, but you need to sort the index for the display to look grouped (only consecutive identical labels look "merged"):
df = (df.set_index(["first", "first_numbers", "second",
"second_numbers", "fourth", "third_numbers", "third"])
.sort_index()
)
# then export
# ...
output:
number
first first_numbers second second_numbers fourth third_numbers third
AAA1 143.5 BBB1 129.0 DDD 2.0 XXX2 2.0
YYY1 111.0 XXX1 46.0
XXX2 2.0
XXX3 24.0
XXX4 10.0
XXX5 26.0
XXX6 3.0
YYY3 1.0 XXX2 1.0
YYY6 11.0 XXX2 8.0
XXX5 3.0
...
