Home > Mobile >  Group by don't work after used for third time
Group by don't work after used for third time

Time:02-04

I don't know what I am doing wrong, but my dataframe does not groupby as I would expect it to do:

This is actual result of my script, but i wanted to fourth to be also grouped - see images below

Actual result

Wanted result

For some reason it works fine for two first columns, but struggles lately and I don't see thing that could be wrong. Thank you all for your help.

Here is source-code.

import pandas as pd

data = [('AAA1', 'BBB1', 'XXX1', 46.0, 'YYY1'), ('AAA1', 'BBB1', 'XXX2', 1.0, 'YYY3'),
        ('AAA1', 'BBB1', 'XXX2', 2.0, 'YYY1'), ('AAA1', 'BBB1', 'XXX2', 2.0, 'DDD'),
        ('AAA1', 'BBB1', 'XXX5', 3.0, 'YYY6'), ('AAA1', 'BBB1', 'XXX6', 3.0, 'YYY1'),
        ('AAA1', 'BBB1', 'XXX4', 10.0, 'YYY1'), ('AAA1', 'BBB1', 'XXX3', 24.0, 'YYY1'),
        ('AAA1', 'BBB1', 'XXX5', 4.0, 'YYY89'), ('AAA1', 'BBB1', 'XXX2', 8.0, 'YYY6'),
        ('AAA1', 'BBB1', 'XXX5', 26.0, 'YYY1'), ('AAA1', 'BBB2', 'XXX2', 1.0, 'DDD'),
        ('AAA1', 'BBB2', 'XXX5', 1.5, 'YYY3'), ('AAA1', 'BBB2', 'XXX5', 12.0, 'YYY6'),
        ('AAA10', 'BBB42', 'XXX2', 1.0, 'YYY3'), ('AAA10', 'BBB42', 'XXX2', 2.0, 'YYY89'),
        ('AAA10', 'BBB42', 'XXX5', 7.0, 'YYY3'), ('AAA12', 'BBB20', 'XXX5', 3.5, 'YYY3'),
        ('AAA12', 'BBB52', 'XXX4', 8.0, 'YYY3'), ('AAA13', 'BBB21', 'XXX4', 3.0, 'YYY3'),
        ('AAA13', 'BBB23', 'XXX4', 3.0, 'YYY3'), ('AAA13', 'BBB23', 'XXX5', 1.0, 'YYY6'),
        ('AAA13', 'BBB23', 'XXX5', 2.0, 'YYY3'), ('AAA13', 'BBB24', 'XXX2', 6.5, 'YYY11'),
        ('AAA13', 'BBB24', 'XXX2', 7.0, 'YYY10'), ('AAA13', 'BBB24', 'XXX5', 2.0, 'YYY3'),
        ('AAA13', 'BBB24', 'XXX5', 5.0, 'YYY3'), ('AAA13', 'BBB65', 'XXX5', 8.0, 'YYY19'),
        ('AAA14', 'BBB26', 'XXX2', 14.0, 'YYY3'), ('AAA14', 'BBB26', 'XXX6', 4.0, 'YYY3'),
        ('AAA14', 'BBB77', 'XXX2', 3.0, 'YYY19'), ('AAA14', 'BBB77', 'XXX2', 19.5, 'YYY3'),
        ('AAA15', 'BBB30', 'XXX4', 1.0, 'YYY3'), ('AAA15', 'BBB30', 'XXX3', 8.0, 'YYY3'),
        ('AAA15', 'BBB30', 'XXX5', 1.0, 'YYY3'), ('AAA16', 'BBB33', 'XXX6', 0.5, 'YYY3'),
        ('AAA17', 'BBB57', 'XXX4', 4.0, 'YYY3'), ('AAA18', 'BBB36', 'XXX4', 1.0, 'YYY3'),
        ('AAA2', 'BBB61', 'XXX2', 1.0, 'YYY3'), ('AAA2', 'BBB61', 'XXX4', 4.0, 'YYY3'),
        ('AAA32', 'BBB76', 'XXX4', 1.0, 'YYY3'), ('AAA32', 'BBB76', 'XXX3', 16.0, 'YYY3'),
        ('AAA6', 'BBB15', 'XXX3', 8.0, 'YYY6'), ('AAA7', 'BBB10', 'XXX6', 51.0, 'YYY3'),
        ('AAA7', 'BBB12', 'XXX5', 8.0, 'YYY3'), ('AAA29', 'BBB38', 'XXX4', 12.0, 'YYY3'),
        ('AAA18', 'BBB40', 'XXX1', 16.0, 'YYY3')]

df = pd.DataFrame(data,
                  columns=["first", "second", "third", "number", "fourth"])
df["first_numbers"] = df.groupby("first")["number"].transform("sum")
df["second_numbers"] = df.groupby(["first", "second"])["number"].transform("sum")
df["third_numbers"] = df.groupby(["first", "second", "fourth"])["number"].transform("sum")

df.set_index(["first", "first_numbers", "second",
                        "second_numbers", "fourth", "third_numbers", "third"], inplace=True)


xlsx_writer = pd.ExcelWriter("new.xlsx", engine="xlsxwriter")


df.to_excel(excel_writer=xlsx_writer,
            sheet_name="name",
            index_label=["First", "First Number", "Second", "Second Number", "Fourth", "Fourth Number", "Third", "Third Number"],
            engine="xlsxwriter",
            startrow=0)

xlsx_writer.save()

CodePudding user response:

You code is working fine, but you need to sort the index for the display to look grouped (only consecutive identical labels look "merged"):

df = (df.set_index(["first", "first_numbers", "second",
                    "second_numbers", "fourth", "third_numbers", "third"])
        .sort_index()
      )

# then export
# ...

output:

                                                                      number
first first_numbers second second_numbers fourth third_numbers third        
AAA1  143.5         BBB1   129.0          DDD    2.0           XXX2      2.0
                                          YYY1   111.0         XXX1     46.0
                                                               XXX2      2.0
                                                               XXX3     24.0
                                                               XXX4     10.0
                                                               XXX5     26.0
                                                               XXX6      3.0
                                          YYY3   1.0           XXX2      1.0
                                          YYY6   11.0          XXX2      8.0
                                                               XXX5      3.0
...
  •  Tags:  
  • Related