Home > OS >  Transform pandas Series into boolean matrix
Transform pandas Series into boolean matrix

Time:01-27

I have the following pd.Series object.

              ticker
date                
2020-02-06     BKBR3
2020-02-06     BRKM5
2020-02-06     CNTO3
..........    ......
2020-02-06    BIDI11
2020-02-06     BRPR3
2020-02-06     CVCB3
2020-02-06     ELET3
2020-02-06     GFSA3
2022-01-26    QETH11
2022-01-26     ABEV3
2022-01-26    BIDI11
2022-01-26     CRFB3
2022-01-26     LCAM3

And I want to turn it into a boolean matrix (not sure if I'm using the appropriate terminology), that looks like the table below. It has the same index values from the original table, but the ticker column is pivoted and the values are True when the column and index are in the original table and False when they aren't.

            BKBR3  BRKM5  CNTO3  .....
date                           
2020-02-06  True   False  False  .....
2020-02-10  False  False  False  .....
..........  .....  .....  .....

Does anyone know a clever way of doing it? I tried pivoting the table, but since it has no values, it didn't work.

CodePudding user response:

The output your provided does not look like a series, it looks like a dataframe with index 'date' and one column 'value'. I assume this is what it is -- it is easy to go between the two obviously so you may need to make small changes to the below

anyway here is how you can then do it -- by making it a multi-index df that you then unstack:

(df.set_index(['ticker'], append=True)
    .assign(v = True)
    .unstack(level=1)
    .fillna(False)
)

output (for a short version you had the output for in your question)

    v
ticker      ABEV3   BIDI11  BKBR3   BRKM5   BRPR3   CNTO3   CRFB3   CVCB3   ELET3   GFSA3   LCAM3   QETH11
date                                                
2020-02-06  False   True    True    True    True    True    False   True    True    True    False   False
2022-01-26  True    True    False   False   False   False   True    False   False   False   True    True

CodePudding user response:

Managed to do it with the pivot method after some more trying.

df["bool"] = 1
df = df.groupby(["date", "ticker"]).sum()
df = df.reset_index()
df = df.pivot(index="date", columns="ticker", values="bool")
df = df.fillna(0)
  •  Tags:  
  • Related