multiplication columns with openpyxl or pandas-CodePudding

its my code:

import pandas as pd

data = pd.read_excel ('A.xlsx') 
df = pd.DataFrame(data, columns= ['xCode', 'xYear'])
df2 = df.assign(repeated=df.groupby('xCode').transform('count'))

print (df2)

      xCode  xYear   Repeated
1.    100    1900    3
2.    100    1900    3
3.    100    1934    3
4.    200    1921    1
5.    157    1945    1
       .
       . 
999.  ...    ....    .
1000. ...    ....    .

I have to find not connected data with xCode on xYear column, (doesn't exist in xCode(x) rows) for example: in row 5, xCode is 157 & xYear is 1945. So I have to do multiplication, 157 in 1900,1934,1921

this code isn't correct

dict1 ={key={xcode},value={xyear}}
xc=xcode
xy=xyear
for x in dict1:
    if xy exist in xc rows, continue, else get multiplication xc*xy and get this result:


          xCode  repeated    IsntConnectWith    multiplication
    1.    100    2           1921               100*1921 (just result)
    2.    100    2           1945               100*1945
    3.    200    1           1900               200*1900
    4.    200    1           1934               200*1934
    5.    200    1           1945               200*1945
    6.    157    1           1900               157*1900
    7.    157    1           1934               157*1934
    8.    157    1           1921               157*1921

    999.  ...    .           ....               ........

i need this result. With openpyxl or pandas

CodePudding user response：

Use pivot_table to transform your dataframe and keep not connected years:

out = df.pivot_table('xYear', 'xCode', 'xYear', fill_value=0, sort=False).astype(bool) \
        .replace({True: pd.NA, False: 1}).stack().index.to_frame(index=False) \
        .assign(multiplication=lambda x: x['xYear'] * x['xCode'])
print(out)

# Output
   xCode  xYear  multiplication
0    100   1921          192100
1    100   1945          194500
2    200   1900          380000
3    200   1934          386800
4    200   1945          389000
5    157   1900          298300
6    157   1921          301597
7    157   1934          303638

Note: I don't understand how to compute the column repeated.