its my code:
import pandas as pd
data = pd.read_excel ('A.xlsx')
df = pd.DataFrame(data, columns= ['xCode', 'xYear'])
df2 = df.assign(repeated=df.groupby('xCode').transform('count'))
print (df2)
xCode xYear Repeated
1. 100 1900 3
2. 100 1900 3
3. 100 1934 3
4. 200 1921 1
5. 157 1945 1
.
.
999. ... .... .
1000. ... .... .
I have to find not connected data with xCode on xYear column, (doesn't exist in xCode(x) rows) for example: in row 5, xCode is 157 & xYear is 1945. So I have to do multiplication, 157 in 1900,1934,1921
this code isn't correct
dict1 ={key={xcode},value={xyear}}
xc=xcode
xy=xyear
for x in dict1:
if xy exist in xc rows, continue, else get multiplication xc*xy and get this result:
xCode repeated IsntConnectWith multiplication
1. 100 2 1921 100*1921 (just result)
2. 100 2 1945 100*1945
3. 200 1 1900 200*1900
4. 200 1 1934 200*1934
5. 200 1 1945 200*1945
6. 157 1 1900 157*1900
7. 157 1 1934 157*1934
8. 157 1 1921 157*1921
999. ... . .... ........
i need this result. With openpyxl or pandas
CodePudding user response:
Use pivot_table to transform your dataframe and keep not connected years:
out = df.pivot_table('xYear', 'xCode', 'xYear', fill_value=0, sort=False).astype(bool) \
.replace({True: pd.NA, False: 1}).stack().index.to_frame(index=False) \
.assign(multiplication=lambda x: x['xYear'] * x['xCode'])
print(out)
# Output
xCode xYear multiplication
0 100 1921 192100
1 100 1945 194500
2 200 1900 380000
3 200 1934 386800
4 200 1945 389000
5 157 1900 298300
6 157 1921 301597
7 157 1934 303638
Note: I don't understand how to compute the column repeated.
