I have 2 columns, column A has many string values, some unique, and some repeat several times in the column. Column B has either 1 or 0. Some unique values have only an equivalent zero in column B and some have only 1, and for some, it may differ between 1 and zero in different rows. I'd like to 'override' the zeroes by checking if a value in column A has 1 in column B, look for rows where the same value equals zero and replace it with 1. I have a variable with all values that equal 1. If possible I'd like to avoid for loop with the iterrows method which would probably be the immediate suspect:
is_1=data.query('is_1==1')
A_unique=is_1['A'].unique()
for index, row in data.iterrows():
if row['is_1']==0:
if row['A'] in A_unique:
data.loc[data.A==row['A'],'is_1']=1
CodePudding user response:
One way I can think of is to sort and use the fillna method to forward fill the zeros -
df = pd.DataFrame({'A': list('ABBABCB'), 'B': list('0100011')})
# A B
#0 A 0
#1 B 1
#2 B 0
#3 A 0
#4 B 0
#5 C 1
#6 B 1
# First we replace all 0's with nan's
df.loc[df['B'] == '0', 'B'] = np.nan
# Then we sort and fillna
df = df.sort_values(['A', 'B']).fillna(method="ffill").fillna('0')
# A B
#0 A 0
#3 A 0
#1 B 1
#6 B 1
#2 B 1
#4 B 1
#5 C 1
CodePudding user response:
This could be a solution as well using list comprehension
df = pd.DataFrame({
'a': ['str1', 'str2', 'str3', 'str1', 'str1', 'str1', 'str4', 'str4'],
'b': [0, 1, 0, 1, 0, 1, 0, 1]})
# a b
#0 str1 0
#1 str2 1
#2 str3 0
#3 str1 1
#4 str1 0
#5 str1 1
#6 str4 0
#7 str4 1
tup_list = [(j, 1) if (j, 1) in zip(df['a'], df['b']) else (j, i) for(j, i) in zip(df['a'], df['b'])]
df = pd.DataFrame(tup_list, columns=['a', 'b'])
# a b
#0 str1 1
#1 str2 1
#2 str3 0
#3 str1 1
#4 str1 1
#5 str1 1
#6 str4 1
#7 str4 1
