I have my source data in the form of csv file as below:
id,col1,col2
123,11|22|33||||||,val1|val3|val2
456,99||77|||88|||||||||6|,val4|val5|val6|val7
I need to add a new column(fnlsrc) which will have the values based on values in Col2 and Col1, i.e if col1 has 9 values(separated with pipe) and col2 has 3 values(separated with pipe), then in fnlsrc column I have to load 9 values(separated with pipe) 3 set of col2(val1|val3|val2|val1|val3|val2|val1|val3|val2). Please refer the output below, which will help in understanding the requirement easily:
id,col1,col2,fnlsrc
123,11|22|33||||||,val1|val3|val2,val1|val3|val2|val1|val3|val2|val1|val3|val2
456,99||77|||88|||||||||6|,val4|val5|val6|val7,val4|val5|val6|val7|val4|val5|val6|val7|val4|val5|val6|val7|val4|val5|val6|val7
I have tried following code, but its adding only the one set:
zipped = zip(df['col1'], df['col2'])
for s,t in zipped:
count = int((s.count('|') 1)/(t.count('|') 1))
for val in range(count):
df['fnlsrc'] = t
CodePudding user response:
As the new column is based on the other two, I would use panda's apply() function. I defined a function that calculates the new column value based on the other two columns, which is then applied to each row:
def new_value(x):
# Find out number of values in both columns
col1_numbers = x['col1'].count('|') 1
col2_numbers = x['col2'].count('|') 1
# Calculate how many times col2 should appear in the new column
repetition = int(col1_numbers/col2_numbers)
# Create list of strings containing the values of the new column
values = [x['col2']]*repetition
# Join the list of strings with pipes
return '|'.join(values)
# Apply the function on every row
df['fnlsrc'] = df.apply(lambda x:new_value(x), axis=1)
df
Output:
id col1 col2 fnlsrc
0 123 11|22|33|||||| val1|val3|val2 val1|val3|val2|val1|val3|val2|val1|val3|val2
1 456 99||77|||88|||||||||6| val4|val5|val6|val7 val4|val5|val6|val7|val4|val5|val6|val7|val4|v...
Full output in your input format:
id,col1,col2,fnlsrc
123,11|22|33||||||,val1|val3|val2,val1|val3|val2|val1|val3|val2|val1|val3|val2
456,99||77|||88|||||||||6|,val4|val5|val6|val7,val4|val5|val6|val7|val4|val5|val6|val7|val4|val5|val6|val7|val4|val5|val6|val7
