Home > Net >  Increase the values in a column values based on values in other column in pandas
Increase the values in a column values based on values in other column in pandas

Time:02-04

I have my source data in the form of csv file as below:

id,col1,col2                
123,11|22|33||||||,val1|val3|val2           
456,99||77|||88|||||||||6|,val4|val5|val6|val7

I need to add a new column(fnlsrc) which will have the values based on values in Col2 and Col1, i.e if col1 has 9 values(separated with pipe) and col2 has 3 values(separated with pipe), then in fnlsrc column I have to load 9 values(separated with pipe) 3 set of col2(val1|val3|val2|val1|val3|val2|val1|val3|val2). Please refer the output below, which will help in understanding the requirement easily:

id,col1,col2,fnlsrc
123,11|22|33||||||,val1|val3|val2,val1|val3|val2|val1|val3|val2|val1|val3|val2
456,99||77|||88|||||||||6|,val4|val5|val6|val7,val4|val5|val6|val7|val4|val5|val6|val7|val4|val5|val6|val7|val4|val5|val6|val7

I have tried following code, but its adding only the one set:

zipped = zip(df['col1'], df['col2'])
for s,t in zipped:
    count = int((s.count('|')   1)/(t.count('|')   1))
    for val in range(count):
        df['fnlsrc'] = t

CodePudding user response:

As the new column is based on the other two, I would use panda's apply() function. I defined a function that calculates the new column value based on the other two columns, which is then applied to each row:

def new_value(x):
    # Find out number of values in both columns
    col1_numbers = x['col1'].count('|')   1
    col2_numbers = x['col2'].count('|')   1
    
    # Calculate how many times col2 should appear in the new column
    repetition = int(col1_numbers/col2_numbers)

    # Create list of strings containing the values of the new column
    values = [x['col2']]*repetition

    # Join the list of strings with pipes
    return '|'.join(values)

# Apply the function on every row
df['fnlsrc'] = df.apply(lambda x:new_value(x), axis=1)
df

Output:

id  col1    col2    fnlsrc
0   123 11|22|33||||||  val1|val3|val2  val1|val3|val2|val1|val3|val2|val1|val3|val2
1   456 99||77|||88|||||||||6|  val4|val5|val6|val7 val4|val5|val6|val7|val4|val5|val6|val7|val4|v...

Full output in your input format:

id,col1,col2,fnlsrc
123,11|22|33||||||,val1|val3|val2,val1|val3|val2|val1|val3|val2|val1|val3|val2
456,99||77|||88|||||||||6|,val4|val5|val6|val7,val4|val5|val6|val7|val4|val5|val6|val7|val4|val5|val6|val7|val4|val5|val6|val7
  •  Tags:  
  • Related