Home > database >  How to use a Python data frame to generate a list/sequence that can be added as a column
How to use a Python data frame to generate a list/sequence that can be added as a column

Time:01-26

I have created a data frame from a sql query that generates bin data for a histogram: min, max, and bin size. I want to use this data to create a list of bins that I can then merge back into the original data frame

Here is a sample of my query output

My query produces one row per model. I want to then create a list that starts at the initial_bin (0) and then increments by bin_size (7) until it reaches the maximum (435). The resulting column will be called bin_seq.

My code below shows where I'm at so far. I can generate a list with the desired sequence using a for loop and the range() function, but am unsure on how to merge this back into the data frame.

'''

import pyodbc
import pandas as pd
bin_sql = """
select 'test_model_1' model,  0 initial_bin , 435 maximum , 7 bin_size
    from analytics.model_data
        
"""

bin_data = pd.read_sql(bin_sql,conn)

bin_max = int(bin_data["maximum"] bin_data["bin_size"])
bin_size = int(bin_data["bin_size"])
bin_start = int(bin_data["initial_bin"])

for i in range(bin_start, bin_max, bin_size):
    print(i, end=', ')

'''

Here is the image of the desired output

One additional wrinkle on this is that my final output will have multiple models, each with its own maximum and bin size. So I will need to be able to generate a custom sequence for each model and then join the sequence back to the original model name.

CodePudding user response:

This code will generate the desire output you want. You didn't mentioned how to calculate bin_seq.

import pyodbc
import pandas as pd
bin_sql = """
select 'test_model_1' model,  0 initial_bin , 435 maximum , 7 bin_size
    from analytics.model_data
        
"""

bin_data = pd.read_sql(bin_sql,conn)

bin_max = int(bin_data["maximum"] bin_data["bin_size"])
bin_size = int(bin_data["bin_size"])
bin_start = int(bin_data["initial_bin"])
bin_sq=[]
## For creating bin Sequence
temp=0
## Generating bin_seq
for i in range(bin_start, bin_max, bin_size):
    bin_sq.append(temp)
    temp=temp bin_size
## Adding bin_seq in dataframe
bin_data['bin_seq']=bin_sq

CodePudding user response:

Try:

Setup a MRE:

df = pd.DataFrame([['test_model', 0, 435, 7]],
                  columns=['model', 'initial_bin', 'maximum', 'bin_size'])
print(df)

# Output
        model  initial_bin  maximum  bin_size
0  test_model            0      435         7
bin_seq = lambda x: range(x['initial_bin'], x['maximum'], x['bin_size'])
df = df.assign(bin_seq=df.apply(bin_seq, axis=1)) \
       .explode('bin_seq').reset_index(drop=True)
print(df)

# Output
         model  initial_bin  maximum  bin_size bin_seq
0   test_model            0      435         7       0
1   test_model            0      435         7       7
2   test_model            0      435         7      14
3   test_model            0      435         7      21
4   test_model            0      435         7      28
..         ...          ...      ...       ...     ...
58  test_model            0      435         7     406
59  test_model            0      435         7     413
60  test_model            0      435         7     420
61  test_model            0      435         7     427
62  test_model            0      435         7     434

[63 rows x 5 columns]
  •  Tags:  
  • Related