Home > Blockchain >  assigning list of strings as name for dataframe
assigning list of strings as name for dataframe

Time:12-12

I have searched and searched and not found what I would think was a common question. Which makes me think I'm going about this wrong. So I humbly ask these two versions of the same question.

I have a list of currency names, as strings. A short version would look like this:

col_names = ['australian_dollar', 'bulgarian_lev', 'brazilian_real']

I also have a list of dataframes (df_list). Each one is has a column for data, currency exchange rate, etc. Here's the head for one of them (sorry it's blurry, it was fine bigger but I stuck in an m in the URL because it was huge):

enter image description here

I would be stoked to assign each one of those strings col_list as a variable name for a data frame in df_list. I did make a dictionary where key/value was currency name and the corresponding df. But I didn't really know how to use it, primarily because it was unordered. Is there a way to zip col_list and df_list together? I could also just unpack each df in df_list and use the title of the second column be the title of the frame. That seems really cool.

So instead I just wrote something that gave me index numbers and then hand put them into the function I needed. Super kludgy but I want to make the overall project work for now. I end up with this in my figure code:

for ax, currency in zip((ax1, ax2, ax3, ax4), (df_list[38], df_list[19], df_list[10], df_list[0])):
    ax.plot(currency["date"], currency["rolling_mean_30"])

And that's OK. I'm learning, not delivering something to a client. I can use it to make eight line plots. But I want to do this with 40 frames so I can get the annual or monthly volatility. I have to take a list of data frames and unpack them by hand.

Here is the second version of my question. Take df_list and:

def framer(currency):
    index = col_names.index(currency)
    df = df_list[index] # this is a dataframe containing a single currency and the columns built in cell 3
    return df

brazilian_real = framer("brazilian_real")

Which unpacks the a df (but only if type out the name) and then:

def volatizer(currency):
    all_the_years = [currency[currency['year'] == y] for y in currency['year'].unique()] # list of dataframes for each year
    c_name = currency.columns[1]
    df_dict = {}
    for frame in all_the_years:
        year_name = frame.iat[0,4] # the year for each df, becomes the "year" cell for annual volatility df
        annual_volatility = frame["log_rate"].std()*253**.5 # volatility measured by standard deviation * 253 trading days per year raised to the 0.5 power
        df_dict[year_name] = annual_volatility
    df = pd.DataFrame.from_dict(df_dict, orient="index", columns=[c_name "_annual_vol"]) # indexing on year, not sure if this is cool
    return df

br_vol = volatizer(brazilian_real)

which returns a df with a row for each year and annual volatility. Then I want to concatenate them and use that for more charts. Ultimately make a little dashboard that lets you switch between weekly, monthly, annual and maybe set date lims.

So maybe there's some cool way to run those functions on the original df or on the lists of dfs that I don't know about. I have started using df.map and df.apply some.

But it seems to me it would be pretty handy to be able to unpack the one list using the names from the other. Basically same question, how do I get the dataframes in df_list out and attached to variable names?

Sorry if this is waaaay too long or a really bad way to do this. Thanks ahead of time!

CodePudding user response:

Do you want something like this?

dfs = {df.columns[1]: df for df in df_list}

Then you can reference them like this for example:

dfs['brazilian_real']

CodePudding user response:

This is how I took the approach suggested by Kelvin:

def volatizer(currency):
    annual_df_list = [currency[currency['year'] == y] for y in currency['year'].unique()] # list of annual dfs
    c_name = currency.columns[1]
    row_dict = {} # dictionary with year:annual_volatility as key:value 
    for frame in annual_df_list:
        year_name = frame.iat[0,4] # first cell of the "year" column, becomes the "year" key for row_dict
        annual_volatility = frame["log_rate"].std()*253**.5 # volatility measured by standard deviation * 253 trading days per year raised to the 0.5 power
        row_dict[year_name] = annual_volatility # dictionary with year:annual_volatility as key:value 
    df = pd.DataFrame.from_dict(row_dict, orient="index", columns=[c_name "_annual_vol"]) # new df from dictionary indexing on year
    return df

# apply volatizer to each currency df
for key in df_dict:
    df_dict[key] = volatizer(df_dict[key])

It worked fine. I can use a list of strings to access any of the key:value pairs. It feels like a better way than trying to instantiate a bunch of new objects.

  • Related