I have 12 pd dataframes , i want to extract one column from each and pass as new df and rename based-CodePudding

**I need to extract "adj close" from all into a new DF and rename based on source and map based on Date

new_DF = date AAL AAPL ALK .....(containing adj close) please help**

AAL = pd.read_csv("AAL.csv")

AAPL = pd.read_csv("AAPL.csv")

ALK = pd.read_csv("ALK.csv")

AMZN = pd.read_csv("AMZN.csv")

BHC = pd.read_csv("BHC.csv")

CS = pd.read_csv("CS.csv")

DB = pd.read_csv("DB.csv")

GS = pd.read_csv("GS.csv")

GOOG = pd.read_csv("GOOG.csv")

HA = pd.read_csv("HA.csv")

JNJ = pd.read_csv("JNJ.csv")

MRK = pd.read_csv("MRK.csv")

SP500 = pd.read_csv("S&P500.csv")

CodePudding user response：

Try this:

# load csv data
# define relative path to folder containing csv data
files_folder = '/path/to/csv/'

# load all csv files in one dataframe
df_list = []
for file in glob.glob(os.path.join(files_folder, '*.csv')):
    df = pd.read_csv(file)
    # write here column you want to select
    df_column = df['column_name'].rename(columns={'column_name':file[:-4]})
    df_list.append(df_column)
# concatenate the list of dataframes into one
df_final = pd.concat(df_list, axis=1)

CodePudding user response：

Here is an example. Not having your .csv files means we need to be creative in terms of how to get the data, but imagine you have a dict of DataFrames, one per ticker.

Here we use Yahoo finance to get similar data. The column we are looking for ('adj close') is not in that data, so for this example we'll use Close instead.

import yfinance as yf

tickers = ['AAL', 'AAPL', 'AMZN', 'GOOG']
sources = {ticker: yf.Ticker(ticker).history(period='5d') for ticker in tickers}

At this point, we've got data for each ticker. For example:

>>> sources['AAPL']
            Open        High        Low         Close       Volume     Dividends  Stock Splits
Date                                                                                          
2022-03-30  178.550003  179.610001  176.699997  177.770004   92633200  0          0           
2022-03-31  177.839996  178.029999  174.399994  174.610001  103049300  0          0           
2022-04-01  174.029999  174.880005  171.940002  174.309998   78699800  0          0           
2022-04-04  174.570007  178.490005  174.440002  178.440002   76468400  0          0           
2022-04-05  177.500000  178.300003  174.419998  175.059998   73311300  0          0

In your case, you'd get the data from CSV files, so instead:

sources = {k: pd.read_csv(f'{k}.csv') for k in tickers}

Now, the answer to your question:

df = pd.concat([v['Close'].to_frame(k) for k, v in sources.items()], axis=1)

>>> df
                  AAL        AAPL         AMZN         GOOG
Date                                                       
2022-03-30  18.049999  177.770004  3326.020020  2852.889893
2022-03-31  18.250000  174.610001  3259.949951  2792.989990
2022-04-01  18.240000  174.309998  3271.199951  2814.000000
2022-04-04  18.230000  178.440002  3366.929932  2872.850098
2022-04-05  17.840000  175.059998  3281.100098  2821.260010

Again, in your case you would select the 'adj close' column instead.