**I need to extract "adj close" from all into a new DF and rename based on source and map based on Date
new_DF = date AAL AAPL ALK .....(containing adj close) please help**
AAL = pd.read_csv("AAL.csv")
AAPL = pd.read_csv("AAPL.csv")
ALK = pd.read_csv("ALK.csv")
AMZN = pd.read_csv("AMZN.csv")
BHC = pd.read_csv("BHC.csv")
CS = pd.read_csv("CS.csv")
DB = pd.read_csv("DB.csv")
GS = pd.read_csv("GS.csv")
GOOG = pd.read_csv("GOOG.csv")
HA = pd.read_csv("HA.csv")
JNJ = pd.read_csv("JNJ.csv")
MRK = pd.read_csv("MRK.csv")
SP500 = pd.read_csv("S&P500.csv")
df = date | Open| high | low |close | adj close |volume
CodePudding user response:
Try this:
# load csv data
# define relative path to folder containing csv data
files_folder = '/path/to/csv/'
# load all csv files in one dataframe
df_list = []
for file in glob.glob(os.path.join(files_folder, '*.csv')):
df = pd.read_csv(file)
# write here column you want to select
df_column = df['column_name'].rename(columns={'column_name':file[:-4]})
df_list.append(df_column)
# concatenate the list of dataframes into one
df_final = pd.concat(df_list, axis=1)
CodePudding user response:
Here is an example. Not having your .csv files means we need to be creative in terms of how to get the data, but imagine you have a dict of DataFrames, one per ticker.
Here we use Yahoo finance to get similar data. The column we are looking for ('adj close') is not in that data, so for this example we'll use Close instead.
import yfinance as yf
tickers = ['AAL', 'AAPL', 'AMZN', 'GOOG']
sources = {ticker: yf.Ticker(ticker).history(period='5d') for ticker in tickers}
At this point, we've got data for each ticker. For example:
>>> sources['AAPL']
Open High Low Close Volume Dividends Stock Splits
Date
2022-03-30 178.550003 179.610001 176.699997 177.770004 92633200 0 0
2022-03-31 177.839996 178.029999 174.399994 174.610001 103049300 0 0
2022-04-01 174.029999 174.880005 171.940002 174.309998 78699800 0 0
2022-04-04 174.570007 178.490005 174.440002 178.440002 76468400 0 0
2022-04-05 177.500000 178.300003 174.419998 175.059998 73311300 0 0
In your case, you'd get the data from CSV files, so instead:
sources = {k: pd.read_csv(f'{k}.csv') for k in tickers}
Now, the answer to your question:
df = pd.concat([v['Close'].to_frame(k) for k, v in sources.items()], axis=1)
>>> df
AAL AAPL AMZN GOOG
Date
2022-03-30 18.049999 177.770004 3326.020020 2852.889893
2022-03-31 18.250000 174.610001 3259.949951 2792.989990
2022-04-01 18.240000 174.309998 3271.199951 2814.000000
2022-04-04 18.230000 178.440002 3366.929932 2872.850098
2022-04-05 17.840000 175.059998 3281.100098 2821.260010
Again, in your case you would select the 'adj close' column instead.
