If you see this following code:
from pandas_datareader import data as web
import pandas as pd
stocks = 'f', 'fb'
df = web.DataReader(stocks,'yahoo')
The resultant df looks like this:
Attributes Adj Close Close ... Open Volume
Symbols f fb f ... fb f fb
Date ...
2017-06-05 9.280543 153.630005 11.25 ... 153.639999 42558600.0 12520400.0
2017-06-06 9.173302 152.809998 11.12 ... 153.410004 44543700.0 13457100.0
2017-06-07 9.132055 153.119995 11.07 ... 153.270004 37344200.0 12066700.0
2017-06-08 9.156803 154.710007 11.10 ... 154.080002 40757400.0 17799400.0
2017-06-09 9.181552 149.600006 11.13 ... 154.770004 30285900.0 35577700.0
... ... ... ... ... ... ...
2022-05-27 13.630000 195.130005 13.63 ... 191.360001 54195700.0 22562700.0
2022-05-31 13.680000 193.639999 13.68 ... 194.889999 79689900.0 26131100.0
2022-06-01 13.550000 188.639999 13.55 ... 196.509995 50726200.0 36623500.0
2022-06-02 13.890000 198.860001 13.89 ... 188.449997 42979700.0 31951600.0
2022-06-03 13.500000 190.779999 13.50 ... 195.979996 43574400.0 19447300.0
[1260 rows x 12 columns]
If you want to see the closing value for 'f'
df['Close'].f
Out[17]:
Date
2017-06-05 11.25
2017-06-06 11.12
2017-06-07 11.07
2017-06-08 11.10
2017-06-09 11.13
2022-05-27 13.63
2022-05-31 13.68
2022-06-01 13.55
2022-06-02 13.89
2022-06-03 13.50
Name: f, Length: 1260, dtype: float64
What is this method called? For example if you have a few dataframes of random number with different names but same column values; how can one combine them to make it behave such as this?
CodePudding user response:
What you're seeing is a dataframe with several levels (a MultiIndex) for its columns. These levels can each have a name and seem to have names in this case ("Attributes" and "Symbols"), but nameless levels also exist.
To look closer at that, I'd use print(df.columns).
Since there are two levels of columns, the following will also work: df[('Close', 'f')] i.e. using tuples as the "full column names". These tuples are also what you see if you would take a closer look at df.columns.
We can use pd.concat to combine two dataframes and do so with a new column level. By default this becomes the topmost level, which we'll have to "work against".
# Given dataframes a, b
# Concatenate in the column direction. Use keys to give the new
# column level names and and give the level itself the name Symbols.
(pd.concat([a, b], axis='columns', keys=pd.Index(["f", "fb"], name="Symbols"))
# swap hierarchy order of column levels
.swaplevel(-2, -1, axis=1)
# restore sorting to that of a's columns - assuming a, b have the same cols
.reindex(columns=a.columns, level=0)
)
You can also take a look at df.stack("Symbols") which moves the symbols level down into an index level (and you can reset that index level if desired, leaving it as a column). One can use stack/unstack to move back and forth like this, so going the path through unstack is another way to reach the same goal.
If Symbol was a column, you'd do this: df.set_index("Symbol", append=True).unstack("Symbol") to turn it into another column level.
