I have a pandas dataframe that consist of 5000 rows with different countries and emission data, and looks like the following:
| country | year | emissions |
|---|---|---|
| peru | 2020 | 1000 |
| 2019 | 900 | |
| 2018 | 800 |
The country label is an index.
eg. df = emission.loc[['peru']]
would give me a new dataframe consisting only of the emission data attached to peru. My goal is to use a variable name instead of 'peru' and store the country-specific emission data into a new dataframe.
what I search for is a code that would work the same way as the code below:
country = 'zanzibar'
df = emissions.loc[[{country}]]
From what I can tell the problem arises with the iloc function which does not accept variables as input. Is there a way I could circumvent this problem?
In other words I want to be able to create a new dataframe with country specific emission data, based on a variable that matches one of the countries in my emission.index()all without having to change anything but the given variable.
One way could be to iterate through or maybe create a function in some way? Thank you in advance for any help.
CodePudding user response:
You should be able to select by a certain string for your index. For example:
df = pd.DataFrame({'a':[1,2,3,4]}, index=['Peru','Peru','zanzibar','zanzibar'])
country = 'zanzibar'
df.loc[{country}]
This will return:
a
zanzibar 3
zanzibar 4
In your case, removing one set of square brackets should work:
country = 'zanzibar'
df = emissions.loc[{country}]
CodePudding user response:
An alternative approach where you dont use a country name for your index:
emissions = pd.DataFrame({'Country' : ['Peru', 'Peru', 'Peru', 'Chile', 'Chile', 'Chile'], "Year" : [2021,2020,2019,2021,2020,2019], 'Emissions' : [100,200,400,300,200,100]})
country = 'Peru'
Then to filter:
df = emissions[emissions.Country == country]
or
df = emissions.loc[emissions.Country == country]
Giving:
Country Year Emissions
0 Peru 2021 100
1 Peru 2020 200
2 Peru 2019 400
