so I have a for loop that loops through countries and each country has either a yes or a no, I want the corresponding animal to be added to a list each time there is a yes triggered. For example, I have a list that goes
Countries = ['Germany','France'..etc etc]
my DF is something like this
animal Germany France
Rabbit yes yes
Bear no yes
...
I want a list of animals such that there is a yes for the countries selected in the countries list. So in the instance above, I would want
animal_list = [Rabbit, Rabbit, Bear]
and my main code goes something like this, I have my attempt below as well but it doesn't work. Is there a clean way of doing it?
Countries = ['Germany','France'..etc etc]
animals_list = []
for country in Countries:
animal_list = animal_list.append(df[df[country] == 'yes'],'animal'])
The for loop is required so I am unable to do it off the bat using pandas.
CodePudding user response:
Considering you have a Dataframe like this
data = {'animal':['Rabbit', 'Bear'],
'Germany':['yes', 'no'],
'France': ['yes', 'no']
}
df = pd.DataFrame(data)
If the wanted countries are given in a list:
# In Python, Try to use lowercase, underscore seperated names for your variables (PEP8)
countries = ['Germany', 'France']
Then you can select those columns:
# Select the countries that you want
df_subset = df[df.columns.intersection(countries)]
And calculate number of yes for each animal:
animals_index_to_num_yes = df_subset.eq('yes').sum(axis=1)
In this way the list can be created very easily:
animals_list = []
for index, animal in df['animal'].iteritems():
occurences = animals_index_to_num_yes.get(index)
animals_list.extend(
[animal] * occurrences
)
Notes:
- Try to avoid
forloops in Pandas as much as possible, in general, built-in methods will have a better performance because of the use of concurrency. See this excellent answer for more. - In your case, as the order of the animals in the output list matters, I'm not sure if the loop can be avoided, therefore I used a
forloop.
CodePudding user response:
You could iterate over the animals and for each one, count how many times the rest of that row contains yes, then append as many of that animal to the list:
animals_list = []
for i, animal in enumerate(df.animal):
n = sum(df.iloc[i, 1:] == 'yes')
animals_list.extend([animal] * n)
CodePudding user response:
animals_list=[]
country_list=['germany','france']
for i in range(len(df)):
for country in country_list:
if df[country].iloc[i]=='yes':
animals_list.append(df.animal.iloc[i])
print(animal_list)
Output : ['rabbit', 'rabbit', 'bear']
