Home > Software design >  Coding Meetup #9 - Higher-Order Functions Series - Is the meetup age-diverse? (Python) Pandas soluti
Coding Meetup #9 - Higher-Order Functions Series - Is the meetup age-diverse? (Python) Pandas soluti

Time:02-01

I've completed the aforementioned kata on CodeWars and was wondering if there is a more elegant solution using pandas? I was thinking about using pd.Series.between() but couldn't quite get to a solution.

Here is the CodeWars Kata prompt:

You will be given an array of objects (associative arrays in PHP) representing data about developers who have signed up to attend the next coding meetup that you are organising.

Your task is to return:

true if developers from all of the following age groups have signed up: teens, twenties, thirties, forties, fifties, sixties, seventies, eighties, nineties, centenarian (at least 100 years young). false otherwise. For example, given the following input array:

list1 = [
  { 'firstName': 'Harry', 'lastName': 'K.', 'country': 'Brazil', 'continent': 'Americas', 'age': 19, 'language': 'Python' },
  { 'firstName': 'Kseniya', 'lastName': 'T.', 'country': 'Belarus', 'continent': 'Europe', 'age': 29, 'language': 'JavaScript'},
  { 'firstName': 'Jing', 'lastName': 'X.', 'country': 'China', 'continent': 'Asia', 'age': 39, 'language': 'Ruby' },
  { 'firstName': 'Noa', 'lastName': 'A.', 'country': 'Israel', 'continent': 'Asia', 'age': 40, 'language': 'Ruby' },
  { 'firstName': 'Andrei', 'lastName': 'E.', 'country': 'Romania', 'continent': 'Europe', 'age': 59, 'language': 'C' },
  { 'firstName': 'Maria', 'lastName': 'S.', 'country': 'Peru', 'continent': 'Americas', 'age': 60, 'language': 'C' },
  { 'firstName': 'Lukas', 'lastName': 'X.', 'country': 'Croatia', 'continent': 'Europe', 'age': 75, 'language': 'Python' },
  { 'firstName': 'Chloe', 'lastName': 'K.', 'country': 'Guernsey', 'continent': 'Europe', 'age': 88, 'language': 'Ruby' },
  { 'firstName': 'Viktoria', 'lastName': 'W.', 'country': 'Bulgaria', 'continent': 'Europe', 'age': 98, 'language': 'PHP' },
  { 'firstName': 'Piotr', 'lastName': 'B.', 'country': 'Poland', 'continent': 'Europe', 'age': 128, 'language': 'JavaScript' }
]

your function should return true as there is at least one developer from each age group.

Notes:

The input array will always be valid and formatted as in the example above. Age is represented by a number which can be any positive integer up to 199.

And this is what I came up with:

import pandas as pd
def is_age_diverse(lst): 
    ages_list = [i for i in range(10, 101, 10)]
    for i in pd.DataFrame(lst)['age']:
        for j in ages_list:
            if j <= i < (j   10):
                ages_list.remove(j)
            elif i > 110 and 100 in ages_list:
                ages_list.remove(100)
    return not ages_list

I looked through the other solutions on the page but I found nobody using Pandas for this. Any help would be much appreciated. Also if you have any suggestions how I could improve upon my existing code apart from Pandas, just hit me with it.

CodePudding user response:

Using pd.cut() and the groupby function, and assuming ages is a series of the age values (or possibly a dataframe with a single column), you can achieve it with:

bins = list(range(10, 101, 10)) [200]
ages.groupby(pd.cut(ages, bins, right = False, include_lowest = False)).count().all()

Note that bins is built using the assumption that the largest value is 199. From the question description you might want to change the lowest value from 10 to 13 since that is where I would start "teens", but I followed the logic in the code from the post.

The count function at the end will aggregate the count of each bin, and all returns true if all values are non-zero.

CodePudding user response:

Here is a solution that uses pandas and groupby -- we groupby age category and make sure we have exactly 10 of them

import pandas as pd
def is_age_diverse(lst): 
    def age_category(age):
        if age >= 100:
            return 10
        return age//10
    
    df = pd.DataFrame(lst)
    return len(df[df['age']>=10].set_index('age').groupby(age_category).first())==10
  •  Tags:  
  • Related