Home > Back-end >  Get name if range values are true Python
Get name if range values are true Python

Time:01-07

I have a pandas question about getting a name if a range of values is true from an X column. If year is inside its decade from 1960 until now, print the Name So here is an example of my dataframe:

#,Name,description,year
1,a,foo,1961
2,a,foo2,1977
3,a,foo3,1980
4,a,foo4,1995
5,a,foo5,2001
6,a,foo6,2011
7,a,foo7,2020
8,b,bar,1965
9,b,bar2,1970
10,b,bar3,1983
11,b,bar4,1997
12,b,bar5,2005
13,b,bar6,2016
14,b,bar7,2022
15,c,abc,1965
16,c,ab2,1970
17,c,abc3,1993
18,c,abc4,2007
19,c,abc5,2015
20,c,abc6,2020

Output: a,b

So far, I did this:

dataset[Year].str.match(str(year[0:3]))

I think I need a for loop for this, but I am not sure at all. Thank you for any help!

CodePudding user response:

One way to solve the problem is by creating groups using Pandas groupby method and then filtering the groups using Pandas filter method.

import pandas as pd


def is_within_range(group):
    years = sorted(list(group["Year"]))
    check_decade = {}
    for year in years:
        decade = year // 10
        if 196 <= decade <= 202:
            check_decade[decade] = True
    if len(check_decade.keys()) == (202 - 196   1):
        return True
    return False


data = pd.read_csv("years.csv")
filtered_data = data.groupby(['Name']).filter(lambda x: is_within_range(x))
print(list(filtered_data.Name.unique()))

Output:

['a', 'b']

years.csv:

#,Name,Description,Year
1,a,foo,1961
2,a,foo2,1977
3,a,foo3,1980
4,a,foo4,1995
5,a,foo5,2001
6,a,foo6,2011
7,a,foo7,2020
8,b,bar,1965
9,b,bar2,1970
10,b,bar3,1983
11,b,bar4,1997
12,b,bar5,2005
13,b,bar6,2016
14,b,bar7,2022
15,c,abc,1965
16,c,ab2,1970
17,c,abc3,1993
18,c,abc4,2007
19,c,abc5,2015
20,c,abc6,2020

Explanation:

  • The is_with_range method check if a group has years from each decade from 1960 to 2020. Decade of an year is year // 10. E.g. year 1965 and the year 1969 have a decade value of 196 while 1996, 1998 have a decade value of 199.
  • I used a dictionary to flag each decade to True and later on count the number of decades in the group.

References:

CodePudding user response:

You can use dataframe.query method to do the same like dataset.query("year">=1961, inplace=True) print(dataset) #it replace dataframe with those values whuch has year greater than 1961

  •  Tags:  
  • Related