I am trying to calculate how long each store has been open in years. Here is an example of the dataset:
| year | store name |
|---|---|
| 2000 | Store A |
| 2001 | Store A |
| 2002 | Store A |
| 2003 | Store A |
| 2000 | Store B |
| 2001 | Store B |
| 2002 | Store B |
| 2000 | Store C |
I'm not sure how to calculate the difference in max and min year for each store name as they are all in the same column. Do I put it into a new column using pandas?
CodePudding user response:
You need to use a groupby:
g = df.groupby('store name')['year']
out = g.max()-g.min()
CodePudding user response:
You can use groupby and transform to create an additional column in the same dataframe.
df["years open"] = df.groupby("store name")["year"].transform(lambda x: x.max()-x.min())
