I have a large pandas dataframe with varying rows and columns but looks more or less like:
time id angle ...
0.0 a1 33.67 ...
0.0 b2 35.90 ...
0.0 c3 42.01 ...
0.0 d4 45.00 ...
0.1 a1 12.15 ...
0.1 b2 15.35 ...
0.1 c3 33.12 ...
0.2 a1 65.28 ...
0.2 c3 87.43 ...
0.3 a1 98.85 ...
0.3 c3 100.12 ...
0.4 a1 11.11 ...
0.4 c3 83.22 ...
...
I am trying to aggregate the id's together and then find id's that have in common time-intervals. I have tried using pandas groupby and can easily group them by id and get their respective groups with information. How can I then take it a step further to find id's that also have the same time stamps?
Ideally I'd like to return intersection of certain fixed time intervals (2-3 seconds) for similar id's with the fixed time interval overlap:
time id angle ...
0.0 a1 33.67 ...
0.1 a1 12.15 ...
0.2 a1 65.28 ...
0.3 a1 98.85 ...
0.0 c3 42.01 ...
0.1 c3 33.12 ...
0.2 c3 87.43 ...
0.3 c3 100.12 ...
Code tried so far:
#create pandas grouped by id
df1 = df.groupby(['id'], as_index=False)
Which outputs:
time id angle ...
(0.0 a1 33.67
...
0.4 a1 11.11)
(0.0 b2 35.90
0.1 b2 15.35)
(0.0 c3 42.01
...
0.4 c3 83.22)
(0.0 d4 45.00)
But I'd like to return only a dataframe where id and time are the same for a fixed interval, in the above example .4 seconds.
Any ideas on a fairly simple way to achieve this with pandas dataframes?
CodePudding user response:
If need filter rows by some intervals - e.g. here between 0 and 0.4 and get all id which overlap use boolean indexing with Series.between first, then DataFrame.pivot:
df1 = df[df['time'].between(0, 0.4)].pivot('time','id','angle')
print (df1)
id a1 b2 c3 d4
time
0.0 33.67 35.90 42.01 45.0
0.1 12.15 15.35 33.12 NaN
0.2 65.28 NaN 87.43 NaN
0.3 98.85 NaN 100.12 NaN
0.4 11.11 NaN 83.22 NaN
There are missing values for non overlap id, so remove columns with any NaNs by DataFrame.any and reshape to 3 columns by DataFrame.unstack and Series.reset_index:
print (df1.dropna(axis=1))
id a1 c3
time
0.0 33.67 42.01
0.1 12.15 33.12
0.2 65.28 87.43
0.3 98.85 100.12
0.4 11.11 83.22
df2 = df1.dropna(axis=1).unstack().reset_index(name='angle')
print (df2)
id time angle
0 a1 0.0 33.67
1 a1 0.1 12.15
2 a1 0.2 65.28
3 a1 0.3 98.85
4 a1 0.4 11.11
5 c3 0.0 42.01
6 c3 0.1 33.12
7 c3 0.2 87.43
8 c3 0.3 100.12
9 c3 0.4 83.22
CodePudding user response:
There are many ways to define the filter you're asking for:
df.groupby('id').filter(lambda x: len(x) > 4)
# OR
df.groupby('id').filter(lambda x: x['time'].eq(0.4).any())
# OR
df.groupby('id').filter(lambda x: x['time'].max() == 0.4)
Output:
time id angle
0 0.0 a1 33.67
2 0.0 c3 42.01
4 0.1 a1 12.15
6 0.1 c3 33.12
7 0.2 a1 65.28
8 0.2 c3 87.43
9 0.3 a1 98.85
10 0.3 c3 100.12
11 0.4 a1 11.11
12 0.4 c3 83.22
