i have a dataframe, namely data, with a datetime index and the below columns :
id activity x y z
datetime
1970-01-01 00:42:00.219142823 1623 A -0.152512 -8.585220 -1.219192
1970-01-01 00:42:00.269496827 1623 A 0.999466 -8.196548 -0.758926
1970-01-01 00:42:00.319850830 1623 A 0.450241 -8.701187 -1.290024
1970-01-01 00:42:00.370204834 1623 A -0.042175 -9.739563 -1.787415
1970-01-01 00:42:00.420558838 1623 A 3.551483 -10.745132 -1.266403
... ... ... ... ... ...
1970-01-22 01:26:29.872699000 1644 A 2.239343 -8.408914 2.074087
1970-01-22 01:26:29.892898000 1644 A 2.548301 -8.157437 1.820215
1970-01-22 01:26:29.912994000 1644 A 2.636917 -7.786209 2.057322
1970-01-22 01:26:29.933195000 1644 A 2.545906 -7.743098 1.801055
1970-01-22 01:26:29.953291000 1644 A 2.373464 -8.071217 1.585503
279817 rows × 5 columns
every 119 rows i want to extract only the values of x, y, z columns, as well with the activity label, and put them in a new dataframe row by row. the values of each column followed up by the next column. like below :
values of column x|values of column y|values of column z|activity
next row after 119 rows of values of the dataframe data
values of column x|values of column y|values of column z|activity
etc
any ideas would be very helpful and much appreciated.
Thanks in advance for your time!
CodePudding user response:
If your data frame is called data, then you can use
data.iloc[::119, [data.columns.get_loc(col) for col in ['x', 'y', 'z', 'activity']]]
CodePudding user response:
Edit: after understanding that the question is really about getting data every 5 seconds, we can say so more directly:
wanted = ['x', 'y', 'z', 'activity']
newdf = df.resample('5s', origin='start').first()[wanted].copy()
Note: look at the various possibilities for the origin argument. Using 'start' means we start with the first index in df. But usually it is more natural to used the default ('start_day') which makes "round datetimes".
Original answer:
newdf = df.iloc[::119][['x', 'y', 'z', 'activity']].copy()
Example (reproducible setup):
import numpy as np
np.random.seed(0)
n = 279817
df = pd.DataFrame({
'id': np.linspace(1623, 1644, n).round().astype(int),
'activity': ['A'] * n,
}, index=pd.date_range(start='1970-01-01 00:42:00', end='1970-01-22 01:26:30', periods=n))
df[list('xyz')] = np.random.normal(size=(n, 3))
# code above, then
>>> newdf
x y z activity
1970-01-01 00:42:00.000000000 1.764052 0.400157 0.978738 A
1970-01-01 00:54:52.762565400 -0.477974 -0.479656 0.620358 A
1970-01-01 01:07:45.525130800 1.327783 -0.101281 -0.803141 A
1970-01-01 01:20:38.287696200 -1.429991 -0.061638 -1.432735 A
1970-01-01 01:33:31.050261600 -1.109478 -0.547518 0.665967 A
... ... ... ... ...
1970-01-22 00:29:53.740994081 -0.267150 0.414945 -0.627917 A
1970-01-22 00:42:46.503559481 -0.001320 0.743854 0.137284 A
1970-01-22 00:55:39.266124882 -0.469361 -1.204360 -1.688222 A
1970-01-22 01:08:32.028690282 0.340768 -0.911023 0.069327 A
1970-01-22 01:21:24.791255682 -1.461045 1.938692 -0.188877 A
[2352 rows x 4 columns]
