I've been working on a pandas DataFrame,
df = pd.DataFrame({'col':[-0.217514, -0.217834, 0.844116, 0.800125, 0.824554]}, index=[49082, 49083, 49853, 49854, 49855])
and I get data that looks like this:
As you can see, the index suddenly jumps 770 values (due to a sorting I did earlier).
Now I would like to split this DataFrame into many different ones, where each one would be made of the rows whose index follow each other only (here the first 2 rows would be in the same DataFrame while the last three would be in a different one).
Does anyone have an idea as to how to do this?
Thanks!
CodePudding user response:
You can do it in two steps:
(i) Find where the cutoff points are by using np.diff and np.where. Use np.diff to find the the difference between index values (append np.nan so that we don't leave out the last index) and identify the locations in the index where it is not consecutive using np.where.
(ii) Slice the DataFrame using the index cutoffs found in (i)
end_of_consecutive_indices = np.where(np.diff(df.index, append=np.nan) !=1)[0] 1
df_list = [df.loc[df.index[:i]] for i in end_of_consecutive_indices]
Output:
[ col
49082 -0.217514
49083 -0.217834,
col
49082 -0.217514
49083 -0.217834
49853 0.844116
49854 0.800125
49855 0.824554]
CodePudding user response:
Use groupby on the index from which we subtract an increasing by 1 sequence, then stick each group as a separate df in the list
all_dfs = [g for _,g in df.groupby(df.index - np.arange(len(df.index)))]
all_dfs
output:
[ col
49082 -0.217514
49083 -0.217834,
col
49853 0.844116
49854 0.800125
49855 0.824554]

