Pandas split DataFrame according to indices-CodePudding

I've been working on a pandas DataFrame,

df = pd.DataFrame({'col':[-0.217514, -0.217834, 0.844116, 0.800125, 0.824554]}, index=[49082, 49083, 49853, 49854, 49855])

and I get data that looks like this:

As you can see, the index suddenly jumps 770 values (due to a sorting I did earlier).

Now I would like to split this DataFrame into many different ones, where each one would be made of the rows whose index follow each other only (here the first 2 rows would be in the same DataFrame while the last three would be in a different one).

Does anyone have an idea as to how to do this?

Thanks!

CodePudding user response：

You can do it in two steps:

(i) Find where the cutoff points are by using np.diff and np.where. Use np.diff to find the the difference between index values (append np.nan so that we don't leave out the last index) and identify the locations in the index where it is not consecutive using np.where.

(ii) Slice the DataFrame using the index cutoffs found in (i)

end_of_consecutive_indices = np.where(np.diff(df.index, append=np.nan) !=1)[0]   1
df_list = [df.loc[df.index[:i]] for i in end_of_consecutive_indices]

Output:

[            col
 49082 -0.217514
 49083 -0.217834,

             col
 49082 -0.217514
 49083 -0.217834
 49853  0.844116
 49854  0.800125
 49855  0.824554]

CodePudding user response：

Use groupby on the index from which we subtract an increasing by 1 sequence, then stick each group as a separate df in the list

all_dfs = [g for _,g in df.groupby(df.index - np.arange(len(df.index)))]
all_dfs

output:

[            col
 49082 -0.217514
 49083 -0.217834,

             col
 49853  0.844116
 49854  0.800125
 49855  0.824554]