Home > OS >  How to modify a series to match indices of Pandas dataframe?
How to modify a series to match indices of Pandas dataframe?

Time:01-10

Consider a series y (dtype is float64) which has its indices e.g.

y = pd.Series((6.0, 1621.0, 4.6, 1479.9, 1520.0), index=(3608, 3652, 510, 941, 3007))

that looks like:

3608       6.000
3652    1621.000
510        4.600
941     1479.900
3007    1520.000
          ...   
dtype: float64 (length: 554)

There is a Pandas dataframe X which has its own indices and multiple columns such as:

X = pd.DataFrame({'Col1':[1,2,3], 'Col2':[1,2,3]}, index=[510,3007,3652])

which looks like:

         Col1      Col2
510
3007
3652
... (dataframe length/count is 7)

I would like to modify the series y, so as to obtain a new series that is ordered based on the dataframe indices and has same number of samples as the dataframe (i.e. 7 indices from y do match X). Expected y is:

510        4.600
3007    1520.000
3652    1621.000
          ...   
dtype: float64 (length: 7)

Any help and suggestions on this would be much appreciated.

CodePudding user response:

You can use Index.intersection method:

out = y[y.index.intersection(X.index)]

or Index.isin method:

out = y[y.index.isin(X.index)]

to filter y for indices its that also exist in X.index.

If X.index is guaranteed to be a subset of y.index, then you can simply filter using X.index as well:

out = y[X.index]

Output:

3652    1621.0
510        4.6
3007    1520.0
dtype: float64

CodePudding user response:

As per the question, given that the series y is unnamed/cannot be matched to a dataframe column name directly, the following worked:-

By converting the series y to a dataframe with to_frame() and using X.merge() as suggested by @Chris (thanks!) in the question's comment - alongside using the specifiers for the match to be performed on either of the indices, we can get the modified y

modified_y = X.merge(y.to_frame(), left_index=True, right_index=True)

This y is a dataframe, and can be thereby be converted back to a series form by using:-

modified_y = pd.Series(y.iloc[:,0].values, index = y.index)

There may well be easier alternatives to this, but this is how it worked for me.

  •  Tags:  
  • Related