Consider a series y (dtype is float64) which has its indices e.g.
y = pd.Series((6.0, 1621.0, 4.6, 1479.9, 1520.0), index=(3608, 3652, 510, 941, 3007))
that looks like:
3608 6.000
3652 1621.000
510 4.600
941 1479.900
3007 1520.000
...
dtype: float64 (length: 554)
There is a Pandas dataframe X which has its own indices and multiple columns such as:
X = pd.DataFrame({'Col1':[1,2,3], 'Col2':[1,2,3]}, index=[510,3007,3652])
which looks like:
Col1 Col2
510
3007
3652
... (dataframe length/count is 7)
I would like to modify the series y, so as to obtain a new series that is ordered based on the dataframe indices and has same number of samples as the dataframe (i.e. 7 indices from y do match X). Expected y is:
510 4.600
3007 1520.000
3652 1621.000
...
dtype: float64 (length: 7)
Any help and suggestions on this would be much appreciated.
CodePudding user response:
You can use Index.intersection method:
out = y[y.index.intersection(X.index)]
or Index.isin method:
out = y[y.index.isin(X.index)]
to filter y for indices its that also exist in X.index.
If X.index is guaranteed to be a subset of y.index, then you can simply filter using X.index as well:
out = y[X.index]
Output:
3652 1621.0
510 4.6
3007 1520.0
dtype: float64
CodePudding user response:
As per the question, given that the series y is unnamed/cannot be matched to a dataframe column name directly, the following worked:-
By converting the series y to a dataframe with to_frame() and using X.merge() as suggested by @Chris (thanks!) in the question's comment - alongside using the specifiers for the match to be performed on either of the indices, we can get the modified y
modified_y = X.merge(y.to_frame(), left_index=True, right_index=True)
This y is a dataframe, and can be thereby be converted back to a series form by using:-
modified_y = pd.Series(y.iloc[:,0].values, index = y.index)
There may well be easier alternatives to this, but this is how it worked for me.
