Home > Back-end >  python - calculate a new column in pandas using 2 numpy arrays
python - calculate a new column in pandas using 2 numpy arrays

Time:01-18

I have a pandas dataframe : column header is called "Location" example contents: "London Arndale Centre" "Manchester Arndale" "Birmingham Central Station" "Newcastle Metro Centre"

2 numpy arrays :

originalLocation = np.array(["London Arndale Centre","Manchester Arndale","Birmingham Central Station","Newcastle Metro Centre")

newLocation = np.array(["London","Manchester","Birmingham","Newcastle"]

i want to create a new column in the pandas : newLocation

the result needs to be the matching column in newLocation, where the location field matches the original location numpy.

example : "London Arndale Centre" needs to be "London" "Manchester Arndale" needs to be "Manchester"

i have tried this , but it throw back errors

df['newLocation'] = newLocation[int(np.where(originalLocation == df['Location'])[0])]

errors : ValueError: ('Lengths must match to compare', (159,), (12,))

what am i doing wrong here ?

CodePudding user response:

It seems like you forgot the commas in your originalLocation array. Also, the int() is not necessary. Updated code:

df_data = ["London Arndale Centre", "Manchester Arndale", "Birmingham Central Station", "Newcastle Metro Centre"]
df = pd.DataFrame(df_data, columns=['Location'])

originalLocation = np.array(["London Arndale Centre", "Manchester Arndale", "Birmingham Central Station", "Newcastle Metro Centre"])

newLocation = np.array(["London","Manchester","Birmingham","Newcastle"])      

df['newLocation'] = newLocation[np.where(originalLocation == df['Location'])[0]]

df

Output:

    Location    newLocation
0   London Arndale Centre   London
1   Manchester Arndale  Manchester
2   Birmingham Central Station  Birmingham
3   Newcastle Metro Centre  Newcastle
  •  Tags:  
  • Related