I have a dataframe called test_df (below) and I am trying to search for the ID based on two criteria: the Factor and the name.
Here's my code:
test_id = test_df.loc[test_df['Name'].str.contains(test_name, case=False) & test_df['Factor'].str.contains(test_factor, case=False), 'ID'].item()
But I get the following error:
Traceback (most recent call last):
File "C:/Users/v3.py", line 508, in <module>
test_id = test_df.loc[test_df['Name'].str.contains(test_name, case=False) & test_df['Factor'].str.contains(test_factor, case=False), 'ID'].item()
File "C:\Users\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\base.py", line 331, in item
raise ValueError("can only convert an array of size 1 to a Python scalar")
ValueError: can only convert an array of size 1 to a Python scalar
I thought the issue was the Factor column being a float format but I have converted it to a string with the same result.
Can anyone see what the problem is?
| FIELD1 | ID | Factor | Status | Speed | Name |
|---|---|---|---|---|---|
| 0 | 49513622 | -4 | ACTIVE | II | |
| 1 | 11193741 | 4 | ACTIVE | AP | |
| 2 | 49513622 | -3.75 | ACTIVE | II | |
| 3 | 11193741 | 3.75 | ACTIVE | AP | |
| 4 | 49513622 | -3.5 | ACTIVE | II | |
| 5 | 11193741 | 3.5 | ACTIVE | AP | |
| 6 | 49513622 | -3.25 | ACTIVE | II | |
| 7 | 11193741 | 3.25 | ACTIVE | AP | |
| 8 | 49513622 | -3 | ACTIVE | II | |
| 9 | 11193741 | 3 | ACTIVE | AP | |
| 10 | 49513622 | -2.75 | ACTIVE | II | |
| 11 | 11193741 | 2.75 | ACTIVE | AP | |
| 12 | 49513622 | -2.5 | ACTIVE | II | |
| 13 | 11193741 | 2.5 | ACTIVE | AP | |
| 14 | 49513622 | -2.25 | ACTIVE | II | |
| 15 | 11193741 | 2.25 | ACTIVE | AP | |
| 16 | 49513622 | -2 | ACTIVE | II | |
| 17 | 11193741 | 2 | ACTIVE | AP | |
| 18 | 49513622 | -1.75 | ACTIVE | II | |
| 19 | 11193741 | 1.75 | ACTIVE | AP | |
| 20 | 49513622 | -1.5 | ACTIVE | II | |
| 21 | 11193741 | 1.5 | ACTIVE | AP | |
| 22 | 49513622 | -1.25 | ACTIVE | II | |
| 23 | 11193741 | 1.25 | ACTIVE | AP | |
| 24 | 49513622 | -1 | ACTIVE | II | |
| 25 | 11193741 | 1 | ACTIVE | AP | |
| 26 | 49513622 | -0.75 | ACTIVE | II | |
| 27 | 11193741 | 0.75 | ACTIVE | 1.02 | AP |
| 28 | 49513622 | -0.5 | ACTIVE | II | |
| 29 | 11193741 | 0.5 | ACTIVE | AP | |
| 30 | 49513622 | -0.25 | ACTIVE | II | |
| 31 | 11193741 | 0.25 | ACTIVE | AP | |
| 32 | 49513622 | 0 | ACTIVE | 2.68 | II |
| 33 | 11193741 | 0 | ACTIVE | 1.03 | AP |
| 34 | 49513622 | 0.25 | ACTIVE | II | |
| 35 | 11193741 | -0.25 | ACTIVE | 1.99 | AP |
| 36 | 49513622 | 0.5 | ACTIVE | II | |
| 37 | 11193741 | -0.5 | ACTIVE | 2.3 | AP |
| 38 | 49513622 | 0.75 | ACTIVE | II | |
| 39 | 11193741 | -0.75 | ACTIVE | AP | |
| 40 | 49513622 | 1 | ACTIVE | II | |
| 41 | 11193741 | -1 | ACTIVE | AP | |
| 42 | 49513622 | 1.25 | ACTIVE | II | |
| 43 | 11193741 | -1.25 | ACTIVE | AP | |
| 44 | 49513622 | 1.5 | ACTIVE | II | |
| 45 | 11193741 | -1.5 | ACTIVE | AP | |
| 46 | 49513622 | 1.75 | ACTIVE | II | |
| 47 | 11193741 | -1.75 | ACTIVE | AP | |
| 48 | 49513622 | 2 | ACTIVE | II | |
| 49 | 11193741 | -2 | ACTIVE | AP | |
| 50 | 49513622 | 2.25 | ACTIVE | II | |
| 51 | 11193741 | -2.25 | ACTIVE | AP | |
| 52 | 49513622 | 2.5 | ACTIVE | II | |
| 53 | 11193741 | -2.5 | ACTIVE | AP | |
| 54 | 49513622 | 2.75 | ACTIVE | II | |
| 55 | 11193741 | -2.75 | ACTIVE | AP | |
| 56 | 49513622 | 3 | ACTIVE | II | |
| 57 | 11193741 | -3 | ACTIVE | AP | |
| 58 | 49513622 | 3.25 | ACTIVE | II | |
| 59 | 11193741 | -3.25 | ACTIVE | AP | |
| 60 | 49513622 | 3.5 | ACTIVE | II | |
| 61 | 11193741 | -3.5 | ACTIVE | AP | |
| 62 | 49513622 | 3.75 | ACTIVE | II | |
| 63 | 11193741 | -3.75 | ACTIVE | AP | |
| 64 | 49513622 | 4 | ACTIVE | II | |
| 65 | 11193741 | -4 | ACTIVE | AP |
CodePudding user response:
Problem is no match any value, so DataFrame.loc return empty Series.
Possible solution is use next iter for assign default value if no match:
mask = test_df['Name'].str.contains(test_name, case=False) &
test_df['Factor'].str.contains(test_factor, case=False)
#if no match assign 'no match'
test_id = next(iter(test_df.loc[mask, 'ID']), 'no match')
#if no match assign None
test_id = next(iter(test_df.loc[mask, 'ID']), None)
Or use if-else with test if at least one value match:
test_id = test_df.loc[mask, 'ID'].item() if m.any() else 'no match'
Or:
if m.any():
test_id = test_df.loc[mask, 'ID'].item()
EDIT: For testing si possible create helper columns:
m1 = test_df['Name'].str.contains(test_name, case=False)
m2 = test_df['Factor'].str.contains(test_factor, case=False)
test_df = test_df.assign(name_mask = m1, factor_mask = m1, both = m1 & m2)
