folks!
I'm stuck while developing a dashboboard using Pandas. This is the scenario:
I'm importing and transforming a CSV file in order to get some insights about a team I am working with.
|ID |Area Path |
|--------|--------------------------------------|
| 544 | [Level 1, Level 2, Level 3] |
| 545 | [Level 1, Level 2] |
| 546 | [Level 1] |
| 547 | [Level 1, Level 2, Level 3, Level 4] |
As you can see, the column Area Path does not have a pattern. Sometimes I'll find a list with 1 or 2 or 3 or 4 items on it.
I'm facing a problem in order to access each line in this collumn to get the information I need. If the list has only one item, I must use the [0] position, if the list has 2 or more items, I must use the [1] position.
I've tried to do different things and this one below is my last attempt:
def Extract(lst):
if dados['Area Path'].str.len() == 1:
return [item[0] for item in dados['Area Path']]
elif dados['Area Path'].str.len() == 2:
return [item[-1] for item in dados['Area Path']]
elif dados['Area Path'].str.len() == 3:
return [item[1] for item in dados['Area Path']]
elif dados['Area Path'].str.len() == 4:
return [item[1] for item in dados['Area Path']]
lst = [dados['Area Path']]
indice_novo = Extract(lst)
dados['Team'] = indice_novo
The problem is that I'm not able to iterate over the list that is the column. The output provided by .str.len() is great, but it does not help me completely.
Can you help me to solve this problem?
Thanks, Marcelo
CodePudding user response:
Here is a solution using map()
df['Area Path'].map(lambda x: x[0] if len(x) == 1 else x[1])
Output:
0 Level 2
1 Level 2
2 Level 1
3 Level 2
Name: Area Path, dtype: object
CodePudding user response:
Based on your comment, the Area Path column contains lists. If so, you are accessing the columns incorrectly. The correct way to access the lists in the columns would be:
lst = dados['Area Path'].tolist()
This will populate the lst variable with a list of lists, which looks something like:
[['Level 1', 'Level 2', 'Level 3'], ['Level 1', 'Level 2'], ['Level 1'], ...]
Then, in your Extract() function, you can perform your filtering based on the required logic:
def Extract(list_of_lists):
new_list = []
for lst in list_of_lists:
# Will fail if 'Area Path' contains None, NaN values
if len(lst) == 1:
new_list.append(lst[0])
else:
new_list.append(lst[1])
return new_list
indice_novo = Extract(lst)
dados['Team'] = indice_novo
This answer is based on your code and may not be the most optimized way to do this.
