I´ve looked around and found similar questions but none of them really helped me to find a solution. I want my script to read a csv which looks like this:
hot_dict = {'Links': links, 'Titles': titles, 'Datestamps': datestamp_extended,'GroupID': "" }
I want to find all duplicate links in column links and assign all links that are identical the same key in column "GroupID"
| Links | GroupID |
|---|---|
| A | Key1 |
| B | Key2 |
| A | Key1 |
| B | Key2 |
This gives me just true and false values obviously:
df['GroupID'] =df.duplicated(subset=['Links'], keep=False)
Is there an elegant way to continue from here?
Thanks a lot!
CodePudding user response:
For a simple key with an integer ID, you can first convert the Links column to categorical data, then just obtain the category code from that:
df['GroupID'] = df['Links'].astype('category').cat.codes
