So I have a dataframe (df1) of phone records roughly 10k rows long with calls from different phone numbers on the same day and the same phone number on different days. (Example of df1)
| Date | Number |
|---|---|
| 01/01/2022 | 1234567891 |
| 01/01/2022 | 1234567892 |
| 01/02/2022 | 1234567891 |
| 01/02/2022 | 1234567893 |
| 01/02/2022 | 1234567892 |
What I want to do write a short script that will iterate over df1 to group the rows by unique phone number and create a new dataframe for each unique phone number.
Now the kicker is I will have to do this periodically do df1 will fluctuate in length and content so simply sorting df1 and assigning rows 1-10 to df2 and 11-33 to df3 wont work.
So far I have only come up with a way to isolate each number 1 at a time manually
df2= df1[df['Number'].isin([1234567891])]
CodePudding user response:
You can extract all unique phonenumbers from your dataframe into a list:
numbers = df['Number'].unique()
Now you can iterate over this list and extract the dataframe for each phonenumber. In this example I print the dataframe:
for number in numbers:
print(df[df['Number'] == number])
CodePudding user response:
Consider following simple example which make use of .groupby
import pandas as pd
df = pd.DataFrame({'user':['A','B','A','A','C'],'value':[5,4,3,2,1]})
grouped = df.groupby('user')
user_df = {}
for user in df.user.unique():
user_df[user] = grouped.get_group(user)
Now user_df is dict with 3 DataFrames, 1 for each user, so
print(user_df['A'])
gives output
user value
0 A 5
2 A 3
3 A 2
and
print(user_df['B'])
gives output
user value
1 B 4
and
print(user_df['C'])
gives output
user value
4 C 1
If you need to process 1 user per each loop turn do
import pandas as pd
df = pd.DataFrame({'user':['A','B','A','A','C'],'value':[5,4,3,2,1]})
grouped = df.groupby('user')
for user in df.user.unique():
user_df = grouped.get_group(user) # user_df is now pandas.DataFrame
print(user, user_df['value'].min(), user_df['value'].max())
gives output
A 2 5
B 4 4
C 1 1
