Home > Enterprise >  Python Pandas; How do I integrate a table in another table without changing first table's total
Python Pandas; How do I integrate a table in another table without changing first table's total

Time:01-25

I have two datasets about a city. One with the persons who receive social welfare, about 50.000 persons, and one with all ca. 500.000 people living in the city. In the first table it is one person per row with the information "district, sex, age, nationality, kind of welfare". The Second table is already grouped by age with the information "district, sex, age, foreign nationality (y/n), count". Now I would like to merge both tables but without raising the total number of residents. For example: I have 6 men in the age of 21 years living in district 1 saved in the total list and now I would like to update the data with the information from the social welfare list, let's say 2 men in the age of 21 years living in district 1 and receiving social aid. So the result should be line 1: district 1, male, age 21, nationality, 4 persons, no social welfare, and line 2: district 1, male, age 21, nationality, 2 persons, social welfare. Thanks in forward.

SocialData

SocialWelfare Postcode District Age Sex Nationality
0 Wohngeldempfänger 90431 631 69 weiblich Deutschland
1 Asylbewerber §1/§3 in GU 90441 180 27 männlich Iran, Islamische Republik
2 Grundsicherungsempfänger 90459 131 82 männlich Ukraine
3 Grundsicherungsempfänger 90459 131 76 weiblich Ukraine
4 Alg II-/Sozialgeld-Bezieher 90402 18 49 männlich Deutschland

TotalData

Postcode District Age Sex Nationality Count
0 90402 10 1 2 2 1
1 90402 10 2 2 1 1
2 90402 10 4 2 2 1
3 90402 10 17 2 1 1
4 90402 10 21 1 2 1

CodePudding user response:

Once you calculated the aggregate function from the DataFrame you wanted (let's suppose 'myDF', and you saved it in a new variable (let's suppose 'myNewDF'), you save the results, so now worries about that.

However, you won't be able to merge the tables as you want, as you are comparing different data (you don't have the exact indexes to match the merge).

The best way to proceed will joining/merging the DataFrames without grouping, and then start aggregating .

CodePudding user response:

My general approach to this would be:

  • convert nationality in your first dataset to foreign nationality (y/n) like your seconds dataset
  • use df_grp = df.groupby(['district', 'sex', 'age', 'foreign nationality (y/n)'])['kind of welfare'].count() on your first dataset to get the number of people on welfare per district, sex, age and yes/no foreign nationality.
  • Merge df_grp with your second dataset on 'district', 'sex', 'age', 'foreign nationality (y/n)', so that you'll add a column with the number of people on welfare.
  • Then create a new column with the number of people NOT on welfare, by subtracting the column with the number of people on welfare from the column 'count'.
  •  Tags:  
  • Related