Python Pandas; How do I integrate a table in another table without changing first table's total-CodePudding

I have two datasets about a city. One with the persons who receive social welfare, about 50.000 persons, and one with all ca. 500.000 people living in the city. In the first table it is one person per row with the information "district, sex, age, nationality, kind of welfare". The Second table is already grouped by age with the information "district, sex, age, foreign nationality (y/n), count". Now I would like to merge both tables but without raising the total number of residents. For example: I have 6 men in the age of 21 years living in district 1 saved in the total list and now I would like to update the data with the information from the social welfare list, let's say 2 men in the age of 21 years living in district 1 and receiving social aid. So the result should be line 1: district 1, male, age 21, nationality, 4 persons, no social welfare, and line 2: district 1, male, age 21, nationality, 2 persons, social welfare. Thanks in forward.

SocialData

	SocialWelfare	Postcode	District	Age	Sex	Nationality
0	Wohngeldempfänger	90431	631	69	weiblich	Deutschland
1	Asylbewerber §1/§3 in GU	90441	180	27	männlich	Iran, Islamische Republik
2	Grundsicherungsempfänger	90459	131	82	männlich	Ukraine
3	Grundsicherungsempfänger	90459	131	76	weiblich	Ukraine
4	Alg II-/Sozialgeld-Bezieher	90402	18	49	männlich	Deutschland

TotalData

	Postcode	District	Age	Sex	Nationality	Count
0	90402	10	1	2	2	1
1	90402	10	2	2	1	1
2	90402	10	4	2	2	1
3	90402	10	17	2	1	1
4	90402	10	21	1	2	1

CodePudding user response：

Once you calculated the aggregate function from the DataFrame you wanted (let's suppose 'myDF', and you saved it in a new variable (let's suppose 'myNewDF'), you save the results, so now worries about that.

However, you won't be able to merge the tables as you want, as you are comparing different data (you don't have the exact indexes to match the merge).

The best way to proceed will joining/merging the DataFrames without grouping, and then start aggregating .

CodePudding user response：

My general approach to this would be:

convert nationality in your first dataset to foreign nationality (y/n) like your seconds dataset
use df_grp = df.groupby(['district', 'sex', 'age', 'foreign nationality (y/n)'])['kind of welfare'].count() on your first dataset to get the number of people on welfare per district, sex, age and yes/no foreign nationality.
Merge df_grp with your second dataset on 'district', 'sex', 'age', 'foreign nationality (y/n)', so that you'll add a column with the number of people on welfare.
Then create a new column with the number of people NOT on welfare, by subtracting the column with the number of people on welfare from the column 'count'.