I have two datasets about a city. One with the persons who receive social welfare, about 50.000 persons, and one with all ca. 500.000 people living in the city. In the first table it is one person per row with the information "district, sex, age, nationality, kind of welfare". The Second table is already grouped by age with the information "district, sex, age, foreign nationality (y/n), count". Now I would like to merge both tables but without raising the total number of residents. For example: I have 6 men in the age of 21 years living in district 1 saved in the total list and now I would like to update the data with the information from the social welfare list, let's say 2 men in the age of 21 years living in district 1 and receiving social aid. So the result should be line 1: district 1, male, age 21, nationality, 4 persons, no social welfare, and line 2: district 1, male, age 21, nationality, 2 persons, social welfare. Thanks in forward.
SocialData
| SocialWelfare | Postcode | District | Age | Sex | Nationality | |
|---|---|---|---|---|---|---|
| 0 | Wohngeldempfänger | 90431 | 631 | 69 | weiblich | Deutschland |
| 1 | Asylbewerber §1/§3 in GU | 90441 | 180 | 27 | männlich | Iran, Islamische Republik |
| 2 | Grundsicherungsempfänger | 90459 | 131 | 82 | männlich | Ukraine |
| 3 | Grundsicherungsempfänger | 90459 | 131 | 76 | weiblich | Ukraine |
| 4 | Alg II-/Sozialgeld-Bezieher | 90402 | 18 | 49 | männlich | Deutschland |
TotalData
| Postcode | District | Age | Sex | Nationality | Count | |
|---|---|---|---|---|---|---|
| 0 | 90402 | 10 | 1 | 2 | 2 | 1 |
| 1 | 90402 | 10 | 2 | 2 | 1 | 1 |
| 2 | 90402 | 10 | 4 | 2 | 2 | 1 |
| 3 | 90402 | 10 | 17 | 2 | 1 | 1 |
| 4 | 90402 | 10 | 21 | 1 | 2 | 1 |
CodePudding user response:
Once you calculated the aggregate function from the DataFrame you wanted (let's suppose 'myDF', and you saved it in a new variable (let's suppose 'myNewDF'), you save the results, so now worries about that.
However, you won't be able to merge the tables as you want, as you are comparing different data (you don't have the exact indexes to match the merge).
The best way to proceed will joining/merging the DataFrames without grouping, and then start aggregating .
CodePudding user response:
My general approach to this would be:
- convert
nationalityin your first dataset to foreign nationality (y/n) like your seconds dataset - use
df_grp = df.groupby(['district', 'sex', 'age', 'foreign nationality (y/n)'])['kind of welfare'].count()on your first dataset to get the number of people on welfare per district, sex, age and yes/no foreign nationality. - Merge
df_grpwith your second dataset on 'district', 'sex', 'age', 'foreign nationality (y/n)', so that you'll add a column with the number of people on welfare. - Then create a new column with the number of people NOT on welfare, by subtracting the column with the number of people on welfare from the column 'count'.
