I'm using R to work with the US county-level voting data that the good folks at MIT steward. I'd like to know the total votes each candidate got in each county. For some states, such as Wisconsin, that's easy:
"state", "county_name", "county_fips", "candidate", "party", "candidatevotes", "totalvotes", "mode"<br>
"WISCONSIN", "WINNEBAGO", "55139", "JO JORGENSEN", "LIBERTARIAN", 1629, 94032, "TOTAL"
For other states, such as Utah, it's doable:
"UTAH", "WEBER", "49057", "DONALD J TRUMP", "REPUBLICAN", 0, 111403, "EARLY"<br>
"UTAH", "WEBER", "49057", "DONALD J TRUMP", "REPUBLICAN", 0, 111403, "ELECTION DAY"<br>
"UTAH", "WEBER", "49057", "DONALD J TRUMP", "REPUBLICAN", 0, 111403, "MAIL"<br>
"UTAH", "WEBER", "49057", "DONALD J TRUMP", "REPUBLICAN", 65949, 111403, "TOTAL"
South Carolina, however, is problematic:
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 13656, 144050, "ABSENTEE BY MAIL"<br>
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 22075, 144050, "ELECTION DAY"<br>
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 18, 144050, "FAILSAFE"<br>
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 176, 144050, "FAILSAFE PROVISIONAL"<br>
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 22950, 144050, "IN-PERSON ABSENTEE"<br>
"SOUTH CAROLINA", "YORK", "45091", "JOSEPH R BIDEN JR", "DEMOCRAT", 133, 144050, "PROVISIONAL"
It seems to me that there should be some way to loop over the FIPS codes and the party name to generate the totals for each county, but I'm stumped.
CodePudding user response:
Does this solve your problem?
library(tidyverse)
df <- read_csv("~/Desktop/countypres_2000-2020.csv")
#> Rows: 72617 Columns: 12
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (8): state, state_po, county_name, county_fips, office, candidate, party...
#> dbl (4): year, candidatevotes, totalvotes, version
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df %>%
filter(year == 2020) %>%
group_by(candidate, county_fips) %>%
summarise(
county_name,
total_votes_per_candidate_per_county = sum(candidatevotes)
) %>%
relocate(candidate, .before = 4) %>%
distinct() %>%
arrange(county_fips)
#> `summarise()` has grouped output by 'candidate', 'county_fips'. You can override using the `.groups` argument.
#> # A tibble: 11,902 × 4
#> # Groups: candidate, county_fips [11,898]
#> county_fips county_name candidate total_votes_per_candidate_per_coun…
#> <chr> <chr> <chr> <dbl>
#> 1 01001 AUTAUGA DONALD J TRUMP 19838
#> 2 01001 AUTAUGA JOSEPH R BIDEN JR 7503
#> 3 01001 AUTAUGA OTHER 429
#> 4 01003 BALDWIN DONALD J TRUMP 83544
#> 5 01003 BALDWIN JOSEPH R BIDEN JR 24578
#> 6 01003 BALDWIN OTHER 1557
#> 7 01005 BARBOUR DONALD J TRUMP 5622
#> 8 01005 BARBOUR JOSEPH R BIDEN JR 4816
#> 9 01005 BARBOUR OTHER 80
#> 10 01007 BIBB DONALD J TRUMP 7525
#> # … with 11,892 more rows
Created on 2022-01-20 by the reprex package (v2.0.1)
