I have a dataframe (sy2.1) with two columns (Vannstand2Positive and Vannstand2Negative) that both contain NA's as a result of mutating the columns from an existing column. I also have a third column (t) with time and date, and a fourth column (Kategori) with 4 categories. I want to find the percentage of each of the 4 categories with values from Vannstand2Positive and Vannstand2Negative.
This is the code I have used, in which I only use the column with positive values (Vannstand2Positive) in this code.
sy2.1 %>%
filter(year(t)==2005) %>%
group_by(Kategori, year(t)) %>%
dplyr::summarise(sum_vannstandPos = sum(Vannstand2Positive)) %>%
dplyr::mutate(percent_vannstandPos=sum_vannstandPos/sum(sum_vannstandPos)*100)
# Kategori year(t) sum_VannstandPos percent_VannstandPos
# 1: Liten 2005 NA NA
# 2: Moderat 2005 1107 100
# 1: Stor 2005 609 100
# 2: Størst 2005 1107 100
There is clearly something wrong with the code, as the percentage gives 100% on all the categories, but I need the correct calculations of the percentage. I tried to also use it on the column with negative values (Vannstand2Negative), but I only got NA's on the table. But I know its fully possible to find percentages of negative values. Anyone with a code that can fix this?
By adding na.rm=T to the sum function, I got percentages of almost all groups except the first group:
# Kategori year(t) sum_VannstandPos percent_VannstandPos
# 1: Liten 2005 NA NA
# 2: Moderat 2005 1107 32.34005
# 1: Stor 2005 609 17.79141
# 2: Størst 2005 1107 49.86854
CodePudding user response:
Use na.rm = TRUE to drop the NA values from the calculation in sum. You can use .groups = 'drop' in summarise to drop the groupings and since you are using year(t) more than once it is better to create a new column with the year information.
library(dplyr)
library(lubridate)
result <- sy2.1 %>%
mutate(year = year(t)) %>%
filter(year==2005) %>%
group_by(Kategori, year) %>%
dplyr::summarise(sum_vannstandPos = sum(Vannstand2Positive, na.rm = TRUE),
.groups = 'drop') %>%
dplyr::mutate(percent_vannstandPos= prop.table(sum_vannstandPos)*100)
result
prop.table(sum_vannstandPos) is another way to write sum_vannstandPos/sum(sum_vannstandPos).
CodePudding user response:
Adding ungroup() as in the reproducab le example below.
iris %>%
group_by(Species) %>%
summarize(sum_Petal.Length = sum(Petal.Length, na.rm = TRUE)) %>%
ungroup() %>%
mutate(
precentage = sum_Petal.Length / sum(sum_Petal.Length, na.rm = TRUE)
)
