I have a dataset that looks like this :
| var | date | value |
|---|---|---|
| A | 2022-01-01 | 1 |
| A | 2022-01-02 | 2 |
| A | 2022-01-03 | 3 |
| A | 2022-01-04 | 4 |
| A | 2022-01-05 | 5 |
| A | 2022-01-06 | 6 |
| A | 2022-01-07 | 7 |
| B | 2022-02-02 | 10 |
| B | 2022-01-03 | 20 |
| B | 2022-01-07 | 30 |
| C | 2022-01-01 | 100 |
| C | 2022-01-04 | 200 |
| C | 2022-01-05 | 300 |
| C | 2022-06-06 | 400 |
My variable of interest is the A from column var.Specifically the dates of A that match the values of the other factors inn var variable.I want to pivot wider them into :
| date | A | B | C |
|---|---|---|---|
| 2022-01-01 | 1 | NA | 100 |
| 2022-01-02 | 2 | NA | NA |
| 2022-01-03 | 3 | 20 | NA |
| 2022-01-04 | 4 | NA | 200 |
| 2022-01-05 | 5 | NA | 300 |
| 2022-01-06 | 6 | NA | NA |
| 2022-01-07 | 7 | 30 | NA |
And at the end to each column to summarize the sum (or even the correlation of A with B and the correlation A with C):
| var | Sum |
|---|---|
| A | 28 |
| B | 50 |
| C | 600 |
How can I do it in R using dplyr package ?
library(tidyverse)
date = c(seq(as.Date("2022/1/1"), by = "day", length.out = 7),
as.Date("2022/2/2"),as.Date("2022/1/3"),as.Date("2022/1/7"),
as.Date("2022/1/1"),as.Date("2022/1/4"),as.Date("2022/1/5"),as.Date("2022/6/6"))
var = c(rep("A",7),rep("B",3),rep("C",4))
value = c(seq(1,7,1),10,20,30,100,200,300,400)
data = tibble(var,date,value);data
CodePudding user response:
We may convert to 'wide' format with pivot_wider, filter out the NA element rows from A column, summarise acrossthe columns to get thesumand reshape to 'long' withpivot_longer`
library(dplyr)
library(tidyr)
pivot_wider(data, names_from = var, values_from = value) %>%
filter(!is.na(A)) %>%
summarise(across(A:C, sum, na.rm = TRUE)) %>%
pivot_longer(cols = everything(), names_to = 'var', values_to = 'Sum')
# A tibble: 3 × 2
var Sum
<chr> <dbl>
1 A 28
2 B 50
3 C 600
