Creat a reproducible data
set.seed(20220105)
begin = as.Date('1994-01-01')
end = as.Date('1994-12-31')
date_seq = seq(from = begin, to = end, by = '1 day')
length = length(date_seq)
death = sample(x = 1:100, size = length, replace = T)
temperature = sample(x = -25:25, size = length, replace = T)
df = data.frame(date = date_seq, death = death, temperature = temperature)
> head(df)
date death temperature
1 1994-01-01 66 20
2 1994-01-02 56 7
3 1994-01-03 33 -9
4 1994-01-04 29 -17
5 1994-01-05 6 0
6 1994-01-06 33 -15
Variable definition
Each day in df can be a case day and also can be a control day.
The case day and control days are matched by day of the week in the same month and in the same year.
Thus, each case has 3 or 4 control days (before and/or after the case day in the same month).
For example, when the case day is 1994-01-01, control days are 1994-01-08, 1994-01-15, 1994-01-22 and 1994-01-29.
When the case day is 1994-01-08, control days are 1994-01-01, 1994-01-15, 1994-01-22 and 1994-01-29.
What I needed
I want to creat a new df_wanted that based on the original df.
df_wanted should contain 5 varibales which 3 variables from the original df including date, death and temperature.
The death and temperature should be the case day's death and temperature.
The other 2 varibales are new variables.
One is status which indicates a day is case day or control day.
One is stratum. This variable like a group. Each group has one case day and three or four control day.
Some of the data in df_wanted should like this:
df_wanted = data.frame(
date = c('1994-01-01', '1994-01-08', '1994-01-15', '1994-01-22', '1994-01-29',
'1994-01-08', '1994-01-01', '1994-01-15', '1994-01-22', '1994-01-29'),
status = c(1,0,0,0,0, 1,0,0,0,0),
stratum = c(1,1,1,1,1, 2,2,2,2,2),
death = c(66, 66, 66, 66, 66, 1,1,1,1,1),
temperature = c(20,20,20,20,20, 13,13,13,13,13)
)
> df_wanted
date status stratum death temperature
1 1994-01-01 1 1 66 20
2 1994-01-08 0 1 66 20
3 1994-01-15 0 1 66 20
4 1994-01-22 0 1 66 20
5 1994-01-29 0 1 66 20
6 1994-01-08 1 2 1 13
7 1994-01-01 0 2 1 13
8 1994-01-15 0 2 1 13
9 1994-01-22 0 2 1 13
10 1994-01-29 0 2 1 13
What I have tried
I searched some answers, such as Create control dates in bilateral case crossover design and Create a case control pair for time stratified case-crossover design, but all the answers do not meet my need.
Any help will be highly appreciated!
CodePudding user response:
We can full_join the dataset with itself by the same day of week, month and year. Here is a dplyr approach.
library(dplyr)
df %>%
mutate(
id = seq_len(n()),
ymw = format(date, "%Y-%m-%w")
) %>%
full_join(., ., by = "ymw") %>%
transmute(
statum = id.x,
date = date.y,
status = (id.x == id.y),
death = death.x,
temperature = temperature.x
)
Output
statum date status death temperature
1 1 1994-01-01 1 66 20
2 1 1994-01-08 0 66 20
3 1 1994-01-15 0 66 20
4 1 1994-01-22 0 66 20
5 1 1994-01-29 0 66 20
6 2 1994-01-02 1 56 7
7 2 1994-01-09 0 56 7
8 2 1994-01-16 0 56 7
9 2 1994-01-23 0 56 7
10 2 1994-01-30 0 56 7
11 3 1994-01-03 1 33 -9
12 3 1994-01-10 0 33 -9
13 3 1994-01-17 0 33 -9
14 3 1994-01-24 0 33 -9
15 3 1994-01-31 0 33 -9
16 4 1994-01-04 1 29 -17
17 4 1994-01-11 0 29 -17
18 4 1994-01-18 0 29 -17
19 4 1994-01-25 0 29 -17
20 5 1994-01-05 1 6 0
21 5 1994-01-12 0 6 0
22 5 1994-01-19 0 6 0
23 5 1994-01-26 0 6 0
24 6 1994-01-06 1 33 -15
25 6 1994-01-13 0 33 -15
26 6 1994-01-20 0 33 -15
27 6 1994-01-27 0 33 -15
28 7 1994-01-07 1 31 21
29 7 1994-01-14 0 31 21
30 7 1994-01-21 0 31 21
31 7 1994-01-28 0 31 21
32 8 1994-01-01 0 1 13
33 8 1994-01-08 1 1 13
34 8 1994-01-15 0 1 13
35 8 1994-01-22 0 1 13
36 8 1994-01-29 0 1 13
37 9 1994-01-02 0 83 4
38 9 1994-01-09 1 83 4
39 9 1994-01-16 0 83 4
40 9 1994-01-23 0 83 4
41 9 1994-01-30 0 83 4
42 10 1994-01-03 0 37 7
43 10 1994-01-10 1 37 7
44 10 1994-01-17 0 37 7
45 10 1994-01-24 0 37 7
46 10 1994-01-31 0 37 7
47 11 1994-01-04 0 94 -18
48 11 1994-01-11 1 94 -18
49 11 1994-01-18 0 94 -18
50 11 1994-01-25 0 94 -18
51 12 1994-01-05 0 46 3
52 12 1994-01-12 1 46 3
53 12 1994-01-19 0 46 3
54 12 1994-01-26 0 46 3
55 13 1994-01-06 0 45 -13
56 13 1994-01-13 1 45 -13
57 13 1994-01-20 0 45 -13
58 13 1994-01-27 0 45 -13
59 14 1994-01-07 0 47 -21
60 14 1994-01-14 1 47 -21
61 14 1994-01-21 0 47 -21
62 14 1994-01-28 0 47 -21
63 15 1994-01-01 0 38 3
64 15 1994-01-08 0 38 3
65 15 1994-01-15 1 38 3
66 15 1994-01-22 0 38 3
67 15 1994-01-29 0 38 3
68 16 1994-01-02 0 96 -25
69 16 1994-01-09 0 96 -25
70 16 1994-01-16 1 96 -25
71 16 1994-01-23 0 96 -25
72 16 1994-01-30 0 96 -25
73 17 1994-01-03 0 99 20
74 17 1994-01-10 0 99 20
75 17 1994-01-17 1 99 20
76 17 1994-01-24 0 99 20
77 17 1994-01-31 0 99 20
78 18 1994-01-04 0 33 -22
79 18 1994-01-11 0 33 -22
80 18 1994-01-18 1 33 -22
81 18 1994-01-25 0 33 -22
82 19 1994-01-05 0 46 10
83 19 1994-01-12 0 46 10
84 19 1994-01-19 1 46 10
85 19 1994-01-26 0 46 10
86 20 1994-01-06 0 60 -2
87 20 1994-01-13 0 60 -2
88 20 1994-01-20 1 60 -2
89 20 1994-01-27 0 60 -2
90 21 1994-01-07 0 43 16
91 21 1994-01-14 0 43 16
92 21 1994-01-21 1 43 16
93 21 1994-01-28 0 43 16
94 22 1994-01-01 0 81 -14
95 22 1994-01-08 0 81 -14
96 22 1994-01-15 0 81 -14
97 22 1994-01-22 1 81 -14
98 22 1994-01-29 0 81 -14
99 23 1994-01-02 0 67 25
100 23 1994-01-09 0 67 25
101 23 1994-01-16 0 67 25
102 23 1994-01-23 1 67 25
103 23 1994-01-30 0 67 25
104 24 1994-01-03 0 31 23
105 24 1994-01-10 0 31 23
106 24 1994-01-17 0 31 23
107 24 1994-01-24 1 31 23
108 24 1994-01-31 0 31 23
109 25 1994-01-04 0 25 0
110 25 1994-01-11 0 25 0
111 25 1994-01-18 0 25 0
112 25 1994-01-25 1 25 0
113 26 1994-01-05 0 51 -21
114 26 1994-01-12 0 51 -21
115 26 1994-01-19 0 51 -21
116 26 1994-01-26 1 51 -21
117 27 1994-01-06 0 37 5
118 27 1994-01-13 0 37 5
119 27 1994-01-20 0 37 5
120 27 1994-01-27 1 37 5
121 28 1994-01-07 0 3 13
122 28 1994-01-14 0 3 13
123 28 1994-01-21 0 3 13
124 28 1994-01-28 1 3 13
125 29 1994-01-01 0 69 -22
126 29 1994-01-08 0 69 -22
127 29 1994-01-15 0 69 -22
128 29 1994-01-22 0 69 -22
129 29 1994-01-29 1 69 -22
130 30 1994-01-02 0 51 12
131 30 1994-01-09 0 51 12
132 30 1994-01-16 0 51 12
133 30 1994-01-23 0 51 12
134 30 1994-01-30 1 51 12
135 31 1994-01-03 0 84 17
136 31 1994-01-10 0 84 17
137 31 1994-01-17 0 84 17
138 31 1994-01-24 0 84 17
139 31 1994-01-31 1 84 17
140 32 1994-02-01 1 10 4
141 32 1994-02-08 0 10 4
142 32 1994-02-15 0 10 4
143 32 1994-02-22 0 10 4
144 33 1994-02-02 1 67 10
145 33 1994-02-09 0 67 10
146 33 1994-02-16 0 67 10
147 33 1994-02-23 0 67 10
148 34 1994-02-03 1 61 -21
149 34 1994-02-10 0 61 -21
150 34 1994-02-17 0 61 -21
151 34 1994-02-24 0 61 -21
152 35 1994-02-04 1 11 7
153 35 1994-02-11 0 11 7
154 35 1994-02-18 0 11 7
155 35 1994-02-25 0 11 7
156 36 1994-02-05 1 15 -21
157 36 1994-02-12 0 15 -21
158 36 1994-02-19 0 15 -21
159 36 1994-02-26 0 15 -21
160 37 1994-02-06 1 78 21
161 37 1994-02-13 0 78 21
162 37 1994-02-20 0 78 21
163 37 1994-02-27 0 78 21
164 38 1994-02-07 1 67 11
165 38 1994-02-14 0 67 11
166 38 1994-02-21 0 67 11
167 38 1994-02-28 0 67 11
168 39 1994-02-01 0 89 -10
169 39 1994-02-08 1 89 -10
170 39 1994-02-15 0 89 -10
171 39 1994-02-22 0 89 -10
172 40 1994-02-02 0 70 11
173 40 1994-02-09 1 70 11
174 40 1994-02-16 0 70 11
175 40 1994-02-23 0 70 11
176 41 1994-02-03 0 95 25
177 41 1994-02-10 1 95 25
178 41 1994-02-17 0 95 25
179 41 1994-02-24 0 95 25
180 42 1994-02-04 0 75 22
181 42 1994-02-11 1 75 22
182 42 1994-02-18 0 75 22
183 42 1994-02-25 0 75 22
184 43 1994-02-05 0 99 -20
185 43 1994-02-12 1 99 -20
186 43 1994-02-19 0 99 -20
187 43 1994-02-26 0 99 -20
188 44 1994-02-06 0 99 7
189 44 1994-02-13 1 99 7
190 44 1994-02-20 0 99 7
191 44 1994-02-27 0 99 7
192 45 1994-02-07 0 62 -2
193 45 1994-02-14 1 62 -2
194 45 1994-02-21 0 62 -2
195 45 1994-02-28 0 62 -2
196 46 1994-02-01 0 50 -9
197 46 1994-02-08 0 50 -9
198 46 1994-02-15 1 50 -9
199 46 1994-02-22 0 50 -9
200 47 1994-02-02 0 99 -13
[ reached 'max' / getOption("max.print") -- omitted 1405 rows ]
