In R, I try to divide n=10000 iid observations into 100 blocks and each block with size n/100=10. Then for each block I want to choose the largest value for each block as a new dataset with size 100. How to achieve this point in R?
For example,
#sample data
n<-10000
exp_data=rexp(n, 1)
CodePudding user response:
First you need a column that provides the grouping, in this example assume the groups are sequential (i.e. first 100 values belong to the first group, second 100 to the second group and so on):
df = data.frame(values = exp_data,
group = floor((1:length(exp_data))/100))
Now, just use tapply to get the maximum:
with(df, tapply(X = values,
INDEX = group,
FUN = max))
CodePudding user response:
One tidyverse way could be:
- We first transform to a tibble with
as_tibblefromtibblepackage. - Generate groups of 10 with
gl()function. - Split our tibble of 10000 rows to a list of tibbles with 100 tibble
- Apply the
mapfrompurrrpackage with theslice_maxfunction (dplyrpackage) to get the max value from each of the 100 new tibbles. - Finally use
bind_rows()to get them all in your new tibble with 100 rows:
Note (dplyr, tibble, purrr) are in tidyverse
library(tidyverse)
exp_data %>%
as_tibble() %>%
mutate(group =as.integer(gl(n(),100,n()))) %>%
group_split(group) %>%
map(., ~slice_max(., order_by = value)) %>%
bind_rows()
<dbl> <int>
1 5.81 1
2 6.42 2
3 4.46 3
4 4.07 4
5 5.35 5
6 5.85 6
7 4.03 7
8 5.13 8
9 4.71 9
10 4.71 10
# … with 90 more rows
