R output BOTH maximum and minimum value by group in dataframe-CodePudding

Let's say I have a dataframe of Name and value, is there any ways to extract BOTH minimum and maximum values within Name in a single function?

set.seed(1)
df <- tibble(Name = rep(LETTERS[1:3], each = 3), Value = sample(1:100, 9))

# A tibble: 9 x 2
  Name  Value
  <chr> <int>
1 A        27
2 A        37
3 A        57
4 B        89
5 B        20
6 B        86
7 C        97
8 C        62
9 C        58

The output should contains TWO columns only (Name and Value). Thanks in advance!

CodePudding user response：

You can use range to get max and min value and use it in summarise to get different rows for each Name.

library(dplyr)

df %>%
  group_by(Name) %>%
  summarise(Value = range(Value), .groups = "drop")

#  Name  Value
#  <chr> <int>
#1 A        27
#2 A        57
#3 B        20
#4 B        89
#5 C        58
#6 C        97

If you have large dataset using data.table might be faster.

library(data.table)
setDT(df)[, .(Value = range(Value)), Name]

CodePudding user response：

I'm currently using rbind() together with slice_min() and slice_max(), but I think it may not be the best way or the most efficient way when the dataframe contains millions of rows.

library(tidyverse)

rbind(df %>% group_by(Name) %>% slice_max(Value), 
      df %>% group_by(Name) %>% slice_min(Value)) %>%
  arrange(Name)

# A tibble: 6 x 2
# Groups:   Name [3]
  Name  Value
  <chr> <int>
1 A        57
2 A        27
3 B        89
4 B        20
5 C        97
6 C        58

CodePudding user response：

You can use dplyr::group_by() and dplyr::summarise() like this:

library(dplyr)

set.seed(1)
df <- tibble(Name = rep(LETTERS[1:3], each = 3), Value = sample(1:100, 9))

df %>% 
    group_by(Name) %>% 
    summarise(
        maximum = max(Value),
        minimum = min(Value)
    )

This outputs:

# A tibble: 3 × 3
  Name  maximum minimum
  <chr>   <int>   <int>
1 A          68       1
2 B          87      34
3 C          82      14

What's a little odd is that my original df object looks a little different than yours, in spite of the seed:

# A tibble: 9 × 2
  Name  Value
  <chr> <int>
1 A        68
2 A        39
3 A         1
4 B        34
5 B        87
6 B        43
7 C        14
8 C        82
9 C        59

CodePudding user response：

Using aggregate.

aggregate(Value ~ Name, df, range)
#   Name Value.1 Value.2
# 1    A       1      68
# 2    B      34      87
# 3    C      14      82