Home > Blockchain >  R output BOTH maximum and minimum value by group in dataframe
R output BOTH maximum and minimum value by group in dataframe

Time:01-30

Let's say I have a dataframe of Name and value, is there any ways to extract BOTH minimum and maximum values within Name in a single function?

set.seed(1)
df <- tibble(Name = rep(LETTERS[1:3], each = 3), Value = sample(1:100, 9))

# A tibble: 9 x 2
  Name  Value
  <chr> <int>
1 A        27
2 A        37
3 A        57
4 B        89
5 B        20
6 B        86
7 C        97
8 C        62
9 C        58

The output should contains TWO columns only (Name and Value). Thanks in advance!

CodePudding user response:

You can use range to get max and min value and use it in summarise to get different rows for each Name.

library(dplyr)

df %>%
  group_by(Name) %>%
  summarise(Value = range(Value), .groups = "drop")

#  Name  Value
#  <chr> <int>
#1 A        27
#2 A        57
#3 B        20
#4 B        89
#5 C        58
#6 C        97

If you have large dataset using data.table might be faster.

library(data.table)
setDT(df)[, .(Value = range(Value)), Name]

CodePudding user response:

I'm currently using rbind() together with slice_min() and slice_max(), but I think it may not be the best way or the most efficient way when the dataframe contains millions of rows.

library(tidyverse)

rbind(df %>% group_by(Name) %>% slice_max(Value), 
      df %>% group_by(Name) %>% slice_min(Value)) %>%
  arrange(Name)

# A tibble: 6 x 2
# Groups:   Name [3]
  Name  Value
  <chr> <int>
1 A        57
2 A        27
3 B        89
4 B        20
5 C        97
6 C        58

CodePudding user response:

You can use dplyr::group_by() and dplyr::summarise() like this:

library(dplyr)

set.seed(1)
df <- tibble(Name = rep(LETTERS[1:3], each = 3), Value = sample(1:100, 9))

df %>% 
    group_by(Name) %>% 
    summarise(
        maximum = max(Value),
        minimum = min(Value)
    )

This outputs:

# A tibble: 3 × 3
  Name  maximum minimum
  <chr>   <int>   <int>
1 A          68       1
2 B          87      34
3 C          82      14

What's a little odd is that my original df object looks a little different than yours, in spite of the seed:

# A tibble: 9 × 2
  Name  Value
  <chr> <int>
1 A        68
2 A        39
3 A         1
4 B        34
5 B        87
6 B        43
7 C        14
8 C        82
9 C        59

CodePudding user response:

Using aggregate.

aggregate(Value ~ Name, df, range)
#   Name Value.1 Value.2
# 1    A       1      68
# 2    B      34      87
# 3    C      14      82
  •  Tags:  
  • Related