I have a dataframe that contains a rating column.
I would like to sort using a custom order, e.g.
AAA, AA, A, BBB, BB, B
However, default R sorting (using dplyr::arrange) results in
A AA AAA B BB BBB
data.frame(Rating=c('AAA','AA','A','B','BB','BBB'),
Value1=c(1,2,3,4,5,6),
Value2=c(2,3,4,5,3,2)) %>%
arrange(Rating)
I found many links referring to same problem but they are not related to dataframe eg customize the sort function in R
How can I sort my data using a custom order?
CodePudding user response:
If you know the complete list of possible ratings, the easiest way is to make it a factor with the values in order, e.g.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
data.frame(Rating=factor(c('A', 'AA','AAA','B','BB','BBB'),
levels=c('AAA','AA','A','BBB','BB','B')),
Value1=c(1,2,3,4,5,6),
Value2=c(2,3,4,5,3,2)) %>%
arrange(Rating)
#> Rating Value1 Value2
#> 1 AAA 3 4
#> 2 AA 2 3
#> 3 A 1 2
#> 4 BBB 6 2
#> 5 BB 5 3
#> 6 B 4 5
Created on 2022-01-20 by the reprex package (v2.0.1)
CodePudding user response:
Here is one approach using dplyr.
In short, first sort by the letter grade, and then again by the number of letters. This would not work for ratings such as AAB, but from what I gather from your example this isn't the case.
library(dplyr)
data.frame(Rating=c('AAA','AA','A','B','BB','BBB'),
Value1=c(1,2,3,4,5,6),
Value2=c(2,3,4,5,3,2)) %>%
mutate(grade = substr(Rating, 1,1), # Create a column with letter grade
count = nchar(Rating)) %>% # Create a column with number of letters
arrange(grade, count) %>% # Sort by grade, then count
select(-grade, count) # Optional, removes intermediary columns
#> Rating Value1 Value2 count
#> 1 A 3 4 1
#> 2 AA 2 3 2
#> 3 AAA 1 2 3
#> 4 B 4 5 1
#> 5 BB 5 3 2
#> 6 BBB 6 2 3
Created on 2022-01-20 by the reprex package (v0.3.0)
CodePudding user response:
If you want to keep the column as character rather than factor, you can arrange based on a match to the order vector
rating_order <- c("AAA", "AA", "A", "BBB", "BB", "B")
df <-
data.frame(
Rating = c("A", "AA", "AAA", "B", "BB", "BBB"),
Value1 = c(1, 2, 3, 4, 5, 6),
Value2 = c(2, 3, 4, 5, 3, 2)
)
library(dplyr, warn.conflicts = FALSE)
df %>%
arrange(match(Rating, rating_order))
#> Rating Value1 Value2
#> 1 AAA 3 4
#> 2 AA 2 3
#> 3 A 1 2
#> 4 BBB 6 2
#> 5 BB 5 3
#> 6 B 4 5
Created on 2022-01-20 by the reprex package (v2.0.1)
