Sort 2 columns simultaneously based on the group name-CodePudding

Dataset

> read.delim("df.tsv")
   col1 col2 group
1     3    2    aa
2     1    1    aa
3     4    1    aa
4     4    3    aa
5     5    3    ab
6     3    2    ab
7     4    1    ab
8     2    4    ab
9     4    2    ba
10    1    4    ba
11    3    1    ba
12    4    3    ba
13    4    2    bb
14    2    3    bb
15    3    1    bb
16    1    2    bb

I want to sort the columns col1 and col2 within each of the 4 groups, in the following way:

If the 1st character in the group name is "a", sort col1 in a descending manner, and ascending if it's "b"
If the 2nd character in the group name is "a", sort col2 in a descending manner, and ascending if it's "b"
Importantly, I would like that both columns are sorted simultaneously, i.e. if e.g. the group is "aa", the sorting for that group should look like this:

   col1 col2 group
1     4    3    aa
2     3    2    aa
3     4    1    aa
4     1    1    aa
...

This could be accomplished by e.g. a "one row at a time" approach, first col1 and then col2, alternating for each row.

Current code and output

library(dplyr)

read.delim("df.tsv") %>%
  group_by(group) %>%
  arrange(ifelse(substr(group, 1,1) == "a", desc(col1), col1), # if first character in group name is "a", sort col1 in a descending manner, and ascending if it's "b"
          ifelse(substr(group, 2,2) == "a", desc(col2), col2), # if second character in group name is also "a", sort also col2 in a descending manner, and ascending if it's "b"
          .by_group = TRUE)

    col1  col2 group
 1     4     3 aa   
 2     4     1 aa   
 3     3     2 aa   
 4     1     1 aa   
 5     5     3 ab   
 6     4     1 ab   
 7     3     2 ab   
 8     2     4 ab   
 9     1     4 ba   
10     3     1 ba   
11     4     3 ba   
12     4     2 ba   
13     1     2 bb   
14     2     3 bb   
15     3     1 bb   
16     4     2 bb

However, this does not fulfill the 3rd criterion, the "simultaneous sorting one row at a time".

Desired output

    col1  col2 group
 1     4     3 aa   
 2     3     2 aa   
 3     4     1 aa   
 4     1     1 aa   
 5     5     3 ab   
 6     4     1 ab   
 7     3     2 ab   
 8     2     4 ab   
 9     1     4 ba   
10     4     3 ba   
11     3     1 ba   
12     4     2 ba   
13     1     2 bb   
14     3     1 bb   
15     2     3 bb   
16     4     2 bb

EDIT

There are a couple of answers that actually do the proposed task, so I think a tie-breaker could be that the algorithm is flexible with respect to the number of columns to sort, e.g. 3:

col1    col2    col3    group
3   2   4   aaa
1   1   2   aaa
4   1   4   aaa
4   3   1   aaa
5   3   3   aab
3   2   2   aab
4   1   1   aab
2   4   1   aab
4   2   3   aba
1   4   3   aba
3   1   2   aba
4   3   3   aba
3   2   4   abb
1   1   2   abb
4   1   4   abb
4   3   1   abb
4   2   1   baa
2   3   2   baa
3   1   2   baa
1   2   1   baa
5   3   3   bab
3   2   2   bab
4   1   1   bab
2   4   1   bab
4   2   3   bba
1   4   3   bba
3   1   2   bba
4   3   3   bba
4   2   1   bbb
2   3   2   bbb
3   1   2   bbb
1   2   1   bbb

The output should be

col1    col2    col3    group
4   3   1   aaa
3   2   4   aaa
4   1   4   aaa
1   1   2   aaa
5   3   3   aab
2   4   1   aab
4   1   1   aab
3   2   2   aab
4   2   3   aba
3   1   2   aba
1   4   3   aba
4   3   3   aba
4   1   4   abb
1   1   2   abb
4   3   1   abb
3   2   4   abb
1   2   1   baa
2   3   2   baa
3   1   2   baa
4   2   1   baa
2   4   1   bab
5   3   3   bab
4   1   1   bab
3   2   2   bab
1   4   3   bba
3   1   2   bba
4   2   3   bba
4   3   3   bba
1   2   1   bbb
3   1   2   bbb
4   2   1   bbb
2   3   2   bbb

Currently the 2 suggested solutions do not work when 3 or more columns are included, they sort based on only 2 columns.

CodePudding user response：

Well, perhaps a not efficient but simple solution is to just sort the two columns continuously, swap the major column each time, and discharge the first element until no element is left to be sorted. Here is the function.

alt_order <- function(..., type) {
  cols <- unname(list(...))
  stopifnot(
    # sanity checks; you may skip if you think they are unnecessary
    length(unique(lengths(cols))) == 1L,
    length(cols) == length(type),
    all(unlist(type) %in% c(1L, -1L))
  ) 
  cols <- mapply(`*`, cols, type, SIMPLIFY = FALSE)
  out <- integer(length(cols[[1L]]))
  for (i in seq_along(out)) {
    out[[i]] <- do.call(order, cols)[[1L]]
    cols <- rev(lapply(cols, `is.na<-`, out[[i]]))
  }
  out
}

We assign values to NAs to discharge them since NAs will be sorted to the last in an ascending way. type should be either 1 or -1 and is used to streamline the order we would like to impose since the descending order of c(1,2,3) is the same as the ascending order of -1 * c(1,2,3). We also need a helper function as follows to transfer your groups into 1 and -1

ab2sign <- function(x) {
  out <- data.table::transpose(strsplit(x, "", fixed = TRUE))
  lapply(out, function(x) 2L * (x == "b") - 1L)
}

Now we can apply them

df %>% group_by(group) %>% slice(alt_order(col1, col2, type = ab2sign(group)))

Output

# A tibble: 16 x 3
# Groups:   group [4]
    col1  col2 group
   <int> <int> <chr>
 1     4     3 aa   
 2     3     2 aa   
 3     4     1 aa   
 4     1     1 aa   
 5     5     3 ab   
 6     4     1 ab   
 7     3     2 ab   
 8     2     4 ab   
 9     1     4 ba   
10     4     3 ba   
11     3     1 ba   
12     4     2 ba   
13     1     2 bb   
14     3     1 bb   
15     2     3 bb   
16     4     2 bb

I hope to see more efficient (perhaps vectorized) solutions before the bounty expires.

CodePudding user response：

Here is an option using dynamic programming (but maybe not that efficient)

f <- function(.) {
  col <- with(., data.frame(col1, col2) * (2 * (t(list2DF(strsplit(.$group, ""))) == "b") - 1))
  r <- data.frame()
  while (nrow(.)) {
    p <- do.call(order, ifelse(nrow(r) %% 2, rev, I)(col))[1]
    r <- rbind(r, .[p, ])
    col <- col[-p,]
    . <- .[-p, ]
  }
  r
}

df %>%
  group_by(group) %>%
  do(f(.)) %>%
  ungroup()

which gives

# A tibble: 16 x 3
    col1  col2 group
   <int> <int> <chr>
 1     4     3 aa
 2     3     2 aa
 3     4     1 aa
 4     1     1 aa
 5     5     3 ab
 6     4     1 ab
 7     3     2 ab
 8     2     4 ab
 9     1     4 ba
10     4     3 ba
11     3     1 ba
12     4     2 ba
13     1     2 bb
14     3     1 bb
15     2     3 bb
16     4     2 bb

CodePudding user response：

I can tell you the answer, but I cannot write complete r code for it, since I don't know r, I hope some one may edit my code for a complete answer.

suppose both sorts are ascending (you can generalize it to your case)

idx1=order(col1)
idx2=order(col2[idx1])

return col1[idx1[idx2]], col2[idx1[idx2]]