Home > Back-end >  for loop to determine the top 10 percent of values in an interval
for loop to determine the top 10 percent of values in an interval

Time:01-11

I essentially have two columns (vectors) with speed and accel in a data.frame as such:

    speed     acceleration
1   3.2694444 2.6539535522
2   3.3388889 2.5096979141
3   3.3888889 2.2722134590
4   3.4388889 1.9815256596
5   3.5000000 1.6777544022
6   3.5555556 1.3933215141
7   3.6055556 1.1439051628
8   3.6527778 0.9334115982
9   3.6722222 0.7561602592

I need to find for each value speed on the x axis (speed), what is the top 10% max values from the y axis (acceleration). This also needs to be in a specific interval. For example speed 3.2-3.4, 3.4-3.6, and so on. Can you please show me how a for loop would look like in this situation?

CodePudding user response:

As @alistaire already pointed out, you have provided a very limited amount of data. So we first have to simulate I a bit more data based on which we can test our code.

Let's first simulate some extra data.

set.seed(1)

# your data
speed <- c(3.2694444, 3.3388889, 3.3388889, 3.4388889, 3.5,
           3.5555556, 3.6055556, 3.6527778, 3.6722222)
acceleration <- c(2.6539535522, 2.5096979141, 2.2722134590,
                  1.9815256596, 1.6777544022, 1.3933215141,
                  1.1439051628, 0.9334115982, 0.7561602592)
df <- data.frame(speed, acceleration)

# expand data.frame and add a little bit of noise to all values
# to make them 'unique'
df <- as.data.frame(do.call(
  rbind,
  replicate(15L, apply(df, 2, \(x) (x   runif(length(x), -1e-1, 1e-1) )),
            simplify = FALSE)
))

Now, the following code which does the 'heavy lifting' and stores the desired result in out.

# function to cut speed into equal intervals
my_groups <- \(n_groups) {
  step <- with(df, c(max(speed) - min(speed))/n_groups)
  intervals <- array(0L, dim = n_groups)
  for(i in seq_len(n_groups)) {
    intervals[i] <- min(df$speed)   i * step
  }
  return(intervals)
}

# three intervals of equal width
my_intervals <- my_groups(n_groups = 3)

# Compute values of speed when acceleration is greater then
# or equal to its 90th percentile in each interval
out <- lapply(1:(length(my_intervals)-1L), \(i) {
  x <- subset(df, speed >= my_intervals[i] & speed <= my_intervals[i 1L])
  x[x$acceleration >= quantile(x$acceleration, 0.9), ]
})

# function to round values to two decimal places
r <- \(x) round(x, 2)

# assign names to each element of out
for(i in seq_along(out)) {
  names(out)[i] <- paste0(r(my_intervals[i]), '-', r(my_intervals[i 1L]))
}

Output

> out
$`3.38-3.57`
       speed acceleration
11  3.394378     2.583636
21  3.383631     2.267659
57  3.434123     2.300234
83  3.394886     2.580924
101 3.395459     2.460971

$`3.57-3.76`
      speed acceleration
6  3.635234     1.447290
41 3.572868     1.618293
51 3.615017     1.420020
95 3.575412     1.763215

CodePudding user response:

there was a previous question 1 yr 9 mon ago which may help: dplyr select top ten values for each category can probably google it , I did

  •  Tags:  
  • Related