Fastest way to find second (third...) highest/lowest value in vector or column

ghz 1years ago ⋅ 5082 views

Question

R offers max and min, but I do not see a really fast way to find another value in the order, apart from sorting the whole vector and then picking a value x from this vector.

Is there a faster way to get the second highest value, for example?


Answer

Rfast has a function called nth_element that does exactly what you ask.

Further the methods discussed above that are based on partial sort, don't support finding the k smallest values

Update (28/FEB/21) package kit offers a faster implementation (topn) see https://stackoverflow.com/a/66367996/4729755, https://stackoverflow.com/a/53146559/4729755

Disclaimer : An issue appears to occur when dealing with integers which can by bypassed by using as.numeric (e.g. Rfast::nth(as.numeric(1:10), 2)), and will be addressed in the next update of Rfast.

Rfast::nth(x, 5, descending = T)

Will return the 5th largest element of x, while

Rfast::nth(x, 5, descending = F)

Will return the 5th smallest element of x

Benchmarks below against most popular answers.

For 10 thousand numbers:

N = 10000
x = rnorm(N)

maxN <- function(x, N=2){
    len <- length(x)
    if(N>len){
        warning('N greater than length(x).  Setting N=length(x)')
        N <- length(x)
    }
    sort(x,partial=len-N+1)[len-N+1]
}

microbenchmark::microbenchmark(
Rfast = Rfast::nth(x,5,descending = T),
maxn = maxN(x,5),
order = x[order(x, decreasing = T)[5]])

Unit: microseconds
  expr      min       lq      mean   median        uq       max neval
 Rfast  160.364  179.607  202.8024  194.575  210.1830   351.517   100
  maxN  396.419  423.360  559.2707  446.452  487.0775  4949.452   100
 order 1288.466 1343.417 1746.7627 1433.221 1500.7865 13768.148   100

For 1 million numbers:

N = 1e6
x = rnorm(N)

microbenchmark::microbenchmark(
Rfast = Rfast::nth(x,5,descending = T),
maxN = maxN(x,5),
order = x[order(x, decreasing = T)[5]]) 

Unit: milliseconds
  expr      min        lq      mean   median        uq       max neval
 Rfast  89.7722  93.63674  114.9893 104.6325  120.5767  204.8839   100
  maxN 150.2822 207.03922  235.3037 241.7604  259.7476  336.7051   100
 order 930.8924 968.54785 1005.5487 991.7995 1031.0290 1164.9129   100