Question
R's duplicated
returns a vector showing whether each element of a vector or
data frame is a duplicate of an element with a smaller subscript. So if rows
3, 4, and 5 of a 5-row data frame are the same, duplicated
will give me the
vector
FALSE, FALSE, FALSE, TRUE, TRUE
But in this case I actually want to get
FALSE, FALSE, TRUE, TRUE, TRUE
that is, I want to know whether a row is duplicated by a row with a larger subscript too.
Answer
duplicated
has a fromLast
argument. The "Example" section of ?duplicated
shows you how to use it. Just call duplicated
twice, once with
fromLast=FALSE
and once with fromLast=TRUE
and take the rows where either
are TRUE
.
Some late Edit: You didn't provide a reproducible example, so here's an illustration kindly contributed by @jbaums
vec <- c("a", "b", "c","c","c")
vec[duplicated(vec) | duplicated(vec, fromLast=TRUE)]
## [1] "c" "c" "c"
Edit: And an example for the case of a data frame:
df <- data.frame(rbind(c("a","a"),c("b","b"),c("c","c"),c("c","c")))
df[duplicated(df) | duplicated(df, fromLast=TRUE), ]
## X1 X2
## 3 c c
## 4 c c