How to deal with nonstandard column names (white space, punctuation, starts with numbers)

ghz 1years ago ⋅ 5722 views

Question

df <- structure(list(`a a` = 1:3, `a b` = 2:4), .Names = c("a a", "a b"
), row.names = c(NA, -3L), class = "data.frame")

and the data looks like

  a a a b
1   1   2
2   2   3
3   3   4

Following call to select

select(df, 'a a')

gives

Error in abs(ind[ind < 0]) : 
  non-numeric argument to mathematical function

How can I select "a a" and/or rename it to something without space using select? I know the following approaches:

  1. names(df)[1] <- "a"
  2. select(df, a=1)
  3. select(df, ends_with("a"))

but if I am working on a large data set, how can I get an exact match without knowing the index numer or similar column names?


Answer

You may select the variable by using backticks ```.

select(df, `a a`)
#   a a
# 1   1
# 2   2
# 3   3

However, if your main objective is to rename the column, you may use rename in plyr package, in which you can use both "" and ````.

rename(df, replace = c("a a" = "a"))
rename(df, replace = c(`a a` = "a"))

Or in base R:

names(df)[names(df) == "a a"] <- "a"

For a more thorough description on the use of various quotes, see ?Quotes. The 'Names and Identifiers' section is especially relevant here:

other [syntactically invalid] names can be used provided they are quoted. The preferred quote is the backtick".

See also ?make.names about valid names.

See also [this post](https://stackoverflow.com/questions/21502465/replacement-for-rename- in-dplyr) about renaming in dplyr