tidyselect::where() inconsistencies: where is where()?

Summary: You can do rename(A=1, B=2), can you do the same using rename_with()? my ~str_replace(... paste0()) works, I don't need to change that. But it only works for one variable at a time. Tidyselect suggests wrapping where(~str_replace...) but then complains it can't find it even though I can get where() to work in other instances.

I want to implement rename_with for more than one variable, but I get an error Error: Formula shorthand must be wrapped in where()`.

# Bad
  data %>% select(~str_replace(., "Var_2_", paste0("Issue: Time")))

  # Good
  data %>% select(where(~str_replace(., "Var_2_", paste0("Issue: time"))))

Example original: test%>% rename_with( ~str_replace(., "Var_2_", paste0("Issue: Time")), ~str_replace(., "Var_3_", paste0("Issue: Time")))

when I run test%>% rename_with(where( ~str_replace(., "Var_2_", paste0("Issue: Time")), ~str_replace(., "Var_3_", paste0("Issue: Time"))))

and test%>% rename_with( where(~str_replace(., "Var_2_", paste0("Issue: Time"))), where(~str_replace(., "Var_3_", paste0("Issue: Time"))))

I get Error in where(~str_replace(., "Var_1_", paste0("Gov't surveillance: video wave")), : could not find function "where" And I can't find it tabbing through tidyselect::

But I can run test%>% select(where(is.numeric)) %>% map(sd, na.rm = TRUE) without any issue so it does exist. What am I doing wrong?

Example data:

x <- c("_1_1",
       "_1_2",
       "_1_3",
       "_2_1",
       "_2_2",
       "_2_3",
       "_3_1",
       "_3_2",
       "_3_3",
       "_4_3")
paste0("Var",x)

test <- t(as_tibble(rnorm(10, 5.5, .35)))
colnames(test) <- paste0("Var",x)

1 answer

  • answered 2021-02-22 22:32 akrun

    There is a switching of arguments in rename_with compared to rename_at. It is a bit unclear about the column names specified in the code and the data showed especially with the str_replace in both arguments. A typical use to replace the column names that starts with 'Var_2' with 'Issue: Time_2' would be

    library(dplyr)
    data <- data %>% 
        rename_with(~ str_replace(., 'Var_2', 'Issue: Time'), 
           starts_with('Var_2'))
    

    -output

    data
    # A tibble: 1 x 10
    #  Var_1_1 Var_1_2 Var_1_3 `Issue: Time_1` `Issue: Time_2` `Issue: Time_3` Var_3_1 Var_3_2 Var_3_3 Var_4_3
    #    <dbl>   <dbl>   <dbl>           <dbl>           <dbl>           <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
    #1    5.68    5.18    5.34            5.38            5.47            5.82    5.93    5.35    5.20    5.62   
    

    If we need to change multiple column patterns, use matches

    data %>% 
       rename_with(~ str_replace(., '(Var_2|Var_3)', '\\1_Issue: Time'),
             matches('Var_2|Var_3'))
    # A tibble: 1 x 10
    #  Var_1_1 Var_1_2 Var_1_3 `Var_2_Issue: Tim… `Var_2_Issue: Tim… `Var_2_Issue: Tim… `Var_3_Issue: Ti… `Var_3_Issue: Ti… `Var_3_Issue: Ti… Var_4_3
    #    <dbl>   <dbl>   <dbl>              <dbl>              <dbl>              <dbl>             <dbl>             <dbl>             <dbl>   <dbl>
    #1    5.68    5.18    5.34               5.38               5.47               5.82              5.93              5.35              5.20    5.62
     
    

    Or if we want to change corresponding replacement, pattern, use str_replace_all

    data1 <- data %>%
       set_names(str_replace_all(names(.), c("Var_1", "Var_2"), c("Issue 1 wave", "Issue 2 Wave")))
    

    compare the output

    data1
    # A tibble: 1 x 10
      `Issue 1 wave_1` Var_1_2 `Issue 1 wave_3` `Trust Wave_1` Var_2_2 `Issue 2  Wave_3` Var_3_1 Var_3_2 Var_3_3 Var_4_3
                              <dbl>   <dbl>                         <dbl>          <dbl>   <dbl>          <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
    1                          5.68    5.18                          5.34           5.38    5.47           5.82    5.93    5.35    5.20    5.62
    

    with original data

    data
    # A tibble: 1 x 10
      Var_1_1 Var_1_2 Var_1_3 Var_2_1 Var_2_2 Var_2_3 Var_3_1 Var_3_2 Var_3_3 Var_4_3
        <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
    1    5.68    5.18    5.34    5.38    5.47    5.82    5.93    5.35    5.20    5.62
    

    where is generally used to check the column value i.e. suppose we want to select columns that are numeric type, use select(where(is.numeric)) and not on the column names. There are select_helpers to find the column names based on a substring i.e. starts_with, ends_with, contains, or pass a regex pattern in matches. An use case of where would be

    data %>% 
       rename_with(~ str_replace(., 'Var_2', 'Issue: Time'), where(~ all(. > 5.5)))
    
    # A tibble: 1 x 10
    #  Var_1_1 Var_1_2 Var_1_3 Var_2_1 Var_2_2 `Issue: Time_3` Var_3_1 Var_3_2 Var_3_3 Var_4_3
    #    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>           <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
    #1    5.68    5.18    5.34    5.38    5.47            5.82    5.93    5.35    5.20    5.62
    

    In the OP's code, select/map can be replaced with summarise/across

    df %>%
        summarise(across(where(is.numeric), sd))
    

    data

    data <- as_tibble(test)