warnings with calculation of relative positions of words in turns

I want to record the relative position of words in column Turn in this dataframe (see my question here Divide numbers into equally-spaced intervals ranging between 0-1)

df <- data.frame(
  Turn = c("what ? mhm .",
           "mhm . why , is that  mine you 're using ?",
           "mm . very nice . no it   's not on now eight did you say ?",
           "yes it does sheila mm .",
           "mm , you know the black and white one ?"),
  N_words = c(2,8,13,5,8),
  Turn_no = c(1,2,3,4,5)
)

The code I have gives the correct results in column RelPosition but also produces tons of warning messages:

library(tidyr)
library(dplyr)
df %>%
  # remove punctuation and reduce whitespace to 1:
  mutate(Turn = gsub("\\s[.,!?]", "", Turn),
         Turn = gsub("\\s{2,}", " ", Turn)) %>%
  # separate rows:
  separate_rows(Turn, sep = " ") %>%
  # rename:
  rename(Word = Turn) %>%
  group_by(Turn_no) %>%
  # calculate relative position:
  mutate(RelPosition = seq(0, 1, length.out =  N_words))
# A tibble: 36 x 4
# Groups:   Turn_no [5]
   Word  N_words Turn_no RelPosition
   <chr>   <dbl>   <dbl>       <dbl>
 1 what        2       1       0    
 2 mhm         2       1       1    
 3 mhm         8       2       0    
 4 why         8       2       0.143
 5 is          8       2       0.286
 6 that        8       2       0.429
 7 mine        8       2       0.571
 8 you         8       2       0.714
 9 're         8       2       0.857
10 using       8       2       1    
# … with 26 more rows
Warning messages:
1: Problem with `mutate()` input `RelPosition`.
ℹ first element used of 'length.out' argument
ℹ Input `RelPosition` is `seq(0, 1, length.out = N_words)`.
ℹ The error occurred in group 1: Turn_no = 1. 
2: Problem with `mutate()` input `RelPosition`.
ℹ first element used of 'length.out' argument
ℹ Input `RelPosition` is `seq(0, 1, length.out = N_words)`.
ℹ The error occurred in group 2: Turn_no = 2. 
3: Problem with `mutate()` input `RelPosition`.
ℹ first element used of 'length.out' argument
ℹ Input `RelPosition` is `seq(0, 1, length.out = N_words)`.
ℹ The error occurred in group 3: Turn_no = 3. 
4: Problem with `mutate()` input `RelPosition`.
ℹ first element used of 'length.out' argument
ℹ Input `RelPosition` is `seq(0, 1, length.out = N_words)`.
ℹ The error occurred in group 4: Turn_no = 4. 
5: Problem with `mutate()` input `RelPosition`.
ℹ first element used of 'length.out' argument
ℹ Input `RelPosition` is `seq(0, 1, length.out = N_words)`.
ℹ The error occurred in group 5: Turn_no = 5. 

Why the warnings? Anything wrong with the code? How can the warnings be prevented?

1 answer

  • answered 2021-09-27 16:13 akrun

    There are more than one element per group, use the first when specifying the length.out (it is not vectorized)

    library(dplyr)
    df %>%
      # remove punctuation and reduce whitespace to 1:
      mutate(Turn = gsub("\\s[.,!?]", "", Turn),
             Turn = gsub("\\s{2,}", " ", Turn)) %>%
      # separate rows:
      separate_rows(Turn, sep = " ") %>%
      # rename:
      rename(Word = Turn) %>%
      group_by(Turn_no) %>%
      # calculate relative position:
      mutate(RelPosition = seq(0, 1, length.out =  first(N_words)))
    

    -output

    # A tibble: 36 × 4
    # Groups:   Turn_no [5]
       Word  N_words Turn_no RelPosition
       <chr>   <dbl>   <dbl>       <dbl>
     1 what        2       1       0    
     2 mhm         2       1       1    
     3 mhm         8       2       0    
     4 why         8       2       0.143
     5 is          8       2       0.286
     6 that        8       2       0.429
     7 mine        8       2       0.571
     8 you         8       2       0.714
     9 're         8       2       0.857
    10 using       8       2       1    
    # … with 26 more rows
    

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum