Check if two values within consecutive dates are identical

Let's say I have a tibble like

df <- tribble(
  ~date,       ~place, ~wthr,

I want to check if the weather in a specific region on a specific day was same as yesterday, and attach the boolean column to df, so that

  ~date,       ~place, ~wthr, ~same,
  "2017-05-06","NY","sun",    NA,
  "2017-05-06","CA","cloud",  NA, 
  "2017-05-07","NY","sun",    TRUE,
  "2017-05-07","CA","rain",   FALSE,
  "2017-05-08","NY","cloud",  FALSE,
  "2017-05-08","CA","rain",   TRUE,
  "2017-05-09","NY","cloud",  TRUE,
  "2017-05-09","CA", NA,      NA,
  "2017-05-10","NY","cloud",  TRUE,
  "2017-05-10","CA","rain",   NA

Is there a good way to do this?

1 answer

  • answered 2020-11-23 18:23 Ben

    To get a logical column, you check wthr value if equal to row before using lag after grouping by place. I added arrange for date to make sure in chronological order.

    df %>%
      arrange(date) %>%
      group_by(place) %>%
      mutate(same = wthr == lag(wthr, default = NA))

    Edit: If you want to make sure dates are consecutive (1 day apart), you can include an ifelse to see if the difference is 1 between date and lag(date). If is not 1 day apart, it can be coded as NA.

    Note: Also, make sure your date is a Date:

    df$date <- as.Date(df$date)
    df %>%
      arrange(date) %>%
      group_by(place) %>%
      mutate(same = ifelse(
        date - lag(date) == 1, 
        wthr == lag(wthr, default = NA),


       date       place wthr  same 
       <chr>      <chr> <chr> <lgl>
     1 2017-05-06 NY    sun   NA   
     2 2017-05-06 CA    cloud NA   
     3 2017-05-07 NY    sun   TRUE 
     4 2017-05-07 CA    rain  FALSE
     5 2017-05-08 NY    cloud FALSE
     6 2017-05-08 CA    rain  TRUE 
     7 2017-05-09 NY    cloud TRUE 
     8 2017-05-09 CA    NA    NA   
     9 2017-05-10 NY    cloud TRUE 
    10 2017-05-10 CA    rain  NA