Find Consecutive Occurrences within a Group (of IDs)

My data set looks like this:

ID start.date  end.date    program
1  2016.05.05  2017.05.05  A
1  2017.05.06  2019.06.16  A
2  2012.06.05  2013.06.18  B
3  2014.09.09  2017.07.01  B
3  2017.09.09  2018.09.09  B

I want to identify the people who were present in a program (character variable) consecutively, and then calculate the time between each end.date and start.date (if the occurrence was consecutive).

So the resulting data should look like this:

ID start.date  end.date    program  days
1  2016.05.05  2017.05.05  A         NA
1  2017.05.06  2019.06.16  A .       1   
2  2012.06.05  2013.06.18  B .       NA
3  2014.09.09  2017.07.01  B .       NA
3  2017.09.09  2018.09.09  B .       63

Don't know how to start on this!

1 answer

  • answered 2018-10-11 19:32 Jon Spring

    library(dplyr)
    dat %>%
      group_by(ID, program) %>%
      arrange(start.date) %>%  # Added in case the data isn't sorted
      mutate(days = start.date - lag(end.date))
    

    I get slightly different results, though:

    # A tibble: 5 x 5
    # Groups:   ID, program [3]
         ID start.date end.date   program days  
      <int> <date>     <date>     <chr>   <time>
    1     1 2016-05-05 2017-05-05 A       NA    
    2     1 2017-05-06 2019-06-16 A       1     
    3     2 2012-06-05 2013-06-18 B       NA    
    4     3 2014-09-09 2017-07-01 B       NA    
    5     3 2017-09-09 2018-09-09 B       70 
    

    To bring the data in, I converted to dates:

    dat <- read.table(header = T, stringsAsFactors = F, 
    text = "ID start.date  end.date    program
            1  2016.05.05  2017.05.05  A
            1  2017.05.06  2019.06.16  A
            2  2012.06.05  2013.06.18  B
            3  2014.09.09  2017.07.01  B
            3  2017.09.09  2018.09.09  B") %>%
      mutate_at(vars(matches("date")), lubridate::ymd)