R How can I substract date time rows from each other dependent on Participant ID?

Starting with disclaimers here: I am completely new to R and have already read other questions & answers here concerning similar topics but can't seem to find one that helps my problem, so please forgive me if I just wasn't able identify a correct and applicable answer that may already exists, due to my inexperience. Another warning before you continure reading: I have tried my best to create a minimal reproducible example and have failed thus far, so now I am hoping my question might also be answered without it - but I obviously understand if this basis is too cumbersome for anyone to continue with. (However, I have deposited a minimal version of my csv data table with just two participants and their time date data, in case anyone would like to get an impression of the data: Link to my dropbox containing a small version of the csv data table).

My main problem here is that I would like to substract date time data in one column from each other, in order to fill another row with data called t0-t28, but dependent on/ nested for Participant ID.

Background: I've got a data set ("workdata") with about 1700 columns and 380 rows, results from a repeated measures psychology study. For the study, participants filled out five different questionnaires: Onboarding (at t0), Daily_A (at t1-t6 and t8-t13), Day_7 (at t7), Day14 (at t14) and Day28 (at t28, follow-up). Each participant has a unique identifier, their Participant ID. Each Participant ID may appear up to 16 times in the data set, if the regarding participant has filled out all questionnaires mentioned above. (Which is not always the case, since the adherence of some participants was as low as 20%).

So far: A friend and I have been able to tell R to identify the date time values as such (referring to the SCHEDULED_TS column), using the lubridate package:

date_sorted <- dmy_hm(workdata$SCHEDULED_TS)
workdata <- cbind(workdata,date_sorted)

Also, we have managed to sort all rows by participant ID and date time, so now all repeated measures per participant are displayed as the rows underneath each other in date time order.

Data sorted by date time and participant ID, but without column displaying t0-t28 per participant

Then, we wanted to create a new column that displays t0-t28 for each participants data row. This is needed because not all participants have started their study participation on the same day – so March 14 might be t0 for one participant but t9 for another. I would like to be able to compare t1 for all participants regardless of whether t1 was March 5th or March 15th for any participant.

We thought the easiest way to creating and filling this t0-t28 column would be by identifying t0 (“Onboarding” time) per participant and then substracting its date time from the date times of all other measures in order to receive t1, t2, t3 and so on. We have tried this by

t <- length(workdata$date_sorted)
t <- ifelse(workdata$SURVEY_NAME == "Onboarding", "t0", NA)
workdata <- cbind(workdata, t)

and several strategies that I accidentaly deleted, so I can't display them here anymore - I am really sorry my first post here is so messy.

All we did receive though is that R took the first “Onboarding” date time in the entire data set and subtracted it from all the other date times, regardless of the corresponding Participant ID.

Does anyone here have a tip on how to solve this and create t0-t28 nested per participant? Thank you very much for your kind help.

1 answer

  • answered 2022-05-04 11:45 Aron

    Here's how you can do it using the dplyr package. The steps are:

    1. Calculate date_sorted (keeping only the date part)
    2. Group the rows by PARTICIPANT_ID
    3. Then within each group, subtract the minimum date from the current date and prefix it with 't'.
    4. Ungroup the rows again

    Note that Day28 is on t27 if starting from 0.

    library(dplyr)
    library(lubridate)
    
    df <- workdata %>%
      mutate(date_sorted = date(dmy_hm(SCHEDULED_TS))) %>%
      group_by(PARTICIPANT_ID) %>%
      mutate(t = paste0('t', date_sorted - min(date_sorted))) %>%
      ungroup()
      
    df
    #> # A tibble: 32 x 9
    #>    Filter.1 source                     PARTICIPANT_ID Groups SURVEY_NAME START_END SCHEDULED_TS    date_sorted t    
    #>       <dbl> <chr>                      <chr>          <chr>  <chr>           <dbl> <chr>           <date>      <chr>
    #>  1        1 Onboarding Export 26042022 s014620782     Exp    Onboarding          1 17-Mar-22 17:29 2022-03-17  t0   
    #>  2        1 Daily_A Export 26042022    s014620782     Exp    Daily_A             1 18-Mar-22 18:48 2022-03-18  t1   
    #>  3        1 Daily_A Export 26042022    s014620782     Exp    Daily_A             1 19-Mar-22 19:29 2022-03-19  t2   
    #>  4        1 Daily_A Export 26042022    s014620782     Exp    Daily_A             1 20-Mar-22 19:17 2022-03-20  t3   
    #>  5        1 Daily_A Export 26042022    s014620782     Exp    Daily_A             1 21-Mar-22 18:43 2022-03-21  t4   
    #>  6        1 Daily_A Export 26042022    s014620782     Exp    Daily_A             1 22-Mar-22 18:39 2022-03-22  t5   
    #>  7        1 Daily_A Export 26042022    s014620782     Exp    Daily_A             1 23-Mar-22 19:16 2022-03-23  t6   
    #>  8        1 Day_7 Export 26042022      s014620782     Exp    Day_7               1 24-Mar-22 18:44 2022-03-24  t7   
    #>  9        1 Daily_A Export 26042022    s014620782     Exp    Daily_A             1 25-Mar-22 18:46 2022-03-25  t8   
    #> 10        1 Daily_A Export 26042022    s014620782     Exp    Daily_A             1 26-Mar-22 18:44 2022-03-26  t9   
    #> # … with 22 more rows
    

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum