How to copy ("carry forward") values from one row and column ("cell") to a different row and column with R data.table?

I have an R data.table. I'd like to copy or carry-forward the value from one column and row to a different column and row based on a key. I've done this with merge and rbindlist, but I'm wondering if there is a more parsimonious solution.

Here's a toy example: Let's say I'm tracking opening and closing balances for different checking accounts on different dates. I want to carry-forward the previous day's (t-1) closing balance to current day's (t) opening balance.

Have this input:

> DT_in <- data.table(date = c("2017-12-29", "2017-12-29", "2017-12-29", "2018-01-02", 
+                         "2018-01-02", "2018-01-02", "2018-01-02", "2018-01-03", 
+                         "2018-01-03", "2018-01-03", "2018-01-03"), 
+                  account_id = c("A17", "A23", "B21", "A17", "A23", "B21", "C12", 
+                                 "A17", "A23", "B21", "C12"),
+                  opening_balance = NA,
+                  ending_balance = c(224, 254, 240, 290, 107, 272, 105, 256, 215, 
+                                     202, 238)
+                  )
> DT_in
          date account_id opening_balance ending_balance
 1: 2017-12-29        A17              NA            224
 2: 2017-12-29        A23              NA            254
 3: 2017-12-29        B21              NA            240
 4: 2018-01-02        A17              NA            290
 5: 2018-01-02        A23              NA            107
 6: 2018-01-02        B21              NA            272
 7: 2018-01-02        C12              NA            105
 8: 2018-01-03        A17              NA            256
 9: 2018-01-03        A23              NA            215
10: 2018-01-03        B21              NA            202
11: 2018-01-03        C12              NA            238

Want this output:

> DT_out <- data.table(date = c("2017-12-29", "2017-12-29", "2017-12-29", "2018-01-02", 
+                           "2018-01-02", "2018-01-02", "2018-01-02", "2018-01-03", 
+                           "2018-01-03", "2018-01-03", "2018-01-03"), 
+                  account_id = c("A17", "A23", "B21", "A17", "A23", "B21", "C12", 
+                                 "A17", "A23", "B21", "C12"),
+                  opening_balance = c(NA, NA, NA, 224, 254, 240, NA, 290, 107, 272, 105), 
+                  ending_balance = c(224, 254, 240, 290, 107, 272, 105, 256, 215, 
+                                     202, 238)
+                  )
> DT_out
          date account_id opening_balance ending_balance
 1: 2017-12-29        A17              NA            224
 2: 2017-12-29        A23              NA            254
 3: 2017-12-29        B21              NA            240
 4: 2018-01-02        A17             224            290
 5: 2018-01-02        A23             254            107
 6: 2018-01-02        B21             240            272
 7: 2018-01-02        C12              NA            105
 8: 2018-01-03        A17             290            256
 9: 2018-01-03        A23             107            215
10: 2018-01-03        B21             272            202
11: 2018-01-03        C12             105            238

Note that account_ids are not necessarily persistent from one date to the next (new ones can added or old ones removed). While it should not matter, please also note that the dates are business dates and not calendar dates.

1 answer

  • answered 2018-10-11 20:26 R Yoda

    If you fix your example data to use the same data type of "opening_balance" and "ending_balance" columns via

    ...
    opening_balance = NA_real_,
    ...
    

    you can use this code to carry the last observation forward per group (after applying the right order/sorting of the rows!):

    setorder(DT_in, account_id, date)
    DT_in[, opening_balance := shift(ending_balance, 1), by = .(account_id)]
    

    This results in (sorted output):

    > DT_in
              date account_id opening_balance ending_balance
     1: 2017-12-29        A17              NA            224
     2: 2018-01-02        A17             224            290
     3: 2018-01-03        A17             290            256
     4: 2017-12-29        A23              NA            254
     5: 2018-01-02        A23             254            107
     6: 2018-01-03        A23             107            215
     7: 2017-12-29        B21              NA            240
     8: 2018-01-02        B21             240            272
     9: 2018-01-03        B21             272            202
    10: 2018-01-02        C12              NA            105
    11: 2018-01-03        C12             105            238