Creating columns for each observed value of a variable
Suppose a respondent (id) is asked to make a choice in five tasks (t=1,2,3,4,5) (a panel dataset with five observations per respondent). Once, a choice is made then an outcome is shown to the respondent. Suppose the data look like below.
++++
 id  t  outcome 
++++
 1  1  10 
 1  2  20 
 1  3  30 
 1  4  40 
 1  5  40 
 2  1  20 
 2  2  30 
 2  3  40 
 2  4  40 
 2  5  20 
 .  .  . 
 .  .  . 
 .  .  . 
++++
Now, I am interested to keep the history of the outcome variable for each t1 task. I am aiming for the below output.
+++++++++
 id  t  outcome  outcome_t1  outcome_t2  outcome_t3  outcome_t4  outcome_t5 
+++++++++
 1  1  10  NA  NA  NA  NA  NA 
 1  2  20  10  NA  NA  NA  NA 
 1  3  30  10  20  NA  NA  NA 
 1  4  40  10  20  30  NA  NA 
 1  5  40  10  20  30  40  NA 
 2  1  20  NA  NA  NA  NA  NA 
 2  2  30  20  NA  NA  NA  NA 
 2  3  40  20  30  NA  NA  NA 
 2  4  40  20  30  40  NA  NA 
 2  5  20  20  30  40  40  NA 
 .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  . 
 .  .  .  .  .  .  .  . 
+++++++++
I went through most of the questions on this forum, but most of them address the lagged columns which are not applicable to this case.
Perhaps there could be an easy and efficient way using mutate
with dplyr
but I am unable to make it work so far.
4 answers

We can use
data.table
methods for this. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'id', loop through the 'outcome',rep
licate the elements with the specifying the sequence from1:.N
and.N:1
with NA as padding, then join with the original dataset on the 'id' and 't' columnslibrary(data.table) df2 < setDT(df1)[, Map(function(x, y, z) rep(c(NA, x), c(y, z)), outcome, 1:.N, .N:1), id][, t := rowid(id)] out < df2[df1, on = .(id, t)] setcolorder(out, c(1, 7, 8, 2:6)) setnames(out, 4:ncol(out), paste0("outcome_t", 1:5)) out # id t outcome outcome_t1 outcome_t2 outcome_t3 outcome_t4 outcome_t5 # 1: 1 1 10 NA NA NA NA NA # 2: 1 2 20 10 NA NA NA NA # 3: 1 3 30 10 20 NA NA NA # 4: 1 4 40 10 20 30 NA NA # 5: 1 5 40 10 20 30 40 NA # 6: 2 1 20 NA NA NA NA NA # 7: 2 2 30 20 NA NA NA NA # 8: 2 3 40 20 30 NA NA NA # 9: 2 4 40 20 30 40 NA NA #10: 2 5 20 20 30 40 40 NA
Or an option with
dcast
dcast(setDT(df1), id + t ~ paste0("outcome_t", t), value.var = 'outcome')[, na.locf(.SD, na.rm = FALSE), id]
Or we can do this more compactly
library(zoo) nm1 < paste0("outcome_t", 1:5) df1[nm1] < do.call(rbind, lapply(split(df1$outcome, df1$id), function(x) head(rbind(NA, na.locf((NA^!diag(x)) * x)), 1)))
Or using
colCumsums
library(matrixStats) df1[nm1] < do.call(rbind, lapply(split(df1$outcome, df1$id), function(x) colCumsums(rbind(0, diag(x)))[length(x), ]))
data
df1 < structure(list(id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), t = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), outcome = c(10L, 20L, 30L, 40L, 40L, 20L, 30L, 40L, 40L, 20L)), class = "data.frame", row.names = c(NA, 10L))

Base R approach, we can
split
theoutcome
column based onid
and create a dataframe incrementally adding one value at a time inoutcome
variable and filling rest of them withNA
and finallyrbind
these list of dataframes into one dataframe.n < 5 df[paste0("outcome_t", seq_len(n))] < do.call(rbind, lapply(split(df$outcome, df$id), function(x) t(sapply(seq_along(x), function(y) c(x[seq_len(y  1)], rep(NA, n  (y  1))))))) df # id t outcome outcome_t1 outcome_t2 outcome_t3 outcome_t4 outcome_t5 #1 1 1 10 NA NA NA NA NA #2 1 2 20 10 NA NA NA NA #3 1 3 30 10 20 NA NA NA #4 1 4 40 10 20 30 NA NA #5 1 5 40 10 20 30 40 NA #6 2 1 20 NA NA NA NA NA #7 2 2 30 20 NA NA NA NA #8 2 3 40 20 30 NA NA NA #9 2 4 40 20 30 40 NA NA #10 2 5 20 20 30 40 40 NA
A
tidyverse
option usingseparate
library(tidyverse) df %>% group_by(id) %>% mutate(new = map_chr(seq_along(outcome), ~paste0(outcome[seq_len(.  1)], collapse = ","))) %>% separate(new, into = paste0("outcome_t", seq_len(n)), sep = ",", fill = "right") %>% mutate(outcome_t1 = replace(outcome_t1, outcome_t1 == "", NA))
data
df < data.frame(id = rep(c(1, 2), each = 5), t = 1:5, outcome = c(10, 20, 30, 40, 40, 20, 30, 40, 40, 20))

Another data.table approach using
transpose
:DT[, paste0("outcome_t", 1:5) := transpose(lapply(t, function(x) replace(outcome, t>=x, NA))), by=.(id)]
output:
id t outcome outcome_t1 outcome_t2 outcome_t3 outcome_t4 outcome_t5 1: 1 1 10 NA NA NA NA NA 2: 1 2 20 10 NA NA NA NA 3: 1 3 30 10 20 NA NA NA 4: 1 4 40 10 20 30 NA NA 5: 1 5 40 10 20 30 40 NA 6: 2 1 20 NA NA NA NA NA 7: 2 2 30 20 NA NA NA NA 8: 2 3 40 20 30 NA NA NA 9: 2 4 40 20 30 40 NA NA 10: 2 5 20 20 30 40 40 NA
data:
library(data.table) DT < fread(" id  t  outcome   1  1  10   1  2  20   1  3  30   1  4  40   1  5  40   2  1  20   2  2  30   2  3  40   2  4  40   2  5  20 ")[, c(1,5)]

Here is a
tidyverse
approach.library(tidyverse) df %>% mutate(rn = 1:n(), t = paste0("outcome_t", t)) %>% group_by(id) %>% spread(t, outcome) %>% mutate_at(vars(rn, id), lag) %>% fill(rn, id) # A tibble: 10 x 7 # Groups: id [2] id rn outcome_t1 outcome_t2 outcome_t3 outcome_t4 outcome_t5 <int> <int> <int> <int> <int> <int> <int> 1 1 1 NA NA NA NA NA 2 1 2 10 NA NA NA NA 3 1 3 10 20 NA NA NA 4 1 4 10 20 30 NA NA 5 1 5 10 20 30 40 NA 6 2 6 NA NA NA NA NA 7 2 7 20 NA NA NA NA 8 2 8 20 30 NA NA NA 9 2 9 20 30 40 NA NA 10 2 10 20 30 40 40 NA