Rolling sum from a certain position in a data frame in R
Say I have the following data, dat1;
width from by
2 1 A
3 1 A
2 2 A
3 2 A
2 1 B
3 1 B
2 2 B
3 2 B
And additionally have that, dat2;
x pos by
4 1 A
5 2 A
7 3 A
3 4 A
2 1 B
4 2 B
3 3 B
5 4 B
Say I want to create a new column on dat1 of rolling sum values from dat2 where;
Our width of this rolling sum is equivalent to the width given in that row
Our starting position is equivalent to the from vector value in that row
We wish to do it for the A or Bth factor depending on which level is in the row
So far I have that we want
rollapply(x = dat2$x, width = dat1$width, FUN = sum, align = "left", data = dat2)
So I need to incorporate in the starting position and the factor level for that starting position.
So in this instance I want to get
width from by RS
2 1 A 9
3 1 A 16
2 2 A 12
3 2 A 15
etc
Any help would be greatly appreciated. Thanks
2 answers

1) For each row
i
indat1
the anonymous function subsets dat2 to theby
value indat1
and from that picks out the indicated rows ofx
and sums them:transform(dat1, RS = sapply(1:nrow(dat1), function(i) sum(subset(dat2, dat1$by[i] == by)[seq(from[i], length = width[i]), "x"])))
giving:
width from by RS 1 2 1 A 9 2 3 1 A 16 3 2 2 A 12 4 3 2 A 15 5 2 1 B 6 6 3 1 B 9 7 2 2 B 7 8 3 2 B 12
2) An alternative would be to calculate the start values and widths for the sequences to sum in
dat2
and then apply that:st < match(dat1$by, dat2$by) + dat1$from  1 w < dat1$width Sum < function(st, w) sum(dat2[seq(st, length = w), "x"]) transform(dat1, RS = mapply(Sum, st, w))
giving:
width from by RS 1 2 1 A 9 2 3 1 A 16 3 2 2 A 12 4 3 2 A 15 5 2 1 B 6 6 3 1 B 9 7 2 2 B 7 8 3 2 B 12
Note
dat1
anddat2
in reproducible form are:Lines1 < " width from by 2 1 A 3 1 A 2 2 A 3 2 A 2 1 B 3 1 B 2 2 B 3 2 B" dat1 < read.table(text = Lines1, header = TRUE) Lines2 < " x pos by 4 1 A 5 2 A 7 3 A 3 4 A 2 1 B 4 2 B 3 3 B 5 4 B" dat2 < read.table(text = Lines2, header = TRUE)
Update
Fixed (1). Added (2).

Another option could be using
dplyr
andjoin
. The approach would bejoin
two dataframes by "by". Then applyfilter
to consider only those rows whichpos
is betweenfrom
andfrom+width
. Finally take sum ofx
column.dat1 %>% inner_join(dat2, by = "by") %>% filter(from <= pos & pos < (from + width) ) %>% group_by(by, from, width ) %>% summarise(RS = sum(x)) %>% select(width, from, by, RS) # A tibble: 8 x 4 # Groups: by, from [4] # width from by RS # <int> <int> <chr> <int> # 1 2 1 A 9 # 2 3 1 A 16 # 3 2 2 A 12 # 4 3 2 A 15 # 5 2 1 B 6 # 6 3 1 B 9 # 7 2 2 B 7 # 8 3 2 B 12
data
dat1 < read.table(text = "width from by 2 1 A 3 1 A 2 2 A 3 2 A 2 1 B 3 1 B 2 2 B 3 2 B", header = TRUE, stringsAsFactors = FALSE) dat2 < read.table(text = "x pos by 4 1 A 5 2 A 7 3 A 3 4 A 2 1 B 4 2 B 3 3 B 5 4 B", header = TRUE, stringsAsFactors = FALSE)