Rolling sum from a certain position in a data frame in R

Say I have the following data, dat1;

width  from  by
2      1     A
3      1     A
2      2     A
3      2     A
2      1     B
3      1     B
2      2     B
3      2     B

And additionally have that, dat2;

x      pos   by
4      1     A
5      2     A
7      3     A
3      4     A
2      1     B
4      2     B
3      3     B
5      4     B

Say I want to create a new column on dat1 of rolling sum values from dat2 where;

  1. Our width of this rolling sum is equivalent to the width given in that row

  2. Our starting position is equivalent to the from vector value in that row

  3. We wish to do it for the A or Bth factor depending on which level is in the row

So far I have that we want

rollapply(x = dat2$x, width = dat1$width, FUN = sum, align = "left", data = dat2)

So I need to incorporate in the starting position and the factor level for that starting position.

So in this instance I want to get

width  from  by   RS
2      1     A    9
3      1     A    16
2      2     A    12
3      2     A    15

etc

Any help would be greatly appreciated. Thanks

2 answers

  • answered 2018-03-13 21:43 G. Grothendieck

    1) For each row i in dat1 the anonymous function subsets dat2 to the by value in dat1 and from that picks out the indicated rows of x and sums them:

    transform(dat1, RS = sapply(1:nrow(dat1), function(i) 
     sum(subset(dat2, dat1$by[i] == by)[seq(from[i], length = width[i]), "x"])))
    

    giving:

      width from by RS
    1     2    1  A  9
    2     3    1  A 16
    3     2    2  A 12
    4     3    2  A 15
    5     2    1  B  6
    6     3    1  B  9
    7     2    2  B  7
    8     3    2  B 12
    

    2) An alternative would be to calculate the start values and widths for the sequences to sum in dat2 and then apply that:

    st <- match(dat1$by, dat2$by) + dat1$from - 1
    w <- dat1$width
    Sum <- function(st, w) sum(dat2[seq(st, length = w), "x"])
    transform(dat1, RS = mapply(Sum, st, w))
    

    giving:

      width from by RS
    1     2    1  A  9
    2     3    1  A 16
    3     2    2  A 12
    4     3    2  A 15
    5     2    1  B  6
    6     3    1  B  9
    7     2    2  B  7
    8     3    2  B 12
    

    Note

    dat1 and dat2 in reproducible form are:

    Lines1 <- "
    width  from  by
    2      1     A
    3      1     A
    2      2     A
    3      2     A
    2      1     B
    3      1     B
    2      2     B
    3      2     B"
    dat1 <- read.table(text = Lines1, header = TRUE)
    
    Lines2 <- "
    x      pos   by
    4      1     A
    5      2     A
    7      3     A
    3      4     A
    2      1     B
    4      2     B
    3      3     B
    5      4     B"
    dat2 <- read.table(text = Lines2, header = TRUE)
    

    Update

    Fixed (1). Added (2).

  • answered 2018-03-13 21:59 MKR

    Another option could be using dplyr and join. The approach would be join two dataframes by "by". Then apply filter to consider only those rows which pos is between from and from+width. Finally take sum of x column.

    dat1 %>% inner_join(dat2, by = "by") %>%
      filter(from <= pos & pos < (from + width) ) %>%
      group_by(by, from, width ) %>%
      summarise(RS = sum(x)) %>%
      select(width, from, by, RS)
    
    
    # A tibble: 8 x 4
    # Groups: by, from [4]
    # width  from by       RS
    # <int> <int> <chr> <int>
    # 1     2     1 A         9
    # 2     3     1 A        16
    # 3     2     2 A        12
    # 4     3     2 A        15
    # 5     2     1 B         6
    # 6     3     1 B         9
    # 7     2     2 B         7
    # 8     3     2 B        12
    

    data

    dat1 <- read.table(text = 
    "width  from  by
    2      1     A
    3      1     A
    2      2     A
    3      2     A
    2      1     B
    3      1     B
    2      2     B
    3      2     B", header = TRUE, stringsAsFactors = FALSE)
    
    
    dat2 <- read.table(text = 
    "x      pos   by
    4      1     A
    5      2     A
    7      3     A
    3      4     A
    2      1     B
    4      2     B
    3      3     B
    5      4     B", header = TRUE, stringsAsFactors = FALSE)