R make subfactors based on consecutively occuring value

Hi does anyone know how to make a subfactor or unique marker to groups of data with the same value or factor consecutively

so my data can look like this

value   group| subgrouping  
1       a     a.1
5       a     a.1
2       a     a.1
3       b     b.1
2       b     b.1
5       b     b.1
2       b     b.1
1       b     b.1
3       b     b.1
2       a     a.2
5       a     a.2
5       a     a.2
6       a     a.2
6       a     a.2
2       a     a.2
1       a     a.2
0       c     c.1
3       c     c.1
3       c     c.1
2       b     b.2
1       b     b.2
3       a     a.3
2       b     b.3
3       b     b.3

This way I can find say the average for a.2 and not all of a

3 answers

  • answered 2018-05-16 05:18 Melissa Key

    I've found this trick to work well in this situation. As written, it does not keep track of each group separately, but it might be sufficient:

    df %>% 
      mutate(subgroup_id = cumsum(lag(group, default = group[1]) != group))
    

  • answered 2018-05-16 05:22 mt1022

    Try rle:

    x <- rle(df$group)
    x$values <- with(x, ave(values, values, FUN = function(x) paste0(x, '.', seq_along(x))))
    df$subgrouping2 <- inverse.rle(x)
    df
    
    # '> df
    #     value group subgrouping subgrouping2
    # 1:     1     a         a.1          a.1
    # 2:     5     a         a.1          a.1
    # 3:     2     a         a.1          a.1
    # 4:     3     b         b.1          b.1
    # 5:     2     b         b.1          b.1
    # 6:     5     b         b.1          b.1
    # 7:     2     b         b.1          b.1
    # 8:     1     b         b.1          b.1
    # 9:     3     b         b.1          b.1
    # 10:     2     a         a.2          a.2
    # 11:     5     a         a.2          a.2
    # 12:     5     a         a.2          a.2
    # 13:     6     a         a.2          a.2
    # 14:     6     a         a.2          a.2
    # 15:     2     a         a.2          a.2
    # 16:     1     a         a.2          a.2
    # 17:     0     c         c.1          c.1
    # 18:     3     c         c.1          c.1
    # 19:     3     c         c.1          c.1
    # 20:     2     b         b.2          b.2
    # 21:     1     b         b.2          b.2
    # 22:     3     a         a.3          a.3
    # 23:     2     b         b.3          b.3
    # 24:     3     b         b.3          b.3
    

  • answered 2018-05-16 05:41 akrun

    With data.table, grouped by the run-length-id of 'group (rleid(group)), get the first 'group' value and the number of observations (.N), then grouped by 'group', paste the sequence of observeations with 'group', replicate that by the number of observations after ordering by the 'ind' and assign those to create the 'subgroup2'

    library(data.table)
    sgrp <- setDT(df1)[, .(group = first(group), n = .N), 
      .(ind = rleid(group))][, .(paste(group, seq_len(.N), sep="."), n, ind), 
           group][order(ind), rep(V1, n)]
    df1[, subgroup2 := sgrp]
    df1
    #    value group subgrouping subgroup2
    # 1:     1     a         a.1       a.1
    # 2:     5     a         a.1       a.1
    # 3:     2     a         a.1       a.1
    # 4:     3     b         b.1       b.1
    # 5:     2     b         b.1       b.1
    # 6:     5     b         b.1       b.1
    # 7:     2     b         b.1       b.1
    # 8:     1     b         b.1       b.1
    # 9:     3     b         b.1       b.1
    #10:     2     a         a.2       a.2
    #11:     5     a         a.2       a.2
    #12:     5     a         a.2       a.2
    #13:     6     a         a.2       a.2
    #14:     6     a         a.2       a.2
    #15:     2     a         a.2       a.2
    #16:     1     a         a.2       a.2
    #17:     0     c         c.1       c.1
    #18:     3     c         c.1       c.1
    #19:     3     c         c.1       c.1
    #20:     2     b         b.2       b.2
    #21:     1     b         b.2       b.2
    #22:     3     a         a.3       a.3
    #23:     2     b         b.3       b.3
    #24:     3     b         b.3       b.3