# R make subfactors based on consecutively occuring value

Hi does anyone know how to make a subfactor or unique marker to groups of data with the same value or factor consecutively

so my data can look like this

``````value   group| subgrouping
1       a     a.1
5       a     a.1
2       a     a.1
3       b     b.1
2       b     b.1
5       b     b.1
2       b     b.1
1       b     b.1
3       b     b.1
2       a     a.2
5       a     a.2
5       a     a.2
6       a     a.2
6       a     a.2
2       a     a.2
1       a     a.2
0       c     c.1
3       c     c.1
3       c     c.1
2       b     b.2
1       b     b.2
3       a     a.3
2       b     b.3
3       b     b.3
``````

This way I can find say the average for a.2 and not all of a

I've found this trick to work well in this situation. As written, it does not keep track of each group separately, but it might be sufficient:

``````df %>%
mutate(subgroup_id = cumsum(lag(group, default = group[1]) != group))
``````

Try `rle`:

``````x <- rle(df\$group)
x\$values <- with(x, ave(values, values, FUN = function(x) paste0(x, '.', seq_along(x))))
df\$subgrouping2 <- inverse.rle(x)
df

# '> df
#     value group subgrouping subgrouping2
# 1:     1     a         a.1          a.1
# 2:     5     a         a.1          a.1
# 3:     2     a         a.1          a.1
# 4:     3     b         b.1          b.1
# 5:     2     b         b.1          b.1
# 6:     5     b         b.1          b.1
# 7:     2     b         b.1          b.1
# 8:     1     b         b.1          b.1
# 9:     3     b         b.1          b.1
# 10:     2     a         a.2          a.2
# 11:     5     a         a.2          a.2
# 12:     5     a         a.2          a.2
# 13:     6     a         a.2          a.2
# 14:     6     a         a.2          a.2
# 15:     2     a         a.2          a.2
# 16:     1     a         a.2          a.2
# 17:     0     c         c.1          c.1
# 18:     3     c         c.1          c.1
# 19:     3     c         c.1          c.1
# 20:     2     b         b.2          b.2
# 21:     1     b         b.2          b.2
# 22:     3     a         a.3          a.3
# 23:     2     b         b.3          b.3
# 24:     3     b         b.3          b.3
``````

With `data.table`, grouped by the run-length-id of 'group (`rleid(group)`), get the `first` 'group' value and the number of observations (`.N`), then grouped by 'group', `paste` the sequence of observeations with 'group', replicate that by the number of observations after `order`ing by the 'ind' and assign those to create the 'subgroup2'

``````library(data.table)
sgrp <- setDT(df1)[, .(group = first(group), n = .N),
.(ind = rleid(group))][, .(paste(group, seq_len(.N), sep="."), n, ind),
group][order(ind), rep(V1, n)]
df1[, subgroup2 := sgrp]
df1
#    value group subgrouping subgroup2
# 1:     1     a         a.1       a.1
# 2:     5     a         a.1       a.1
# 3:     2     a         a.1       a.1
# 4:     3     b         b.1       b.1
# 5:     2     b         b.1       b.1
# 6:     5     b         b.1       b.1
# 7:     2     b         b.1       b.1
# 8:     1     b         b.1       b.1
# 9:     3     b         b.1       b.1
#10:     2     a         a.2       a.2
#11:     5     a         a.2       a.2
#12:     5     a         a.2       a.2
#13:     6     a         a.2       a.2
#14:     6     a         a.2       a.2
#15:     2     a         a.2       a.2
#16:     1     a         a.2       a.2
#17:     0     c         c.1       c.1
#18:     3     c         c.1       c.1
#19:     3     c         c.1       c.1
#20:     2     b         b.2       b.2
#21:     1     b         b.2       b.2
#22:     3     a         a.3       a.3
#23:     2     b         b.3       b.3
#24:     3     b         b.3       b.3
``````