Use group size (`group_size`) in `summarise` in `dplyr`

I want to use the size of a group as part of a groupwise operation in dplyr::summarise.

E.g calculate the proportion of manuals by cylinder, by grouping the cars data by cyl and dividing the number of manuals by the size of the group:

mtcars %>%
  group_by(cyl) %>%
  summarise(zz = sum(am)/group_size(.))

But, (I think), because group_size is after a grouped tbl_df and . is ungrouped, this returns

Error in mutate_impl(.data, dots) : basic_string::resize

Is there a way to do this?

2 answers

  • answered 2018-05-16 04:32 Ronak Shah

    You probably can use n() to get the number of rows for group

    library(dplyr)
    mtcars %>%
      group_by(cyl) %>%
      summarise(zz = sum(am)/n())
    
    #    cyl    zz
    #  <dbl> <dbl>
    #1  4.00 0.727
    #2  6.00 0.429
    #3  8.00 0.143
    

  • answered 2018-05-16 06:40 akrun

    It is just a group by mean

    mtcars %>%
        group_by(cyl) %>% 
        summarise(zz = mean(am))
    # A tibble: 3 x 2
    #    cyl    zz
    #  <dbl> <dbl>
    #1     4 0.727
    #2     6 0.429
    #3     8 0.143
    

    If we need to use group_size

    library(tidyverse)
    mtcars %>%
       group_by(cyl) %>% 
       nest %>%
       mutate(zz = map_dbl(data, ~ sum(.x$am)/group_size(.x))) %>%
       arrange(cyl) %>%
       select(-data)
    # A tibble: 3 x 2
    #    cyl    zz
    #  <dbl> <dbl>
    #1     4 0.727
    #2     6 0.429
    #3     8 0.143
    

    Or using do

    mtcars %>%
        group_by(cyl) %>% 
        do(data.frame(zz = sum(.$am)/group_size(.)))
    # A tibble: 3 x 2
    # Groups:   cyl [3]
    #    cyl    zz
    #  <dbl> <dbl>
    #1     4 0.727
    #2     6 0.429
    #3     8 0.143