Fill missing combinations in a dataframe

My example dataset:

df <- data.frame(
 REGION = c("REGION A", "REGION A", "REGION B"), 
 CATEGORY = c("A", "B", "B"), 
 VALUE1 = c(2,3,4),
 VALUE2 = c(1,2,3)
)

Result:

  REGION    CATEGORY VALUE1 VALUE2
1 REGION A   A             2     1
2 REGION A   B             3     2
3 REGION B   B             4     3

Now I want that every combination of REGION and CATEGORY that is not considered in the data set is filled with a VALUE1 and VALUE2 of 0. The result of this df should be:

      REGION   CATEGORY VALUE1 VALUE2
    1 REGION A  A          2      1
    2 REGION A  B          3      2
    3 REGION B  A          4      3
    4 REGION B  B          0      0

I already wrote a big function for it, that generates a dynamic string with for-loops, but I have the feeling that there is a much simpler way to do it with only a few lines of code. I guess I am thinking much too complicated. Any ideas? Thank you in advance.

1 answer

  • answered 2017-10-11 10:17 Florian

    Using complete from tidyr:

    library(tidyr)
    as.data.frame(complete(df,REGION,CATEGORY,fill=list(VALUE1=0,VALUE2=0)))
    

    Output:

        REGION CATEGORY VALUE1 VALUE2
    1 REGION A        A      2      1
    2 REGION A        B      3      2
    3 REGION B        A      0      0
    4 REGION B        B      4      3
    

    If there are many variables, you could also just do as.data.frame(complete(df,REGION,CATEGORY)) and replace the NA's afterwards.

    Hope this helps!