add columns to data frames for values in existing column

The data frame:

Case <- c("Siddhartha", "Siddhartha", "Siddhartha", "Paul", "Paul", "Paul", "Hannah", "Herbert", "Herbert")
Procedure <- c("1", "1", "2", "3", "3", "4", "1", "1", "1")
Location <- c("a", "a", "a", "b", "b", "b", "c", "a", "a")

(df <- data.frame(Case, Procedure, Location))

        Case Procedure Location
1 Siddhartha         1        a
2 Siddhartha         1        a
3 Siddhartha         2        a
4       Paul         3        b
5       Paul         3        b
6       Paul         4        b
7     Hannah         1        c
8    Herbert         1        a
9    Herbert         1        a

The function:

df %>%
  group_by(Procedure, Location) %>%
  summarise(Anzahl = n_distinct(Case)) %>%
  arrange(desc(Anzahl))

The result:

  Procedure Location Anzahl
  <fct>     <fct>     <int>
1 1         a             2
2 1         c             1
3 2         a             1
4 3         b             1
5 4         b             1

What i need:

# A tibble: 4 x 4
  Procedure     a     b     c
  <fct>     <int> <int> <int>
1 1             2     0     1
2 2             1     0     0
3 3             0     1     0
4 4             0     1     0

So i want to sort the data frame by procedures AND locations. This is what i tried:

df %>%
  group_by(Procedure, Location) %>%
  summarise(Anzahl = n_distinct(Case)) %>%
  pivot_wider(names_from = Location, values_from = n, values_fill = list(n = 0))

But: Error: This tidyselect interface doesn't support predicates yet. i Contact the package author and suggest using eval_select().

I tried to solve this problem in other questions i asked before (almost feels like spamming at this point), but i can't apply the solutions to the original data frame. The function shown above (group_by, summarize) is what also works for the original. The only thing is, that it doesn't sort it for locations.

Regards

1 answer

  • answered 2020-05-22 12:40 Matt

    This should work:

    df %>% 
      group_by(Procedure, Location) %>% 
      summarise(Anzahl = n_distinct(Case)) %>%
      arrange(Location, desc(Anzahl)) %>% 
      pivot_wider(names_from = Location, values_from = Anzahl, values_fill = list(Anzahl = 0))
    

    Which gives us:

      Procedure     a     b     c
      <chr>     <int> <int> <int>
    1 1             2     0     1
    2 2             1     0     0
    3 3             0     1     0
    4 4             0     1     0