How to aggregate over multiple variables

I have some data which has the columns company, amount of donation and party.

I want to have a new list where for each company the sum of donated money, for each party, is listed.

So far I know how to aggregate the "amount" column, if I only have one company

df <- df %>% filter(company == "c1" ) %>% select(amount, party)

test<-aggregate(df$amount, by=list(party=a$party), FUN=sum)

Is there a way not to use a loop function for every company?

Thanks for helping

2 answers

  • answered 2019-11-08 13:59 Wolfgang Arnold

    To extend what Jonny commented - here's an example that may go into right direction:

    library(tidyverse)
    
    df <- tibble(company = rep(letters[1:3],4),
            donation = runif(12, min = 1, max = 100),
            party = rep(LETTERS[25:26], 6))
    
     df %>% group_by(company, party) %>% summarize(donation = sum(donation))
    

  • answered 2019-11-08 15:15 ShahidAziz

    Use the package sqldf, dplyr

    If I understood your question correctly, this is one way you could do it if there are not many companies.

    I am calling the original table TB1 First assign each company a unique ID.

    TB1$ID <- group_indices(TB1$Company) Columns in TB1 are Company, Donation, Party,ID

    sqldf(" SELECT a.Party, CompanyOne.Donation, CompanyTwo.Donation
        FROM TB1 a LEFT OUTER JOIN
    (Select sum(Donation) as CompanyOneDonation, Party
    FROM TB1  a
    WHERE Company = (CompanyOne)
    GROUP BY PARTY, COMPANY) as CompanyOne ON a.ID = CompanyOne.ID
    LEFT OUTER JOIN
     (Select  sum(Donation) as CompanyTwoDonation, Party
    FROM TB1 
    WHERE Company = (CompanyTwo)
    GROUP BY PARTY, COMPANY) as CompanyTwo ON CompanyTwo.ID = CompanyOne.ID
    GROUP BY a.Party
    ")
    

    Please let me know if this works for you.