Finding most common elements of column of arrays in Presto

I would like to find the most common elements within a column of arrays in presto.

For example...

    col1
    [A,B,C]
    [A,B]
    [A,D]

with output of...

    col1 - col2 
    A - 3
    B - 2
    C - 1
    D - 1    

I have tried using flatten and unnest. I am able to get it into a single array using

    select flatten(array_agg(col1))
    from tablename;

but I am then not sure how to group and count by the distinct elements. I also am struggling to get this to run on all of my data because of the large amount of memory required. Thanks for any help!

2 answers

  • answered 2019-10-18 18:17 Gordon Linoff

    You can unnest() and aggregate:

    select u.col, count(*)
    from t cross join
         unnest(col1) u(col)
    group by u.col;
    

  • answered 2019-10-19 09:24 Ravi Joshi

    You can use to unnest() to flatten Array and then group by to group the unique values.

    The Query to generate the data set for your case. You can replace this part with your select command in the final query:

    with dataset AS (
      SELECT  ARRAY[
        ARRAY['A','B','C'],
        ARRAY['A','B'],
        ARRAY['A','D']
        ] AS data
       )
       select dt from dataset
       CROSS JOIN UNNEST(data) AS t(dt)
    

    O/P:

    ------
    dt
    ------
    [A,B,C]
    ------
    [A,B]
    ------
    [A,D]
    

    Now in the final query we will first flatten this data to remove all the values from all the rows and then group those value to get unique values and their count.

    FINAL QUERY:

    with da AS( 
     with dataset AS (
      SELECT  ARRAY[
        ARRAY['A','B','C'],
        ARRAY['A','B'],
        ARRAY['A','D']
        ] AS data
       )
       select dt from dataset
       CROSS JOIN UNNEST(data) AS t(dt)
      )
      select daVal,count(*) from da
      CROSS JOIN UNNEST(dt) AS t(daVal)
      GROUP BY daVal