Finding most common elements of column of arrays in Presto
I would like to find the most common elements within a column of arrays in presto.
col1 [A,B,C] [A,B] [A,D]
with output of...
col1 - col2 A - 3 B - 2 C - 1 D - 1
I have tried using flatten and unnest. I am able to get it into a single array using
select flatten(array_agg(col1)) from tablename;
but I am then not sure how to group and count by the distinct elements. I also am struggling to get this to run on all of my data because of the large amount of memory required. Thanks for any help!
select u.col, count(*) from t cross join unnest(col1) u(col) group by u.col;
You can use to unnest() to flatten Array and then group by to group the unique values.
The Query to generate the data set for your case. You can replace this part with your select command in the final query:
with dataset AS ( SELECT ARRAY[ ARRAY['A','B','C'], ARRAY['A','B'], ARRAY['A','D'] ] AS data ) select dt from dataset CROSS JOIN UNNEST(data) AS t(dt)
------ dt ------ [A,B,C] ------ [A,B] ------ [A,D]
Now in the final query we will first flatten this data to remove all the values from all the rows and then group those value to get unique values and their count.
with da AS( with dataset AS ( SELECT ARRAY[ ARRAY['A','B','C'], ARRAY['A','B'], ARRAY['A','D'] ] AS data ) select dt from dataset CROSS JOIN UNNEST(data) AS t(dt) ) select daVal,count(*) from da CROSS JOIN UNNEST(dt) AS t(daVal) GROUP BY daVal