# Presto - how to perform correlations on between all columns in one query

I have a table in the following format:

``````A   B   C   D
7   7   2   12
2   2   3   4
2   2   2   4
2   2   2   3
5   5   2   7
``````

I would like to calculate correlations between each of the columns using the build-in correlation function (https://prestodb.io/docs/current/functions/aggregate.html corr(y, x) → double)

I could run over all the columns and perform the corr calculation each time with: `select corr(A,B) from table` but I would like to reduce the number of times I access presto and run it in one query if its possible.

Would it be possible to get as a result the column names that pass a certain threshold or at least the correlation scores between all possible combinations in one query?

Thanks.

I would like to calculate correlations between each of the columns

Correlation involves two series of data (in SQL, two columns). So I understand your question as: how to compute the correlation for each and every possible combination of columns in the table. That would look like:

``````select
corr(a, b) corr_a_b,
corr(a, c) corr_a_c,
corr(a, d) corr_a_d,
corr(b, c) corr_b_c,
corr(b, d) corr_c_d,
corr(c, d) corr_c_d
from mytable
``````

You can use a lateral join to unpivot the table, then a self join and aggregation:

``````with v as (
select v.*, t.id
from (select t.*,
row_number() over (order by a) as id
from t
) t cross join lateral
(values ('a', a), ('b', b), ('c', c), ('d', d)
) v(col, val)
)
select v1.col, v2.col, corr(v1.val, v2.val)
from v v1 join
v v2
on v1.id = v2.id and v1.which < v2.which
group by v1.col, v2.col;
``````

The `row_number()` is only to generate a unique id for each row, which is then used for the self-join. You may already have a column with this information, so that might not be necessary.