HiveQL: build a pairwise comparison table

I am trying to build a Hive table that contains pairwise comparisons of ids. For example, let's say we start with table A:

id  val
a   10
b   11
c   12

I want to create a table B:

id_A  id_B  val_A  val_B  val_diff
a     b     10     11     -1
a     c     10     12     -2
b     c     11     12     -1

I believe this requires a self-join, but I'm not sure how to do it. Any ideas or leads to follow are appreciated.

I tried:

CREATE TABLE C STORED AS ORC AS
SELECT a.id         AS id_A, 
       b.id         AS id_B,
       a.val         AS val_A,
       b.val         AS val_B,
       a.val - b.val AS val_diff
FROM A AS a
INNER JOIN B as b
ON a.id != b.id;

but it did not work:

FAILED: SemanticException [Error 10017]: Line 4:3 Both left and right aliases encountered in JOIN 'id'

UPDATE: problem extension.

Suppose I have table A':

id  val  category
a   10   X
b   11   X
c   12   X
d   13   Y
e   14   Y
f   15   Y

How would I do a pairwise comparison only among elements of each category?

I.e. a table B':

id_A  id_B  val_A  val_B  val_diff cat
a     b     10     11     -1       X
a     c     10     12     -2       X
b     c     11     12     -1       X
d     e     . . .                  Y
d     f     . . .                  Y
e     f     . . .                  Y

1 answer

  • answered 2018-03-13 21:20 Gordon Linoff

    You use a self join:

    select a.id as id_a, b.id as id_b, a.val as val_a, b.val as val_b,
           (a.val - b.val) as val_diff
    from t a join
         t b
         on a.id < b.id;
    

    In Hive, you might need to do:

    select a.id as id_a, b.id as id_b, a.val as val_a, b.val as val_b,
           (a.val - b.val) as val_diff
    from t a cross join
         t b
    where a.id < b.id;