HiveQL: build a pairwise comparison table
I am trying to build a Hive table that contains pairwise comparisons of
ids. For example, let's say we start with table
id val a 10 b 11 c 12
I want to create a table
id_A id_B val_A val_B val_diff a b 10 11 -1 a c 10 12 -2 b c 11 12 -1
I believe this requires a self-join, but I'm not sure how to do it. Any ideas or leads to follow are appreciated.
CREATE TABLE C STORED AS ORC AS SELECT a.id AS id_A, b.id AS id_B, a.val AS val_A, b.val AS val_B, a.val - b.val AS val_diff FROM A AS a INNER JOIN B as b ON a.id != b.id;
but it did not work:
FAILED: SemanticException [Error 10017]: Line 4:3 Both left and right aliases encountered in JOIN 'id'
UPDATE: problem extension.
Suppose I have table
id val category a 10 X b 11 X c 12 X d 13 Y e 14 Y f 15 Y
How would I do a pairwise comparison only among elements of each
I.e. a table
id_A id_B val_A val_B val_diff cat a b 10 11 -1 X a c 10 12 -2 X b c 11 12 -1 X d e . . . Y d f . . . Y e f . . . Y
You use a self join:
select a.id as id_a, b.id as id_b, a.val as val_a, b.val as val_b, (a.val - b.val) as val_diff from t a join t b on a.id < b.id;
In Hive, you might need to do:
select a.id as id_a, b.id as id_b, a.val as val_a, b.val as val_b, (a.val - b.val) as val_diff from t a cross join t b where a.id < b.id;