How to display similar rows once for SQL left join on same table

Say I have the following table in my postgresql database:

id|user_id|document_id|
--|-------|-----------|
 1|10     |        100|
 2|20     |        100|
 3|10     |        200|
 4|20     |        200|
 5|10     |        300|
 6|20     |        300|
 7|10     |        400|
 8|20     |        400|

I now join this table with itself on the column document_id as follows:

select t1.document_id, t1.user_id as user_id1, t2.user_id as user_id2
from test_table t1 left join test_table t2 on (t1.document_id = t2.document_id and t1.user_id <> t2.user_id);

The result:

document_id|user_id1|user_id2|
-----------|--------|--------|
        100|10      |20      |
        100|20      |10      |
        200|10      |20      |
        200|20      |10      |
        300|10      |20      |
        300|20      |10      |
        400|10      |20      |
        400|20      |10      |

Here I want to remove similar rows such as the one below because both records mean the same thing:

document_id|user_id1|user_id2|
-----------|--------|--------|
        100|10      |20      |
        100|20      |10      |

So, the expected result should look like:

document_id|user_id1|user_id2|
-----------|--------|--------|
        100|10      |20      |
        200|10      |20      |
        300|10      |20      |
        400|10      |20      |

So I basically require document_id to appear once rather than twice. Is there any way to do so?

Edit:

I tried the following query as suggested by @jarlh:

select t1.document_id, t1.user_id as user_id1, t2.user_id as user_id2
from test_table t1 left join test_table t2 on (t1.document_id = t2.document_id and t1.user_id < t2.user_id);

But the result is that user_id2 is null when user_id1 is greater of the two:

document_id|user_id1|user_id2|
-----------|--------|--------|
        100|10      |20      |
        100|20      |        |
        200|10      |20      |
        200|20      |        |
        300|10      |20      |
        300|20      |        |

1 answer

  • answered 2019-11-08 13:41 Tim Biegeleisen

    The comment by @jarlh might be one way to go here, but another way would be to select distinct using least/greatest:

    select distinct
        t1.document_id,
        least(t1.user_id, t2.user_id) as user_id1,
        greatest(t1.user_id, t2.user_id) as user_id2
    from test_table t1
    left join test_table t2
        on t1.document_id = t2.document_id and
           t1.user_id <> t2.user_id;