Improve performance of UPDATE WHERE sql query

I have a quite simple query

UPDATE TableA
SET date_type = TableB.date_type
FROM TableB
WHERE TableB.int_type = TableA.int_type

My indices are: TableA(int_type), TableB(int_type, date_type)

EXPLAIN results:

Update on TableA  (cost=2788789.320..34222368.900 rows=82594592 width=261)
  ->  Hash Join  (cost=2788789.320..34222368.900 rows=82594592 width=261)
          Hash Cond: (TableA.int_type = TableB.int_type)
        ->  Seq Scan on tableA  (cost=0.000..12610586.960 rows=101433296 width=247)
        ->  Hash  (cost=1272403.920..1272403.920 rows=82594592 width=18)
              ->  Seq Scan on TableB  (cost=0.000..1272403.920 rows=82594592 width=18)

Query is in progress for more than 3 hours.

What can be done to make it run faster? As I can see from the EXPLAIN results, indices are not used. Should I pick other indices/make any other improvement to have query running faster?

Postgresql 9.6

2 answers

  • answered 2018-11-14 12:08 Gordon Linoff

    For this query:

    UPDATE TableA
    SET date_type = TableB.date_type
    FROM TableB
    WHERE TableB.int_type = TableA.int_type
    

    You can try an index on TableB(int_type, date_type).

  • answered 2018-11-14 12:39 wildplasser

    What you could do, is avoid idempotent updates:


    UPDATE TableA a
    SET date_type = b.date_type
    FROM TableB b
    WHERE b.int_type = a.int_type
    AND a.date_type IS DISTINCT FROM b.date_type  -- <<-- avoid updates with the same value
            ;
    

    And, maybe you assume a 1-to-1 relation between A and B, but the DBMS does not. You could restrict the updates to at most one source row per target row:


    EXPLAIN
    UPDATE TableA a
    SET date_type = b.date_type
    FROM ( SELECT int_type, date_type
            , row_number() OVER(PARTITION BY int_type) AS rn
            FROM TableB
            ) b
    WHERE b.int_type = a.int_type
    AND a.date_type IS DISTINCT FROM b.date_type -- <<-- avoid idempotent updates
    AND b.rn=1 -- <<-- allow only one update per target row.
            ;