SQL: How to join tables with 1+ millions of records

I want to join two tables ("products" table has 1.5 millions of records) using the following query, but after 15 minutes the query was still running and my pc was overheating (it's a lenovo v330-14ikb with 8gb of RAM), so I stopped it.

I am very new to indexes, and I tried by creating the followings:

  • CREATE INDEX customer_id_idx1 ON orders (customer_id)
  • CREATE INDEX customer_id_idx2 ON products (customer_id)
  • CREATE INDEX customer_id_revenues_idx ON orders(customer_id,revenues)
  • CREATE INDEX customer_id_costs_idx ON products(customer_id,costs)

This is the query:

SELECT a.customer_id, (SUM(a.revenues) / SUM(b.costs) :: FLOAT) AS roi
FROM orders a
JOIN products b
ON a.customer_id = b.customer_id
WHERE a.customer_id IN (
    SELECT customer_id FROM (SELECT 
        customer_id,
        COUNT(*) AS n_products
    FROM products
    GROUP BY 1
    ORDER BY 2 DESC
    LIMIT 5) x
)
GROUP BY a.customer_id  
ORDER BY roi DESC

The output should return the ratio of revenues/costs for the top 5 customers by number of products they bought.

I am using pgadmin. Can someone explain me how to speed up and make it compile? Thank you in advance.

1 answer

  • answered 2021-02-27 15:10 a_horse_with_no_name

    I don't think you need to aggregate twice as far as I can tell.

    select customer_id, roi
    from (
      select o.customer_id, 
             sum(o.revenues) / sum(p.costs)::float as roi, 
             count(*) as n_products
      from orders o
        join products p on o.customer_id = p.customer_id
      group by o.customer_id  
      order by n_products
      limit 5
    ) t
    order by roi desc
    

    Alternatively try to aggregate the two tables separately, then join the results:

    select o.customer_id, o.revenues / p.costs::numeric as roi
    from (
      select customer_id, sum(revenues) as revenues
      from orders
      group by customer_id
    ) o
      join (
        select customer_id, sum(costs) as costs, count(*) n_products
        from products
        group by customer_id
      ) p on p.customer_id = o.customer_id
    order by p.n_products desc
    limit 5