SQL server improve performance of joining a large and small table

I am trying to join two tables:

Table1: (900 million rows (106 GB). And, id1, id2, id3, id4 are clustered primary key, houseType is string)

+-----+-----+-----+------------+--------+
| Id1 | id2 | id3 | id4        |  val1  |
+-----+-----+-----+------------+--------+
| ac  |  15 | 697 | houseType1 | 75.396 |
+-----+-----+-----+------------+--------+
| ac  |  15 | 697 | houseType2 | 20.97  |
+-----+-----+-----+------------+--------+
| ac  |  15 | 805 | houseType1 | 112.99 |
+-----+-----+-----+------------+--------+
| ac  |  15 | 805 | houseType2 | 53.67  |
+-----+-----+-----+------------+--------+
| ac  |  27 | 697 | houseType1 | 67.28  |
+-----+-----+-----+------------+--------+
| ac  |  27 | 697 | houseType2 | 55.12  |
+-----+-----+-----+------------+--------+

Table 2 is very small with 150 rows. And, val1, val2 are clustered primary key.

+------+------+---------+
| val1 | val2 | factor1 |
+------+------+---------+
| 0    | 10   | 0.82    |
+------+------+---------+
| 10   | 20   | 0.77    |
+------+------+---------+
| 20   | 30   | 0.15    |
+------+------+---------+

What I need :

For every "val1" in table1, it should be found which range [val1, val2] in table2 it belongs to and its associated "factor1" in table2 should be returned from table2, which will be used for further aggregate calculation.

example of my query:

 Select a.id1, a.id2, a.id3, a.id4, 
         max(case when a.val1 >= b.val1 and a.val1 < b.val2 then  b.factor1 * a.val1
                else null
            end ) as result
 From Table1 as a,
      Table2 as b
 Group by  a.id1, a.id2, a.id3, a.id4

For example, a row :

   ac ,  15, 697, houseType2, 20.97 in table1
   0.15 should be returned from table2 because 20.97 in range [20, 30] in table2.

There is no join action in the query because I do not know how to use join here. I just need to lookup the factors for val1 in table2.

In SQL server, it runs very slow with more than 3 hours.

I also got :

   Warning: Null value is eliminated by an aggregate or other SET operation. 

Could anyone help me about this ?

thanks

2 answers

  • answered 2018-02-13 02:15 Ctznkane525

    This should reduce your recordset:

    Select a.id1, a.id2, a.id3, a.id4, 
             b.factor1 * a.val1 as result
     From Table1 a inner join
          Table2 b on a.val1 >= b.val1 and a.val1 < b.val2
    

    This way, you will only get a single record from b for each record from a. This is at least a start to improve your performance problem.

    No need for MAX because you are joining to get a single record.

  • answered 2018-02-13 03:47 Gordon Linoff

    I would be inclined to express this as a subquery or lateral join:

    Select a.id1, a.id2, a.id3, a.id4, b.factor1 * a.val1 as result
    From Table1 a cross apply
         (select b.*
          from Table2 b
          where a.val1 >= b.val1 and a.val1 < b.val2 
         ) b;
    

    The aggregation is unnecessary because the four keys constitute the primary key.