Merging similar Athena (Presto) queries

I'm new to SQL and other querying languages, so I have some doubts about them.

I have the following two queries in Athena, which are very similar but in one case, an additional parameter is passed and an INNER JOIN is used. The first one gets how many distinct values are in one column of certain table, and the second one does the same, but joining two tables, and adding a restriction on another column, using the joined tables.

--Query #1
SELECT COUNT(DISTINCT(usertoken)) users
FROM "database"

WHERE country = ${country}
and serverdate between CAST('${fromdate}' As DATE) and CAST('${todate}' As DATE) 


--Query #2
SELECT COUNT(DISTINCT(usertoken)) users
FROM "database" op
INNER JOIN "location_0" cities
ON op.loc0id = cities.id

WHERE openings.country = ${country}
and openings.serverdate between CAST('${fromdate}' As DATE) and CAST('${todate}' As DATE)
and cities.id in (${cities_list})

Where the parameters between ${} are passed externally and are all of string type, except ${cities_list}, which is a list of integers (dates are casted to DATE). Both work fine and without problems.

My question is the following: Can I use only one query to perform both operations, depending on the value of cities_list? I check before executing this queries wheather this value is Null or not, and depending on it, I execute one or the another, but I would want to have only one query able to perform both cases (as most of them are the same, and I don't want to have redundant code).

TL;DR: I want to merge these two queries into one that works properly for both cases (cities_list has a value, or is Null), adding the last condition only if certain parameter is not Null (and, if possible, doing only the INNER JOIN if the parameter is not Null).

Thanks!

2 answers

  • answered 2019-10-14 13:29 GMB

    You can do conditional aggregation:

    SELECT 
        COUNT(DISTINCT usertoken) users1,
        COUNT(DISTINCT CASE WHEN ci.id in (${ci.list}) THEN usertoken END) users2
    FROM "database" op
    INNER JOIN "location_0" cities ci ON op.loc0id = ci.id
    WHERE 
        op.country = ${country}
        AND op.serverdate between CAST('${fromdate}' As DATE) and CAST('${todate}' As DATE)
    

  • answered 2019-10-14 14:04 Gordon Linoff

    You can use JOIN and COUNT(DISTINCT), but you need a LEFT JOIN:

    SELECT COUNT(DISTINCT op.usertoken) as users,
           COUNT(DISTINCT CASE WHEN cities.id in (${cities_list}) THEN op.usertoken END) as users_2
    FROM "database" op LEFT JOIN
         "location_0" cities
         ON op.loc0id = cities.id
    WHERE openings.country = ${country} AND
          openings.serverdate between CAST('${fromdate}' As DATE) and CAST('${todate}' As DATE);
    

    That said, it is quite possible that the JOIN is not needed at all, if it is only used to look up a single value:

    SELECT COUNT(DISTINCT op.usertoken) as users,
           COUNT(DISTINCT CASE WHEN op.loc0id in (${cities_list}) THEN op.usertoken END) as users_2
    FROM "database" op 
    WHERE openings.country = ${country} AND
          openings.serverdate between CAST('${fromdate}' As DATE) and CAST('${todate}' As DATE);