Presto query to exclude rows based on the value in a specific column

This seems like it should be simple but I guess I'm missing something.

I've got a query where I'm selecting a bunch of columns from my DB, and I want to exclude any rows where (for example) the province = 'ontario' or 'quebec'. I've tried it many ways, with different results - none of which are what I want.

select .... from fulldata 
   WHERE substr(code, 1, 5) <> '10000' 
   AND (province <> 'ontario' OR province <> 'quebec')

This does work to exclude rows where province is one of those values, but it also excludes all rows where province is null/blank - no idea why.

I tried AND NOT ( stateprovince = 'ontario' OR stateprovince = 'quebec') with the same result.

I tried AND stateprovince <> ANY(VALUES 'ontario','quebec') and it doesn't have any impact (all data shows up, nothing is filtered out)

Even breaking it down to something as simple as AND stateprovince <> 'ontario' does get rid of ontario data, but also get's rid of blanks which isn't the intended result.

In writing this up I've kind of come up with a workaround which is to change the query to this:

AND (stateprovince IS NULL OR stateprovince <> 'ontario')
AND (stateprovince IS NULL OR stateprovince <> 'quebec')

Which works, but doesn't seem like the "right" way. Any idea what's up?

I'm using Presto on Athena.

1 answer

  • answered 2019-11-08 20:45 Piotr Findeisen

    When province is NULL (think: "unknown") then this:

    (province <> 'ontario' OR province <> 'quebec')

    will evaluate to NULL, filtering out given row. Since NOT (NULL) is also NULL, there isn't a simple way to avert this. Thus you need an explicit NULL check:

    (province IS NULL OR province NOT IN('ontario', 'quebec'))

    Alternatively you can write this this way:

    (coalesce(province, '') NOT IN('ontario', 'quebec'))