Query by multiple values from an AWS Athena bucketed table
I have a bucketed table from which I want to query by multiple values. Here is an example:
SELECT * FROM my_bucketed_table WHERE bucketed_column IN (value1, value2)
The result is a full scan of the table, instead of using the index.
When I used union to query each value at a time it worked as expected in terms of data scanned:
SELECT * FROM my_bucketed_table WHERE bucketed_column = value1 UNION SELECT * FROM my_bucketed_table WHERE bucketed_column = value2
but I want the list to be dynamic, so this solution is not good enough for me.
I expect the data scanned to be the same as in the UNION solution using the IN operator or a JOIN with another table
This is a bit long for a comment.
I think you are referring to partition pruning, which is a bit different from "using an index". You want the query to only read the relevant partitions.
Partition pruning is quite tricky. The basic problem is that the query needs to know what data to read before it starts executing the query. This is usually handled by requiring explicit comparisons on the partitioning column.
Identifying the right partitions should work correctly with
<=. It might get more complicated with
not in. It probably will not work when you use a
joinon one table and don't explicit include the partition for both tables in the join.
you can try like below which may help to use index
SELECT * FROM my_bucketed_table WHERE bucketed_column = 0000 or bucketed_column IN (value1, value2)
assume you have not any value 0000 in your column