Presto for loop

I am new to presto and I would like to know if there is any way to have for loop. I have a query that aggregates some data date by date, and when i run it it throws an error of: exceeded max memory size of 30GB. I can use other suggestions if looping is not an option.

the query I am using:

select  dt as DATE_KPI,brand,count(distinct concat(cast(post_visid_high as varchar),
         cast(post_visid_low as varchar)))as kpi_value
from hive.adobe.tbl 
  and dt >= date '2017-05-15'  and dt <= date '2017-06-13' 
group by 1,2

1 answer

  • answered 2017-06-15 05:17 Dain Sundstrom

    Assuming you are using, Hive you can write the source data to a table bucketed bucketed on brand, and then process groups of buckets with WHERE "$bucket" % 32 = <N>.

    Otherwise, you can fragment the query into n queries and then process 1/n of the "brand" in each query. You use WHERE abs(from_big_endian_64(xxhash64(to_utf8(brand)))) % 32 = <N> to bucketize the brands.