Force athena/presto to respect ordering of files from S3
I have a file where the ordering is important. Essentially the file contains multiple record types, and first record of the group defines the ID that links subsequent records, until another "first" record is found. There can be any number of subsequent records of varying types.
So when reading it's important to read the file in order. You find that ID and then propagate it to all the other records. But this goes completely against the grain of how Athena and presto work - they do everything they can to read the data in parallel.
So can this be read in athena, or must we pre-process it in python first? I was guessing there may be some option to force a single threaded read, or even to teach athena that the order of the source file is important but i couldnt find anything.