Amazon Athena partition with colon(:) is not working

When creating partition in Athena, I tried to use the date in the format (yyyy-MM-ddTHH:mm:ssZ) then I am not able to query the data

Step 1: Create table

CREATE EXTERNAL TABLE my_info (
         id STRING,
         name STRING
) PARTITIONED BY (
        part string
) STORED AS ORC LOCATION 's3://bucket1/data' tblproperties ("orc.compress"="SNAPPY"); 

Step 2: Create folder like below and added the files.

S3://bucket1/data/part=2019-11-12T14:15:16Z

Step 3: Refresh partition MSCK REPAIR TABLE my_info

Step 4: Query the data SELECT * FROM my_info

With this I am not able to query any data

If I change the folder to format (yyyy-MM-ddTHH)

without ’:’ in Step 2

s3://bucket1/data/part=2019-11-12T14

Then I am able to get the results.

Any idea about why this is not working.

1 answer

  • answered 2019-11-12 03:12 Nathan Griffiths

    This is because when you create the partitioned table the partitioning is implemented as part of the S3 path e.g. for s3://bucket1/data/part=2019-11-12T14:15:16Z the part=2019-11-12T14:15:16Z section is an S3 path that Athena interprets as a partition when querying the data.

    S3 path names have some restrictions on the characters that can be used:

    The following characters in a key name might require additional code handling and likely need to be URL encoded or referenced as HEX. Some of these are non-printable characters and your browser might not handle them, which also requires special handling:

    Ampersand ("&")  
    Dollar ("$")  
    ASCII character ranges 00–1F hex (0–31 decimal) and 7F (127 decimal)  
    'At' symbol ("@")  
    Equals ("=")  
    Semicolon (";")  
    Colon (":")  
    Plus ("+")  
    Space – Significant sequences of spaces may be lost in some uses (especially multiple spaces)  
    Comma (",")  
    Question mark ("?")  
    

    In this case it's probably the colons in the path that are not being interpreted by Presto/Athena. To work around this you can use an alternative dividing character in the timestamp e.g. part=2019-11-12--14-15-16 or omit it altogether.