How do I convert a sequence file to parquet format

I have a HIVE Table (test) that I need to create in the PARQUET format. I will be using a bunch of SEQUENCE files in order to create and insert into a table.

Once the table is created, is there a way to convert into PARQUET? I mean I know we could have done, say

CREATE TABLE default.test( user_id STRING, location STRING) 
PARTITIONED BY ( dt INT ) STORED AS PARQUET

initially while creating the table itself. However, in my case I am forced to use SEQUENCE files to create the table first because it is the format that I have to begin with and cannot directly convert to PARQUET. Is there a way I could convert into parquet after the table is created and data inserted?

1 answer

  • answered 2019-10-10 06:55 Piotr Findeisen

    To convert form sequence file to Parquet you need to load the data (CTAS) into a new table.

    The question is tagged with presto, so I am giving you Presto syntax for this. I am including partitioning, because example in the question contains it.

    CREATE TABLE test_parquet WITH(format='PARQUET', partitioned_by=ARRAY['dt']) AS
    SELECT * FROM test_sequencefile;