How to convert timestamp to AWS data lake s3 timestamp
In my data frame I have timestamp column with values in format: "%Y-%m-%d %H:%M:%m". I want to save this Data Frame to parquet using:
df.write.partitionBy('p_timestamp').mode("overwrite").parquet('output/proto.parquet')
and I want to have timestamp partition saved in AWS data lake s3 timestamp format: "%Y-%m-%dT%H-%M"
I tried:
df.select(F.col("p_timestamp"), F.date_format(F.col("p_timestamp"), "%Y-%m-%dT%H-%M").alias("date_format")).show()
but I got an error:
Illegal pattern character 'T'
I would be grateful for any tips.
1 answer
-
answered 2022-05-04 11:29
fskj
I think the formatting string you are looking for is:
"yyyy-MM-dd'T'HH-mm"
You get the error because the
T
is not quoted with'
to mark it as a string. You can consult sparks Datetime Patterns for Formatting and Parsing if you need something a little different.Small example:
import pyspark.sql.types as T schema = T.StructType([T.StructField("Time", T.StringType(), False)]) df = spark.createDataFrame([('1970-09-01 02:02:10',)], schema=schema) df.withColumn("Convert", F.date_format(F.col("Time"), "yyyy-MM-dd'T'HH-mm")).show()
Output:
+-------------------+----------------+ | Time| Convert| +-------------------+----------------+ |1970-09-01 02:03:10|1970-09-01T02-03| +-------------------+----------------+
How many English words
do you know?
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
how many words do you know
Powered by Examplum