Writing spark.sql dataframe result to parquet file

I enabled the following spark.sql session:

# creating Spark context and connection
spark = (SparkSession.builder.appName("appName").enableHiveSupport().getOrCreate())

and am able to produce see the results of the following query:

spark.sql("select year(plt_date) as Year, month(plt_date) as Mounth, count(build) as B_Count, count(product) as P_Count from first_table full outer join second_table on key1=CONCAT('SS',key_2) group by year(plt_date), month(plt_date)").show()

However, when I try to write the resulting dataframe from this query to hdfs, I get the following error:

saving spark.sql.dataframe.DataFrame in hdfs

I am able to save the resulting dataframe of a simple version of this query to the same path. The problem appears by adding functions such as count(), year() and etc.

What is the problem? and how can I save the results to hdfs?

1 answer

  • answered 2019-12-09 12:28 Ajinkya Bhore

    It is giving error due to '(' present in column 'year(CAST(plt_date AS DATE))' :

    Use to rename :

    data = data.selectExpr("year(CAST(plt_date AS DATE)) as nameofcolumn")

    Upvote if works

    Refer : Rename Spark Column