I cannot see the .csv file I have saved using Spark & Scala

I am trying to save into a .csv file the result of the query I made using SQL and Parquet, from a DataFrame.

This is how I wrote my query:

This his how I query

And this is how I save the data:

enter image description here

But the supposed .csv file output is no where to be seen:

enter image description here

2 answers

  • answered 2018-05-16 06:03 Yayati Sule

    You can try adding file:///home/hadoop/apr2.csv.gz instead of /home/hadoop.apr2.csv. You can refer to the following code snippet available on the databricks' spark-csv module present on Github Spark CSV

    import org.apache.spark.sql.SQLContext
    val sqlContext = new SQLContext(sc)
    val df = sqlContext.read
        .option("header", "true") // Use first line of all files as header
        .option("inferSchema", "true") // Automatically infer data types
    val selectedData = df.select("year", "model")
        .option("header", "true")
        .option("codec", "org.apache.hadoop.io.compress.GzipCodec")

    As for the prefix file:/// we usually add it if we want to perform any read or write operation from local filesystem instead of HDFS.

  • answered 2018-05-16 10:15 Mugdha

    To add dependency, start your spark shell using following command:

    spark-shell --packages com.databricks:spark-csv_2.10:1.5.0

    Read your paraquet file using:

    val df = sqlContext.read.parquet("file:///home/mugdha/users.parquet")

    Apply filter, select operations:

    val filteredDF = df.select("name","favorite_color")

    To save filteredDF, you can use following code:


    Inside testSave folder, you can check out your stored csv.