How do I read sequence data in Scala in Spark

This is my first time to attempt to read sequence format data in Scala, it would be greatly appreciated if someone can help me with the right command.

data:

hdfs dfs -cat orders03132_seq/part-m-00000 | head
SEQ!org.apache.hadoop.io.LongWritableordeG�Y���&���]E�@��

My command:

sc.sequenceFile("orders03132_seq/part-m-00000", classOf[Int], classOf[String]).first

Error:

18/03/13 16:59:28 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) java.lang.RuntimeException: java.io.IOException: WritableName can't load class: orders at org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:2103)

Thank you very much in advance.

1 answer

  • answered 2018-03-14 11:11 suj1th

    You would need to read it as a Hadoop File. You can do this with something like:

    sc.hadoopFile[K, V, SequenceFileInputFormat[K,V]]("path/to/file")
    

    Refer documentation here.