Read spark dataset only first n columns
I have a dataset with more than 5000 columns and OutOfMemoryException was thrown when try to read the dataset, even when limiting to 10 rows. There is another post on the cause of exception and so I want to read only first n columns to avoid the error.
I could not find an api call that does that and only the rows could be restricted with
Is there a way to do restricting to only first few columns? Thanks.
Given that your Dataset is
ds, you can extract the first
ncolumns into an Array :
val n = 2 val firstNCols = ds.columns.take(n)
and then select only these columns from the Dataset :