Apache Spark Dataframe - Get length of each column

Question: In Apache Spark Dataframe, using Python, how can we get the data type and length of each column? I'm using latest version of python.

Using pandas dataframe, I do it as follows:

df = pd.read_csv(r'C:\TestFolder\myFile1.csv', low_memory=False)

for col in df:
        print(col, '->', df[col].str.len().max())

1 answer

  • answered 2022-05-07 05:16 Vaebhav

    Pyspark also has a describe similar to Pandas , which you can use in this case

    sparkDF.describe()
    

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum