Spark compare two dataframes

  • List item

Need to compare two dataframes and create a 3rd dataframe to generate the difference. While comparing the dataframes need to follow few conditions.

DF1:(Struct type -> empid:IntegerType,name:StringType,phone:IntegerType)

   1| amar | 12345
   2|  23  |<blank>

DF2: i am casting DF1 per their datatypes. (col(c).cast(datatype))

   1| amar | 12345
   2|  null  | null

Now here since name was string type it cast it to null. Also since phone was integer the blank was cast as null.

I need help to generate a third Dataframe,which should only point the casting error i had. If i consider all the null it takes the blank cast of phone as well which i dont want. Please see below example of expected DF3.


empid|name|phone    |error
   1| amar | 12345  |null
   2|  23  |<blank> |name is wrong data type