Pyspark Unsupported literal type class java.util.ArrayList

I am using python3 on Spark(2.2.0). I want to apply my UDF to a specified list of strings.

df = ['Apps A','Chrome', 'BBM', 'Apps B', 'Skype']

def calc_app(app, app_list):

    browser_list = ['Chrome', 'Firefox', 'Opera']
    chat_list = ['WhatsApp', 'BBM', 'Skype']
    sum = 0
    for data in app:
        name = data['name']
        if name in app_list:
            sum += 1
    return sum

calc_appUDF = udf(calc_app)
df = df.withColumn('app_browser', calc_appUDF(df['apps'], browser_list))
df = df.withColumn('app_chat', calc_appUDF(df['apps'], chat_list))

But it failed and returns : 'Unsupported literal type class java.util.ArrayList'

1 answer

  • answered 2018-01-14 11:59 Prem

    If I understood your requirement correctly then you should try this

    from pyspark.sql.functions import udf, col
    
    #sample data
    df_list = ['Apps A','Chrome', 'BBM', 'Apps B', 'Skype']
    df = sqlContext.createDataFrame([(l,) for l in df_list], ['apps'])
    df.show()
    
    #some lists definition
    browser_list = ['Chrome', 'Firefox', 'Opera']
    chat_list = ['WhatsApp', 'BBM', 'Skype']
    
    #udf definition    
    def calc_app(app, app_list):
        if app in app_list:
            return 1
        else:
            return 0
    def calc_appUDF(app_list):
        return udf(lambda l: calc_app(l, app_list))
    
    #add new columns
    df = df.withColumn('app_browser', calc_appUDF(browser_list)(col('apps')))
    df = df.withColumn('app_chat', calc_appUDF(chat_list)(col('apps')))
    df.show()
    

    Sample input:

    +------+
    |  apps|
    +------+
    |Apps A|
    |Chrome|
    |   BBM|
    |Apps B|
    | Skype|
    +------+
    

    Output is:

    +------+-----------+--------+
    |  apps|app_browser|app_chat|
    +------+-----------+--------+
    |Apps A|          0|       0|
    |Chrome|          1|       0|
    |   BBM|          0|       1|
    |Apps B|          0|       0|
    | Skype|          0|       1|
    +------+-----------+--------+