How to create columns in pandas df with .apply and user defined function

I'm trying to create several columns in a pandas DataFrame at once, where each column name is a key in a dictionary and the function returns 1 if any of the values corresponding to that key are present.

My DataFrame has 3 columns, jp_ref, jp_title, and jp_description. Essentially, I'm searching the jp_descriptions for relevant words assigned to that key and populating the column assigned to that key with 1s and 0s based on if any of the values are found present in the jp_description.

jp_tile = [‘software developer’, ‘operations analyst’, ‘it project manager’]

jp_ref = [‘j01’, ‘j02’, ‘j03’]

jp_description = [‘software developer with java and sql experience’, ‘operations analyst with ms in operations research, statistics or related field. sql experience desired.’, ‘it project manager with javascript working knowledge’]

myDict = {‘jp_title’:jp_title, ‘jp_ref’:jp_ref, ‘jp_description’:jp_description}

data = pd.DataFrame(myDict)

technologies = {'java':['java','jdbc','jms','jconsole','jprobe','jax','jax-rs','kotlin','jdk'],

def term_search(doc,tech):
    for term in technologies[tech]:
        if term in doc:
            return 1
            return 0

for tech in technologies:
    data[tech] = data.apply(term_search(data['jp_description'],tech))

I received the following error but don't understand it:

TypeError: ("'int' object is not callable", 'occurred at index jp_ref')

1 answer

  • answered 2019-07-18 15:56 tawab_shakeel

    Your logic is wrong you are traversing list in a loop and after first iteration it return 0 or 1 so jp_description value is never compared with complete list.

    You split the jp_description and check the common elements with technology dict if common elements exists it means substring is found so return 1 else 0

    def term_search(doc,tech):
        doc = doc.split(" ")
        common_elem = list(set(doc).intersection(technologies[tech]))
        if len(common_elem)>0:
            return 1
        return 0       
    for tech in technologies:
        data[tech] = data['jp_description'].apply(lambda x : term_search(x,tech))
         jp_title          jp_ref  jp_description   java    javascript  sql
    0   software developer  j01 software developer....  1          0        1
    1   operations analyst  j02 operations analyst ..   0          0        1
    2   it project manager  j03 it project manager...   0          1        0