How to append from iterated dataframe row a calculated value into a new column into same row

unfortunately I can't get it to write a calculated value from one row to the same row, so that it creates a new dataframe that has two new columns of calculated values.

My dataframe looks like this:

VP text1 text2
1 Text1 Text2
2 Text3 Text4
3 Text5 Text 6

My goal should look like this:

VP text1 text2 error_count1 error_count2
1 Text1 Text2 2 5
2 Text3 Text4 4 7
3 Text5 Text 6 8 9

I tried this:

def compare_texts(text1: str, text2: str, data: pd.DataFrame, switch: bool ):
    Compare each text from data with text1 and text2. Return founded errors. 

    :param text1: Correct Text 1 
    :param text2: Correct Text 2
    :param data: dataframe of participant data

    :return data: new dataframe

    # Insert new empty columns for inseration. 
    if switch == False:
        data["error_count1"]        = ""
        data["error_count2"]        = ""
        data["error_count1_rev"]    = ""
        data["error_count2_rev"]    = ""

    for index, row in data.iterrows():
        # get participant data into variables to pass as parameter
        participant = row['VP']
        pp_text1 = row['text1']
        pp_text2 = row['text2']

        if switch == False:
            error_count_1 = Levenshtein.distance(words(pp_text1), words(text1))
            error_count_2 = Levenshtein.distance(words(pp_text2), words(text2))

            data[index,'error_count1'] = error_count_1  # Here is the problematic code that needs to be adjusted
            data[index,'error_count2'] = error_count_2  
        else:    # Switch compared text, because we changed texts in week 3. 
            error_count_1 = Levenshtein.distance(words(pp_text2), words(text1))
            error_count_2 = Levenshtein.distance(words(pp_text1), words(text2))

            data['error_count1_rev'] = error_count_1
            data['error_count2_rev'] = error_count_2 

    return data

But the end result, unfortunately, looks like this:

VP text1 text2 error_count1 error_count2 error_count1 error_count 2 error_count1 error_count2
1 Text1 Text2 2 5 4 7 8 9
2 Text3 Text4 2 5 4 7 8 9
3 Text5 Text 6 2 5 4 7 8 9

If I omit "index", then the last value in all rows is stored in the columns.

So I have to make it somehow that only the value in the row of the corresponding column is stored.

2 answers

  • answered 2022-05-04 10:26 Daweo

    I suggest using pandas.DataFrame.apply for this task consider following simple example: lets say you have text1 and text2 and your task is to find if they are same case-sensitive and case-insensitive then you might do

    import pandas as pd
    df = pd.DataFrame({'text1':['ABC','ABC','XYZ'],'text2':['ABC','abc','ABC']})
    def same(row):
        return {"sensitive":row["text1"]==row["text2"],"insensitive":row["text1"].lower()==row["text2"].lower()}
    dfsame = df.apply(same,axis=1,result_type="expand")
    dffinal = pd.concat([df,dfsame],axis=1)


      text1 text2  sensitive  insensitive
    0   ABC   ABC       True         True
    1   ABC   abc      False         True
    2   XYZ   ABC      False        False

  • answered 2022-05-04 11:40 Xixiaxixi


    using loc, data.loc[index,'error_count1'] = error_count_1


    I tested your code, but got result like this

    for idx, row in data.iterrows():
        data[idx,'add col'] = idx
      text1 text2  (0, add col)  (1, add col)  (2, add col)
    0   ABC   ABC             0             1             2
    1   ABC   abc             0             1             2
    2   XYZ   ABC             0             1             2

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum