Function returns a function within lapply - nested lapply?

I thought I was being elegant with the code until I ran into an issue with lapply function. I used dput to output sample. Note that I am using data.table not data.frame.

full_data <- structure(list(FireplaceQu = c("Gd", "Gd", "TA", "TA", "Gd", 
"None", "Gd", "Gd", "None", "None", "None", "None", "Gd", "Gd", 
"Gd", "None"), BsmtQual = c("TA", "Gd", "Gd", "TA", "Gd", "TA", 
"Ex", "TA", "TA", "TA", "TA", "Ex", "TA", "Ex", "Ex", "Gd"), 
    CentralAir = c("Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "N", 
    "N", "Y", "Y", "Y", "Y", "Y", "Y")), .Names = c("FireplaceQu", 
"BsmtQual", "CentralAir"), class = "data.frame", row.names = c(NA, 
-16L))

library(data.table)
setDT(full_data)


cols = c('FireplaceQu', 'BsmtQual', 'CentralAir')

FireplaceQu=c('None','Po','Fa','TA','Gd','Ex')
BsmtQual=c('None','Po','Fa','TA','Gd','Ex')
CentralAir=NA

cust_levels <- list(FireplaceQu, BsmtQual, CentralAir)

# I modified a function from SO to sort based on set levels instead of by using default sort function.
# https://stackoverflow.com/questions/38620424/label-encoder-functionality-in-r
# function which returns function which will encode vectors with values  of 'vec' 
lev_index = 1
label_encoder = function(vec){
    levels = cust_levels[[lev_index]]
    lev_index = lev_index + 1
    function(x){
        match(x, levels)
    }
}

full_data[, (cols) := lapply(.SD, lapply(.SD, label_encoder)), .SDcols = cols]

I know I can get this to work in a for loop, but I thought I would try to use the lapply function. I'm confused on how to use this with a function that returns a function as the value and than needs to be evaluated.

I ultimately want to create integer values ordered based on the order of the cust_levels. Bonus if I can get rid of the lev_index!

Example input:

FireplaceQu BsmtQual CentralAir
         None       Gd          Y
           TA       Gd          Y
           TA       Gd          Y
           Gd       TA          Y

Example output:

FireplaceQu BsmtQual CentralAir
         1       5          NA
         4       5          NA
         4       5          NA
         5       4          NA

1 answer

  • answered 2018-08-09 01:17 mt1022

    You can do this with mapply:

    full_data[, (cols) := mapply(match, .SD, cust_levels, SIMPLIFY = FALSE), .SDcols = cols]
    
    # > full_data
    #     FireplaceQu BsmtQual CentralAir
    #  1:           5        4         NA
    #  2:           5        5         NA
    #  3:           4        5         NA
    #  4:           4        4         NA
    #  5:           5        5         NA
    #  6:           1        4         NA
    #  7:           5        6         NA
    #  8:           5        4         NA
    #  9:           1        4         NA
    # 10:           1        4         NA
    # 11:           1        4         NA
    # 12:           1        6         NA
    # 13:           5        4         NA
    # 14:           5        6         NA
    # 15:           5        6         NA
    # 16:           1        5         NA