R: Combining lapply and left_join to conditionally merge dataframes
I'm hoping someone out there might be able to help me get to the root of a frustrating problem I've been having with my code in R. I have a list consisting of dataframes, and I want to left join each element on one of two OTHER dataframes (call them A and B). Which of these secondary dataframes to join on depends on the element's position in the list. For my purposes, I'd like every odd element to be leftjoined to A, and every even element to be leftjoined to B.
library(dplyr)
DF < data.frame(Num = c("1","2"), Let = c("a","b"), stringsAsFactors = FALSE)
A < data.frame(Let = c("a","b"), Col = c("Yellow","Red"), stringsAsFactors = FALSE)
B < data.frame(Let = c("a","b"), Col = c("Green","Blue"), stringsAsFactors = FALSE)
LIST < list(DF, DF)
So far, I've tried to do this in two different ways. The first approach involved an ifelse statement. If I apply such a statement to assign an integer value based on position, I obtain the expected result. Similarly, when I do away with the ifelse statement and simply perform a series of leftjoins on the list elements, everything works as expected.
lapply(seq_along(LIST), function(x, {ifelse((x %% 2)==0, y[[x]] < 1, y[[x]] < 2)}, y = LIST)
lapply(seq_along(LIST), function(x, {left_join(y[[x]], A, by = c("Let"))}, y = LIST)
Where I run into problems is when I attempt to combine the ifelse statement and the left join. In particular, I end up with a list consisting of lists, each of which retains only the first column of the original corresponding dataframe.
lapply(seq_along(LIST), function(x, y) {ifelse((x %% 2)==0, left_join(y[[x]], A, by = c("Let")), left_join(y[[x]], B, by = c("Let")))}, y = LIST)
Here is the output I would LIKE to obtain:
[[1]]
Let Num Col
1 a 1 Yellow
2 b 2 Red
[[2]]
Let Num Col
1 a 1 Green
2 b 2 Blue
I'm certain there's an absurdly simple solution to the problem. Can anyone see it?
Thanks in advance! Matthew
P.S.: I also attempted a second approach, applying subsetting rather than an ifelse statement. Again, however, I run into problems. The first line below works as expected, but the second returns an error, as though R doesn't recognize the list indices:
lapply(seq_along(LIST), function(x, y) {left_join(y[[x > 0]], A, by = c("Let"))}, y = LIST)
lapply(seq_along(LIST), function(x, y) {left_join(y[[x == 1]], A, by = c("Let"))}, y = LIST)
Error in y[[x == 1]] : attempt to select less than one element in integerOneIndex
4 answers

I'm not entirely sure I understand your issue.
The following solution is based on a reproduction of the output of
lapply(seq_along(LIST), function(x, y) {left_join(y[[x > 0]], A, by = c("Let"))}, y = LIST)
from your postscript. Note that the otherlapply
lines throw errors.library(tidyverse); map(list(A, B), function(x) left_join(DF, x)) #Joining, by = "Let" #Joining, by = "Let" #[[1]] # Num Let Col #1 1 a Yellow #2 2 b Red # #[[2]] # Num Let Col #1 1 a Green #2 2 b Blue
We use
purrr:map
withdplyr::left_join
to joinA
andB
withDF
.
The same can be achieved in base R using
Map
andmerge
:mapply(function(x) merge(DF, x, by = "Let"), list(A, B), SIMPLIFY = F) #[[1]] # Let Num Col #1 a 1 Yellow #2 b 2 Red # #[[2]] # Let Num Col #1 a 1 Green #2 b 2 Blue

Overview
Use
base::mapply()
to return a list of data frames which were conditionally merged. Here, I supply two inputs:seq.along( along.with = LIST )
to obtain the number of elements inLIST
; andLIST
itself.
The
FUN
argument is an anonymous function that takes in two inputs i
andj
 and tests whether the current element inLIST
is even or odd numbered prior to performing a leftjoin using base::merge().If the result of the modulus operator for the i^{th} element in
seq.along( along.with = LIST )
is equal to zero, then leftjoinB
onto the j^{th} element inLIST
; if it does not equal zero, then perform the leftjoinA
onto the j^{th} element inLIST
.# load data DF < data.frame(Num = c("1","2"), Let = c("a","b"), stringsAsFactors = FALSE) A < data.frame(Let = c("a","b"), Col = c("Yellow","Red"), stringsAsFactors = FALSE) B < data.frame(Let = c("a","b"), Col = c("Green","Blue"), stringsAsFactors = FALSE) LIST < list(DF, DF) # goal: left join all odd elements in LIST[[j]] # to `A` and all even elements to `B` merged.list < mapply( FUN = function( i, j ) if( i %% 2 == 0 ){ merge( x = j , y = B , by = "Let" , all.x = TRUE ) } else{ merge( x = j , y = A , by = "Let" , all.x = TRUE ) } , seq_along( along.with = LIST ) , LIST , SIMPLIFY = FALSE ) # view results merged.list # [[1]] # Let Num Col # 1 a 1 Yellow # 2 b 2 Red # # [[2]] # Let Num Col # 1 a 1 Green # 2 b 2 Blue # end of script #
Tidyverse approach
The results are replicated below using functions from the purrr and dplyr packages.
library( dplyr ) library( purrr ) merged.list < map2( .x = seq_along( along.with = LIST ) , .y = LIST , .f = function( i, j ) if( i %% 2 == 0 ){ left_join( x = j , y = B , by = "Let" ) } else{ left_join( x = j , y = A , by = "Let" ) }) # view results merged.list # [[1]] # Num Let Col # 1 1 a Yellow # 2 2 b Red # # [[2]] # Num Let Col # 1 1 a Green # 2 2 b Blue # end of script #

It's possible that MauritsEvers has already answered your question but I thought I would address the obvious errors in R syntax and programming logic. Focus on the first lapplycall:
lapply(seq_along(LIST), function(x, {ifelse((x %% 2)==0, y[[x]] < 1, y[[x]] < 2)}, y = LIST)
First and perhaps trivial is the missing closing
)
argument list in the firstlapply
action. Next and more fundamental is the incorrect use ofifelse
as a programming construct. Theifelse
function is not designed for serial tests of dataobjects. It is only designed to be applied along a single vector. Theif(.){.}else{.}
function probably should have been used in thatlapply
call if serial choices were to be made.However, (and now trying to implement the first paragraph rather than continue correecting code) I think it would be much simpler to use logical indexing (with R's implicit recycling process) on the
LIST
object rather than any looping process. (This is not a tidyverse solution.) This code would segment the LIST into "odd" and "even" components:oddList < LIST[ c(TRUE,FALSE) ] # implicit seqalong by virtue of recycling evenList < LIST[ c(FALSE,TRUE) ]
We could use that type of results to make two oneliners that would accomplish your stated goals. I made the LISTobject four wide rather than two wide.
Abig < Reduce( function(x,y) {merge(x,y,by="Let")}, LIST, init=A) Warning message: In merge.data.frame(x, y, by = "Let") : column names ‘Num.x’, ‘Num.y’ are duplicated in the result Bbig < Reduce( function(x,y) {merge(x,y,by="Let")}, LIST, init=B) Warning message: In merge.data.frame(x, y, by = "Let") : column names ‘Num.x’, ‘Num.y’ are duplicated in the result
That is only a warning and here you can see what it was warning about:
> Abig Let Col Num.x Num.y Num.x Num.y 1 a Yellow 1 1 1 1 2 b Red 2 2 2 2
If you need those duplicated column names labeled uniquely (and I thin that would be a good idea), then:
names(Abig)[ grep("Num", names(Abig)) ] < paste0("Num.", seq_along( grep("Num", names(Abig)) ) ) Abig Let Col Num.1 Num.2 Num.3 Num.4 1 a Yellow 1 1 1 1 2 b Red 2 2 2 2

This solution is quite similar to
mapply
based solution already posted here by (@MauritsEvers & @aspiringurbandatascientist) but it usages a different approach tojoin
data.frames. Thedplyr::left_join
has been used to fit the purpose.library(dplyr) # Using mapply and left_join mapply(function(x,y){ if(y %% 2 == 1){ left_join(x, A, by="Let") }else { left_join(x, B, by="Let") } }, LIST, seq_along(LIST), SIMPLIFY = FALSE) # [[1]] # Num Let Col # 1 1 a Yellow # 2 2 b Red # # [[2]] # Num Let Col # 1 1 a Green # 2 2 b Blue