# Distance between the rows in dataset B, based on dataset A

I have two datasets, A and B

I am interested in how far each row of B is to each row in A (both have the same columns).

Due to the size of B, computing dist() or parDist() on the stacked dataset of A and B and taking a subset isn't feasible.

More concretely: suppose A is 50000 rows, B is 250000. I want 250000 rows x 50000 columns to detail these distances.

Any solution I'm overlooking?

This worked for me with a smaller dataset and should work on your dataset. It separates the task into chunks and calculates summary stats for each row-of-A compared to all-rows-of-B. It still performs an all-to-all comparison in the end since it iterates through all-rows-of-A. (If this is not what you're looking for, it's important to provide a reproducible example and expected output to avoid situations like this)

``````set.seed(1)
A <- as.data.frame(matrix(runif(500*2)*10, nrow=500))  # change 500 to 50000
B <- as.data.frame(matrix(runif(250000*2)*10, nrow=250000))

myfun <- function(rowsofA, B) {
Dx <- outer(rowsofA[,1], B[,1], "-")**2  # ** is same as ^
Dy <- outer(rowsofA[,2], B[,2], "-")**2
Dist <- sqrt(Dx+Dy)  # Dist = sqrt((x1-x2)^2 + (y1-y2)^2)
Summ <- data.frame( mean = apply(Dist, 1, mean),
sd = apply(Dist, 1, sd),
min = apply(Dist, 1, min),
max = apply(Dist, 1, max))
return(Summ)
}

library(purrr)
map_df(split(A, 1:5), ~myfun(.x, B))
``````

With 500-row dataset, `split(..., 1:5)` will split the data frame into 5 100-row data frames. With a 50,000-row dataset, use something like `split(..., 1:100)` or `split(..., 1:1000)` depending on your memory.

Output with 500-row dataset. Each row of the output provides the `mean, sd, min, and max` distance for each-row-of-A vs all-rows-of-B.

``````        # mean       sd          min       max
# 1   4.332120 1.922412 0.0104518694  9.179429
# 2   6.841677 2.798114 0.0044511643 13.195127
# 3   5.708658 2.601969 0.0131417242 11.788345
# 4   4.670345 2.139370 0.0104878996  9.521932
# 5   6.249670 2.716091 0.0069813098 12.473525
# 6   5.497154 2.476391 0.0127143548 11.108188
# 7   3.928659 1.551248 0.0077266976  7.954166
# etc
``````