I have a dataframe `df1`

which contains 6 columns, two of which (`var1`

& `var3`

) I am using to `split`

`df1`

by, resulting in a list of dataframes `ls1`

.

For each sub dataframe in `ls1`

I want to `sample()`

`x$var2`

, `x$num`

times with `x$probs`

probabilities as follows:

Create data:

```
var1 <- rep(LETTERS[seq( from = 1, to = 3 )], each = 6)
var2 <- rep(LETTERS[seq( from = 1, to = 3 )], 6)
var3 <- rep(1:2,3, each = 3)
num <- rep(c(10, 11, 13, 8, 20, 5), each = 3)
probs <- round(runif(18), 2)
df1 <- as.data.frame(cbind(var1, var2, var3, num, probs))
ls1 <- split(df1, list(df1$var1, df1$var3))
```

have a look at the first couple list elements:

```
$A.1
var1 var2 var3 num probs
1 A A 1 10 0.06
2 A B 1 10 0.27
3 A C 1 10 0.23
$B.1
var1 var2 var3 num probs
7 B A 1 13 0.93
8 B B 1 13 0.36
9 B C 1 13 0.04
```

`lapply`

over `ls1`

:

```
ls1 <- lapply(ls1, function(x) {
res <- table(sample(x$var2, size = as.numeric(as.character(x$num)),
replace = TRUE, prob = as.numeric(as.character(x$probs))))
res <- as.data.frame(res)
cbind(x, res = res$Freq)
})
df2 <- do.call("rbind", ls1)
df2
```

Have a look at the first couple list elements of the result:

```
$A.1
var1 var2 var3 num probs res
1 A A 1 10 0.06 2
2 A B 1 10 0.27 4
3 A C 1 10 0.23 4
$B.1
var1 var2 var3 num probs res
7 B A 1 13 0.93 10
8 B B 1 13 0.36 3
9 B C 1 13 0.04 0
```

So for each dataframe a new variable `res`

is created, the sum of `res`

equals `num`

and the elements of `var2`

are represented in `res`

in proportions relating to `probs`

. This does what I want but it becomes very slow when there is a lot of data.

**My Question:** is there a way to replace the `lapply`

piece of code with something more efficient/faster?

I am just beginning to learn about vectorization and am guessing this could be vectorized? but I am unsure of how to achieve it.

`ls1`

is eventually returned to a dataframe structure so if it doesn't need to become a list to begin with all the better (although it doesn't really matter how the data is structured for this step).

Any help would be much appreciated.