# counting rows between two specific rows with a condition

``````df <- structure(list(inv = c("INV_1", "INV_1", "INV_1", "INV_1", "INV_1"), ass = c("x", "x", "x", "x", "x"), datetime = c("2010-01-01",
"2010-01-02", "2010-01-03", "2010-01-08", "2010-01-19"), portfolio = c(10,
0, 5, 2, 0)), operation = c(10, -10, 5, -3, -2), class = "data.frame", row.names = c(NA, -5L))
``````

So I have `4000` investors with `6000` different assets, for each investor I have his trading operations in two different variables: operation tells me if he is buying/selling; portfolio tells me how much he has in the portfolio.

What I want to do is computing the number of days a position stays open in the portfolio, so I though about computing the difference between the day in which the portfolio goes back to zero and the day in which the portfolio went positive (it is not possible to get negative portfolio).

so in the dataset above I would count `row2 - row1 ==> 2010-01-02 - 2010-01-01` and `row 5 - row 3 ==> 2010-01-19 - 2010-01-03` and so on...

I want to do this computation for all the investor & asset I have in my dataset for all the rows in which I find that `portfolio > 0`.

So my dataset will have a further column called `duration` which would be equal, in this case to `c(0,1,0,5,16)` (so of course i also had to compute `raw1 - raw1` and `raw3 - raw3`) Hence my problem is to restart the count everytime `portfolio` goes back to zero.

``````library(dplyr)

df %>%
mutate(datetime = as.Date(datetime, "%Y-%m-%d")) %>%
group_by(investor, asset) %>%
arrange(datetime) %>%
mutate(grp.pos = cumsum(lag(portfolio, default = 1) == 0)) %>%
group_by(investor, asset, grp.pos) %>%
mutate(`Open (#days)` = datetime - datetime[1])

#> # A tibble: 5 x 6
#> # Groups:   investor, asset, grp.pos [2]
#>   investor asset datetime   portfolio grp.pos `Open (#days)`
#>   <chr>    <chr> <date>         <dbl>   <int> <drtn>
#> 1 INV_1    x     2010-01-01        10       0  0 days
#> 2 INV_1    x     2010-01-02         0       0  1 days
#> 3 INV_1    x     2010-01-03         5       1  0 days
#> 4 INV_1    x     2010-01-08         2       1  5 days
#> 5 INV_1    x     2010-01-19         0       1 16 days
``````

### Data:

``````df <- structure(list(investor = c("INV_1", "INV_1", "INV_1", "INV_1", "INV_1"),
asset = c("x", "x", "x", "x", "x"),
datetime = c("2010-01-01", "2010-01-02", "2010-01-03",
"2010-01-08", "2010-01-19"),
portfolio = c(10, 0, 5, 2, 0)),
operation = c(10, -10, 5, -3, -2),
class = "data.frame", row.names = c(NA, -5L))
``````

Here is a way how we could do it, that is expandable if necessary for `ass`

First we group by `inv` to use for the original dataset. Then transform `datetime` to date format to do calculations easily (here we use `ymd()` function).

The next step could be done in different ways:

Main idea is to group the column `portfolio` indicated by the last row of the group that is 0. For this we arrange `datetime` in descending form to easily apply the grouping id with `cumsum == 0`.

After rearranging `datetime` we can calculate the last from the first as intended:

``````library(dplyr)
library(lubridate)

df %>%
group_by(inv) %>%
mutate(datetime = ymd(datetime)) %>%
arrange(desc(datetime)) %>%
group_by(position_Group = cumsum(portfolio==0)) %>%
arrange(datetime) %>%
mutate(position_open = last(datetime)-first(datetime)) %>%
ungroup()
``````
``````  inv   ass   datetime   portfolio operation id_Group position_open
<chr> <chr> <date>         <dbl>     <dbl>    <int> <drtn>
1 INV_1 x     2010-01-01        10        10        2  1 days
2 INV_1 x     2010-01-02         0       -10        2  1 days
3 INV_1 x     2010-01-03         5         5        1 16 days
4 INV_1 x     2010-01-08         2        -3        1 16 days
5 INV_1 x     2010-01-19         0        -2        1 16 days
``````