counting rows between two specific rows with a condition

df <- structure(list(inv = c("INV_1", "INV_1", "INV_1", "INV_1", "INV_1"), ass = c("x", "x", "x", "x", "x"), datetime = c("2010-01-01", 
"2010-01-02", "2010-01-03", "2010-01-08", "2010-01-19"), portfolio = c(10, 
0, 5, 2, 0)), operation = c(10, -10, 5, -3, -2), class = "data.frame", row.names = c(NA, -5L))

So I have 4000 investors with 6000 different assets, for each investor I have his trading operations in two different variables: operation tells me if he is buying/selling; portfolio tells me how much he has in the portfolio.

What I want to do is computing the number of days a position stays open in the portfolio, so I though about computing the difference between the day in which the portfolio goes back to zero and the day in which the portfolio went positive (it is not possible to get negative portfolio).

so in the dataset above I would count row2 - row1 ==> 2010-01-02 - 2010-01-01 and row 5 - row 3 ==> 2010-01-19 - 2010-01-03 and so on...

I want to do this computation for all the investor & asset I have in my dataset for all the rows in which I find that portfolio > 0.

So my dataset will have a further column called duration which would be equal, in this case to c(0,1,0,5,16) (so of course i also had to compute raw1 - raw1 and raw3 - raw3) Hence my problem is to restart the count everytime portfolio goes back to zero.

2 answers

  • answered 2022-05-05 20:28 M--

    library(dplyr)
    
    df %>% 
      mutate(datetime = as.Date(datetime, "%Y-%m-%d")) %>% 
      group_by(investor, asset) %>% 
      arrange(datetime) %>% 
      mutate(grp.pos = cumsum(lag(portfolio, default = 1) == 0)) %>%
      group_by(investor, asset, grp.pos) %>% 
      mutate(`Open (#days)` = datetime - datetime[1])
    
    #> # A tibble: 5 x 6
    #> # Groups:   investor, asset, grp.pos [2]
    #>   investor asset datetime   portfolio grp.pos `Open (#days)`
    #>   <chr>    <chr> <date>         <dbl>   <int> <drtn>        
    #> 1 INV_1    x     2010-01-01        10       0  0 days       
    #> 2 INV_1    x     2010-01-02         0       0  1 days       
    #> 3 INV_1    x     2010-01-03         5       1  0 days       
    #> 4 INV_1    x     2010-01-08         2       1  5 days       
    #> 5 INV_1    x     2010-01-19         0       1 16 days
    

    Data:

    df <- structure(list(investor = c("INV_1", "INV_1", "INV_1", "INV_1", "INV_1"), 
                         asset = c("x", "x", "x", "x", "x"), 
                         datetime = c("2010-01-01", "2010-01-02", "2010-01-03", 
                                      "2010-01-08", "2010-01-19"), 
                         portfolio = c(10, 0, 5, 2, 0)), 
                         operation = c(10, -10, 5, -3, -2), 
                    class = "data.frame", row.names = c(NA, -5L))
    

  • answered 2022-05-05 20:53 TarJae

    Here is a way how we could do it, that is expandable if necessary for ass

    First we group by inv to use for the original dataset. Then transform datetime to date format to do calculations easily (here we use ymd() function).

    The next step could be done in different ways:

    Main idea is to group the column portfolio indicated by the last row of the group that is 0. For this we arrange datetime in descending form to easily apply the grouping id with cumsum == 0.

    After rearranging datetime we can calculate the last from the first as intended:

    library(dplyr)
    library(lubridate)
    
    df %>% 
      group_by(inv) %>% 
      mutate(datetime = ymd(datetime)) %>% 
      arrange(desc(datetime)) %>% 
      group_by(position_Group = cumsum(portfolio==0)) %>% 
      arrange(datetime) %>% 
      mutate(position_open = last(datetime)-first(datetime)) %>% 
      ungroup()
    
      inv   ass   datetime   portfolio operation id_Group position_open
      <chr> <chr> <date>         <dbl>     <dbl>    <int> <drtn>       
    1 INV_1 x     2010-01-01        10        10        2  1 days      
    2 INV_1 x     2010-01-02         0       -10        2  1 days      
    3 INV_1 x     2010-01-03         5         5        1 16 days      
    4 INV_1 x     2010-01-08         2        -3        1 16 days      
    5 INV_1 x     2010-01-19         0        -2        1 16 days    
    

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum