Subset dataframe with equal difference for one column in R

I am trying to iterate the rows in a dataframe (data) to check if one of the columns (data$ID) has similar difference (e.g., 3) between consecutive elements. If yes, keep the row, otherwise remove the row. The tricky part is I need to re-compare consecutive elements after certain row is removed.

data <- data.frame(ID=c(3.1, 6, 6.9, 9, 10.5, 12, 14.2, 15),
                   score = c(70, 80, 90, 65, 43, 78, 44, 92))
data
    ID    score
1   3.1     70
2   6     80
3   6.9     90
4   9     65
5   10.5    43
6   12    78
7   14.2    44
8   15    92

for (i in (length(data$ID)-1)) {
    first <- data$ID[i]
    second <- data$ID[i+1]
    if ((second-first) == 3){
       data <- data[-(i+1),]
    }    
 }

The expected output data should be

    ID    score
1   3.1     70
2   6     80
3   9     65
4   12    78
5   15    92

The initial row 3, 5, 7 are excluded due to the different diff. But my code failed.

I also try to use diff function,

DF <- diff(data)

But it doesn't take care the fact that after one row is removed, the difference will change. Should I use diff function in a loop, but the dataframe is dynamic changed.

3 answers

  • answered 2018-03-13 22:27 Andrew Lavers

    Using a recursive function (a function that calls itself)

    data <- data.frame(ID=c(3.1, 6, 6.9, 9, 10.5, 12, 14.2, 15),
                       score = c(70, 80, 90, 65, 43, 78, 44, 92))
    
    # use recursive function to trim the remainder of the list
    trim_ids <- function (ids) {
      # if only one element, return it
      if (length(ids) <= 1) {
        return(ids) 
      }
       # if the gap between element 2 and element 1 is small enough 
      if ((ids[2] - ids[1]) < 2.9 ) {
        # trim after dropping the second element
        return(trim_ids(ids[-2])) 
      } else {
        # keep the first element and trim from the second element
        return(c(ids[1], trim_ids(ids[2:length(ids)] )))
      }
    }
    
    # find the ids to keep
    keep_ids <- trim_ids(data$ID)
    
    # select the matching rows
    data[data$ID %in% keep_ids,]
    
    #      ID score
    # 1  3.1    70
    # 2  6.0    80
    # 4  9.0    65
    # 6 12.0    78
    # 8 15.0    92
    

  • answered 2018-03-13 22:33 Lennyy

    If you define you want to keep all rows of which the ID, when rounded to 0 digits, belongs to a product of 3, you could try:

     df1 <- data.frame(ID=c(3.1, 6, 6.9, 9, 10.5, 12, 14.2, 15),
                   score = c(70, 80, 90, 65, 43, 78, 44, 92))
    
    
    df1[round(df1$ID) %% 3 == 0,]
    
    ID score
    1  3.1    70
    2  6.0    80
    4  9.0    65
    6 12.0    78
    8 15.0    92
    

  • answered 2018-03-13 23:03 MKR

    An option could be achieved using cumsum and diff as:

    #data
    data <- data.frame(ID=c(3.1, 6, 6.9, 9, 10.5, 12, 14.2, 15),
                       score = c(70, 80, 90, 65, 43, 78, 44, 92))
    
    
    data[c(0, cumsum(diff(round(data$ID))) %% 3 ) == 0,]
    
    # ID score
    # 1  3.1    70
    # 2  6.0    80
    # 4  9.0    65
    # 6 12.0    78
    # 8 15.0    92