Remove row within groups if coordinates of subgroup are within another subgroup in r

I have a dataframe such as

Groups NAMES start end 
G1     A    1     50
G1     A    25    45
G1     B    20    51
G1     A    51    49
G2     A    200   400
G2     B    1     1600
G2     A    2000  3000
G2     B    4000  5000

and the idea is within each Groups to look at NAMES where start & end coordinates of A are within coordinates of B

for instance here in the example :

Groups NAMES start end 
G1     A    1     50    <- A is outside any B coordinate 
G1     A    25    45    <- A is **inside** the B coord `20-51`,then I remove these B row. 
G1     B    20    51  
G1     A    51    49    <- A is outside any B coordinate 
G2     A    200   400   <- A is **inside** the B coordinate 1-1600, then I romove this B row. 
G2     B    1     1600
G2     A    2000  3000  <- A is outside any B coordinate 
G2     B    4000  5000  <- this one does not have any A inside it, then it will be kept in the output.

Then I should get as output :

Groups NAMES start end 
G1     A    1     50
G1     A    25    45
G1     A    51    49
G2     A    200   400
G2     A    2000  3000
G2     B    4000  5000

Does someone have an idea please ?

Here is the dataframe in dput format if it can help you ? :

   structure(list(Groups = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L), .Label = c("G1", "G2"), class = "factor"), NAMES = structure(c(1L, 
1L, 2L, 1L, 1L, 2L, 1L, 2L), .Label = c("A", "B"), class = "factor"), 
    start = c(1L, 25L, 20L, 51L, 200L, 1L, 2000L, 4000L), end = c(50L, 
    45L, 51L, 49L, 400L, 1600L, 3000L, 5000L)), class = "data.frame", row.names = c(NA, 
-8L))

2 answers

  • answered 2021-07-27 15:37 Calum You

    Here's a possible approach. We'll split the df by NAMES and join the two parts to each other by Groups to do within-group comparisons. Only B rows can get dropped, so those are the only ones whose row numbers we want to keep track of.

    We can then just group by rowid to tag the B rows by whether or not they have any A inside them. Finally, filter to the B to keep and concatenate back to the A rows.

    library(tidyverse)
    df <- structure(list(Groups = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("G1", "G2"), class = "factor"), NAMES = structure(c(1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L), .Label = c("A", "B"), class = "factor"), start = c(1L, 25L, 20L, 51L, 200L, 1L, 2000L, 4000L), end = c(50L, 45L, 51L, 49L, 400L, 1600L, 3000L, 5000L)), class = "data.frame", row.names = c(NA, -8L))
    
    A <- filter(df, NAMES == "A")
    B <- df %>%
      filter(NAMES == "B") %>%
      rowid_to_column()
    
    comparison <- inner_join(A, B, by = "Groups") %>%
      mutate(A_in_B = start.x >= start.y & end.x <= end.y) %>%
      group_by(rowid) %>%
      summarise(keep_B = !any(A_in_B))
      
    B %>%
      inner_join(comparison, by = "rowid") %>%
      filter(keep_B) %>%
      select(-rowid, -keep_B) %>%
      bind_rows(A) %>%
      arrange(Groups, NAMES)
    #>   Groups NAMES start  end
    #> 1     G1     A     1   50
    #> 2     G1     A    25   45
    #> 3     G1     A    51   49
    #> 4     G2     A   200  400
    #> 5     G2     A  2000 3000
    #> 6     G2     B  4000 5000
    

    Created on 2021-07-27 by the reprex package (v1.0.0)

  • answered 2021-07-27 16:04 AnilGoyal

    This will also do using purrr::map_dfr

    library(tidyverse)
    df %>%
      group_split(Groups) %>%
      map_dfr(~ .x %>% mutate(r = row_number()) %>%
            full_join(.x %>% 
                        filter(NAMES == 'B'), 
                      by = 'Groups') %>%
            group_by(r) %>%
            filter(any(NAMES.x == 'B' | start.x > start.y & end.x < end.y)) %>%
            ungroup %>%
            select(Groups, ends_with('.x')) %>%
            distinct %>%
            rename_with(~ gsub('\\.x', '', .), everything())
            )
    
    #> # A tibble: 6 x 4
    #>   Groups NAMES start   end
    #>   <fct>  <fct> <int> <int>
    #> 1 G1     A        25    45
    #> 2 G1     B        20    51
    #> 3 G1     A        51    49
    #> 4 G2     A       200   400
    #> 5 G2     B         1  1600
    #> 6 G2     B      4000  5000
    

    Created on 2021-07-27 by the reprex package (v2.0.0)

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum