Is there a equivalent for the tidyr fill() for strings in R?

So I have a data frame like this one:

First Group  Bob
             Joe
             John
             Jesse
Second Group Jane
             Mary
             Emily
             Sarah
             Grace

I would like to fill in the empty cells in the first column in the data frame with the last string in that column i.e

First Group  Bob
First Group  Joe
First Group  John
First Group  Jesse
Second Group Jane
Second Group Mary
Second Group Emily
Second Group Sarah
Second Group Grace

With tidyr, there is fill() but it obviously doesn't work with strings. Is there an equivalent for strings? If not is there a way to accomplish this?

2 answers

  • answered 2018-10-11 22:34 42-

    (I made the assumption that this was output from an R console session. If it's a raw text file the data input may need to be done with read.fwf.)

    The display suggests those are empty character values in the "spaces">

    First set them to NA and then use na.locf from zoo:

     dat[dat==""] <- NA
     dat[1:2] <- lapply(dat[1:2], zoo::na.locf)
     dat
    #------------
          V1    V2    V3
    1  First Group   Bob
    2  First Group   Joe
    3  First Group  John
    4  First Group Jesse
    5 Second Group  Jane
    6 Second Group  Mary
    7 Second Group Emily
    8 Second Group  Sara
    9 Second Group Grace
    

    To start with what I was using:

    dat <-
    structure(list(V1 = structure(c(2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 
    1L), .Label = c("", "First", "Second"), class = "factor"), V2 = structure(c(2L, 
    1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L), .Label = c("", "Group"), class = "factor"), 
        V3 = structure(c(1L, 6L, 7L, 5L, 4L, 8L, 2L, 9L, 3L), .Label = c("Bob", 
        "Emily", "Grace", "Jane", "Jesse", "Joe", "John", "Mary", 
        "Sara"), class = "factor")), class = "data.frame", row.names = c(NA, 
    -9L))
    

  • answered 2018-10-12 17:33 krads

    If I have to take a stab at what your data structure is, I might have something like this:

    df <- data.frame(c1=c("First Group", "", "", "", "Second Group", "", "", "", ""),
                     c2=c("Bob","Joe","Jon","Jesse","Jane","Mary","Emily","Sara","Grace"),
                     stringsAsFactors = FALSE)
    

    Then, a very basic way to do this would be by simply looping:

    for(i in 2:nrow(df)) if(df$c1[i]=="") df$c1[i] <- df$c1[i-1]  
    
    df
    
                c1    c2
    1  First Group   Bob
    2  First Group   Joe
    3  First Group   Jon
    4  First Group Jesse
    5 Second Group  Jane
    6 Second Group  Mary
    7 Second Group Emily
    8 Second Group  Sara
    9 Second Group Grace
    

    However, I would suggest you accept @42-'s solution if you have anything other than a small data set as zoo::na.locf is optimized to work with large numbers of records and is a very respected, widely used stable package.