Replace second space for \n if applies in R

I have a vector of text, lets say:

vector <- c("20 DE NOVIEMBRE",  "CENTRO", "EL ARENAL 4A SECCION",     "IGNACIO ZARAGOZA", "JARDIN BALBUENA", "MOCTEZUMA 2A SECCION",    "MORELOS", "PEON DE LOS BAOS")

I want to substitute second space, if exists, with the special character "\n".

I've tried this:

  vector <- gsub(".* .*( ).*", "\\\n", vector)

But didn't work.

This is the expected result:

c("20 DE\nNOVIEMBRE",  "CENTRO", "EL ARENAL\n4A SECCION",     "IGNACIO ZARAGOZA", "JARDIN BALBUENA", "MOCTEZUMA 2A\nSECCION",    "MORELOS", "PEON DE\nLOS BAOS")

How can I get it?

2 answers

  • answered 2020-07-29 17:30 Tim Biegeleisen

    One approach, using sub with capture groups:

    vector <- sub("^(\\S+) (\\S+) ", "\\1 \\2\n", vector)
    vector
    
    [1] "20 DE\nNOVIEMBRE"      "CENTRO"                "EL ARENAL\n4A SECCION"
    [4] "IGNACIO ZARAGOZA"      "JARDIN BALBUENA"       "MOCTEZUMA 2A\nSECCION"
    [7] "MORELOS"               "PEON DE\nLOS BAOS"    
    

    Data:

    vector <- c("20 DE NOVIEMBRE",  "CENTRO", "EL ARENAL 4A SECCION",
                "IGNACIO ZARAGOZA", "JARDIN BALBUENA", "MOCTEZUMA 2A SECCION",
                "MORELOS", "PEON DE LOS BAOS")
    

    The regex logic here simply says to capture the first and second words, given by \S+, consuming the first and second space as well. Note that this would only match should the input in fact have a second space. Then, we replace with the same, but substituting a \n line feed in place of the second space.

  • answered 2020-07-29 19:27 Wiktor Stribiżew

    You may use

    vector <- c("20 DE NOVIEMBRE",  "CENTRO", "EL ARENAL 4A SECCION",     "IGNACIO ZARAGOZA", "JARDIN BALBUENA", "MOCTEZUMA 2A SECCION",    "MORELOS", "PEON DE LOS BAOS")
    sub("^\\S+\\s+\\S+\\K\\s+", "\n", vector, perl=TRUE)
    

    Output of the R demo:

    [1] "20 DE\nNOVIEMBRE"      "CENTRO"                "EL ARENAL\n4A SECCION"
    [4] "IGNACIO ZARAGOZA"      "JARDIN BALBUENA"       "MOCTEZUMA 2A\nSECCION"
    [7] "MORELOS"               "PEON DE\nLOS BAOS"    
    

    The regex is ^\S+\s+\S+\K\s+ (see demo), it matches

    • ^ - start of string
    • \S+ - 1+ non-whitespaces
    • \s+ - 1+ whitespaces
    • \S+ - 1+ non-whitespaces
    • \K - match reset operator discarding all text matched so far
    • \s+ - 1+ whitespace chars.