Split and re-concatenate a string in R

I am trying to get the host of an IP address from a list of strings.

ips <- c('140.112.204.42', '132.212.14.139', '31.2.47.93', '7.112.221.238')

I want to get the first 2 digits from the ips. output:

ips <- c('140.112', '132.212', '31.2', '7.112')

This is the code that I wrote to convert them:

cat(unlist(strsplit(ips, "\\.", fixed = FALSE))[1:2], sep = ".")

When I check the type of individual ips in the end I get something like this:

140.112 NULL

Not sure what I am doing wrong. If you have some other ideas completely different from this that is completely fine too. Thank you for nay help in advance

5 answers

  • answered 2018-07-20 17:53 useR

    With sub:

    ips <- c('140.112.204.42', '132.212.14.139', '31.2.47.93', '7.112.221.238')
    
    sub('\\.\\d+\\.\\d+$', '', ips)
    # [1] "140.112" "132.212" "31.2"    "7.112"
    

    With str_extract from stringr:

    library(stringr)
    str_extract(ips, '^\\d+\\.\\d+')
    # [1] "140.112" "132.212" "31.2"    "7.112"
    

    With strsplit + sapply:

    sapply(strsplit(ips, '\\.'), function(x) paste(x[1:2], collapse = '.'))
    # [1] "140.112" "132.212" "31.2"    "7.112"
    

    With read.table + apply:

    apply(read.table(textConnection(ips), sep='.')[1:2], 1, paste, collapse = '.')
    #[1] "140.112" "132.212" "31.2"    "7.112"
    

    Notes:

    1. sub('\\.\\d+\\.\\d+$', '', ips):

      i. \\.\\d+\\.\\d+$ matches a literal dot, a digit one or more times, a literal dot again, and a digit one or more times at the end of the string

      ii. sub removes the above match from the string

    2. str_extract(ips, '^\\d+\\.\\d+'):

      i. ^\\d+\\.\\d+ matches a digit one or more times, a literal dot and a digit one or more times in the beginning of the string

      ii. str_extract extracts the above match from the string

    3. sapply(strsplit(ips, '\\.'), function(x) paste(x[1:2], collapse = '.')):

      i. strsplit(ips, '\\.') splits each ip using a literal dot as the delimiter. This returns a list of vectors after the split

      ii. With sapply, paste(x[1:2], collapse = '.') is applied to every element of the list, thus taking only the first two numbers from each vector, and collapsing them with a dot as the separator. sapply then coerces the list to a vector, thus returning a vector of the desired ips.

    4. apply(read.table(textConnection(ips), sep='.')[1:2], 1, paste, collapse = '.'):

      i. read.table(textConnection(ips), sep='.')[1:2] treats ips as text input and reads it in with dot as a delimiter. Only taking the first two columns.

      ii. apply enables paste to be operated on each row, and collapses with a dot.

  • answered 2018-07-20 17:59 RavinderSingh13

    Could you please try following.

    gsub("([0-9]+.[0-9]+)(.*)","\\1",ips)
    

    Explanation: Using gsub function and putting regex there to match digits then DOT then digits in memory's 1st place holder and keeping .* everything after it in 2nd place holder of memory. Then substituting these with \\1 with first regex's value which will be first 2 fields.

  • answered 2018-07-20 18:00 Noah

    One solution is the following:

    vapply(strsplit(ips, ".", fixed = TRUE), 
           function(x) paste(x[1:2], collapse = "."), 
           character(1L))
    
    • vapply applies function(x) to each element of the output of strsplit
    • strsplit produces a list where each element of the list is the components of the IP addresses separated by "."; setting fixed = TRUE requests to split using the exact value of the splitting string (i.e., "."), not using regex
    • function(x) takes the first two elements (x[1:2]) of each item coming out of strsplit and pastes them together, seperated by "."
    • character(1L) tells vapply that each element of the output (i.e., returned from function(x) should be a string of length 1.

    Edit: @useR posted this solution right before me (using sapply).

  • answered 2018-07-20 18:17 James

    substr is vectorised on the stop argument, so you can use this with a vector of positions before the second dot. regexpr gives the positions of the first match, so if you sub out the first one you can match on the second - which will be conveniently one before it's true position as needed (since you removed the first one).

    substr(ips,1,regexpr("\\.",sub("\\.","",ips)))
    [1] "140.112" "132.212" "31.2"    "7.112"
    

  • answered 2018-07-20 20:21 G. Grothendieck

    We can convert the ip addresses to numeric_version class and then format using this base R one-liner that employs no regular expressions:

    format(numeric_version(ips)[, 1:2])
    [1] "140.112" "132.212" "31.2"    "7.112"