split column data frame into multiple columns

I have a data frame with one column that looks like this

df = read.table(file="sprint.m.df.txt", sep="\t", quote="", header=TRUE)

    X.Rank...Time...Wind...Name...Country...Birthdate...City...Date.
1     1 9.58 0.9 "Usain Bolt" "JAM" "21.08.86" "Berlin" "16.08.2009"
2     2 9.63 1.5 "Usain Bolt" "JAM" "21.08.86" "London" "05.08.2012"
3      3 9.69 0 "Usain Bolt" "JAM" "21.08.86" "Beijing" "16.08.2008"
4      3 9.69 2 "Tyson Gay" "USA" "09.08.82" "Shanghai" "20.09.2009"
5 3 9.69 -0.1 "Yohan Blake" "JAM" "26.12.89" "Lausanne" "23.08.2012"
6      6 9.71 0.9 "Tyson Gay" "USA" "09.08.82" "Berlin" "16.08.2009"

i have been trying to split the column into multiple columns using string splits and other methods but nothing is working.

How can i split the data frame so i can end up with a data frame as

 X.rank | Time | wind | name       | country | birthdate| city    | date
    1      9.58  0.9    Usian Bolt    jam       21.08.86  Berlin    16.08.2009

2 answers

  • answered 2021-02-22 22:57 TarJae

    You can create a tribble with the tibble package

    library(tibble)
    
    df <- tribble(
    ~X, ~RankTime, ~Wind, ~Name, ~Country, ~Birthdate, ~City, ~Date, 
    1, 9.58, 0.9, "Usain Bolt", "JAM", "21.08.86", "Berlin", "16.08.2009",
    2, 9.63, 1.5, "Usain Bolt", "JAM", "21.08.86", "London", "05.08.2012",
    3, 9.69, 0, "Usain Bolt", "JAM", "21.08.86", "Beijing", "16.08.2008",
    3, 9.69, 2, "Tyson Gay", "USA", "09.08.82", "Shanghai", "20.09.2009",
    3, 9.69, -0.1, "Yohan Blake", "JAM", "26.12.89", "Lausanne", "23.08.2012",
    6, 9.71, 0.9, "Tyson Gay", "USA", "09.08.82", "Berlin", "16.08.2009")
    
    df
    
    # output
    # A tibble: 6 x 8
          X RankTime  Wind Name        Country Birthdate City     Date      
      <dbl>    <dbl> <dbl> <chr>       <chr>   <chr>     <chr>    <chr>     
    1     1     9.58   0.9 Usain Bolt  JAM     21.08.86  Berlin   16.08.2009
    2     2     9.63   1.5 Usain Bolt  JAM     21.08.86  London   05.08.2012
    3     3     9.69   0   Usain Bolt  JAM     21.08.86  Beijing  16.08.2008
    4     3     9.69   2   Tyson Gay   USA     09.08.82  Shanghai 20.09.2009
    5     3     9.69  -0.1 Yohan Blake JAM     26.12.89  Lausanne 23.08.2012
    6     6     9.71   0.9 Tyson Gay   USA     09.08.82  Berlin   16.08.2009
    

  • answered 2021-02-23 02:55 Chris S.

    The spaces within quotes makes the column difficult to parse but it's easy to read. See my comments above and use read.table(file="sprint.m.df.txt", sep=" ") or if you really have to work with your df, then try read_delim or scan.

    df8 <- readr::read_delim(df[,1], delim=" ", col_names =FALSE)
    # OR
    df8 <- data.frame(matrix(scan(text=df[,1], what=" "), ncol=8, byrow=TRUE))
    colnames(df8) <- c("rank", "Time", "wind", "name", "country", "birthdate", "city", "date")
    df8
      rank Time wind        name country birthdate     city       date
    1    1 9.58  0.9  Usain Bolt     JAM  21.08.86   Berlin 16.08.2009
    2    2 9.63  1.5  Usain Bolt     JAM  21.08.86   London 05.08.2012
    3    3 9.69    0  Usain Bolt     JAM  21.08.86  Beijing 16.08.2008
    4    3 9.69    2   Tyson Gay     USA  09.08.82 Shanghai 20.09.2009
    5    3 9.69 -0.1 Yohan Blake     JAM  26.12.89 Lausanne 23.08.2012
    6    6 9.71  0.9   Tyson Gay     USA  09.08.82   Berlin 16.08.2009