split column data frame into multiple columns
I have a data frame with one column that looks like this
df = read.table(file="sprint.m.df.txt", sep="\t", quote="", header=TRUE)
X.Rank...Time...Wind...Name...Country...Birthdate...City...Date.
1 1 9.58 0.9 "Usain Bolt" "JAM" "21.08.86" "Berlin" "16.08.2009"
2 2 9.63 1.5 "Usain Bolt" "JAM" "21.08.86" "London" "05.08.2012"
3 3 9.69 0 "Usain Bolt" "JAM" "21.08.86" "Beijing" "16.08.2008"
4 3 9.69 2 "Tyson Gay" "USA" "09.08.82" "Shanghai" "20.09.2009"
5 3 9.69 -0.1 "Yohan Blake" "JAM" "26.12.89" "Lausanne" "23.08.2012"
6 6 9.71 0.9 "Tyson Gay" "USA" "09.08.82" "Berlin" "16.08.2009"
i have been trying to split the column into multiple columns using string splits and other methods but nothing is working.
How can i split the data frame so i can end up with a data frame as
X.rank | Time | wind | name | country | birthdate| city | date
1 9.58 0.9 Usian Bolt jam 21.08.86 Berlin 16.08.2009
2 answers
-
answered 2021-02-22 22:57
TarJae
You can create a tribble with the tibble package
library(tibble) df <- tribble( ~X, ~RankTime, ~Wind, ~Name, ~Country, ~Birthdate, ~City, ~Date, 1, 9.58, 0.9, "Usain Bolt", "JAM", "21.08.86", "Berlin", "16.08.2009", 2, 9.63, 1.5, "Usain Bolt", "JAM", "21.08.86", "London", "05.08.2012", 3, 9.69, 0, "Usain Bolt", "JAM", "21.08.86", "Beijing", "16.08.2008", 3, 9.69, 2, "Tyson Gay", "USA", "09.08.82", "Shanghai", "20.09.2009", 3, 9.69, -0.1, "Yohan Blake", "JAM", "26.12.89", "Lausanne", "23.08.2012", 6, 9.71, 0.9, "Tyson Gay", "USA", "09.08.82", "Berlin", "16.08.2009") df # output # A tibble: 6 x 8 X RankTime Wind Name Country Birthdate City Date <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> 1 1 9.58 0.9 Usain Bolt JAM 21.08.86 Berlin 16.08.2009 2 2 9.63 1.5 Usain Bolt JAM 21.08.86 London 05.08.2012 3 3 9.69 0 Usain Bolt JAM 21.08.86 Beijing 16.08.2008 4 3 9.69 2 Tyson Gay USA 09.08.82 Shanghai 20.09.2009 5 3 9.69 -0.1 Yohan Blake JAM 26.12.89 Lausanne 23.08.2012 6 6 9.71 0.9 Tyson Gay USA 09.08.82 Berlin 16.08.2009
-
answered 2021-02-23 02:55
Chris S.
The spaces within quotes makes the column difficult to parse but it's easy to read. See my comments above and use
read.table(file="sprint.m.df.txt", sep=" ")
or if you really have to work with yourdf
, then tryread_delim
orscan
.df8 <- readr::read_delim(df[,1], delim=" ", col_names =FALSE) # OR df8 <- data.frame(matrix(scan(text=df[,1], what=" "), ncol=8, byrow=TRUE)) colnames(df8) <- c("rank", "Time", "wind", "name", "country", "birthdate", "city", "date") df8 rank Time wind name country birthdate city date 1 1 9.58 0.9 Usain Bolt JAM 21.08.86 Berlin 16.08.2009 2 2 9.63 1.5 Usain Bolt JAM 21.08.86 London 05.08.2012 3 3 9.69 0 Usain Bolt JAM 21.08.86 Beijing 16.08.2008 4 3 9.69 2 Tyson Gay USA 09.08.82 Shanghai 20.09.2009 5 3 9.69 -0.1 Yohan Blake JAM 26.12.89 Lausanne 23.08.2012 6 6 9.71 0.9 Tyson Gay USA 09.08.82 Berlin 16.08.2009