Column with string into columns
I have column with strings. Each variable is separated with coma. Rows have different numbers of variables. Somtimes variable x is first, sometimes third, sometimes it's missing. Each data looks like this "year:2005" How can I extract data from this column?
See also questions close to this topic
Restructuring data from long to wide by removing characters
Here is a sample of my data
code group type outcome 11 A red M*P 11 N orange N*P 11 Z red R 12 AB A blue Z*P 12 AN B green Q*P 12 AA A gray AB
I want to get the following table
code group1 group2 group3 type1 type2 type3 outcome 11 A N Z red orange red MNR 12 AB A AN B AA A blue green gray ZQAB
I have used the following codes, but it does not work.I want to remove Ps in outcome. Thanks for your help.
dcast(df, formula= code +group ~ type, value.var = 'outcome')
- How to replace a character value with number in a R data frame?
Run an ANOVA on variables separated by comma in on column using R?
I am trying to run a 3-way ANOVA in R, but my values for each variable are in one column and not separated by rows. Currently, my data frame looks something like this:
Season Site Location Replicate Lengths Jan_16 MI Adj 1.00 , Jan_16 MI Adj 2.00 , Jan_16 MI Adj 3.00 , Jan_16 MI Away 1.00 3,4, Jan_16 MI Away 2.00 , Jan_16 MI Away 3.00 , Jan_16 MP Adj 1.00 4,5,6,5,4,5,4,4,4,4,5,4,6,4, Jan_16 MP Adj 2.00 4,4,3,3,5,4,3,4,5,3,4,3,4,3,4,6, Jan_16 MP Adj 3.00 4,6,5,5,4, Jan_16 MP Away 1.00 ,4,4,10,4,5,4,6,5,5, Jan_16 MP Away 2.00 3,4,4,4,5,5,4,5, Jan_16 MP Away 3.00 4,4,13,4,
Lengths is the response variable that I wish to run the ANOVA on, how would I do this? Just a , means there is no data.
How can I generate tags from product descriptions?
Forgive me if this question very broad, but an explanation on how to overcome the task will be very helpful.
Consider a description like -
``` The sexy lift of a push-up meets theÂ coverage you want in a supersoft bra youâ€™ll love to wear. With lighter Memory Fit for extra support as it conforms to your curves and a smoothing U-shaped back.
Lift & Lining|Uplift with cushioned padding for shape|Full coverage underwire cups|Adjustable straps can convert to crossback and snap into place for a secure hold|Front closure or back closure|Back-close chocies feature a double row of hook and eye closures; Sizes 36DD & 38D-38DD have triple row of closures for a secure, comfortable fit|Back-close choices feature 4 settings to ensure a perfect fit|Front-close choices feature a racerback|Back-close choices feature a U-shaped ballet back that prevents band from riding up and offers more coverage|Supersoft, double-lined sides for the smoothest shape|Hand wash|Imported nylon/spandex ```
How can I extract attributes like bra type (e.g. push-up, strapless) and pattern ?
The text can be very random, and if I do not have an exhaustive list of attributes , then how do I extract them ?
Why can't R read the text file
Try to get R read my text file and do a text mining, but following the steps it's not working, don't know what's wrong. Someone plz help me
library(tm) setwd("E://") path="E:/KEYWORDS" text<-readLines("KEYWORDS.txt") corpus<- Corpus(VectorSource(text)) corpus<- tm_map(corpus,tolower) corpus<- tm_map(corpus,removePunctuation) corpus<-tm_map(corpus,stripWhitespace) corpus<-Corpus(VectorSource(corpus)) tdm =TermDocumentMatrix(corpus,PlainTextDocument) findFreTerms(tdm,lowfreq=2)
And it shows:
Warning message: In tm_map.SimpleCorpus(corpus, removePunctuation) : transformation drops documents tdm =TermDocumentMatrix(corpus,PlainTextDocument) Error: is.list(control) is not TRUE
And if you do this
str(readLines("KEYWORDS.txt")) paste(str(readLines("KEYWORDS.txt")),collapse=" ") text<-paste(str(readLines("KEYWORDS.txt")),collapse=" ") gsub(pattern="//W", replace=" ", text) text<-gsub(pattern="//W",replace=" ",text) gsub(pattern="//d", replace=" ", text) text<-gsub(pattern="//d", replace=" ", text1) tolower(text) text<-tolower(text) text
It shows the text is null or contains 0 characters why?
Identify Designantion of employee from a senetenses
I have sentences like Headline of employee like,
1. Project manager at Infoys. 2. Working as a Java Developer at Infosys.
I want output be like
1.| Project Manager | Infoysys 2.| Java Developer | Infosys
What I tried -
I tried POS tagging using NLTK but all they are faliing under NN. Also I tried Python Spacy in order to do Named Entity recognition.
We cant use Regular expression becuase the format will not be same everytime.