adjust arff file format in C++ for WEKA
I wnat to make preprocessing for Weka arff file which contains 2000 lines for nlp project (sentiment analysis)
I want a code that just add a single quotation at the start and end of each sentence. for example this is a sample for my dataset:
The Da Vinci Code is one of the most beautiful movies ive ever seen.,1 The Da Vinci Code is an * amazing * book, do not get me wrong.,1 then I turn on the light and the radio and enjoy my Da Vinci Code.,1 The Da Vinci Code was REALLY good.,1 i love da vinci code....,1
I want it to be like this:
'The Da Vinci Code is one of the most beautiful movies ive ever seen.',1 'The Da Vinci Code is an * amazing * book, do not get me wrong.',1 'then I turn on the light and the radio and enjoy my Da Vinci Code.',1 'The Da Vinci Code was REALLY good.',1 'i love da vinci code....',1
Just want to add a single quotation at the beginning and end of each sentence (before the 1 ).
I would really appreciate it if you help me do it
Is there any tool that I can use instead of writing a code?
You could use regular expressions with a large amount of tools to achieve this.
- Group 1: Match all characters except for a literal dot, at least 1 character.
- Group 2: Match only literal dots, at least 1 character.
- Group 3: Match a literal comma, followed by a literal 1, followed by at least 1 whitespace character.
- Regex flag g (global): multiple matches
Enclose group 1 and 2 with quotes, followed by group 3.
Use that to put your favorite tool to work. Like
sed -i -E "s/([^\.]+)(\.+)(,1\s+)/'\1\2'\3/gm" yourfile.txt
Other tools might use a different syntax. Provided expression can probably be optimized further.