Trouble with casefold() due to Non-English letters
All I want to do is change the address column in df to upper case
df$address <- casefold(df$address, upper = TRUE)
but I keep getting the following error - probably because of the 'I' with an accent
Error in toupper(x) : invalid input 'POLÍGONO INDUSTRIAL OLASO' in 'utf8towcs'
I know this observation is already upper case, but not all of them are. I don't want to just substitute all of these instances for their English counterpart, mainly because an Eszett (ß) shows up later and I don't know what that would be replaced with.
Casefold works as expected with the i accent on my account.
> casefold('POLÍGONO INDUSTRIAL OLASO')  "polígono industrial olaso" > casefold('POLÍGONO INDUSTRIAL OLASO', upper = TRUE)  "POLÍGONO INDUSTRIAL OLASO"
For eszett it leaves as is.
> casefold('daß')  "daß" > casefold('daß', upper = T)  "DAß"
You may want to check out the package stringr which will translate eszett to SS.
> library(stringr) > str_to_lower('daß')  "daß" > str_to_upper('daß')  "DASS"
But it doesn't work the other way around.
> str_to_lower('DASS')  "dass"