Value changes while changing a column from factor to integer in r

I have a df which has a column which is a factor when I read if from csv.

   Month_considered   pct ATC_Count 
   <fct>            <dbl> <fct>     
 1 Apr-17            54.9 198,337   
 2 May-17            56.4 227,681   
 3 Jun-17            58.0 251,664   
 4 Jul-17            57.7 251,934   
 5 Aug-17            55.5 259,617   
 6 Sep-17            55.7 245,588   
 7 Oct-17            56.6 247,051   
 8 Nov-17            57.6 256,375   
 9 Dec-17            56.9 277,784   
10 Jan-18            56.7 272,818   
11 2/1/18            59.1 266,277.00
> sapply(ab, class)
Month_considered              pct        ATC_Count 
        "factor"        "numeric"         "factor"

When I try to convert ATC_Count to integer I get the following output where ATC_Count shows different value. What might be wrong here.

ab$ATC_Count <- as.integer(ab$ATC_Count)

   Month_considered   pct ATC_Count
   <fct>            <dbl>     <int>
 1 Apr-17            54.9     36571
 2 May-17            56.4     37325
 3 Jun-17            58.0     37780
 4 Jul-17            57.7     37781
 5 Aug-17            55.5     37885
 6 Sep-17            55.7     37682
 7 Oct-17            56.6     37714
 8 Nov-17            57.6     37855
 9 Dec-17            56.9     38099
10 Jan-18            56.7     38060
11 2/1/18            59.1     37990

1 answer

  • answered 2018-07-11 02:57 akrun

    There is a , in the 'ATC_Count' which can be removed with sub

    as.integer(sub(",", "", ab$ATC_Count))
    

    Or using tidyverse

    library(tidyverse)
    ab %>% 
        mutate(ATC_Count = as.integer(str_remove(ATC_Count, ",")))
    

    Or with parse_number from readr

    ab %>%
        mutate(ATC_Count = parse_number(ATC_Count))
    

    Regarding the different numbers while conversion of factor to integer, it is the integer storage values that we get. The usual way to convert is

    as.integer(as.character(ab$ATC_Count))
    

    which would not work here because there is , within the column values