calculate the correlation for factor type of data in R
i want to calculate the correlation matrix (data class is factor). How should I do that? the error which I saw is : Error in cor(mydata_students, method = "pearson") : 'x' must be numeric
do you know?
how many words do you know
See also questions close to this topic
-
pivot_wider does not keep all the variables
I would like to keep the variable
cat
(category) in the output of my function. However, I am not able to keep it. The idea is to apply a similar function tom <- 1 - (1 - se * p2)^df$n
based on the category. But in order to perform that step, I need to keep the variable category.Here's the code:
#script3 suppressPackageStartupMessages({ library(mc2d) library(tidyverse) }) sim_one <- function() { df<-data.frame(id=c(1:30),cat=c(rep("a",12),rep("b",18)),month=c(1:6,1,6,4,1,5,2,3,2,5,4,6,3:6,4:6,1:5,5),n=rpois(30,5)) nr <- nrow(df) df$n[df$n == "0"] <- 3 se <- rbeta(nr, 96, 6) epi.a <- rpert(nr, min = 1.5, mode = 2, max = 3) p <- 0.2 p2 <- epi.a*p m <- 1 - (1 - se * p2)^df$n results <- data.frame(month = df$month, m, df$cat) results %>% arrange(month) %>% group_by(month) %>% mutate(n = row_number(), .groups = "drop") %>% pivot_wider( id_cols = n, names_from = month, names_glue = "m_{.name}", values_from =m ) } set.seed(99) iters <- 1000 sim_list <- replicate(iters, sim_one(), simplify = FALSE) sim_list[[1]] #> # A tibble: 7 x 7 #> n m_1 m_2 m_3 m_4 m_5 m_6 #> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 0.970 0.623 0.905 0.998 0.929 0.980 #> 2 2 0.912 0.892 0.736 0.830 0.890 0.862 #> 3 3 0.795 0.932 0.553 0.958 0.931 0.798 #> 4 4 0.950 0.892 0.732 0.649 0.777 0.743 #> 5 5 NA NA NA 0.657 0.980 0.945 #> 6 6 NA NA NA 0.976 0.836 NA #> 7 7 NA NA NA NA 0.740 NA
Created on 2022-05-07 by the reprex package (v2.0.1)
-
calculate weighted average over several columns with NA
I have a data frame like this one:
ID duration1 duration2 total_duration quantity1 quantity2 1 5 2 7 3 1 2 NA 4 4 3 4 3 5 NA 5 2 NA
I would like to do a weighted mean for each subject like this:
df$weighted_mean<- ((df$duration1*df$quantity1) + (df$duration2*df$quantity2) ) / (df$total_duration)
But as I have NA, this command does not work and it is not very nice....
The result would be this:
ID duration1 duration2 total_duration quantity1 quantity2 weighted_mean 1 5 2 7 3 1 2.43 2 NA 4 4 3 4 4 3 5 NA 5 2 NA 2
Thanks in advance for the help
-
I am to extract data from netCDF file using R for specific loaction the code i've written as showen and I have an error at the end of the code
I need some help with extracting date from NetCDF files using R , I downloaded them from cordex (The Coordinated Regional climate Downscaling Experiment). In total I have some files. This files have dimensions of (longitude, latitude, time) and the variable is maximum temperature (tasmax). At specific location, I need to extract data of tasmax at different time. In total I have some files. This files have dimensions of (longitude, latitude, time) and variable maximum temperature (tasmax). At specific location, I need to extract data of tasmax at different time.I wrote the code using R but at the end of code, an error appeared. Error ( location subscript out of bounds)
getwd() setwd("C:/Users/20120/climate change/rcp4.5/tasmax")
dir() library ("ncdf4") libra,-ry(ncdf4.helpers) library ("chron") ncin <- nc_open("tasmax_AFR-44_ICHEC-EC-EARTH_rcp45_r1i1p1_KNMI-RACMO22T_v1_mon_200601-201012.nc") lat <- ncvar_get(ncin, "lat") lon <- ncvar_get(ncin, "lon") tori <- ncvar_get(ncin, "time") title <- ncatt_get(ncin,0,"title") institution <- ncatt_get(ncin,0,"institution") datasource <- ncatt_get(ncin,0,"source") references <- ncatt_get(ncin,0,"references") history <- ncatt_get(ncin,0,"history") Conventions <- ncatt_get(ncin,0,"Conventions") tustr <- strsplit(tunits$value,"") ncin$dim$time$units ncin$dim$time$calendar tas_time <- nc.get.time.series(ncin, v = "tasmax", time.dim.name = "time") tas_time[c(1:3, length(tas_time) - 2:0)] tmp.array <- ncvar_get(ncin,"tasmax") dunits <- ncatt_get(ncin,"tasmax","units") tmp.array <- tmp.array-273.15 tunits <- ncatt_get(ncin,"time","units") nc_close(ncin) which.min(abs(lat-28.9)) which.min(abs(lon-30.2)) tmp.slice <- tmp.array[126,32981,] tmp.slice
Error in tmp.array[126, 32981, ] : subscript out of bounds