How can I pass an argument for the variable name for storing intermediate dplyr output?
xfun < function(yvar, newvar){
df %>%
pull(yvar) %>%
lmoms(., nmom=4) %>% # Estimates some parameters
{. >> newvar} # Stores list
}
xfun("var2", "newvar2")
But this doesn't work. I get new output.
But this works:
xfun < function(yvar){
df %>%
pull(yvar) %>%
lmoms(., nmom=4) %>% # Estimates some parameters
{. >> "newvar2"} # Stores list
}
xfun("var2")
So, how can I pass the newvar as a function argument to store this intermediate output?
Open to a different way of accomplishing this as well.
See also questions close to this topic

Conditional data labeling by comparing values in previous and next rows
Hi I have following data and want to relabel its
NA
rowstest < data.frame(gr=c(rep("1",4),c(rep("2",3)),c(rep("3",8))), sd_value=c(c(77,18,3,16),c(21,32,76),c(24,32,31,44,60,80,62,60)), value=c(c(5400,6900,7080,1892),c(2712,4207,4403),c(3900,4069,4237,5169,5254,5339,5524,5525)), label=c(c("good",NA,"unable","bads"),c("bads",NA,"good"),c("bad",NA,NA,NA,NA,"good",NA,NA))) > test gr sd_value value label 1 1 77 5400 good 2 1 18 6900 <NA> 3 1 3 7080 unable 4 1 16 1892 bads 5 2 21 2712 bads 6 2 32 4207 <NA> 7 2 76 4403 good 8 3 24 3900 bad 9 3 32 4069 <NA> 10 3 31 4237 <NA> 11 3 44 5169 <NA> 12 3 60 5254 <NA> 13 3 80 5339 good 14 3 62 5524 <NA> 15 3 60 5525 <NA>
The basic idea behind the labeling
NA
rows is that comparingvalues
andsd_values
in the closestnonNA
rows.There is only one caveat when the closest row is
good
. I want to label themeww!
if the condition is met!the expected output
test %>% mutate(diff_val=c(0,diff(value)), diff_sd_val=c(0,diff(sd_value)))
ps. added diff_val and diff_sd_val for only visual check
gr sd_value value label new_label diff_val diff_sd_val 1 1 77 5400 good good 0 0 2 1 18 6900 NA unable 1500 59 # compared to row 1 and 3; labeled row 3 label `unable` because it satisfied diff_val<200 & diff (sd_value) < 50 3 1 3 7080 unable unable 180 15 4 1 16 1892 bads bads 5188 13 5 2 21 2712 bads bads 820 5 6 2 32 4207 NA eww! 1495 11 # compared to row 5&7; labeled new label `eww!` because it satisfied diff_val<200 & diff (sd_value) < 50 with row 7 7 2 76 4403 good good 196 44 8 3 24 3900 bads bads 503 52 9 3 32 4069 NA bads 169 8 # compared to row 8&10; labeled same as row 8 `bads` because it satisfied diff_val<200 & diff (sd_value) < 50 10 3 31 4237 NA bads 168 1 # compared to row 9&11; labeled row 8 `bads` because it satisfied diff_val<200 & diff (sd_value) < 50 ::: take closest nonNA label satisfies this diff condition 11 3 44 5169 NA eww! 932 13 # compared to row 10&12; new label `eww!` because it satisfied diff_val<200 & diff (sd_value) < 50 ::: compare values to closest nonNA label :: 12 3 60 5254 NA eww! 85 16 # compared to row 11&13; new label `eww!` because it satisfied diff_val<200 & diff (sd_value) < 50 ::: compare values to closest nonNA label 13 3 80 5339 good good 85 20 14 3 62 5524 NA eww! 185 18 # compared to row 13&15; new label `eww!` because it satisfied diff_val<200 & diff (sd_value) < 50 ::: compare values to closest nonNA label 15 3 60 5525 NA eww! 1 2 # compared to row 13; new label `eww!` because it satisfied diff_val<200 & diff (sd_value) < 50 ::: compare values to closest nonNA label
could be defined like this function
custom_label < function(value,sd_value,label){ if(is.na(label)){ compare diff(values)&diff(sd_values) in previous and next rows if diff(value) <200 and diff(sd_value) < 20 use label of that row else NA }

Linear Regression Analysis of population data with R
I have a homework assignment where I need to take a CSV file based around population data around the United States and do some data analysis on the data inside. I need to find the data that exists for my state and for starters run a Linear Regression Analysis to predict the size of the population.
I've been studying R for a few weeks now, went through a LinkedIn Learning training, as well as 2 different trainings on pluralsight about R. I have also tried searching for how to do a Linear Regression Analysis in R and I find plenty of examples for how to do it when the data is perfectly laid out in a table in just the right way to Analyze.
The CSV file is laid out so that each state is defined on a single line/row so I used the filter function to grab just the data for my State and put it into a variable.
Within that dataset the population data is defined across several columns with the most important data being the Population Estimates for each year from 2010 to 2018.
library(tidyverse) population.data < read_csv("nstest2018alldata.csv") mn.state.data < filter(population.data, NAME == "Minnesota")
I'm looking for some help to get headed in the right direction my thought is that I will need to create to containers of data 1 having each year from 2010 to 2018 and one that contains the population data for each of those years. And then use the xyplot function with those two containers? If you have some experience in this area please help me think this through I'm not looking for anybody to do the assignment for me just want some help trying to think it through.

Mixedmodel on each factor level
I have a mixed effect model that I want to run on each level of a factor. I can do it one level at a time by subsetting the dataframe, but I am sure there is a straightfoward way to do it.
Here is an example. "x" and "y" are variables and "fac 2" is the factor to be modelled as random effect. What I want to do is to apply the model to each "fac1" level.
Here is how the dataframe looks like:
> head(df) x y fac1 fac2 1 14 1.0328 A 1 2 18 1.0205 A 1 3 22 1.9262 A 1 4 26 2.3026 A 1 5 30 2.5159 A 1 6 34 2.6633 A 1
Here is the model
lp< function(x, a, b, c){ ifelse(x < c, a + b * x, a + b * c) } library(nlme) f1< nlme(y ~ lp(x, a, b, c), random = a+b+c~1fac2, fixed = a+b+c, data = df, start = c(a=1, b=0.05, c=40,40))
I want to get something like this but for each level of "fac 1" (in this case would be "A" and "B"):
> fixef(f1) a b c 0.4653545 0.1025822 30.3707837 > ranef(f1) a b c 1 0.02172505 3.580473e13 0.4892547 2 0.16928799 1.110698e12 1.5177167 3 0.20562406 1.196632e12 1.6351403 4 0.34252338 1.742883e12 2.3815664 5 0.29974892 1.608822e12 2.1983787 6 0.17999633 8.927181e13 1.2198569 7 0.17491778 1.083529e12 1.4805913 8 0.08449744 8.231909e13 1.1248512
In case you need the data of the example:
df<structure(list(x = c(14L, 18L, 22L, 26L, 30L, 34L, 38L, 40L, 42L, 14L, 18L, 22L, 26L, 30L, 34L, 38L, 40L, 42L, 14L, 18L, 22L, 26L, 30L, 34L, 38L, 42L, 44L, 14L, 18L, 22L, 26L, 30L, 34L, 38L, 40L, 42L, 10L, 14L, 18L, 22L, 26L, 30L, 34L, 36L, 38L, 42L, 10L, 14L, 18L, 22L, 26L, 30L, 34L, 38L, 42L, 10L, 14L, 18L, 22L, 26L, 30L, 34L, 37L, 38L, 42L, 10L, 14L, 18L, 22L, 26L, 30L, 34L, 36L, 38L, 42L), y = c(1.0328, 1.0205, 1.9262, 2.3026, 2.5159, 2.6633, 2.7435, 2.855, 2.6624, 0.881, 1.0738, 1.6, 2.1519, 2.3339, 2.4908, 2.7169, 2.7106, 2.7731, 0.6859, 1.1838, 1.6867, 1.9957, 2.3212, 2.5685, 2.6384, 2.557, 2.7263, 0.6374, 1.062, 1.5332, 1.8987, 2.0687, 2.5393, 2.5393, 2.5241, 2.4515, 0.7777, 1.3255, 1.7045, 2.2217, 2.4302, 2.7138, 2.7788, 2.733, 2.7156, 2.741, 0.6405, 1.1825, 1.5864, 2.0159, 2.3993, 2.7801, 2.7167, 2.8142, 2.6103, 0.6804, 1.0521, 1.7468, 1.8914, 2.4697, 2.773, 2.6471, 2.4977, 2.76, 2.595, 0.7479, 1.0025, 1.4848, 1.8616, 2.3183, 2.6273, 2.5209, 2.6643, 2.4964, 2.4766), fac1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), fac2 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8"), class = "factor")), row.names = c(NA, 75L ), class = "data.frame")

Getting mean for levels with lapply
I am trying to get mean to each level from a level list, usually I use a loop for this task but I would like be faster because I usually use a large list of variables (in this case levels), I am trying to avoid a loop. My command is like:
library(dplyr) var= c(1,2) dd < lapply(var, function(x) { mtcars %>% filter(carb==var) %>% mutate(mean=mean(mpg)) })
My result
dd [[1]] mpg cyl disp hp drat wt qsec vs am gear carb mean 1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 22.66667 2 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 22.66667 3 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 22.66667 4 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 22.66667 5 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 22.66667 6 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 22.66667 [[2]] mpg cyl disp hp drat wt qsec vs am gear carb mean 1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 22.66667 2 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 22.66667 3 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 22.66667 4 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 22.66667 5 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 22.66667 6 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 22.66667
What I expect:
tapply(mtcars$mpg, mtcars$carb, mean) > 1 2 3 4 6 8 25.34286 22.40000 16.30000 15.79000 19.70000 15.00000 mpg cyl disp hp drat wt qsec vs am gear carb mean 1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 25.34 2 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 1 25.34 3 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 25.34 4 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 1 25.34 5 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 1 25.34 6 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 1 25.34 [[2]] mpg cyl disp hp drat wt qsec vs am gear carb mean 1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 2 22.40 2 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 22.40 3 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 2 22.40 4 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 22.40 5 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 22.40 6 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 22.40

Extract forecasting data
I have one data frame, which is composed of three columns(AJT,NET and SAT). My intention is to make forecast with each of this three times series with forecast package. For that reason I convert data frame into ts object and make forecasting with snaive function,so I wrote this lines of code:
#CODE library(forecast) # Data set DATA_SET<data.frame( AJT=seq(1, 48, by = 2), NET=seq(1, 24, by = 1), SAT=seq(1, 94, by = 4) ) # Making TS object TS_SALES<ts(DATA_SET,start=c(2016,1),frequency = 12) # Making forecasting with Forecast package SNAIVE_AJT<snaive(TS_SALES[, 'AJT'],h=5) SNAIVE_NET<snaive(TS_SALES[, 'NET'],h=5) SNAIVE_SAT<snaive(TS_SALES[, 'SAT'],h=5) # Union forecast in list SNAIVE_UNION<mapply(SNAIVE_AJT, SNAIVE_NET,SNAIVE_SAT, FUN=list, SIMPLIFY=FALSE)
All outputs from snaive function I put into SNAIVE_UNION which contain all results from forecasting. So here most important component is "mean" which contain results of forecatig by months.
SNAIVE_UNION[["mean"]] # # # [[1]] # # Jan Feb Mar Apr May # # 2018 25 27 29 31 33 # # # # [[2]] # # Jan Feb Mar Apr May # # 2018 13 14 15 16 17 # # # # [[3]] # # Jan Feb Mar Apr May # # 2018 49 53 57 61 65
So here, my intention is to put results form SNAIVE_UNION[["mean"]] ,into data table like table below with some function loop,for or other function
Jan Feb Mar Apr May  AJT 25 27 29 31 33 NET 13 14 15 16 17 SAT 49 53 57 61 65
I am asking this because this time series is only small part of whole series and I would like to automate this code.

Rounding all numeric columns but one in table
I have created numerous output tables from ttests and ANOVAs and I'd like to round all numeric columns of the tables apart from the column containing the pvalues (p.value).
Current code:
library(dplyr) library(broom) a < rnorm(100, 0.75, 0.1) t.test < t.test(a, mu = 0.5, alternative = "greater") %>% broom::tidy() %>% mutate_if(is.numeric, round, 2)
The issue is that this also rounds my pvalue which is then displayed as 0. I already have a function for reporting pvalues for my markdown file so I'm wondering how I can keep the pvalue (p.value) unchanged yet round all other numeric columns to 2 digits?
Thanks