How to draw observations from a normal distribution and compute the mean using a customized estimator, and then running this procedure in a loop in r
I want to use a custom estimator Yhat=(1/(n4))*sum(Y_i) as an estimator for the mean of Y_i.
How do I draw, say, 20 observations from the distribution N(4,10) and compute the estimate of the sample mean using Yhat, and then repeat this procedure k amount of times in a loop and save the results in a matrix in r?
1 answer

First, since you want the estimate of the mean, you don't need a matrix, a vector should be fine.
Yhat < function(x) { if (length(x) <= 4) { stop("vector x must be at least 5 elements long") } else { sum(x) / (length(x)  1) } } k < 100 result < replicate(k, { simulations < rnorm(20, 4, 10) Yhat(simulations) }) result
do you know?
how many words do you know
See also questions close to this topic

R: geom_venn how to change font
Is there way to change font style when drawing a venn diagram using geom_venn function? I have tried theme(text=element_text(size=16, family="Comic Sans MS")) but for some reason it doesn't work

How to use forloop to create duration variable?
I need to create a variable that tells me how many years there was peace before the variable conflict = 1 (meaning a conflict starts).
Someone recommended to me to do a forloop with a variable that starts with 0, adds 1 if conflict==0, saves the value if conflict==1 and puts it back to 0 afterwards. Can someone help me with how I put this into R code language?
Thank you!

Flextable output to .docx cannot be opened in Word
I've been trying to import a .csv file into R, then convert it to a flextable, then export the flextable into a Word .docx document. I use the following code
library(tidyverse) library(readxl) library(scales) library(janitor) library(stringr) library(magrittr) library(officer) library(flextable) library(dplyr) # load data  Rpops_survey < read_csv("RPOPS Data Quant.csv") spec(Rpops_survey) # convert data  Rpops_survey_dframed < as.data.frame(Rpops_survey) Rpops_ft1 < flextable(Rpops_survey_dframed) Rpops_doctemp < read_docx() Rpops_doctemp < body_add_flextable(Rpops_doctemp, value = Rpops_ft1) # export data  fileout < tempfile(fileext = ".docx") fileout < "test.docx" print(Rpops_doctemp, target = fileout)
The file is created, but trying to open it in Word produces the following error: Word docx error
 translate this math func in c program

calculate pixels to meter different heights
I have data taken at 73 M height, it has an X coordinate and Ycorddinate that are known to me. the problem is the image was taken at a 79meter height and while I try to convert meters to pixels I get an error and the real object is not marked
for example  I hope o mark the red point , but my code marks the yellow one

How do i print out a number triangle in python? Using anything to string is not allowed. Only arithmetic operation
Example
n = 5
Output of each line is int type
1 22 333 4444

How can I optimize the expected value of a function in R?
I have derived a survival function for a system of components (ignore the details of how this system is setup) and I am trying to maximize its expected, or more specifically, maximizing the expected value of the function:
surv_func = function(x,mu) = {(exp((x/(mu))^(1/3))*((1exp((4/3)*x^(3/2)))+exp(((4/3)*x^(3/2)))))*exp((x/(3mu))^(1/3))}
and I am supposed (since the pdf including my tasks gives a hint about it) to use the function
optimize()
and the expected value for a function can be computed with
# Computes expected value of a the function "function" E < integrate(function, 0, Inf)
but my function depends on x and mu. The expected value could (obviously) be computed if the integral had no mu but instead only depended on x. For those interested, the mu comes from the fact that one of the components has a Weibulldistribution with parameters (1/3,mu) and the 3mu comes from that has a Weibulldistribution with parameters (1/3,lambda). In the task there is a constraint mu + lambda = 3, so I tought substituting the lambdaparameter in the second Weibulldistribution with lambda = 3  mu and trying to maximize this problem would yield not only mu, but also lambda.
If I try to, just for the sake of learing about R, compute the expected value using the code below (in the console window), it just gives me the following:
> E < integrate(surv_func,0,Inf) Error in (function (x, mu) : argument "mu" is missing, with no default
I am new to R and seem to be a little bit "slow" at learning. How can I approach this problem?

What's the actual probability of an event occurring predicted by a classification model?
I have a classification model that predicts whether event A will occur or event B. The accuracy of the model is 49%. Say for a test case it predicts that event A will occur with a probability of 72%. So what is the probability that Event A will occur?

Calculate convolution of exponential variables
have question for below convolution problem Let variables a1, a2 and a3 independently follows exponential(1) distributions. Find P(a1<2, a1+a2>2) and P(a1+a2<2, a1+a2+a3>2)

Fit distribution to partiallyobserved data with covariates
Here is a question that has burnt my brain lately. I do not have a strong mathematical background so perhaps the solution is relatively easy.
Imagine we have partiallyobserved distance data between a departure point and an arrival point for many individuals (i.e. Individual 1 moves 30 km, individual 2 moves 50 km, and so on). However, we have just observed a fraction of all cases (i.e. we observed 200 departures, but only 70 arrivals, and therefore we do not know where the other 130 individuals arrived, and their distance is thus unknown). Also, we have good reasons to suspect that we are seeing those individuals that move a shorter distance much more often than those that have longer distances.
Now, I would like to fit a distribution to these data, let's put a gamma distribution as an example. The aim of this is to obtain a kernel that describes the movement of this population of individuals (distance in the X axis, density or another good informative parameter at Y axis). However, it seems obvious that when fitting this kernel, as we have data biased towards shorter distances, we will also obtain a kernel biased towards shorter distances. Luckily, we have some additional data that are informing us about how this bias could be. For example, we know  with some uncertainty  that in the first 50 km we will see around 70% of all arrivals. Instead, in the next 50km we will see 60%, on the third 50km we will see again around 65% and in the fourth, fifth and sixth next 50km we will see just 20% of all arrivals.
Then my question is: is there a way to fit a statistical distribution to the data (I put gamma as an example) accounting for this bias and correcting by this explicit "observation chance" information? Is there a mathematical/statistical solution to this?
I am basically working in R, so any advice about potential code or packages that might be useful will be more than welcome. Of course some mathematical description will be great as well. I set gamma as an example but if there is any other distribution with which this is more intuitive or easier to do, I would be happy to know too.
Thank you in advance for taking your time to read this. I wrote this quite descriptively, so please let me know if you need me to be more specific.
Thank you!
J.

Distributing my coin collection over 28 albums
I have a coin collection and I want to distribute that collection evenly (or as evenly as possible) over my albums. The coins are in 353 sheets and there are 28 albums, so that's 12.81 sheets per album. There are 193 countries and not every country has the same amount of sheets. Afghanistan has 1 sheet, as does Albania. Algeria has two sheets, Andorra and Angola both have 1 sheet, Argentina has 6 sheets and so on.
Basically it comes down to the following: I have a collection of 193 numbers (the number of sheets per country) and those numbers need to be split in 28 collections. The sum of each collection should be as close to each other as possible. There are two important limitations. The maximum sum of numbers for a collection is 15 and the order of the numbers can not be changed. This is the collection:
{1,1,2,1,1,6,1,1,6,4,1,2,2,1,1,1,8,1,1,1,1,1,1,5,1,1,1,3,1,1,1,1,1,1,2,2,2,1,2,1,2,2,2,3,1,2,5,1,2,1,2,2,4,1,1,1,1,2,2,1,1,2,3,1,1,1,2,2,5,2,2,1,1,1,1,2,4,2,2,2,1,3,2,3,5,2,3,2,2,3,2,1,1,1,1,1,1,1,1,1,2,3,1,1,1,2,1,3,1,2,3,1,2,1,3,1,1,1,2,1,2,1,1,2,1,3,1,1,2,2,1,2,3,3,5,4,1,1,1,3,1,1,1,2,1,2,2,1,1,2,1,1,1,1,1,6,1,1,6,3,1,3,2,1,1,2,1,3,1,1,1,1,4,1,1,2,4,1,1,2,1,3,1,1,3,3,1,1,1,1,5,1,2}
If I would do it by hand, it would result in something like this:
1. {1,1,2,1,1,6,1,1} = 14 2. {6,4,1,2} = 13 3. {2,1,1,1,8} = 13 4. {1,1,1,1,1,1,5,1,1} = 13 5. {1,3,1,1,1,1,1,1,2} = 12 6. {2,2,1,2,1,2,2} = 12 7. {2,3,1,2,5} = 13 8. {1,2,1,2,2,4} = 12 9. {1,1,1,1,2,2,1,1,2} = 12 10. {3,1,1,1,2,2,5} = 15 11. {2,2,1,1,1,1,2} = 10 12. {4,2,2,2,1,3} 14 13. {2,3,5,2} = 12 14. {3,2,2,3,2} = 12 15. {1,1,1,1,1,1,1,1,1,2} = 11 16. {3,1,1,1,2,1,3,1} = 13 17. {2,3,1,2,1,3} = 12 18. {1,1,1,2,1,2,1,1,2} = 12 19. {1,3,1,1,2,2,1,2} = 13 20. {3,3,5} = 11 21. {4,1,1,1,3,1,1,1} = 13 22. {2,1,2,2,1,1,2,1,1} = 13 23. {1,1,1,6,1,1} = 11 24. {6,3,1,3} = 13 25. {2,1,1,2,1,3,1,1,1} = 13 26. {1,4,1,1,2,4} = 13 27. {1,1,2,1,3,1,1,3} = 13 28. {3,1,1,1,1,5,1,2} = 15
I really don't know how I should do this in code. Could someone point me in the right direction? Thanks a lot!

Unevenly spaced numbers in specified interval
I would like to make an array of unevenly spaced numbers in a specified interval, such that the distribution has a smooth shape (see figure). I have a total depth and I want 200 layers. At the moment I specify the thickness of these layers manually, but is there a way or a function to specify this? Is there for instance a function which makes you specify a quadratic array of numbers in a specified interval (similar to how linspace works)?

Estimate the baseline cumulative hazard with Breslow's method
I would like to estimate the survival function on a validation data set using the beta coefficients estimated by the training data set. The survival function is written as follows:
Therefore, the first step is to estimate the baseline survival function : .
Thus, I need to estimate the baseline cumulative hazard function on the training dataset using the Breslow method : .
My question is: What option/formula to use with the
survfit
function to estimate ?I have tried
Surv(time, status) ~ 1
without covariate andSurv(time, status) ~ age+sex+ph.ecog+ph.karno+pat.karno
with covariates. I also triedsurvfit
withSurv(...)
and withcoxph(Surv(...))
, it gives me different results and I'll be interested to know what is the right formulation.data(cancer, package="survival") cox.mel < coxph(Surv(time, status) ~ 1, data = cancer) fit1 < survfit(cox.mel, type='breslow', centered=T) fit2 < survfit(cox.mel, type='breslow', centered=F) fit3 < survfit(Surv(time, status) ~ 1, data = cancer, stype = 2, se.fit = TRUE) fit4 < survfit(Surv(time, status) ~ 1, data = cancer, ctype =1, se.fit = TRUE) cox.mel2 < coxph(Surv(time, status) ~ age+sex+ph.ecog+ph.karno+pat.karno, data = cancer) fit1. < survfit(cox.mel2, type='breslow', centered=T) fit3. < survfit(Surv(time, status) ~ age+sex+ph.ecog+ph.karno+pat.karno, data = cancer, stype = 2, se.fit = TRUE) fit4. < survfit(Surv(time, status) ~ age+sex+ph.ecog+ph.karno+pat.karno, data = cancer, ctype =1, se.fit = TRUE) cbind(fit1$cumhaz,fit2$cumhaz,fit3$cumhaz,fit4$cumhaz) cbind(fit1.$cumhaz,fit3.$cumhaz,fit4.$cumhaz)
Should all covariates be considered when estimating the baseline cumulative hazard function?
Thank you!

hypothesis test in poisson model
library(MASS) library(carData) library(car) library(hnp) setwd("C:/Users/User/Downloads") d = read.csv("Preg3.csv") M1 = glm(reclamos ~ edad + genero + calificacion + offset(log(tiempo)), data = d, family = poisson(link="log")) summary(M1) exp(Confint(M1)) hnp(M1,halfnormal = FALSE) Call: glm(formula = reclamos ~ edad + genero + calificacion + offset(log(tiempo)), family = poisson(link = "log"), data = d) Deviance Residuals: Min 1Q Median 3Q Max 0.8026 0.4567 0.3543 0.2343 3.3079 Coefficients: Estimate Std. Error z value Pr(>z) (Intercept) 1.77196 0.32611 5.434 5.52e08 *** edad2635 0.14705 0.31513 0.467 0.641 edad3645 0.11517 0.32098 0.359 0.720 edad4655 0.05929 0.35285 0.168 0.867 edad5665 0.46454 0.39113 1.188 0.235 edadmás de 66 0.63277 0.77360 0.818 0.413 generoM 0.17525 0.15511 1.130 0.259 calificacion 0.15049 0.02968 5.070 3.97e07 ***  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 1514.5 on 3841 degrees of freedom Residual deviance: 1481.6 on 3834 degrees of freedom AIC: 2081.8 Number of Fisher Scoring iterations: 6
I have model 1 (M1) and I want to ask the following question:
How can I perform a hypothesis test to test whether the effect of moving up a category in the classification as a driver is the same regardless of which category you start in (the effect of moving from category 0 to 1, from 1 to 2 , ..., from 4 to 5 is the same)?
Pd:my category variable takes the values from 1 to 5

Why does gamlss give incorrect estimates of exgaussian distribution parameters?
From the gamlss.dist page for
exGAUSS
:The exGaussian distribution is often used by psychologists to model response time (RT). It is defined by adding two random variables, one from a normal distribution and the other from an exponential. The parameters
mu
andsigma
are the mean and standard deviation from the normal distribution variable while the parameternu
is the mean of the exponential variable.Here is how we're supposed to estimate the parameters:
library(gamlss) y < rexGAUS(100, mu = 300, nu = 100, sigma = 35) m1 < gamlss(y ~ 1, family = exGAUS) m1
Unfortunately the estimates are way off:
Family: c("exGAUS", "exGaussian") Fitting method: RS() Call: gamlss(formula = y ~ 1, family = exGAUS) Mu Coefficients: (Intercept) 302.9 Sigma Coefficients: (Intercept) 3.496 Nu Coefficients: (Intercept) 4.63
A package that has disappeared from CRAN, retimes, can still be installed from
https://cran.rproject.org/src/contrib/Archive/retimes/retimes_0.12.tar.gz
It has a functionmexgauss
:library(retimes) mexgauss(y)
gives:
mu sigma tau 319.42880 55.51562 85.94403
which is closer.