if (freq) x$counts else x$density length > 1 and only the first element will be used
for my thesis I have to calculate the number of workers at risk of substitution by machines. I have calculated the probability of substitution (X) and the number of employee at risk (Y) for each occupation category. I have a dataset like this:
X Y
1 0.1300 0
2 0.1000 0
3 0.0841 1513
4 0.0221 287
5 0.1175 3641
....
700 0.9875 4000
I tried to plot a histogram with this command:
hist(dataset1$X,dataset1$Y,xlim=c(0,1),ylim=c(0,30000),breaks=100,main="Distribution",xlab="Probability",ylab="Number of employee")
But I get this error:
In if (freq) x$counts else x$density
length > 1 and only the first element will be used
Can someone tell me what is the problem and write me the right command? Thank you!
1 answer

It is worth pointing out that the message displayed is a Warning message, and should not prevent the results being plotted. However, it does indicate there are some issues with the data.
Without the full dataset, it is not 100% obvious what may be the problem. I believe it is caused by the data not being in the correct format, with two potential issues. Firstly, some values have a value of 0, and these won't be plotted on the histogram. Secondly, the observations appear to be inconsistently spaced.
Histograms are best built from one of two datasets:
 A dataframe which has been aggregated grouped into consistently sized bins.
 A list of values X which in the data
I prefer the second technique. As originally shown here The
expandRows()
function in the packagesplitstackshape
can be used to repeat the number of rows in the dataframe by the number of observations:set.seed(123) dataset1 < data.frame(X = runif(900, 0, 1), Y = runif(900, 0, 1000)) library(splitstackshape) dataset2 < expandRows(dataset1, "Y") hist(dataset2$X, xlim=c(0,1)) dataset1$bins < cut(dataset1$X, breaks = seq(0,1,0.01), labels = FALSE)