Analysing simulation result of event sequences with branch
So I have a problem where a sequence of
A1 > B1 > C1 > D1
or
A1 > B1 > C2 > D2
or
A1 > B1 > C2 > D3
or
A2 > B2 > C3 > D4
Note there's more than 1 root starting point too. Each stage also has some other properties to it. So I'd want to ask
 find all stage (regardless of ABCD) where property 1 = some value and has some where up the parent chain property 2 = some value.
 I need to work out the probability of getting to each stage if given all "sequence branch" are of equal probability. So probability of getting to D3 is 1/2(A) * 1/2(C) where as D1 or D2 stage is 1/2(A) * 1/2(C) * 1/2(D)
 Conditional probability. Given B1 has happened, what's the chance of D3.
What's the best way / technique to store and analyse / query / interrogate data like this? What sort of keywords should I google / field / technology to read and learn?
Note I'm thinking to generate in the neighbourhood of 100s of k up to millions sample of sequence events.
I've had some look at RDBMS recursive CTE. That solves problem 1, but 2 and 3 in combination seem a bit more difficult. Was wondering if a graph database like neo4j can solve the problem better?
See also questions close to this topic

How to aggregate statistics from all published apps in the Play Store
By advance, sorry for my vagueness, I'm very new to this...
I need to learn how to aggregate the statistics of all the apps published by my company.
More specifically, I need for example to get the number of downloads, day after day, month after month, etc, for all the apps that the company has published. Google already provides some statistics from the Google Play Console. But these are systematically given app by app. Nothing more general. And there are several views that my company requires. I need to create "custom views" according to the needs of the company.
Eventually, it would be ideal to have a dynamic webpage that displays only the desired informations. But for now, I would already be very happy to have a new .csv file that gathers the required informations from the .csv files provided by Google.
So far, I tried to follow the indications provided by Google. Starting from this page, I created a Google Cloud Storage account, which seems to be a HUGE thing that completely lost me (and it seems to cost money). I also tried to learn how to use
gsutil
which, as far as I understand, is a console interface to Google Cloud Storage.These tools seem quite complex to learn. So before I dive in, I want to be sure these are the right tools.
I would be very glad if I could get some hints on how to proceed. And of course, I would be glad to give any information that could be useful.

Python MannWhitney confidence interval
I have two datasets (Pandas Series)  ds1 and ds2  for which I want to calculate 95% confidence interval for difference in mean (if normal) or median (for nonnormal).
For difference in mean, I calculate t test statistic and CI as such:
import statsmodels.api as sm tstat, p_value, dof = sm.stats.ttest_ind(ds1, ds2) CI = sm.stats.CompareMeans.from_data(ds1, ds2).tconfint_diff()
for median, I do:
from scipy.stats import mannwhitneyu U_stat, p_value = mannwhitneyu(ds1, ds2, True, "twosided")
How do I to calculate CI for difference in median?

Format data to run ANOVA in R
I am trying to run a 3way ANOVA in R, but my values for each variable are in one column and not separated by rows. Currently, my data frame looks something like this:
Season Site Location Replicate Lengths Jan_16 MI Adj 1.00 , Jan_16 MI Adj 2.00 , Jan_16 MI Adj 3.00 , Jan_16 MI Away 1.00 3,4, Jan_16 MI Away 2.00 , Jan_16 MI Away 3.00 , Jan_16 MP Adj 1.00 4,5,6,5,4,5,4,4,4,4,5,4,6,4, Jan_16 MP Adj 2.00 4,4,3,3,5,4,3,4,5,3,4,3,4,3,4,6, Jan_16 MP Adj 3.00 4,6,5,5,4, Jan_16 MP Away 1.00 ,4,4,10,4,5,4,6,5,5, Jan_16 MP Away 2.00 3,4,4,4,5,5,4,5, Jan_16 MP Away 3.00 4,4,13,4,
Lengths
is the response variable that I wish to run the ANOVA on, how would I do this? Just a "," means there is no data.**** EDIT
I have tried separate rows
library(tidyr) separate_rows(data.frame, Season:Replicate, Lengths, convert=numeric ) #Error: All nested columns must have the same number of elements
The Lengths have a different number of variables, so is there a way to unnest this?

AnyLogic "variable cannot be resolved or is not a field"
I'm currently building a threestaged simulation model consisting of a supplier, manufacturer and a customer in AnyLogic. I've introduced the variables C,R and Z for parameters Costs, Revenue and Backlog respectively.
Sadly running the model leads to error messages "source cannot be resolved" and "mean cannot be resolved".
If clicked on the error message opens up my bar graph marking "source" or "mean".
This might be simple but nevertheless, help is appreciated.
Kind regards

in anylogic exception with Agent.setspeed()
i have a simple anylogic model for pedestrian movement from start line towards target line
i want to change the speed of the moving agents at some condition.
i test the condition using events
if the number of agents in a specific area exceeds 20, i change the speed of the agents in the previous area using agent.setspeed()
when i run the simulation and the event is triggered i get this exception:

Keypress simulation in React
I'll try to describe my problem. When user refreshes the page, I need automatically simulate event, as he has pressed "enter". I tried this tutorial https://codeexamples.net/en/q/91a01 and a lot of others, but it seems not working for me.
I assume that this keypress simulation has to be in ComponentDidMount() function, but maybe I am wrong?
Is it possible to do it without jQuery?

Calculating importance of independent variable in explaining variance of dependent variable in linear regression
I am working on a Media Mix Modeling (MMM) project where I have to build linear model for predicting traffic factoring in various spends as input variables. I have got the linear model equation which is:
Traffic = 1918 + 0.08*TV_Spend + 0.01*Print_Spend + 0.05*Display_spend
I want to calculate two things which I don't know how to do:
 How much each variable is contributing in explaining variance of traffic?
 What percentage of total traffic is due to each independent variable?

How do I do a regression analysis with panel data in R?
So I'm a noob at R and it's been more than a year since I've used R, and I've seem to forgot a lot... :(
I have a panel data that includes different countries with observations from 2005, 2010, and 2015 that looks like this:
Location Year Health_Spending Total NCD Deaths_male 1 CAN 2005 3282.454 101.4 2 CAN 2010 4225.189 105.5 3 CAN 2015 4632.837 109.2 4 ESP 2005 2126.553 179.9 5 ESP 2010 2882.912 180.6 6 ESP 2015 3175.457 183.1 Total NCD Deaths_female 1 102.7 2 107.3 3 110.2 4 170.4 5 170.6 6 180.8
I'm trying to run a regression analysis with Health_Spending as Y, and Total NCD Deaths_male & Total NCD Deaths_female as X1 and X2.
I've been looking up and it seems like plm package is used a lot to analyze panel data in R, but I'm having trouble figuring out how to use it.
Can a kind soul help me out and guide me on what I need to do?
(here's a dput version of my data just in case)
structure(list(Location = c("CAN", "CAN", "CAN", "ESP", "ESP", "ESP", "GBR", "GBR", "GBR", "ISR", "ISR", "ISR", "JPN", "JPN", "JPN", "KOR", "KOR", "KOR", "MEX", "MEX", "MEX", "NLD", "NLD", "NLD", "NOR", "NOR", "NOR", "POL", "POL", "POL", "TUR", "TUR", "TUR", "USA", "USA", "USA"), Year = c(2005L, 2010L, 2015L, 2005L, 2010L, 2015L, 2005L, 2010L, 2015L, 2005L, 2010L, 2015L, 2005L, 2010L, 2015L, 2005L, 2010L, 2015L, 2005L, 2010L, 2015L, 2005L, 2010L, 2015L, 2005L, 2010L, 2015L, 2005L, 2010L, 2015L, 2005L, 2010L, 2015L, 2005L, 2010L, 2015L), Health_Spending = c(3282.454, 4225.189, 4632.837, 2126.553, 2882.912, 3175.457, 2331.136, 3040.114, 4071.806, 1768.952, 2032.725, 2646.915, 2463.725, 3205.216, 4428.349, 1183.438, 1895.699, 2481.587, 730.816, 911.351, 1037.424, 3454.707, 4633.738, 5148.399, 3980.768, 5162.669, 6239.435, 806.974, 1352.424, 1687.009, 582.888, 871.677, 1028.911, 6443.02, 7939.798, 9491.4 ), `Total NCD Deaths_male` = c("101.4", "105.5", "109.2", "179.9", "180.6", "183.1", "245.8", "242.0", "249.0", "16.7", "16.8", "18.0", "460.3", "503.7", "543.2", "105.7", "110.2", "118.3", "194.7", "230.7", "257.5", "58.9", "58.6", "63.2", "17.4", "17.5", "17.1", "172.7", "175.1", "175.9", "185.3", "197.4", "211.8", "1024.9", "1061.6", "1159.5"), `Total NCD Deaths_female` = c("102.7", "107.3", "110.2", "170.4", "170.6", "180.8", "268.2", "259.0", "264.1", "17.5", "17.4", "18.7", "405.0", "458.9", "528.4", "92.9", "93.3", "102.2", "181.4", "214.2", "235.5", "62.1", "62.6", "67.7", "18.4", "18.8", "18.2", "163.1", "168.6", "174.6", "150.3", "162.6", "181.0", "1111.6", "1115.5", "1183.4")), .Names = c("Location", "Year", "Health_Spending", "Total NCD Deaths_male", "Total NCD Deaths_female" ), class = "data.frame", row.names = c(NA, 36L))

Average Case Analysis of Sequential Search with Geometric Probability Distribution
I was kind of aware of getting the average running time in a uniform distribution. Say for example we have 6 array elements.
 1/6  1/6  1/6  1/6  1/6  1/6 
Above is the array with the uniform probability distribution of a search element being positioned in every subscript in the array.
So getting the average running time in a uniform distribution will be like the solution below:
T(n) = (1/6)*1 + (1/6)*2 + (1/6)*3 + (1/6)*4 + (1/6)*5 + (1/6)*6 = (1/6) * ( 1 + 2 + 3 + 4 + 5 + 6 ) = 3.5
or when in express in n terms:
T(n) = (1/n) * ((n(n+1))/2) = (n+1) / 2 = ϴ(n)
But what about the averagecase number of key comparisons in sequential search under a geometric probability distribution?
Example:
Prob(target X is in the jth position) = 1/(2^(j+1)) where j = 0, 1, 2,3,4,5,6,...  1/(2^(0+1))  1/(2^(1+1))  1/(2^(2+1))  1/(2^(3+1))  1/(2^(4+1))  1/(2^(5+1)) 
Then
T(j) = ((1/2)* 1) + ((1/4)* 2) + ((1/8)* 3) + ((1/16)* 4) + ((1/32)* 5) + ((1/64)* 6) = .5 + .25(2) + .125(3) + .0625(4) + .03125(5) + .015625(6) = .5 + .5 + .375 + .25 + .15625 + .09375 = 1.875
I dont know how to express it in j terms:
T(j) = ?
What is the upperbound O(j)? lowerbound Ω(j)? tightbound ϴ(j)?
Any help or ideas , will be very much appreciated.