Onetailed and Twotailed tests
I am working on a project where I need to apply Ftest and Ttest to check for a significant difference. I have a confusion in the onetail and twotail tests in excel. If the pvalues for both onetail and twotail is less than the alpha value which is 0.05, which one should we consider? Is it onetail or twotail? Please help me understand this.
See also questions close to this topic

Pivot Table by Category
I have struggle in trying to have one specific id for multiple category.
For example, currently I have in Excel
**ID  category** apple a/b/c banana c orange a/c/d
how do I show in pivot table that will show the following
**category  id (by count)** a 2 b 1 c 3 d 1
currently when i have my pivot table it shows this instead
**category  id (by count)** a/b/c 1 a/c/d 1 c 1
I have attached my Excel file for reviewing purposes. Please assist if possible with macros or vba code or formula would be really helpful!
Thank you very much for your help

Formula that looks for values in two columns in order and calculates the average in the third column
So I have three columns, A (Country), B (Status), and C (transit time). Below is an example:
Country status transit time Slovakia Transit 2 Hong Kong Delivered 1 New Zealand Transit 2 Barbados Transit 2 Peru Transit 2 Ecuador Transit 2 Greece Transit 2 New Zealand Transit 4
I was hoping to see if there are two formulas. The first formula would match Column A and return the average in Column C as the value. Using the example above, if I wanted to know the average transit time for New Zealand (which is 3 days in this example). The other formula I was looking for is one that return the value of all shipments over the average transit time (that was found before). For example, I know the average transit time to New Zealand is 3 days, but I also want to know how many shipments exceeded the average 3 days (in this example, it would be 1 shipment).
 How to transform sentence in interval using excel?

A/B testing ideas: Conversion rate comparison?
I am currently trying to find ways to conduct A/B testing for conversion rate analysis. To give an example:
Consider an ecommerce website. What I have is following situation  I have two Landing pages LP A: 5000 visitors LP B: 6000 visitors
Converted in LP A  10 Converted in LP B  15
What I want to test is if conversion rate in LP A is significantly different from conversion rate for LP B. I have tried t test of proportion but I was looking for something else. Chi square is also an option but are there any other better ways?

Implementing pythonic statistical functions on spark dataframes
I have very large datasets in spark dataframes that are distributed across the nodes. I can do simple statistics like
mean
,stdev
,skewness
,kurtosis
etc using the spark librariespyspark.sql.functions
.If I want to use advanced statistical tests like JarqueBera (JB) or ShapiroWilk(SW) etc, I use the python libraries like
scipy
since the standard apache pyspark libraries don't have them. But in order to do that, I have to convert the spark dataframe to pandas, which means forcing the data into the master node like so:import scipy.stats as stats pandas_df=spark_df.toPandas() JBtest=stats.jarque_bera(pandas_df) SWtest=stats.shapiro(pandas_df)
I have multiple features, and each feature ID corresponds to a dataset on which I want to perform the test statistic.
My question is:
Is there a way to apply these pythonic functions on a spark dataframe while the data is still distributed across the nodes, or do I need to create my own JB/SW test statistic functions in spark?
Thank you for any valuable insight

Calculating customers lifecycling
I am trying to calculate levels of guaranteed hourly earnings I can offer to drivers during 36 weekly hours with highest demand without losing money and how much extra hours we want to get to capture missed demand.
Do you have any suggestions for an approach to the problem?
I am sharing data pattern below.
Also, I am sharing data defination.
o Date – date + hour for which the row of data is presented
o Active drivers – number of active drivers (any level of activity) available during time period
o Online (h) – total supply hours that were available during time period
o Has booking (h) – total hours during which drivers had a client booking (any state)
o Waiting for booking (h) – total hours which drivers spent waiting for booking
o Busy (h) – total hours which drivers were not available to take orders in
o Hours per active driver – average number of hours each driver was online during time period
o Rides per online hour – aka RPH – avg. finished trips per online hour during period
o Finished Rides – number of finished trips during period
o People saw 0 cars (unique) – number of users who didn’t not see a car.
o People saw +1 cars (unique) – number of users who saw a car.
o Coverage Ratio (unique) – % of users who saw the car.