hypothesis testing for 2 unequal vectors of proportions
I need some statistical advice. I have 2 vectors of proportions ( all values between 0 and 1 ) of an item in a sample. For example : Vector1 has 20 observations and vector2 has 28 observations. I want to be able to test for statistical significance between these 2 vectors and want to be able to tell if the proportion of said item is more in vector1 vs vector2 or vice versa.
Example data :
vector1
0.876
0.789
0.776
0.678
0.780
0.654
0.765
vector2
0.345
0.432
0.235
0.431
0.023
0.134
0.223
0.211
Please let me know if you have any suggestions.
Thanks in advance!
See also questions close to this topic

Why does my Neural network gives same prediction for every input?
I am trying to create a neural network that takes 294 inputs and predicts which of the inputs has the probability to be the output. Then, I also wanted to regress to find out how much difference is there between the actual value and predicted value. So I added two regression output node at the output layer. Before, I added the regression at the output the model was predicting decent enough but after the addition the model started to same value no matter what I do. Then I decided do check the weights then I found somethings like this:
[[ 0.19589818 0.45867598 0.1103735 0.11739671 0.3524462 0.3615998 0.11838996] [0.37149632 0.29049385 0.27328718 0.39140654 0.22933161 0.07160628 0.33962536] [ 0.21745765 0.19408011 0.28868628 0.0097748 0.06756687 0.40600073 0.0485481 ] [0.4144268 0.4770614 0.1586262 0.06003821 0.01309896 0.47136605 0.41377842] [0.25865722 0.3038118 0.2767954 0.33988214 0.48508477 0.33661437 0.20484531] [ 0.4246924 0.4958439 0.2031511 0.4845667 0.18330884 0.1708759 0.28903925] [0.4602847 0.02263796 0.27997506 0.33072484 0.44759667 0.14221525 0.2714281 ] [0.3839649 0.13256657 0.03424132 0.36362755 0.4561025 0.12396967 0.15885079] [0.273561 0.09750211 0.4644209 0.4556396 0.3021226 0.26363683 0.43606043] [ 0.2392633 0.1741817 0.48888505 0.43252754 0.101964 0.02732563 0.28655064] [ 0.41151023 0.16941857 0.48709846 0.23205352 0.22945309 0.2136854] . . . . [0.01252615 0.19594312 0.26858175 0.07100904 0.16546512 0.11748069 0.36638904]]
Above is the weights for layer 294 before any update. Then after some update weights :
weights for layer294:[[[ 0.19589818 0.19589818 0.19589818 ... 0.19589818 0.19589818 0.19589818] [ 0.45867598 0.45867598 0.45867598 ... 0.45867598 0.45867598 0.45867598] [0.1103735 0.1103735 0.1103735 ... 0.1103735 0.1103735 0.1103735 ] ... [ 0.3524462 0.3524462 0.3524462 ... 0.3524462 0.3524462 0.3524462 ] [ 0.3615998 0.3615998 0.3615998 ... 0.3615998 0.3615998 0.3615998 ] [0.11838996 0.11838996 0.11838996 ... 0.11838996 0.11838996 0.11838996]] [[0.37149632 0.37149632 0.37149632 ... 0.37149632 0.37149632 0.37149632] [ 0.29049385 0.29049385 0.29049385 ... 0.29049385 0.29049385 0.29049385] [ 0.27328718 0.27328718 0.27328718 ... 0.27328718 0.27328718 0.27328718] ... [0.22933161 0.22933161 0.22933161 ... 0.22933161 0.22933161 0.22933161] [ 0.07160628 0.07160628 0.07160628 ... 0.07160628 0.07160628 0.07160628] [ 0.33962536 0.33962536 0.33962536 ... 0.33962536 0.33962536 0.33962536]] [[ 0.21745765 0.21745765 0.21745765 ... 0.21745765 0.21745765 0.21745765] [ 0.19408011 0.19408011 0.19408011 ... 0.19408011 0.19408011 0.19408011] [0.28868628 0.28868628 0.28868628 ... 0.28868628 0.28868628 0.28868628] ... [ 0.06756687 0.06756687 0.06756687 ... 0.06756687 0.06756687 0.06756687] [0.40600073 0.40600073 0.40600073 ... 0.40600073 0.40600073 0.40600073] [ 0.0485481 0.0485481 0.0485481 ... 0.0485481 0.0485481 0.0485481 ]] . . . . . . [ 0.36638904 0.36638904 0.36638904 ... 0.36638904 0.36638904 0.36638904]]]
It seems weights does not seem to change rather grow in dimension. Is this how it supposed to be? This is how I constructed my model:
import warnings import pandas as pd pd.options.mode.chained_assignment = None # default='warn' import os os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' import tensorflow as tf from tensorflow.keras import Input import tensorflow.keras.callbacks from keras.models import Sequential from keras.layers.core import Dense from keras.optimizers import SGD,Adam from keras.models import Model from keras.layers import concatenate,Activation from keras.layers.advanced_activations import ELU from sklearn.metrics import classification_report from sklearn.preprocessing import LabelBinarizer from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt import numpy as np from getWeights import GetWeights def build(layer_str): #take the input layer structure and convert it into a list layers=layer_str.split("") #convert the strings in the list to integer layers=list(map(int,layers)) #let's build our model #we add the first layer and the input layer to our network inputs = Input(shape=(layers[0],)) H_inputs=inputs #we add the hidden layers Hidden_list=[] for (x,i) in enumerate(layers): if(x>0 and x!=(len(layers)1)): layer=Dense(i)(H_inputs) Hidden_list.append(ELU(alpha=1.0)(layer)) H_inputs=Hidden_list[1] #then add the final layer classifier = Dense(layers[1],activation="sigmoid")(Hidden_list[1]) model = Model(inputs=inputs, outputs=classifier) return model def split(data,label,split_ratio): train_list=[] test_list=[] for a in data: split=round(len(a)*(1split_ratio)) train_list.append(a[:split]) test_list.append(a[split:]) for l in label: split=round(len(l)*(1split_ratio)) train_list.append(l[:split]) test_list.append(l[split:]) return train_list,test_list def train_eval(data,label,model,lr=0.01,epochs_in=100,batch_size_in=16): warnings.filterwarnings("ignore", category=FutureWarning) #split your data and labels into test and train data, we usually use 25% of the total data for testing initial_learning_rate=lr #for merged model split_ratio=0.25 train_list,test_list=split(data,label,split_ratio) #extract label trainY=train_list[3:] del train_list[3:] testY=test_list[3:] del test_list[3:] #training the network print("[INFO]Trainig the network....") decay_steps = 1000 lr_decayed_fn = tf.keras.experimental.CosineDecay(initial_learning_rate, decay_steps) sgd=SGD(lr_decayed_fn,momentum=0.8) model.compile(loss=["categorical_crossentropy","mean_squared_error","mean_squared_error"],optimizer=sgd,metrics=["accuracy"]) checkpoint_filepath = 'checkpoint1' model_checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_filepath, save_weights_only=True, monitor='val_pred_accuracy', mode='max', save_best_only=True) gw = GetWeights() H=model.fit(train_list,trainY,validation_data=(test_list,testY),epochs=epochs_in,batch_size=batch_size_in,callbacks=[[model_checkpoint_callback],[gw]]) #evalute the network print("[INFO]Evaluating the network....") predictions=model.predict(test_list,batch_size=batch_size_in) return(predictions) def Merge_model(layer,nbx,regress=False): model_list=[] for i in range(nbx): model=build(layer) model_list.append(model) merged_layers = concatenate([tf.convert_to_tensor(model_list[i].output) for i in range(nbx)]) x = Dense(nbx,activation="relu")(merged_layers) out = Dense(nbx,activation="softmax",name="pred")(x) if(regress==True): adj1 = Dense(1, activation='linear',name="x")(x) adj2 = Dense(1, activation='linear',name="y")(x) merged_model = Model([model_list[i].input for i in range(nbx)], [out,adj1,adj2]) else: merged_model = Model([model_list[i].input for i in range(nbx)], [out]) return merged_model
This is how I Implemented it:
with open("dataframe.pkl","rb") as vector_file: vect_df=pickle.load(vector_file) input_list=[np.stack(vect_df[str(i)]) for i in range(294) ] #hyperparameters nbx=294 lr=1e8 epochs=100 batch_size=16 #input data data=input_list label_path=glob.glob("test_image/*.pkl") label=lb. read_label_file(label_path) #if regressing uncomment the following label1=np.array([a[0] for a in label]) label2=np.array([a[1] for a in label]) label3=np.array([a[2] for a in label]) input_label=[label1,label2,label3] model=nn.Merge_model("1771",nbx,regress=True) plot_model(model, to_file='model.png',rankdir='LR') prediction=nn.train_eval(data,input_label,model,lr,epochs,batch_size)
The plot for my neural network: https://drive.google.com/file/d/1w_Obek1fzyrUBRfXilEBD4LH5urP0kal/view?usp=sharing

IndentationError: unexpected indent in fcc Sea level predictor project
Please help me. New in this field and asking 1st time on this platform.
My code
import pandas as pd import matplotlib.pyplot as plt from scipy.stats import linregress def draw_plot(): # Read data from file df=pd.read_csv("epasealevel.csv") # Create scatter plot xpoints=df["Years"] ypoints=df["CSIRO Adjusted Sea Level"] plt.scatter(xpoints, ypoints) # Create first line of best fit res= linregress(xpoints,ypoints) print(res) x_pred=pd.Series([i for i in range(1880,2051)]) y_pred=res.slope*x_pred + res.intercept plt.plot(x_pred, y_pred,"r") # Create second line of best fit new_df=df.loc[df["Year"]>=2000] new_x=new_df["Year"] new_y=new_df["CSIRO Adjusted Sea Level"] res_2= linregress(new_x,new_y) x_pred2=pd.Series([i for i in range(2000,2050)]) y_pred2=res_2.slope*x_pred2 + res_2.intercept plt.plot(x_pred2, y_pred2,"green") # Add labels and title plt.xlabel("Year") plt.ylabel("Sea Level (inches)") plt.title("Rise in Sea Level") plt.legend(fontsize="medium") plt.show() # Save plot and return data for testing (DO NOT MODIFY) plt.savefig('sea_level_plot.png') return plt.gca()
Error: xpoints=df["Years"] ^ IndentationError: unexpected indent
Link of my replit : https://replit.com/@LavishSinghal/boilerplatesealevelpredictor1#sea_level_predictor.py

How to Identify nearby water resources from my current location using google map API
I want to Identify nearby water resources(within the range of 100m) from my current location using google map API.

1 I am trying to create an animation using gganimate:
grafico < nba1 %>% + ggplot() + + geom_point(aes(x = PTS, y = PF, col = Team, size = G), alpha = 0.8) + theme_minimal() + + theme(legend.position = "bottom") + guides(size = "none") + + labs(x = "Points", y="GWS", col = "") > grafico > library(gganimate) > grafico + + transition_time(year)
This the error I have received
Error in seq.default(range[1], range[2], length.out = nframes) : 'from' must be a finite number In addition: Warning messages: 1: In min(x) : no nonmissing arguments to min; returning Inf 2: In max(x) : no nonmissing arguments to max; returning Inf
Iam using the following packages
packageVersion('ggplot2') [1] ‘3.3.3’ > packageVersion('gifski') [1] ‘1.4.3.1’ > packageVersion('gganimate') [1] ‘1.0.7’
Any help will be great. Thanks

How to interpret in R a list in the second parenthesis of a class instance call in Python
I struggle with the question how to convert a specific Python code regarding to an instance call with two parentheses in R. In the keras implementation of a specific neural network there's the following code fragment. The complete code can be found here in line 168.
x = layers.Lambda(lambda inputs, scale: inputs[0] + inputs[1] * scale, output_shape=backend.int_shape(x)[1:], arguments={'scale': scale}, name=block_name)([x, up])
In the second parenthesis is a list with layer classes [x, up]. The Lambda class has implemented callable and makes it possible that the instance of that class is called as a function. In that case, the object in the second parenthesis can be seen as a piped first argument to a function in R like:
x < x %>% keras::layer_lambda(f = function(inputs, scale) { inputs[[1]] + inputs[[2]] * scale }, output_shape = keras::k_int_shape(object)[1], arguments = list(scale = scale))
But how should I deal with the list [x, up] in R?
Thanks in advance!

Is it advisable to learn Python or R to get started with HR analytics?
What are the best sources to start learning Python/ R programming?

R hypergeometric test between a character vector and a list, calculating p values in a loop
I'm trying to write a code myself to run the hypergeometric test in R using
phyper
.I have a character vector of upregulated genes: (or these are "red" balls I pulled out from my urn)
gene.up < c("A", "B", "C", "D")
I also have a character vector of all genes found in my experiment: (or these are all balls  "white" and "red"  that I pulled out from my urn)
gene.background < c("A", "B", "C", "D", "E", "F")
I also have a list of characters containing pathway information: (or each of the "pathway" is a subset of balls I pulled out from my urn, and in this case, my urn has 5 white and 4 red balls)
gene.pathway.list < list("pathwayA" = c("A", "F", "G"), "pathwayB" = c("A", "B", "E", "H"), "pathwayC" = c("D", "G", "I"))
Now I need to run a hypergeometric test for each pathway in my
gene.pathway.list
. So I created an empty data frame to store the pathway names and p values from the hypergeometric test, and created a loop of tests like below.df < data.frame(pathway=character(length(gene.pathway.list)), pvalue=numeric(length(gene.pathway.list))) for (i in c(1:length(gene.pathway.list))) { df[i,1] < names(gene.pathway.list[i]) df[i,2] < phyper(sum(gene.pathway.list[[i]] == gene.up), length(gene.pathway.list[[i]]), length(unique(unlist(gene.pathway.list)))  length(gene.pathway.list[[i]]), length(gene.background)) }
However, the output values don't make any sense  for example, my p value for
pathway C
is zero, but how can the possibility of pulling out"C"
and"D"
be zero? I'm trying to figure out what went wrong, what did I set up incorrectly? 
Where to find curve equations to generate random seed data points that fits a plot's curve?
So my goal is I would like to generate a plot of data that fits along some curve. So you see basically a chart of points that looks somewhat realistic, like it might be real data. However, this is used for seed data in an app, like generating a "time spent on site" chart, which is simply a reverse exponential like curve sorta thing, converted into points that randomly deviate slightly from some "perfect" curve equation.
The first tangential part of my question is where can you find a bunch of equations for curves, or how can I draw my own curve (on some free online curve builder)? Then I would like to take such a curve (assuming it's an equation of some sort, like even a bunch of bezier curves, I don't know yet), and convert it into random points slightly deviating up and down from that curve.
The second main part of my question is, how to take a curve equation, and generate discrete points along such a curve, such that each point doesn't perfectly fall on the curve but deviates slightly up and down so it gives a more natural feel to it.
For example, how do I find lists of equations which generate curves like this (or how do I generate such a curve from a drawing)?
Next, using any simple curve as an example, such as a simple
y = x^2
curve, how do you generate points along the curve? Is it just a simple matter of doing this?const curves = [ (x) => x ** 2 ] function generatePoints(curve, values) { return values.map(curve).map(randomizeSlightly) }
If that is all that is required, how do I find or generate such curve equations to provide my "random seed data plot generator" with hundreds of interesting realistic sorts of curves? Ideally I could just draw 2 or 3 curves which I think look like common data in various categories, and then I could just pick from that, run it through my function above, and get points along the curve. How do I find/generate such curves so I can start generating these points?

PDF of a Lognormal Distribution
I have tried to draw a distribution function with a given mean and standard deviation. However, drawing the distribution function only shows the histograms and not the distribution function and I do not know why it is not drawn:
mean = 15.14 stdev = 0.3738 phi = (stdev ** 2 + mean ** 2) ** 0.5 mu = np.log(mean ** 2 / phi) sigma = (np.log(phi ** 2 / mean ** 2)) ** 0.5 data=np.random.lognormal(mu, sigma , 1000) mu, sigma, n= lognorm.fit(data) plt.hist(data, bins=30, density=True, alpha=0.5, color='b') # Plot the PDF. xmin, xmax = plt.xlim() x = np.linspace(xmin, xmax, 1000) p = lognorm.pdf(x, mu, sigma) plt.plot(x, p, 'k', linewidth=2) title = "LogNormal Distribution: Media: {:.2f} y Dev.Est: {:.2f}".format(mean, stdev) plt.title(title) plt.show()
The result that I have obtained: