Does Random Seed affects my models after trained/exported
I am in great doubt about one of my models. I've trained a Random Forest model and found the best parameters after a grid search, however i forgot to set at seed at the moment.
Once I had my best model, I exported these parameters to be able to reproduced my results.
I have a set of data that once labelled by my best model it gets X probability. However, I ran the same set of data again through the same model and got a diferent probability.
Does the random seed impacts in any way my trained model in this case? I MUST set a random seed when exporting my parameters?
See also questions close to this topic

How can I compute the tensor in Pytorch efficiently?
I have a tensor
x
andx.shape=(batch_size,10)
, now I want to takex[i][0] = x[i][0]*x[i][1]*...*x[i][9] for i in range(batch_size)
Here is my code:
for i in range(batch_size): for k in range(1, 10): x[i][0] = x[i][0] * x[i][k]
But when I implement this in
forward()
and callloss.backward()
, the speed of backpropagation is very slow. Why is it slow and is there any way to implement it efficiently? 
How do I get all Gini indices in my decision tree?
I have made a decision tree using sklearn, here, under the SciKit learn DL package, viz.
sklearn.tree.DecisionTreeClassifier().fit(x,y)
.How do I get the gini indices for all possible nodes at each step?
graphviz
only gives me the gini index of the node with the lowest gini index, ie the node used for split.For example, the image below (from
graphviz
) tells me the gini score of the Pclass_lowVMid right index which is 0.408, but not the gini index of the Pclass_lower or Sex_male at that step. I just know the Gini index of Pclass_lower and Sex_male must be greater than (0.408*0.7 + 0) but that's it. 
Fighting against overfitting in an RNN model
We are currently trying to use an RNN model to build a classifier using text features. Our final accuracy on the training data is 87% but our accuracy on validation data flat out at 57% which is clearly overfitting. We think that the reason for overfitting is because of small data size since we only have about 4000 entries. What can we do to fix to the problem, we have also thought about doing data augmentation but all we can find is replacing words with synonyms which it wouldn't work in our case. Here's our code for the model and thank you in advance.
model = Sequential() model.add(Embedding(num_vocab+1,32)) model.add(SimpleRNN(64)) model.add(Dense(num_classes, activation='softmax')) model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc']) history = model.fit(f_train, cause_train, epochs=10, batch_size=50, validation_split=0.2)

Making Random Forest outputs like Logistic Regression
I am asking dimensional wise etc. I am trying to implement this amazing work with random forest https://www.kaggle.com/allunia/howtoattackamachinelearningmodel/notebook
Both logistic regression and random forest are from sklearn but when I get weights from random forest model its (784,) while the logistic regression returns (10,784)
My most problems are mainly dimension and NaN, infinity or a value too large for dtype errors with attack methods. The weights using logical regression is (10,784) but with Random Forest its (784,) may be this caused the problem? Or can you suggest some modifications to attack methods? I tried Imputer for NaN values error but it wanted me to reshape so I've got this. I tried applying np.mat for the dimension errors I'm getting but they didnt work.
def non_targeted_gradient(target, output, w): target = target.reshape(1, 1) output = output.reshape(1, 1) w = w.reshape(1,1) target = imp.fit_transform(target) output = imp.fit_transform(output) w = imp.fit_transform(w) ww = calc_output_weighted_weights(output, w) for k in range(len(target)): if k == 0: gradient = np.mat((1target[k])) * np.mat((w[k]ww)) else: gradient += np.mat((1target[k])) * np.mat((w[k]ww)) return gradient
I'm probably doing lots of things wrong but the TL;DR is I'm trying to apply Random Forest instead of Logistic regression at the link above.

how to map each leave samples in each tree in random forest classifier to it's X and y after fit?
i am triyng to understand how to map leaves to it's original X and y . i tried to used the Print the decision path of a specific sample in a random forest classifier and i can't understand how to map
children_left_ = [t.tree_.children_left for t in estimator.estimators_] children_right_ = [t.tree_.children_right for t in estimator.estimators_]
to it's original X and Y

Predictive models to predict sales with r
I would like to find a good model to predict which client will buy my product in 2018. I would like to have opinions on which method can fit my data to predict which client will the product A in 2018.
I have the following data:
client buyproductin17 buyproductin16 buyproductin15 all_day day_cnt product 22 1 0 1 34 5 2 23 0 1 1 56 11 2 24 1 1 1 122 45 3
client = client ID
buyproduct17 = if the client buy the product A in 2017 or not
buyproduct16 = if the client buy the product A in 2016 or not
buyproduct15 = if the client buy the product A in 2015 or not
all_day = total number of day the client spent with all my product
day_cnt = total number of day the client spent with product A
product = total number of product A the client has
My first thought are a logistic regression model or a random forest. But which dependant variable can I use ? buyproductin17 ?
Thanks a lot
Dany

malloc error when generating huge random number
I want to get a random element from 0 to a huge number (2^31).
I tried creating an
Array
from such aRange
(so I can use Swift'sArray.randomElement
), as done here:let myArray: [Int64] = [Int64](0...4294967292)
Which compiles, but crashes with:
MyPoject(1569,0x100cc2f40) malloc: can't allocate region mach_vm_map(size=34359738368) failed (error code=3) MyProject(1569,0x100cc2f40) malloc: set a breakpoint in malloc_error_break to debug
Of course, I can write a custom function to create the array, but that smells, especially because the array will be the exact same every time.
Does Swift provide a better solution?

When is mt_rand seeded?
At what moment is the seed for
mt_rand
selected? According to several sources php should reseed this PRNG in every new process. This works perfectly on my windows machine. On the linux machine however a new seed is used with every request. I checked that the requests are running in the same process (usinggetmypid
) and they do. Am I missing something? 
Create random data during Promise.all
I'd like to create a mock data based on a JSON array to test frontend.
Object structure in the JSON array:
Mock:
{ category: string, items: string[] }
Category:
{ name: string, products: Product[] }
Product:
{ name: string, prices: Price[] }
Price:
{ date: Date, value: number }
I'd like to create random
Price
s for eachProduct
, therefore I created the following helper function:// Creates Price array with random values within a given time window // `times` is a `lodash` function const getRandomPrices = (numberOfDays, minValue, maxValue) => times(numberOfDays, n => ({ date: new Date(new Date().setDate(new Date().getDate()  n)), value: Math.floor(Math.random() * maxValue) + minValue }))
And I’m seeding my MongoDB database with the following function:
// data is Mock[] // Price, Product and Category are MongoDB schemas const createCategories = data.map(async c => { const createProducts = c.items.map(async p => { const prices = getRandomPrices(15, 100, 1000) const createPrices = prices.map(price => Price.create(price)) return Product.create({ name: p, prices: await Promise.all(createPrices) }) }) return Category.create({ name: c.category, products: await Promise.all(createProducts) }) }) await Promise.all(createCategories) console.log('done')
My problem is that I have same
Price
data for anyProduct
in anyCategory
and my questions are: how can I randomize prices?
 where am I doing wrong?