MLP performing worse than the SGD in a supervised classification
The learning dataset I'm using is a grayscale image that was flatten
to have each pixel representing an individual sample. The second image will be classified pixel by pixel after training the Stochastic gradient descent
(SGD
) and Multilayer perceptron
(MLP
) classifiers on the former one.
The problem I have is that the SGD
is performing way better than the MLP
, even if I keep the default parameters provided by Scikitlearn
in both cases. Below is the code for both (please notice that because the training dataset is in the order of millions of samples, I had to employ partial_fit()
to train the MLP
by chunks, while this was not necessary for the SGD
):
def batcherator(data, target, chunksize):
for i in range(0, len(data), chunksize):
yield data[i:i+chunksize], target[i:i+chunksize]
def classify():
if algorithm == 'sgd':
classifier = SGDClassifier(verbose=True)
classifier.fit(training.data, training.target)
elif algorithm == 'mlp':
classifier = MLPClassifier(verbose=True)
gen = batcherator(training.data, training.target, 1000)
for chunk_data, chunk_target in gen:
classifier.partial_fit(chunk_data, chunk_target,
classes=np.array([0, 1]))
My question is which parameters should I adjust in the MLP
classifier to make its results similar to those obtained with the SGD
?
I've tried to increase the number of neurons in the hidden layer using hidden_layer_sizes
but I didn't see any improvement. No improvement either if I change the activation function of the hidden layer from the default relu
to logistic
using the activation
parameter.
See also questions close to this topic

using toarray() with onehotencoding during data preprocessing
I am new to machine learning. I have one doubt: why use
toarray()
with onehotencoding while not with label encoding here. I am not getting any idea. pls someone help.from sklearn.preprocessing import LabelEncoder, OneHotEncoder label_encoder_x = LabelEncoder() x[:, 0] = label_encoder_x.fit_transform(x[:, 0]) onehotencoder = OneHotEncoder(categorical_features= [0]) x = onehotencoder.fit_transform(x).toarray() label_encoder_y = LabelEncoder() y = label_encoder_y.fit_transform(y)

How to apply TimeDistributed layer on a CNN block?
Here is my attempt:
inputs = Input(shape=(config.N_FRAMES_IN_SEQUENCE, config.IMAGE_H, config.IMAGE_W, config.N_CHANNELS)) def cnn_model(inputs): x = Conv2D(filters=32, kernel_size=(3,3), padding='same', activation='relu')(inputs) x = MaxPooling2D(pool_size=(2, 2))(x) x = Conv2D(filters=32, kernel_size=(3,3), padding='same', activation='relu')(x) x = MaxPooling2D(pool_size=(2, 2))(x) x = Conv2D(filters=64, kernel_size=(3,3), padding='same', activation='relu')(x) x = MaxPooling2D(pool_size=(2, 2))(x) x = Conv2D(filters=64, kernel_size=(3,3), padding='same', activation='relu')(x) x = MaxPooling2D(pool_size=(2, 2))(x) x = Conv2D(filters=128, kernel_size=(3,3), padding='same', activation='relu')(x) x = MaxPooling2D(pool_size=(2, 2))(x) return x x = TimeDistributed(cnn_model)(inputs)
Which gives the following error:
AttributeError: 'function' object has no attribute 'built'

Keras weight file load exception: loading 2 layers into a model with 0 layers
Exception happened when I add dropout to the input layer.
The exception was mentioned in other threads as well related to another issues and most common suggested solution is to downgrade the Keras version. Is there a workaround for this exception?
def baseline_model() : model = Sequential() model.add(Dropout(0.35)) #THIS LINE CAUSES THE EXCEPTION model.add(Dense(200, input_dim=1200, kernel_initializer='normal', activation='relu')) model.add(Dropout(0.8)) rms = RMSprop(lr = 0.00050) model.add(Dense(1, kernel_initializer='normal', activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer=rms, metrics=['accuracy']) return model
Model throws the following exception during weight file loading:
ValueError: You are trying to load a weight file containing 2 layers into a model with 0 layers.

Crossvalidation gives unexpected results on Boston Housing without shuffling
I am getting surprising results for Boston Housing. The following code produces very different results when I apply crossvalidation to the original Boston Housing dataset and to its randomly shuffled version:
from sklearn.datasets import load_boston from sklearn.neighbors import KNeighborsRegressor from sklearn.model_selection import cross_val_score from sklearn.utils import shuffle boston = load_boston() knn = KNeighborsRegressor(n_neighbors=1) print(cross_val_score(knn, boston.data, boston.target)) X, y = shuffle(boston.data, boston.target, random_state=0) print(cross_val_score(knn, X, y))
The output is:
[1.07454938 0.50761407 0.00351173] [0.30715435 0.36369852 0.51817514]
Even if the order of the original dataset is not random, why are the 1 Nearest Neighbor predictions so poor for it? Thank you.

Isolation forest in python: How to recover the instances that are grouped in a leaf?
In an isolation tree, i want to know how to determine in python the branch that a new observation will cover until it reaches a leaf. In addition, how to recover the instances that are grouped in this leaf?

Neural network input being a list
I'm trying to train a neural network by supervised training model using the R language, but I'm having problems with the data frame because RN does not accept a list as input. Code used to create the dataframe:
df < data.frame("dados"= numeric(69),"saida"= numeric(69))
I insert each line of the data frame manually with the following code:
df$dados[1] < list(resultante1) df$saida[1] < 1
Here what you have in the resultante1:resultante1
After the long work, I have the following data frame: data frame
Note: the list has a high number of information, some inputs reach 50 thousand numbers, these values represent a signal and need to train the neural network with several signals so that it identifies similar signals. So I used 1 to say when is the signal I want and 0 for a similar signal but it is a false positive.
I used the RSNNS library to make use of the mpl function, but when I enter the input data, the following error occurs:
Function mlp:
mlp(df$dados, df$saida, size=nNeuronios, maxit=maxEpocas, initFunc="Randomize_Weights", initFuncParams=c(0.3, 0.3), learnFunc="Std_Backpropagation", learnFuncParams=c(0.1), updateFunc="Topological_Order", updateFuncParams=c(0), hiddenActFunc="Act_Logistic", shufflePatterns=F, linOut=TRUE)
Error:
Error in checkInput(x, y) : 'x' has to be numeric, after a conversion to matrix
Does any way the mlp function accept my data frame, or is there any other way to create and train a network to accept?

Caffe: second Slice layer does not working
In my proto there are 2 Slice layers. the fist one is working well and appears in the log. The second one seems does not work and it does not even appear in the log!

Losses are increasing in Binary classification using gradient descent optimization method
This my program for Binary classification using gradient descent optimization method. I am not sure about my loss function. The error in my case is incresing when plotted
def sigmoid_activation(x): return 1.0 / (1 + np.exp(x)) def predict(testX, W): preds= sigmoid_activation(np.dot(testX,W)) # apply a step function to threshold (=0.5) the outputs to binary class #labels #start your code here for i in range(len(preds)): if preds[i]<0.5: p.append(0) if preds[i]>=0.5: p.append(1) return p epochs = 50 alpha = 0.01 (X,y)=make_moons(n_samples=1000, noise = 0.15) y=y.reshape(y.shape[0],1) X = np.c_[X, np.ones((X.shape[0]))] (trainX, testX, trainY, testY) = train_test_split(X, y, test_size=0.5, random_state=42) print("[INFO] training...") W = np.random.randn(X.shape[1], 1) losses = [] for epoch in np.arange(0, epochs): #start your code here Z=np.dot(trainX, W) yhat= sigmoid_activation(Z) error=trainYyhat loss = np.sum(error ** 2) losses.append(loss) gradient = trainX.T.dot(error) / trainX.shape[0] W= Walpha*gradient #moving in ve direction # check to see if an update should be displayed if epoch == 0 or (epoch + 1) % 5 == 0: print("[INFO] epoch={}, loss={:.7f}".format(int(epoch + 1), loss)) # evaluate our model print("[INFO] evaluating...") preds = predict(testX, W) print(classification_report(testY, preds)) # plot the (testing) classification data plt.style.use("ggplot") plt.figure() plt.title("Data") plt.scatter(testX[:, 0], testX[:, 1], marker="o", c=testY[:,0], s=30) # construct a figure that plots the loss over time plt.style.use("ggplot") plt.figure() plt.plot(np.arange(0, epochs), losses) plt.title("Training Loss") plt.xlabel("Epoch #") plt.ylabel("Loss") plt.show()

the way to tune the threshold for the predict method in sklearn.ensemble.GradientBoostingClassifier
I am using gradient boosting classifier implemented in scikitlearn (sklearn.ensemble.GradientBoostingClassifier) for a binary classification problem. Though predict method is provided by default, no clues in the official documents have been shown:
how the default threshold is defined,
whether there is a customized way for modifying this threshold during tuning models
Any clue and advice will be welcomed
Many Thanks

How to tune SVM for Unbalanced Dataset?
I am a newcomer to Machine learning and classification. I am working for classification of two different types of image classes. I have calculated blockwise (overlapping, sliding (1px), size:3x3) features of every image in the dataset and stored them row wise. There are 800 images in one type of class (A) of dataset and 450 images in other type of class (B) in the dataset. So, basically I have an array of 800 rows and 90000 (approx.) columns in one class and 450 rows and 90000 (appox.) columns in other class.
Now I want to train and test SVM classifier on these two classes. I have tried to perform 10 fold classification using some of following methods:
Every alternative image from class A and first 400 from class B. Accuracy was around 70% (low TP, positive i am considering for minor class B).
All images from class A and all from class B. Accuracy was around 80% average.
In third case, I have upsampled class B with basic repetition measure i.e all images from class A, all images from B + first 350 images again from class B which boosted accuracy to above 90% with high TP and high TN.
I am using SVM as following:
SVMModel = fitcsvm(trained_data,gg,'Standardize',true,'KernelFunction','RBF','BoxConstraint', 32, 'KernelScale', 0.2008);
Although I am getting nice accuracy but I am not sure about acceptability of my upsampling methodology.
Question 1: Am I using right method to upsample the feature set? If not, what other suitable method can I use? any suggestions. (I have already tried SMOTE but it doesn't work because my data has very small values. Standard deviation and SMOTE perform addition/multiplication operation which causes major changes in the data, shifting to some other type class)
Question 2: Can I use SVM without manual upsampling in this case? If yes then what should I write in place of what I am already using?
Well, I have decided kernel scale according to misclassification rate on crossvalidation (third case).
Please guide me.

RELU Backpropagation
I am having trouble with implementing backprop while using the relu activation function. My model has two hidden layers with 10 nodes in both hidden layers and one node in the output layer (thus 3 weights, 3 biases). My model works other than for this broken broken backward_prop function. However, the function works with backprop using sigmoid activation function (included as comments in function). Thus, I believe I am screwing up the relu derivation.
Can anyone push me in the right direction?
# The derivative of relu function is 1 if z > 0, and 0 if z <= 0 def relu_deriv(z): z[z > 0] = 1 z[z <= 0] = 0 return z # Handles a single backward pass through the neural network def backward_prop(X, y, c, p): """ cache (c): includes activations (A) and linear transformations (Z) params (p): includes weights (W) and biases (b) """ m = X.shape[1] # Number of training ex dZ3 = c['A3']  y dW3 = 1/m * np.dot(dZ3,c['A2'].T) db3 = 1/m * np.sum(dZ3, keepdims=True, axis=1) dZ2 = np.dot(p['W3'].T, dZ3) * relu_deriv(c['A2']) # sigmoid: replace relu_deriv w/ (1np.power(c['A2'], 2)) dW2 = 1/m * np.dot(dZ2,c['A1'].T) db2 = 1/m * np.sum(dZ2, keepdims=True, axis=1) dZ1 = np.dot(p['W2'].T,dZ2) * relu_deriv(c['A1']) # sigmoid: replace relu_deriv w/ (1np.power(c['A1'], 2)) dW1 = 1/m * np.dot(dZ1,X.T) db1 = 1/m * np.sum(dZ1, keepdims=True, axis=1) grads = {"dW1":dW1,"db1":db1,"dW2":dW2,"db2":db2,"dW3":dW3,"db3":db3} return grads

Why is Gradient Checking Slow For Back Propagation?
I recently learned the algorithm "Gradient Checking" for making sure the derivatives of my Neural Network's Back Propagation are calculated properly.
The course from which I have learned, and many other sources such as this one, claim it to be much slower than calculating derivatives, but I can't seem anywhere that explains WHY.
So, why is gradient checking slower than calculating the derivative directly?
How much slower is it?