How to apply the learned model using boosting technique
I have no clue on how to apply the learned model using adaptive boosting method since there is not model as such that is returned similar to the one returned by say the SVM classifier's fitcsvm()
function. I am using the implementation given
https://www.mathworks.com/matlabcentral/fileexchange/63162-adaboost?focused=7722161&tab=function
This method uses 5 base classifiers and so the number of learning cycles =5. Variable ada_test
contains the labels from each independent run, itr
of the boosting module.
QUESTION: For the boosting method once the training is done which includes the training set and prediction on the validation set, which variable should be used to predict on the unseen data set for testing? In the following code,
for itr=1:maxItr
[~,ada_test(:,itr)]= adaboost(X,Y, Xtest);
fm_=[fm_; confusion_mat(Ytest, ada_test(:,itr))];
end
the variable ada_test(:,1)=sign(Htest*alpha');
are the predicted labels. In the function, adaboost()
there is the line ada_train(:,1)=sign(H*alpha');
Why ada_train
variable is never used and how to test the learned adaboost model on unseen data? I cannot understand the procedure to predict on a new unseen test data because I cannot follow which variable or what is the learned model. Please help.
See also questions close to this topic
-
Solve a system of boolean equations in Matlab
I have a set of boolean equations, i.e.
var1 = x AND y var2 = x OR z var3 = ... var4 = ...
And the constraint that every output
vari
should equal 1.I want every corresponding combination of input variables (x, y ,z ...) which satisfies these equations.
For example, the first two equations would allow
[x y z] = [1 1 0] or [1 1 1]
as solutions. -
How to solve an error about 'sys/times.h' in Matlab?
I am trying to compile some code, but an error is occurred.
Error using mex correspondPixels.cc G:\matlab\agreement-master\agreement-master\functions\iprecision\support_functions\source\csa.hh(18): fatal error C1083: Cannot open include file: 'sys/times.h': No such file or directory Error in build (line 7) mex -largeArrayDims -v CXXFLAGS="\$CXXFLAGS -O3 -DNOBLAS" -outdir ../ correspondPixels.cc csa.cc kofn.cc match.cc Exception.cc Matrix.cc Random.cc String.cc Timer.cc
and the link of code is: GitHub, that the function of 'build.m' should be run.
I've also seen this link. I couldn't use the suggestion of it in the code for windows.
How do I get
sys/times.h
in Windows10? -
Logistic regression in matlab using mnrfit
I'm trying to use the
mnrfit
function but I get the errorIf Y is a column vector, it must contain positive integer category numbers.
.My data is in a double and my Y values are floats, e.g. 0.6667. Is there a way that I can adjust my data to be able to use the mnrfit function?
Thanks in advance! An unexperienced beginner
-
Input dimension error on pytorch's forward check
I am creating an RNN with
pytorch
, it looks like this:class MyRNN(nn.Module): def __init__(self, batch_size, n_inputs, n_neurons, n_outputs): super(MyRNN, self).__init__() self.n_neurons = n_neurons self.batch_size = batch_size self.n_inputs = n_inputs self.n_outputs = n_outputs self.basic_rnn = nn.RNN(self.n_inputs, self.n_neurons) self.FC = nn.Linear(self.n_neurons, self.n_outputs) def init_hidden(self, ): # (num_layers, batch_size, n_neurons) return torch.zeros(1, self.batch_size, self.n_neurons) def forward(self, X): self.batch_size = X.size(0) self.hidden = self.init_hidden() lstm_out, self.hidden = self.basic_rnn(X, self.hidden) out = self.FC(self.hidden) return out.view(-1, self.n_outputs)
My input
x
looks like this:tensor([[-1.0173e-04, -1.5003e-04, -1.0218e-04, -7.4541e-05, -2.2869e-05, -7.7171e-02, -4.4630e-03, -5.0750e-05, -1.7911e-04, -2.8082e-04, -9.2992e-06, -1.5608e-05, -3.5471e-05, -4.9127e-05, -3.2883e-01], [-1.1193e-04, -1.6928e-04, -1.0218e-04, -7.4541e-05, -2.2869e-05, -7.7171e-02, -4.4630e-03, -5.0750e-05, -1.7911e-04, -2.8082e-04, -9.2992e-06, -1.5608e-05, -3.5471e-05, -4.9127e-05, -3.2883e-01], ... [-6.9490e-05, -8.9197e-05, -1.0218e-04, -7.4541e-05, -2.2869e-05, -7.7171e-02, -4.4630e-03, -5.0750e-05, -1.7911e-04, -2.8082e-04, -9.2992e-06, -1.5608e-05, -3.5471e-05, -4.9127e-05, -3.2883e-01]], dtype=torch.float64)
and is a batch of 64 vectors with size 15.
When trying to test this model by doing:
BATCH_SIZE = 64 N_INPUTS = 15 N_NEURONS = 150 N_OUTPUTS = 1 model = MyRNN(BATCH_SIZE, N_INPUTS, N_NEURONS, N_OUTPUTS) model(x)
I get the following error:
File "/home/tt/anaconda3/envs/venv/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 126, in check_forward_args expected_input_dim, input.dim())) RuntimeError: input must have 3 dimensions, got 2
How can I fix it?
-
Removing stopwords; can't get rid of certain stopwords no matter what?
I'm trying to clean up a train set of newsarticles (input newsarticles) My output here is the 10 most common words in the articles.
When using the nltk stopwords certain words still got through: ['the','would','said','one','also','like','could','he'] So I added them to stopwords myself. I tried both the append method and the extend as shown in the code below. But the desired stopwords (words to be emitted) "the", and "he" is not removed.
Anyone knows why? Or know what I might be doing wrong?
(And yes; Ive googled it ALOT) import numpy as np import matplotlib.pyplot as plt import pandas as pd import nltk from nltk.corpus import stopwords import string import re from IPython.display import display from sklearn.feature_extraction.text import CountVectorizer #importing dataset and making a copy as string data = pd.read_csv('train.csv', encoding="ISO-8859-1") data1 = data.copy() data1.text = data1.text.astype(str) to_drop = ['id', 'title', 'author',] data1.drop(to_drop, inplace=True, axis=1) #cleaning text for punctuation, whitespace, splitting, and set to lower data1['text'] = data1['text'].str.strip().str.lower().str.replace('[^\w\s] ', '').str.split() #removing stopwords stopwords = nltk.corpus.stopwords.words('english') custom_words = ['the','would','said','one','also','like','could','he'] stopwords.extend(custom_words) data1['text'] = data1['text'].apply(lambda x: [item for item in x if item not in stopwords ]) data1['text']= data1['text'].apply(lambda x: " ".join(x)) vectorizer = CountVectorizer(max_features=1500, analyzer='word') train_voc = vectorizer.fit_transform(data1['text']) sum_words = train_voc.sum(axis=0) words_freq = [(word, sum_words[0, idx]) for word, idx in vectorizer.vocabulary_.items()] words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True) print (words_freq[:10]) display(data1.head())
Output:
[('the', 31138), ('people', 28975), ('new', 28495), ('trump', 24752), ('president', 18701), ('he', 17254), ('us', 16969), ('clinton', 16039), ('first', 15520), ('two', 15491)] text label 0 house dem aidewe didnât even see comeyâs l... 1 1 ever get feeling life circles roundabout rathe... 0 2 truth might get fired october 292016 tension i... 1 3 videos 15 civilians killed single us airstrike... 1 4 print iranian woman sentenced six years prison... 1
Someone asked for an example. This is an example where you can see 2 outputs; one before removing stopwords, and one after removing them. The same goes for this data, only that its a much larger dataset, and the output is the most common words aswell. Example output:
without stopwords: ['This', 'is', 'a', 'sample', 'sentence', ',', 'showing', 'off', 'the', 'stop', 'words', 'filtration', '.']
With stopwords['This', 'sample', 'sentence', ',', 'showing', 'stop', 'words', 'filtration', '.']
-
I need help for my final year project about sentiment classification
I need help for my final year project. i am working on a sentimental classifier to classify peoples reactions in python. the available research is only classifying a text as either negative positive or neutral. but my research is about classifying peoples reaction basing on the post. we get facebook posts and the corresponding reactions and then classify them. so the sentiment depends on both the reaction and the post.
i also have to extract features that can determine the sentiment of a reaction like age, sex ,education background etc. any help is appreciated. thank you.
-
Keras LSTM different X timesteps to Y timesteps (e.g. learn on last 4 predict next 2)
I'm having some trouble getting an LSTM model in Keras to train on the last 4 timesteps and then just predict the next 2 timesteps.
I'm sure its possible but think I'm just confusing some of the keras api.
Here is a Google Colab workbook that generates some fake data, reshapes the X and Y to be passed into the model and then trains the model.
If I set
X_N_TIMESTEPS
to be the same asY_N_TIMESTEPS
it trains fine - so for example use the last 4 timesteps to predict the next 4.But I'm trying to be a bit more general and be able to train on say last 4 timesteps and then predict the next 2. The
make_xy()
function reshapes the data as I think it needs to. e.g.X.shape=(1995, 4, 3) Y.shape=(1995, 2, 3)
I think what I'm missing is telling the last
Dense()
layer I want it to output just 2 timesteps. The error I get is:ValueError: Error when checking target: expected dense_1 to have shape (4, 3) but got array with shape (2, 3)
Which sort of suggests the last dense layer does not know I just want 2 timesteps even though that's what I'm passing in as the Y values.
I found this which indicated that maybe I could pass an output_dim to the last dense layer but I get an error if I try that saying I need to use keras api v2, and when I look at the docs for Dense I think the api must have changed a bit since then.
Here is all the code (in case its preferred over the colab link):
import numpy as np import pandas as pd from numpy import concatenate from matplotlib import pyplot from keras.models import Sequential from keras.callbacks import Callback from keras.layers import LSTM, Dense, Activation import matplotlib.pyplot as plt # %matplotlib inline # define some variables N_FEATURES = 3 X_N_TIMESTEPS = 4 Y_N_TIMESTEPS = 2 N_DATA_ORIG = 3000 N_ROLLING = 1000 N_DATA = N_DATA_ORIG - N_ROLLING # make some noisy but smooth looking data data = np.sqrt(np.random.rand(N_DATA_ORIG,N_FEATURES)) df_data = pd.DataFrame(data) df_data = df_data.rolling(window=N_ROLLING).mean() df_data = df_data.dropna() df_data = df_data.head(N_DATA) print(df_data.shape) data = df_data.values print(data.shape) print(df_data.head()) # plot the normal healthy data fig, ax = plt.subplots(num=None, figsize=(14, 6), dpi=80, facecolor='w', edgecolor='k') size = len(data) for x in range(data.shape[1]): ax.plot(range(0,size), data[:,x], '-', linewidth=1) def make_xy(data,x_n_timesteps,y_n_timesteps,print_info=True): ''' Function to reshape the data into model ready format, either for training or prediction. ''' # get original data shape data_shape = data.shape # get n_features from shape of input data n_features = data_shape[1] # loop though each row of data and reshape accordingly for i in range(len(data)): # row to start on for x xi = i # row to start on for y yi = i + x_n_timesteps x = np.array([data[i:(i+x_n_timesteps),]]) y = np.array([data[yi:(yi+y_n_timesteps),]]) # only collect valid shapes if (x.shape == (1,x_n_timesteps,n_features)) & (y.shape == (1,y_n_timesteps,n_features)): # if initial data then copy else concatenate if i == 0: X = x Y = y else: X = np.concatenate((X,x)) Y = np.concatenate((Y,y)) if print_info: print('X.shape={}'.format(X.shape)) print('Y.shape={}'.format(Y.shape)) return X, Y # build network model = Sequential() model.add(LSTM(10,input_shape=(X_N_TIMESTEPS,N_FEATURES),return_sequences=True)) model.add(LSTM(10,return_sequences=True)) model.add(Dense(N_FEATURES)) model.compile(loss='mae', optimizer='adam') # print model summary print(model.summary()) # reshape data for training print(f'... reshaping data for training ...') X, Y = make_xy(data,X_N_TIMESTEPS,Y_N_TIMESTEPS) # fit model model.fit(X, Y)
-
Stacking up of LSTM outputs in pytorch
I was going through some tutorial about the sentiment analysis using lstm network. The below code said that its stacks up the lstm output. I Don't know how it works.
lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)
-
I need an annotation tool to label many images to create a database
i need a tool that can read a list of images one by one and in each one i want to be able to highlight the damage and then save the processed image in a file .At the same time i want to be able to label the image (damage position) and store both name of the image and its label in a csv file.The csv file should contain at the end all the names of the processed images and their labels
-
How can I save the upsampled training data generated using trainControl(sampling="up"...) option in R's Caret library?
I am using the
Caret
package in R to train different binary classifiers.If I use the option
trainControl(sampling="up", ....)
, the training data is up-sampled prior to fitting the model.Is there a way to save the up-sampled training data?
PS: In some cases, it seems that the up-sampled training data is saved by the classifier itself (e.g. for PLS, it's in
$finalModel$model
), but I'd like to find a Caret-based solution if possible. -
I need to classify by categories without mixing the data of different columns
I have the following dataset:
Year Company Product Sales 2017 X A 10 2017 Y A 20 2017 Z B 20 2017 X B 10 2018 X B 20 2018 Y B 30 2018 X A 10 2018 Z A 10
I want to obtain the following summary:
Year Product Sales 2017 A 30 B 30 2018 A 50 B 20
and also the following summary:
Year Company Sales 2017 X 20 Y 20 Z 20 2018 X 50 Y 10 Z 10
Is there any way to do it without using loops?
I know I could do something with the function aggregate, but I don't know how to proceed with it without mixing the data of company, product and year. For example, I get the total sales of product A and B, but it's mixing the sales of both years instead of giving A and B in 2017, and separated in 2018.
Do you have any suggestions?
-
Unify classifiers based on their predictions
I am working on classifying texts and images of scientific articles. From the texts I use title and abstract. So far I have achieved good results using an SVM for the texts and not that good using a CNN for the images. I still did a multimodal classification, which did not show any classification improvement.
What I would like to do now is to use the svm and cnn predictions to classify, something like a vote ensemble. However the VotingClassifier from sklearn does not accept mixed inputs. You would have some idea of how I could implement or some guide line.
Thank you!
-
AttributeError: 'NoneType' object has no attribute 'set_params' in Adaboost classifier with grid search
I am working on
Adaboost
classifier where I am using sklearn library. I am tuning the parameter by grid search but this error I am getting:AttributeError: 'NoneType' object has no attribute 'set_params'
-
What is difference between Decision Tree and Boosting Algorithm?
Decision tree mostly used for Classification problems.
Coming to Boosting algo(XGBoost, Adaboost etc..) helps to get better performance and better execution speed.
What is difference between them and in which Scenario we will use them individually?
-
what is the algorithm details of "AdaboostRegression" in machine learning fields?
I have search much for this question, including "adaboost" paper which only explains "AdaboostClassifier" method; Now I want to code "AdaboostRegression" in Python and I can't find the exact details for coding. I also check 'sklearn' about this method, but I can't find the detail which maybe hides in 'sklearn'; so shall I get more information about it?