How to set a right max_iter value in sklearn LinearSVC to avoid Convergence Warning?
I refereed to this discussion on stackoverflow. As mentioned in this discussion I normalised my data and set dual = False in LinearSVC . Initially I set the max_iter to default(1000) received the ConvergenceWarning and then I set max_iter to 70000 still received the warning.
How to come up with a right max_iter value. My dataset size is 6000 and feature vary from 1000 to 6000(used RBF kernel approximation).
See also questions close to this topic

How to set the entries for KMeans?
In my work,I'm trying to find the different kind of User Behavior based on how a person uses his mobile applications. My dataset is .csv file (log file).I have data of one mounth application usage from many Android users. For each user, I collected his daily interactions related to the way he uses mobile apps such : Name of used App,Category App, Time of use, Frequency of use ,Duration of use. As our aim is to discover the different type of behavior "Clusters of behavior",I would like to to know how to set the entries (input ) for ML algorithmes like KMeans. should I enter kmeans for all users or apply kmeans for each user?

Signal processing to predict if the equipment requires maintenance or not
We consider the scenario of predictive maintenance. It consists of continuous data collection and analysis of say an equipment installed in a manufacturing plant. Data collection may involve capturing signals (eg. audio, vibration patterns etc.) from the equipment under study. Analysis refers to making informed predictions about possible equipment failure, and is typically based on signal processing and data analytics. Such proactive approach for maintenance, as opposed to reactive maintenance, is expected to play a key role in the Fourth Industrial Revolution (Industry 4.0). Suppose it is known from prior experience that all frequency components in the signal captured from a properly functioning equipment should necessarily be ≤ 25 Hz. Using this information, your task is to analyze the provided signal x[n] and predict if the equipment requires maintenance or not. Note that x[n] was obtained by sampling a CT signal x(t) at 100 Hz.

AttributeError: 'list' object has no attribute 'values' using MLPClassifier
I am having some problems with an MLPClassifier giving my the error "AttributeError: 'list' object has no attribute 'values'". The input to the .fit method is a dataframe and a series derived from a dataframe. The types can be found below.
self.inputSpace = self.inputSpace.sample(frac=1).reset_index() self.inputSpace = self.inputSpace.drop('index',axis=1) X = self.inputSpace.drop('label',axis=1) y = self.inputSpace['label'] print(X) print(type(X)) print(y) print(type(y)) self.model = MLPClassifier(random_state=42, max_iter=1000, hidden_layer_sizes(self.numOfInputs,100,100)).fit(X,y)
Output:
0 1 2 0 0 1 1 1 1 0 0 2 0 1 0 3 1 1 0 4 1 0 1 5 1 1 1 6 0 0 1 7 0 0 0 <class 'pandas.core.frame.DataFrame'> 0 0 1 0 2 1 3 0 4 0 5 1 6 0 7 0 Name: label, dtype: int64 <class 'pandas.core.series.Series'>
This same error happens if I do: y = pd.DataFrame(y) so the type(y) == <class 'pandas.core.frame.DataFrame'>
I have used this module many times in what seems to be the same exact way and I have never seen this problem. Any help would be great. Thank you!

XGB model (or any other ML model) objective function vs scoring metrics
I was trying to set the random state for XGB using numpy RandomState generator for hyperparameter tuning such that each instance would give a different column subsampling and so on.
However, unlike normal sklearn regressors such as random forest, it seems that I cannot set the random_state parameter as such:
regr = XGBRegressor('random_state': np.random.RandomState(42)) regr.fit(x_train, y_train) pred_y_test = regr.predict(x_test)
The following error occurs:
xgboost.core.XGBoostError: Invalid Parameter format for seed expect int but value='RandomState(MT19937)'
Do I have to set it as an integer only? What if I want the seed number to change after every hyperparameter trial? Is there an alternative random seed generator that I can use or should I just leave that parameter as None?

MultiLabelBinarizer with duplicated values
I have an expected array
[1,1,3]
and a predicted array[1,2,2,4]
for which I want to calculateprecision_recall_fscore_support
, so I need a matrix in the following format:>> mlb = MultiLabelBinarizerWithDuplicates() >> transformed = mlb.fit_transform([(1, 1, 3), (1, 2, 2, 4)]) array([[1,1,0,0,1,0], [1,0,1,1,0,1]]) >> mlb.classes_ [1,1,2,2,3,4]
For the duplicated values I don't care which one of them is turned on, meaning that this is also a valid result:
array([[1,1,0,0,1,0], [0,1,1,1,0,1]])
MultiLabelBinarizer clearly says "All entries should be unique (cannot contain duplicate classes)" so it doesn't support this usecase.

Model evaulation: Inverse scaling changes ratios of results
I have a problem with my model evaluation:
My procedure is as follows:
 Read in data and scale
x_train, y_train, _ = load_data(...) x_test, y_test, _ = load_data(...) x_train_scaler = StandardScaler() y_train_scaler = StandardScaler() x_train_scaler.fit(x_train) x_train_scaled = x_train_scaler.transform(x_train) x_test_scaled = x_train_scaler.transform(x_test) y_train_scaler.fit(y_train) y_train_scaled = y_train_scaler.transform(y_train) y_test_scaled = y_train_scaler.transform(y_test)
 Train Keras Autoencoder and let it predict (X=x_train, y=x_train!).
easy_model = Sequential() easy_model.add(Dense(x_train.shape[1], activation='tanh', )) easy_model.add(Dense(10, activation='tanh', )) easy_model.add(Dense(x_train.shape[1], activation='linear', )) easy_model.compile(optimizer='Adam', loss='mean_squared_error', metrics=['mean_squared_error'] ) easy_model.fit(x_train_scaled, x_train_scaled, epochs=250, batch_size=128 ) autoencoder_preds_train = easy_model.predict(x_train_scaled)
 Measure MSE columnwise (for each feature) between scaled data and predictions of the model. I sort the error values (as many as features) by size and get information about which feature reconstructed particularly well or particularly poorly.
MSE = np.square(x_train_scaled  autoencoder_preds_train).mean(axis=0) error_df = pd.DataFrame({'error': MSE}) error_df.index.name = 'values' sorted_error_df = error_df.sort_values(by=['error']) sorted_error_df
Problem: If I perform 3) but inversely scale both the scaled data and the predictions (the scaled data exactly match their starting point), then the ratios change. I.e. I don't get the SAME ranking on the features sorted by their error values as before.
x_train_inverse = x_train_scaler.inverse_transform(x_train_scaled) autoencoder_preds_train_inverse = x_train_scaler.inverse_transform(autoencoder_preds_train) MSE = np.square(x_train_inverse  autoencoder_preds_train_inverse).mean(axis=0) error_df = pd.DataFrame({'error': MSE}) error_df.index.name = 'values' sorted_error_df = error_df.sort_values(by=['error']) sorted_error_df
I thought it was a rounding problem since I am calculating returns. After all, the Keras model forecasts only have 7 decimal places, my data has significantly more. Therefore I rounded everything to 7 decimal places and still have the same problem.
What's wrong with these?

When calling .predict() I am getting ValueError: could not convert string to float
I am still relatively new to the world of data science and I cannot figure out why I am getting the following error:
Traceback (most recent call last): File "main.py", line 73, in <module> print(classifyAnswer(getAnswer(f_content))) File "main.py", line 67, in classifyAnswer SVM_predict = loaded_model.predict([array_answer]) File "C:\Users\A144995\lib\sitepackages\sklearn\svm\base.py", line 574, in predict y = super().predict(X) File "C:\Users\A144995\lib\sitepackages\sklearn\svm\base.py", line 322, in predict X = self._validate_for_predict(X) File "C:\Users\A144995\lib\sitepackages\sklearn\svm\base.py", line 454, in _validate_for_predict accept_large_sparse=False) File "C:\Users\A144995\lib\sitepackages\sklearn\utils\validation.py", line 496, in check_array array = np.asarray(array, dtype=dtype, order=order) File "C:\Users\A144995\lib\sitepackages\numpy\core\numeric.py", line 538, in asarray return array(a, dtype, copy=False, order=order) ValueError: could not convert string to float: 'No'
The model that is being read was fitted and modeled without any issues, but when I go to classify a sample text (read from a text file into a string) I get the error. The code I am executing is below:
f = open("samplechat.txt", 'r') f_content = f.read() f.close() def questionFound(txt): x = re.search("(dD)oes anyone in the household have (lL)ife (sS)upport or (mM)edical (dD)evices", str(txt)) return x def findAgentName(txt): return 0 def getAnswer(txt): x = re.search("((dD)oes anyone in the household have (lL)ife (sS)upport or (mM)edical (dD)evices (thatwhich) depend (uponon) supply\?)( *\[Read])( [\w+/g(,\s)]+)([\d]+\s[\w]+\W\s[\d]+ [,]) ([\d]+:[\d]+[AMPMampm]+)\s([\w]+)", str(txt)) return x.group(13) def classifyAnswer(answer): array_answer = [answer] classifier = 'webchatls.sav' loaded_model = pickle.load(open(classifier, 'rb')) SVM_predict = loaded_model.predict([array_answer]) result = SVM_predict[0] return result print(classifyAnswer(getAnswer(f_content)))
I'm still new to this so any guidance in debugging this would be great!

My BIOS does not have virtualization mode?
I'm trying to install Docker on my computer (Win 10 Pro, ASUS). After installing, I get the error "cannot enable HyperV service". It is recommended to turn on SVM mode in BIOS. However, I cannot find the line SVM mode in Advanced/CPU Configuration (Image of my BIOS). I wonder if "Intel Virtualization Technology" works, or if my BIOS is dated. Pls give me an advice

Sensitivity and specificity reported by coords() and confusionMatrix() in R at optimal cutpoint don't match
I trained an SVM with linear kernel in R to classify patients with a disease, used the predict() function to generate predicted probabilities on a testing set using the SVM model, then generated an ROC curve using the roc() function from the pROC library. I also used coords() to calculate the optimal cutpoint using Youden's index. coords() returned a cutpoint of 0.8489392, specificity of 0.6250000, and sensitivity of 0.7954545.
When I attempt to generate a confusion matrix using predictions made at this cutpoint, I get a sensitivity of 0.20455 and specificity of 0.37500 and cannot figure out why they don't match the sensitivity and specificity reported by coords().
This is the only model of several models where the sensitivity and specificity reported by both functions do not match.
Code below:
svm_linear < train(ercp_chole ~ stone_any_modality + age + peak_pre_bili + max_cbd_dia_any, data = chole_training, method = "svmLinear", trControl = trainControl(method = "repeatedcv", number = 10, repeats = 3, classProbs=TRUE, summaryFunction=twoClassSummary), na.action = na.exclude, preProcess = c("center", "scale"), metric = "ROC", tuneLength = 10 ) pprob_svm_linear < predict(svm_linear, chole_testing, type="prob") svm_linear_roc < roc(chole_testing$ercp_chole, pprob_svm_linear[,2], auc=TRUE) coords(svm_linear_roc, "best", "threshold", transpose=TRUE, best.method="youden") confusionMatrix(factor( ifelse(pprob_svm_linear[, "chole_pos"] > 0.8489392, "chole_pos", "chole_neg") ), chole_testing$ercp_chole, positive="chole_pos")
Results of the call to roc():
Setting levels: control = chole_neg, case = chole_pos Setting direction: controls > cases
Results of the call to coords():
threshold specificity sensitivity 0.8489392 0.6250000 0.7954545
Results of the call to confusionMatrix():
Confusion Matrix and Statistics Reference Prediction chole_neg chole_pos chole_neg 3 35 chole_pos 5 9 Accuracy : 0.2308 95% CI : (0.1253, 0.3684) No Information Rate : 0.8462 PValue [Acc > NIR] : 1 Kappa : 0.1659 Mcnemar's Test PValue : 4.533e06 Sensitivity : 0.20455 Specificity : 0.37500 Pos Pred Value : 0.64286 Neg Pred Value : 0.07895 Prevalence : 0.84615 Detection Rate : 0.17308 Detection Prevalence : 0.26923 Balanced Accuracy : 0.28977 'Positive' Class : chole_pos
Any help would be appreciated!
Thanks in advance.

Can I get an example of a code using LIBLINEAR and LIBSVM for a simple classification problem?
I am trying to use the LIBLINEAR package for python 3. Can I get an example code that includes preprocessing the data by subtracting the mean and scaling the data to the range [1, 1]
Thank you.

How to find the support vectors for SVM?
I'm using the liblinear library to train a linear SVM on my data. I have access to the weights for each class of the trained model. But I need to figure out which training instances are acting as support vectors.
The liblinear library doesn't seem to provide these vectors as a model attribute. And I can't seem to figure out how I can find them manually. If I have the training data and I have the weights that define the hyperplane, how would I go about finding these support vectors?