Implementation of Isolated forest in python
I am new to machine learning and trying to learn and implement the isolation forest algorithm in python
My Input contains 40 features and train contains 4000 and test contains 1000 records.Can someone help with a sample code and how to plot the output
See also questions close to this topic
How to train a machine learning algorithm to find this pattern: x1 < x2 without generating a new feature (e.g. x1-x2) first?
If I had 2 features x1 and x2 where I know that the pattern is:
if x1 < x2 then class1 else class2
How can I train a machine learning algorithm to find such a pattern?
I know that I could create a third feature x3 = x1-x2. Then feature x3 can easily be used by some machine learning algorithms. For example a decision tree can solve the problem 100% using x3 and just 3 nodes (1 decision and 2 leaf nodes).
But, is it possible to solve this without creating new features?
I tried MLP and SVM with different kernels, including svg kernel and the results are not great. This seems like a problem that should be easily solved 100% if a machine learning algorithm could only find such a pattern.
As an example of what I tried, here is the scikit-learn code where the SVM could only get a score of 0.992:
import numpy as np from sklearn.svm import SVC # Generate 1000 samples with 2 features with random values X_train = np.random.rand(1000,2) # Label each sample. If feature "x1" is less than feature "x2" then label as 1, otherwise label is 0. y_train = X_train[:,0] < X_train[:,1] y_train = y_train.astype(int) # convert boolean to 0 and 1 svc = SVC(kernel = "rbf", C = 0.9) # tried all kernels and C values from 0.1 to 1.0 svc.fit(X_train, y_train) print("SVC score: %f" % svc.score(X_train, y_train))
Output running the code:
SVC score: 0.992000
This is an oversimplification of my problem. The real problem may have hundreds of features and different patterns, not just x1 < x2. However, to start with it would help a lot to know how to solve for this simple pattern.
Best Machine Learning Algorithm to Rank Top 10 Items Based on Different Attributes?
I am working on a project where I want to retrieve top n selections from a dataset based on different attributes. Let's say I am looking for the best store to buy a product. The algorithm will take in location, prices, closing/opening times, and return policy, and whether they have the product. It will return the top n (let's say 10) stores out of the dataset that it has found.
I want to know what the best machine learning algorithm is for this scenario.
How to consider word pairs/phrases for Word2Vec and other pre-processing
So it's my first time using Word2Vec and Im using a wikipedia dump with WikiCorpus to pre-process the file before training my Word2Vec model. I want to use the following pre-processing techniques:
- Convert all letters to lowercase (I think WikiCorpus does this already).
- Remove all punctuation (Done by WikiCorpus).
- Consider word pairs/phrases as a single word, for example 'Big Apple' -> 'big_apple', not 'big', 'apple'.
- Convert all digits to their word forms, so '3' -> 'three'
At the moment I have no idea how to do the last two. I know about num2text but not sure how to incorporate with WikiCorpus or Word2vec. Can anyone help?
What is X_train and Y_train while fitting a model?
What do X_train and Y_train contain? What are their datatypes? I'm a newbie and have tried searching this question and found this
But it doesn't seem to answer my question completely.
Making predictions on finetuned InceptionResNetV2
I am trying to make predictions on my test set using my finetuned InceptionResNet2. I am struggling with producing the right input shape for the model. The code that I try to run is:
Y_pred = model.predict(test_data)
The error that shows up is this:
ValueError: Error when checking input: expected global_average_pooling2d_1_input to have shape (8, 8, 1536) but got array with shape (7, 7, 512)
The model that I have used for the finetuning is the following:
base_model = InceptionResNetV2(weights='imagenet', include_top=False, input_shape=(299,299,3)) top_model = Sequential() top_model.add(GlobalAveragePooling2D(input_shape=base_model.output_shape[1:])) top_model.add(Dense(num_classes, activation='softmax')) top_model.load_weights(top_model_weights_path) new_model = Model(inputs=base_model.input, outputs=top_model(base_model.output))
Does anyone have any idea why this is showing up?
t-sne visualization does not match results for classifier
I'm training a neural network on a binary classification task for which I'm getting around 85% accuracy. I have tried visualizing the output for the penultimate layer of my nn in tensorboard and the visualization is showing perfectly distinct clusters for my two classes (with both t-sne and pca), so I'm confused as to why I'm still getting a 15% error rate. Does anyone have any ideas?
Thanks for your help.