Implementation of Isolated forest in python
I am new to machine learning and trying to learn and implement the isolation forest algorithm in python
My Input contains 40 features and train contains 4000 and test contains 1000 records.Can someone help with a sample code and how to plot the output
See also questions close to this topic
How to install specific function from package in R?
I have installed 'caret' package using
install.packages('caret', dependencies = c("Depends", "Suggests"))
But I didn't get createDataPartition and dummyVars functions part of 'caret' package.
> dummyVars(formula, data, sep = ".", levelsOnly = FALSE, fullRank = FALSE, ...) >createDataPartition(y, times = 1, p = 0.5, list = TRUE, groups = min(5, length(y)))
even while loading package
library('caret')it is giving :
Error: package or namespace load failed for ‘caret’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]): there is no package called ‘ddalpha’ In addition: Warning message: package ‘caret’ was built under R version 3.4.3
And I installed many packages that I got in error message.
So I want to know, Is there any way to install specific function from package?
extracting patterns from a system using machine learning
The following is the problem statement: I have a system in which I observe and extract various events. A group of events are highly related in time. The groups themselves are not in any way related. For example, let us assume that the system has two blocks - A and B. An event from A (say A.e1) occurs within a time window t1; in response to this, event from B (say B.e2) occurs within a time window t2; in response to this event from A (say A.e3) occurs within a time window t3. So, this is one group of events or one pattern. If a system has many such block (like A and B), many such event groups would happen. The observer in the system generates log of these events when system is operational. A human can look at these events in the log and identify the patterns. What would be the best ML approach for a problem of this type? PS: I am not expecting a solution but more of direction for such problem class.
How to launch oython projects?
I am studing python for a week and machine learning for about 2 days so sorry for asking stupid question. I wanted to learn machine learning via one project on github but I am totally lost how to launch python projects and what is their structure overall...
For example https://github.com/MaxTitkov/Keras_InceptionV3_Binary_classification lots of python files and I can not understand which one to launch first so it will pull up the rest.. None of them has reference to the others files inside the code. So I guess I need to launch them all separately via python + name of the file or how does it work with python projects?
Can "Sequential Feature Selector" be used for one-class feature selection?
Can the methods presented here be used for feature selection for one-class classification?
Feature selection for one-class classification in Python
I am seeking feature selection methods for one-class classification using Python. Would feature importance in random forest classifier in scikit-learn also work for one-class problem?
I haven't managed to find an example of that...
how to find which rules in decision tree are causing misclassifications
I built an binary decision tree classifier . From the confusion matrix m i found class 0 is misclassified 495 times and class 1 is misclassified 134 times.I want to find which rules in the decision trees are actually causing the records to misclassify.
In short which record failed at the which tree node
Is there a machine learning method which can be used to find the rules in the decision tree which are causing them to misclassify
[[14226 495] [ 134 3271]]
Fitting the decision tree and plotting it
cv = CountVectorizer( max_features = 200,analyzer='word',ngram_range=(1, 3)) cv_addr = cv.fit_transform(data.pop('Clean_addr')) for i, col in enumerate(cv.get_feature_names()): data[col] = pd.SparseSeries(cv_addr[:, i].toarray().ravel(), fill_value=0) train = data.drop(['Resi], axis=1) Y = data['Resi'] X_train, X_test, y_train, y_test = train_test_split(train, Y, test_size=0.3,random_state =8) rus = RandomUnderSampler(random_state=42) X_train_res, y_train_res = rus.fit_sample(X_train, y_train) dt=DecisionTreeClassifier(class_weight="balanced", min_samples_leaf=30) fit_decision=dt.fit(X_train_res,y_train_res) from sklearn.externals.six import StringIO from IPython.display import Image from sklearn.tree import export_graphviz import pydotplus dot_data = StringIO() export_graphviz(fit_decision, out_file=dot_data, filled=True, rounded=True, special_characters=True,feature_names=train.columns) graph = pydotplus.graph_from_dot_data(dot_data.getvalue()) Image(graph.create_png())from sklearn.externals.six import StringIO from IPython.display import Image from sklearn.tree import export_graphviz import pydotplus dot_data = StringIO() export_graphviz(fit_decision, out_file=dot_data, filled=True, rounded=True, special_characters=True,feature_names=train.columns) graph = pydotplus.graph_from_dot_data(dot_data.getvalue()) Image(graph.create_png())
Any help is appreciated.
Resi is the target column . Using the other data columns i am trying to predict and I have countvectorized the Clean_addr column.