How can I plot a ROC curve with AUC scores?
I have a table like an example below. With the Precision, Recall, Accuracy, F1, and AUC columns. Can I draw a ROC curve based on it in Python?
See also questions close to this topic
-
Print Variables in For Loop
from matplotlib.pyplot import figure T10,T11,T12,T13 = nx.random_tree(10),nx.random_tree(11),nx.random_tree(12),nx.random_tree(13) T = [T10,T11,T12,T13] for t in T: print("The Pruefer Code for -", pruefer_code(t)) fig = figure() axis = fig.add_subplot(111) nx.draw(t, **opts, ax = axis)
Which results in:
The Pruefer Code for - [2, 8, 1, 9, 5, 5, 6, 8] The Pruefer Code for - [9, 6, 10, 7, 4, 8, 10, 4, 6] The Pruefer Code for - [4, 1, 4, 8, 11, 11, 8, 3, 4, 8] The Pruefer Code for - [8, 7, 11, 4, 2, 7, 9, 1, 5, 10, 7]
Plus the graphs - how would I amend the code so it'd say:
The Pruefer Code for - T10 [2, 8, 1, 9, 5, 5, 6, 8] The Pruefer Code for - T11 [9, 6, 10, 7, 4, 8, 10, 4, 6] The Pruefer Code for - T12 [4, 1, 4, 8, 11, 11, 8, 3, 4, 8] The Pruefer Code for - T13 [8, 7, 11, 4, 2, 7, 9, 1, 5, 10, 7]
Any help appreciated :)
-
Get number of packets received by ping command in python
I need a function that can return the number of packets received or loss percentage. Before I used the code below to get true/false if I receive any of the packets. This should work in Windows, but if somebody can do it suitable for Linux too, I would be thankful.
def ping_ip(ip): current_os = platform.system().lower() parameter = "-c" if current_os == "windows": parameter = "-n" command = ['ping', parameter, '4', ip] res = subprocess.call(command) return res == 0
-
Tkinter GUI unresponsive
I'm pretty new to python and tkinter, so pardon me if I'm being naive. I'm trying to make a GUI for solving an engineering design problem using a widely accepted method (implying the method is seamless). The code for this method takes 0.537909984588623 seconds when run independently (not in tkinter but normal code), and its not too complex or tangled. When I tried to modify this code to fit into a GUI using tkinter, it becomes unresponsive after I enter all the inputs and select a button, even though the program keeps running in the background. Also, when I forcefully close the GUI window, the jupyter kernel becomes dead. Heres a brief outline of my code:
-----------------------------------------------code begins------------------------------------------
from tkinter import * from scipy.optimize import fsolve import matplotlib import numpy as np import threading from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg from matplotlib.figure import Figure import matplotlib.pyplot as plt matplotlib.use('TkAgg') import math class MyWindow(): def __init__(self, win): self.lbl1=Label(win, text='Alpha') self.lbl2=Label(win, text='xd') self.lbl3=Label(win, text='xw') self.lbl4=Label(win, text='xf') self.lbl5=Label(win, text='q') self.lbl6=Label(win, text='Reflux Factor') self.lbl7=Label(win, text='Tray Efficiency') self.lbl8=Label(win, text='Total Number of Stages') self.lbl9=Label(win, text='Feed Stage') self.t1=Entry(bd=3) self.t2=Entry(bd=3) self.t3=Entry(bd=3) self.t4=Entry(bd=3) self.t5=Entry(bd=8) self.t6=Entry(bd=8) self.t7=Entry(bd=8) self.t8=Entry(bd=8) self.t9=Entry(bd=8) self.btn1=Button(win, text='Total Number of Stages ', command=self.stagesN) self.lbl1.place(x=100, y=80) self.t1.place(x=300, y=80) self.lbl2.place(x=100, y=130) self.t2.place(x=300, y=130) self.lbl3.place(x=100, y=180) self.t3.place(x=300, y=180) self.lbl4.place(x=100, y=230) self.t4.place(x=300, y=230) self.lbl5.place(x=100, y=280) self.t5.place(x=300, y=280) self.lbl6.place(x=100, y=330) self.t6.place(x=300, y=330) self.lbl7.place(x=100, y=380) self.t7.place(x=300, y=380) self.lbl8.place(x=800, y=130) self.t8.place(x=790, y=170) self.lbl9.place(x=800, y=210) self.t9.place(x=790, y=260) self.btn1.place(x= 500, y= 75) def originalEq(self,xa,relative_volatility): ya=(relative_volatility*xa)/(1+(relative_volatility-1)*xa) return ya def equilibriumReal(self,xa,relative_volatility,nm): ya=(relative_volatility*xa)/(1+(relative_volatility-1)*xa) ya=((ya-xa)*nm)+xa return ya def equilibriumReal2(self,ya,relative_volatility,nm): a=((relative_volatility*nm)-nm-relative_volatility+1) b=((ya*relative_volatility)-ya+nm-1-(relative_volatility*nm)) c=ya xa=(-b-np.sqrt((b**2)-(4*a*c)))/(2*a) return xa def stepping_ESOL(self,x1,y1,relative_volatility,R,xd,nm): x2=self.equilibriumReal2(y1,relative_volatility,nm) y2=(((R*x2)/(R+1))+(xd/(R+1))) return x1,x2,y1,y2 def stepping_SSOL(self,x1,y1,relative_volatility,\ ESOL_q_x,ESOL_q_y,xb,nm): x2=self.equilibriumReal2(y1,relative_volatility,nm) m=((xb-ESOL_q_y)/(xb-ESOL_q_x)) c=ESOL_q_y-(m*ESOL_q_x) y2=(m*x2)+c return x1,x2,y1,y2 def stagesN(self): relative_volatility=float(self.t1.get()) nm=float(self.t7.get()) xd=float(self.t2.get()) xb=float(self.t3.get()) xf=float(self.t4.get()) q=float(self.t5.get()) R_factor=float(self.t6.get()) xa=np.linspace(0,1,100) ya_og=self.originalEq(xa[:],relative_volatility) ya_eq=self.equilibriumReal(xa[:],relative_volatility,nm) x_line=xa[:] y_line=xa[:] al=relative_volatility a=((al*q)/(q-1))-al+(al*nm)-(q/(q-1))+1-nm b=(q/(q-1))-1+nm+((al*xf)/(1-q))-(xf/(1-q))-(al*nm) c=xf/(1-q) if q>1: q_eqX=(-b+np.sqrt((b**2)-(4*a*c)))/(2*a) else: q_eqX=(-b-np.sqrt((b**2)-(4*a*c)))/(2*a) q_eqy=self.equilibriumReal(q_eqX,relative_volatility,nm) theta_min=xd*(1-((xd-q_eqy)/(xd-q_eqX))) R_min=(xd/theta_min)-1 R=R_factor*R_min theta=(xd/(R+1)) ESOL_q_x=((theta-(xf/(1-q)))/((q/(q-1))-((xd-theta)/xd))) ESOL_q_y=(ESOL_q_x*((xd-theta)/xd))+theta x1,x2,y1,y2=self.stepping_ESOL(xd,xd,relative_volatility,R,xd,nm) step_count=1 while x2>ESOL_q_x: x1,x2,y1,y2=self.stepping_ESOL(x2,y2,relative_volatility,R,xd,nm) step_count+=1 feed_stage=step_count x1,x2,y1,y2=self.stepping_SSOL(x1,y1,relative_volatility\ ,ESOL_q_x,ESOL_q_y,xb,nm) step_count+=1 while x2>xb: x1,x2,y1,y2=self.stepping_SSOL(x2,y2,relative_volatility\ ,ESOL_q_x,ESOL_q_y,xb,nm) step_count+=1 xb_actual=x2 stagesN=step_count-1 self.t8.insert(END, str(stagesN)) return window=Tk() mywin=MyWindow(window) window.title('DColumn') window.geometry("1500x1500") window.mainloop()
-------------------------------------------Code end--------------------------------------------
I read on other articles that using multiple threads brings down the load on mainloop and prevents freezing. But like I said, the code isnt very complex. Is it still because of everythings running on the mainloop? Or is there something more than meets the eye? Is multithreading the only way to go past this point?
Thanks in advance.
-
How to resize x axis
df_train = pd.read_csv('../input/titanic/train.csv') df_train.groupby('Age')['Survived'].mean().plot.bar(rot=0, title='Age',edgecolor="k") plt.show()
I want to resize the x-axis range, but I don't know how to do that. The range I want to resize is [under 20, under 40, under 60, under 80]. X represent age and Y represent survived rate
-
Seaborn Barplot How to include multiple values in one bar
I got a number (lead time) on the y-axis which comes from the addition ofthree other values and i want to plot a Bar, which shows the total lead time for five different measurements on the x axis. In this bar I want it to show the three values, basically stacking them on top of each other. Is there anyway to do this with seaborn?
If this is not possible, is it possible to have the three values plotted on their own combined to one of the five x-value?
-
Validation Accuracy / Loss plateaus regardless of learning rate (VGG 16)
I am trying to classify my dataset into two categories using transfer learning with vgg and finetuning the very last layer. When I plot the graph of value vs epochs, I get the following graph:
Each graph shows a unique, different learning rate. It seems like the validation accuracy / loss always plateaus regardless of which learning rate I set; is this the sign of overfitting? What could potentially be the cause of this?
-
Which is the correct way to calculate AUC with scikit-learn?
I noticed that the result of the following two codes is different.
#1 metrics.plot_roc_curve(classifier, X_test, y_test, ax=plt.gca()) #2 metrics.plot_roc_curve(classifier, X_test, y_test, ax=plt.gca(), label=clsname + ' (AUC = %.2f)' % roc_auc_score(y_test, y_predicted))
So, which method is correct?
I have added a simple reproducible example:
from sklearn.metrics import roc_auc_score from sklearn import metrics import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.svm import SVC from sklearn.datasets import load_breast_cancer data = load_breast_cancer() X = data.data y = data.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=12) svclassifier = SVC(kernel='rbf') svclassifier.fit(X_train, y_train) y_predicted = svclassifier.predict(X_test) print('AUC = %.2f' % roc_auc_score(y_test, y_predicted)) #1 metrics.plot_roc_curve(svclassifier, X_test, y_test, ax=plt.gca()) #2 plt.show()
Output (#1):
AUC = 0.86
While (#2):
-
Getting error when creating a Confusion Matrix using scikit learn
I am trying to classify cancers into
benign
andmalignant
using a pre-trained modelResnet50
.Here is the code for the model.
data_path = '/content/drive/My Drive/data' data_dir_list = os.listdir(data_path) img_data_list = [] for dataclass in data_dir_list: img_list = os.listdir(data_path+"/"+dataclass) for img in img_list: img_path = data_path+"/"+dataclass+"/"+img img=image.load_img(img_path, target_size=(224, 224)) x=image.img_to_array(img) x=np.expand_dims(x,axis=0) x=preprocess_input(x) img_data_list.append(x) img_data=np.array(img_data_list) img_data=np.rollaxis(img_data,1,0) img_data= img_data[0] num_classes=2 num_of_sample=250 labels=np.ones(250,dtype='int64') labels[0:100]=0 labels[100:250]=1 names=['benign', 'malignant'] Y=np_utils.to_categorical(labels, num_classes) x,y = shuffle(img_data, Y, random_state=3) X_train,X_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=3) image_input = Input(shape=(224,224,3)) model = ResNet50(input_tensor=image_input, include_top='True', weights='imagenet') last_layer = model.get_layer('avg_pool').output x=Flatten(name='flattern')(last_layer) out=Dense(2, activation='softmax', name='output_layer')(x) custom_resnet_model = Model(inputs = image_input, outputs=out) for layer in custom_resnet_model.layers[:-1]: layer.trainable = False custom_resnet_model.layers[-1].trainable custom_resnet_model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy']) hist=custom_resnet_model.fit(X_train,y_train, batch_size=32, epochs=100, steps_per_epoch=7, validation_data=(X_test,y_test))
Then I tried to create a confusion matrix for this model using scikit learn. But I am getting an error.
Here is the code for the confusion matrix.
prediction = model.predict_generator(test_set, steps=2 ) rounded_prediction = np.argmax(prediction, axis=-1) cm = confusion_matrix(y_true=y_test , y_pred=rounded_prediction)
Error:
ValueError Traceback (most recent call last) <ipython-input-96-1965da59b466> in <module>() ----> 1 cm = confusion_matrix(y_true=y_test , y_pred=rounded_prediction) 1 frames /usr/local/lib/python3.7/dist-packages/sklearn/metrics/_classification.py in _check_targets(y_true, y_pred) 88 if len(y_type) > 1: 89 raise ValueError("Classification metrics can't handle a mix of {0} " ---> 90 "and {1} targets".format(type_true, type_pred)) 91 92 # We can't have more than one value on y_type => The set is no more needed ValueError: Classification metrics can't handle a mix of multilabel-indicator and binary targets
As I think there is an issue with
y_true
ory_pred
. what is the value for this parameters according to my model. Here I usedrounded_prediction
for selecting most probable prediction for each sample.Data types:
type(y_test) numpy.ndarray type(rounded_prediction) numpy.ndarray
-
How to change plot legends with roc_auc_score?
I'm plotting ROC curve with
plot_roc_curve
ofscikit-learn
, that plot legends are printed automatically. Is there a way to change them?metrics.plot_roc_curve(classifier, X_test, y_test, ax=plt.gca())
-
What is the real AUC?
I prepared a piece of code to examine different classifiers on a dataset and plotted a ROC curve. But as you can see, the printed AUC results are different from the ones shown in the plot. Given that I am a beginner in Python. Please tell me where AUC is calculated correctly?
# One
or# Two
?import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.metrics import precision_score, recall_score, f1_score, \ accuracy_score bankdata = pd.read_csv("dataset.csv") X = bankdata.drop('label', axis=1) y = bankdata['label'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=12) from sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import GradientBoostingClassifier from sklearn.neural_network import MLPClassifier from sklearn.linear_model import RidgeClassifier from sklearn.ensemble import AdaBoostClassifier from catboost import CatBoostClassifier from sklearn.svm import SVC from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import roc_auc_score from time import process_time from sklearn import metrics import matplotlib.pyplot as plt classifiers = { "Logistic Regression": LogisticRegression(class_weight='balanced'), "Decision Tree": DecisionTreeClassifier(class_weight='balanced'), "Linear SVM": SVC(class_weight='balanced', probability=True), "Gradient Boosting Classifier": GradientBoostingClassifier(), "Random Forest": RandomForestClassifier(), 'RidgeClassifier': RidgeClassifier(class_weight='balanced'), 'AdaBoost': AdaBoostClassifier(n_estimators=100), 'catboost': CatBoostClassifier(verbose=0), 'MLP': MLPClassifier() } no_classifiers = len(classifiers.keys()) def batch_classify(X_train, y_train, X_test, y_test, verbose=True): df_results = pd.DataFrame(data=np.zeros(shape=(no_classifiers, 3)), columns=['Classifier', 'Area Under Curve', 'Training time']) count = 0 for key, classifier in classifiers.items(): t_start = process_time() classifier.fit(X_train, y_train) t_stop = process_time() t_elapsed = t_stop - t_start y_predicted = classifier.predict(X_test) df_results.loc[count, 'Classifier'] = key df_results.loc[count, 'Precision'] = precision_score(y_test, y_predicted) df_results.loc[count, 'Recall'] = recall_score(y_test, y_predicted) df_results.loc[count, 'Accuracy'] = accuracy_score(y_test, y_predicted) df_results.loc[count, 'F1-Score'] = f1_score(y_test, y_predicted) # One df_results.loc[count, 'Area Under Curve'] = roc_auc_score(y_test, y_predicted) df_results.loc[count, 'Training time'] = t_elapsed if verbose: print("trained {c} in {f:.2f} s".format(c=key, f=t_elapsed)) count += 1 predY = classifier.predict_proba(X_test) fpr, tpr, thresh = metrics.roc_curve(y_test, predY[:, 1]) # Two auc = metrics.auc(fpr, tpr) plt.plot(fpr, tpr, label=str(key) + ' (area = %.2f)' % auc) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.grid() plt.legend(loc="lower right") plt.title('ROC curve') plt.plot([0, 1], [0, 1], linestyle='--', lw=2, color='r', label='Random guess') plt.show() return df_results df_results = batch_classify(X_train, y_train, X_test, y_test) print(df_results.sort_values(by='Classifier', ascending=False))
The result:
Classifier Area Under Curve ... Accuracy F1-Score 7 catboost 0.627557 ... 0.631111 0.591133 5 RidgeClassifier 0.704138 ... 0.706667 0.679612 4 Random Forest 0.646504 ... 0.653333 0.589474 8 MLP 0.843150 ... 0.844444 0.832536 0 Logistic Regression 0.665887 ... 0.671111 0.622449 2 Linear SVM 0.693594 ... 0.693333 0.682028 3 Gradient Boosting Classifier 0.627557 ... 0.631111 0.591133 1 Decision Tree 0.646940 ... 0.648889 0.622010 6 AdaBoost 0.611265 ... 0.613333 0.583732
-
Plotting ROC curve for RidgeClassifier in Python
I want to plot the ROC curve for the
RidgeClassifier
. But the code comes with an error: I googled for solutions and it comes up to changepredict_proba
topredict
, but it does not work!predY = classifier.predict_proba(X_test)
Error:
AttributeError: 'RidgeClassifier' object has no attribute 'predict_proba'
This is what I get with
predict
:IndexError: too many indices for array