Calculate residual deviance from scikitlearn logistic regression model
Is there any way to calculate residual deviance of a scikitlearn logistic regression model? This is a standard output from R model summaries, but I couldn't find it any of sklearn's documentation.
1 answer

You cannot do it in scikitlearn but check out statsmodels,
GLMResults
(API)
See also questions close to this topic

Why is my if statement is always triggering in python
I'm new to Python and I decided to do a project by myself. In my project there is a if statement that always triggers. Also I'm still learning PEP 8, so tell me if I violated it.
yn = input('Do you need me to explain the rules. Y/N: ').lower() if yn == 'y' or 'yes': print('I will think of a number between 1  100.') print('You will guess a number and I will tell you if it is higher or lower than my number.') print('This repeats until you guess my number.')

Dialogflow python client versioning
I am using python client for accessing dialogflow's functionality.
My question is: doesimport dialogflow
and
import dialogflow_v2 as dialogflow
have any difference?
As per my experience, all the methods are the same. In the samples given by Google,import dialogflow_v2 as dialogflow
has been used and I could not see any difference between the two.Please note that here I am talking about version v2 in python client, and not the dialogflow API version.

Which device does .in_waiting and .out_waiting refer to in Pyserial?
I have a computer that is connected to a serial device at
/dev/ttyUSB0
via a wire with USB2 and microUSB2 connectors.My script writes:
ser = serial.Serial('/dev/ttyUSB0') in_buffer = ser.in_waiting in_data = ser.read( in_buffer ) out_buffer = ser.out_waiting out_data = ser.read( out_buffer )
Output:
ser = {'is_open': True, 'portstr': '/dev/ttyUSB0', 'name': '/dev/ttyUSB0', '_port': '/dev/ttyUSB0', '_baudrate': 9600, '_bytesize': 8, '_parity': 'N', '_stopbits': 1, '_timeout': None, '_write_timeout': None, '_xonxoff': False, '_rtscts': False, '_dsrdtr': False, '_inter_byte_timeout': None, '_rs485_mode': None, '_rts_state': True, '_dtr_state': True, '_break_state': False, '_exclusive': None, 'fd': 6, 'pipe_abort_read_r': 7, 'pipe_abort_read_w': 8, 'pipe_abort_write_r': 9, 'pipe_abort_write_w': 10} in_buffer = 0 <class 'int'> in_data = b'' <class 'bytes'> out_buffer = 0 <class 'int'> out_data = b'' <class 'bytes'>
Does
in_buffer
andout_buffer
refer to the no. of bytes in the buffer in the UART chip of the computer and the device/dev/ttyUSB0
, respectively? Why do they have zero byte size? 
conda sklearn error when importing sklearn
I am using conda with python3.6 on ubuntu 18, and trying to install sklearn version 0.2 using
conda install scikitlearn
I am getting in the process some weird massages such as this one
SafetyError: The package for scikitlearn located at /home/user/anaconda3/pkgs/scikitlearn0.20.2py36hd81dba3_0 appears to be corrupted. The path 'lib/python3.6/sitepackages/sklearn/utils/weight_vector.cpython36mx86_64linuxgnu.so' has an incorrect size. reported size: 66016 bytes actual size: 48608 bytes
then I get "done" massage and approval and when I try to import sklearn I get this error:
ImportError: Something is wrong with the numpy installation. While importing we detected an older version of numpy
What am I missing here? Thanks.

scikit learn implementation of tfidf differs from manual implementation
I tried to manually calculate
tfidf
values using the formula but the result I got is different from the result I got when using scikitlearn implementation.from sklearn.feature_extraction.text import TfidfVectorizer tv = TfidfVectorizer() a = "cat hat bat splat cat bat hat mat cat" b = "cat mat cat sat" tv.fit_transform([a, b]).toarray() # array([[0.53333448, 0.56920781, 0.53333448, 0.18973594, 0. , # 0.26666724], # [0. , 0.75726441, 0. , 0.37863221, 0.53215436, # 0. ]]) tv.get_feature_names() # ['bat', 'cat', 'hat', 'mat', 'sat', 'splat']
I tried to manually calculate
tfidf
for document but result is different fromTfidfVectorizer.fit_transform
.(np.log(2+1/1+1) + 1) * (2/9) = 0.5302876358044202 (np.log(2+1/2+1) + 1) * (3/9) = 0.750920989498456 (np.log(2+1/1+1) + 1) * (2/9) = 0.5302876358044202 (np.log(2+1/2+1) + 1) * (1/9) = 0.25030699649948535 (np.log(2+1/1+1) + 1) * (0/9) = 0.0 (np.log(2+1/1+1) + 1) * (1/9) = 0.2651438179022101
What I should have got is
[0.53333448, 0.56920781, 0.53333448, 0.18973594, 0, 0.26666724]

what is the difference between tfidf vectorizer and tfidf transformer
I know that the formula for
tfidf vectorizer
isCount of word/Total count * log(Number of documents / no.of documents where word is present)
I saw there's tfidf transformer in the scikit learn and I just wanted to difference between them. I could't find anything that's helpful.

Logistic regression in matlab using mnrfit
I'm trying to use the
mnrfit
function but I get the errorIf Y is a column vector, it must contain positive integer category numbers.
.My data is in a double and my Y values are floats, e.g. 0.6667. Is there a way that I can adjust my data to be able to use the mnrfit function?
Thanks in advance! An unexperienced beginner

How to set a range of predicted value in regression based on the actual value?
I am using regression to predict the price of a product. But I want all predicted value should not be 10 more or 10 less than actual value. For example if actual value is 20 the predicted value should be between 10 and 30.
I am not focusing on RMSE or other matrix but this should follow above condition.
How to do that using any regression technique. Looking for suggestion.

Really huge hazard ratio with PHREG competing risk modelerror?
I am modeling a competing risk model of disease progression with death as a competing risk (0=censor, 1=progression, 2=death without progression). I am using SAS's
PHREG
code as below, and it's giving me one variable with a huge hazard ratio. All the independent variables are binary yes/no. Anyone know why the huge hazard ratio? At the bottom I attached a screenshot of the model output, and also a separate screenshot with the # of events/competing events when theLIFETEST
procedure is run withfluoro
as thestrata
argument.proc phreg data=dataset plots(overlay=stratum)=cif; class cephalo; class an_coverage; class fluoro; class vancomycin_taken; model fu_time_prog*prog_cr(0)= vancomycin_taken an_coverage fluoro cephalo / eventcode=1; run;

Bayesian Logistic Regression
I am trying to estimate a Bayesian Logistic Regression. I'm trying to specify my generic model follows and I am receiving an error regarding my '{' that I do not understand. Can someone help me understand why this is?
This is the Error message I am receiving:
Error: unexpected '}' in "}"
logistic_model < model{ for(i in 1:n){ logit(q[i]) < beta[1] + beta[2]*X[i,1] + beta[3]*X[i,2] + beta[4]*X[i,3] + beta[5]*X[i,4] + beta[6]*X[i,5] Y[i] ~ dbern(q[i]) } for(j in 1:6){ beta[j] ~ dnorm(0,0.1) } }

Train a logistic regression model in parts for big data
My data set consists of 1.6 million rows and 17000 columns after preprocessing. I want to use logistic regression on this data, however the process gets killed everytime I load the dataset. Is there a way I can train a logistic regression model in chunks, wit the coefficients being updated at each iteration. Does sklearn support any technique for my problem?

Why is the accuracy of the logistic regression classifier different from knearest neighbors?
I understand how to compute accuracy for each but I don't understand why they are different.