Low accuracy with logistic regression, desired error not necessarily achieved due to precision loss
been trying to implement coursera's machine learning course in python. I am stuck on recognizing handwritten digits/exercise 3, my accuracy is 84% instead of 94% and I am getting a warnings aswell. Ive been checking the gradient and compute_cost for days but I just cant anymore, I would really, really appreciate any insights.
See also questions close to this topic

why is the syntax range not working to execute a command?
What's wrong in this code?
def remove_middle(lst, start, end): for i in range(start, end+1): del(lst[i]) return lst
It should remove all elements place in an index between start and end(inclusive). In the example below, it should return
[4, 23, 42]
but when I run the code I get[4, 15, 23]
print(remove_middle([4, 8, 15, 16, 23, 42], 1, 3))

jsonpickle with simplejson backend serializes Decimal as null
I'm trying to use jsonpickle in python 3.7 to serialize an object tree to json. However, all
Decimal
s are serialized asnull
. I'm using simplejson as a backend, so that should be able to serialize Decimals.How do I serialize a (complex) object tree to json, including Decimals?
Example code (requires simplejson and jsonpickle to be installed):
import jsonpickle from decimal import Decimal jsonpickle.set_preferred_backend('simplejson') jsonpickle.set_encoder_options('simplejson', use_decimal=True) class MyClass(): def __init__(self, amount): self.amount = amount def to_json(self): return jsonpickle.dumps(self, unpicklable=False) if __name__ == '__main__': obj = MyClass(Decimal('1.0')) print(obj.to_json()) # prints '{"amount": null}'

PYTHON ~ MatPlotLib ~ Gridlines are too dense
Example of Gridlines Hey there, using matplotlib to plot a sine curve however because of either too precise values or the large amount of points included the grid lines and scale on the axis are incredibly messed up.
Any idea how to fix that?
import csv import matplotlib.pyplot as plt time = []*800 distance = []*800 force = []*800 with open('Data.csv') as csv_file: csv_reader = csv.reader(csv_file, delimiter=',') for row in csv_reader: time.append(row[0]) distance.append(row[1]) force.append(row[2]) mpl_fig = plt.figure() ax = mpl_fig.add_subplot(111) ax.set_xlabel("Time(s)") ax.set_ylabel("distance x") plt.minorticks_on() plt.grid(b=True, which='major', color='b', linestyle='') plt.grid(b=True, which='minor', color='0.5', linestyle='') plt.plot(time[100:200], distance[100:200], "r.") plt.savefig("SHMfigure.pdf") plt.show()
Bad code I know

TypeError: unhashable type: 'numpy.ndarray' with Pandas after importing data from MYSQL
I ran the same program on two different data sources (a CSV file & MYSQL database), CSV import runs fine but MYSQL import throws the numpy type error:
I'm guessing issue may lie with these 2 points: 1. Data import issues  INT, TEXT etc? I'm using VARCHAR for the data. 2. Issue with how matplotlib works with Panda dataframes?
I'm new so please treat me as one :)
import pandas as pd import numpy as np import matplotlib.pyplot as pp import seaborn from sqlalchemy import create_engine import pymysql engine = create_engine("mysql+pymysql://root:root@127.0.0.1:3306/babynames",echo=False) names = pd.read_sql_query('select * from BABYNAMES',engine) names_indexed = names.set_index(['gender','name','year']).sort_index() def plotname(gender, name): data = names_indexed.loc[gender, name] pp.plot(data.index, data.values) plotname('F','Nancy')
Error code:
 TypeError Traceback (most recent call last) <ipythoninput329d981bcf8365> in <module>() > 1 plotname('F','Nancy') <ipythoninput3185c728659ad0> in plotname(gender, name) 1 def plotname(gender, name): 2 data = allyears_indexed.loc[gender, name] > 3 pp.plot(data.index, data.values) ~/anaconda3/lib/python3.7/sitepackages/matplotlib/pyplot.py in plot(*args, **kwargs) 3361 mplDeprecation) 3362 try: > 3363 ret = ax.plot(*args, **kwargs) 3364 finally: 3365 ax._hold = washold ~/anaconda3/lib/python3.7/sitepackages/matplotlib/__init__.py in inner(ax, *args, **kwargs) 1865 "the Matplotlib list!)" % (label_namer, func.__name__), 1866 RuntimeWarning, stacklevel=2) > 1867 return func(ax, *args, **kwargs) 1868 1869 inner.__doc__ = _add_data_doc(inner.__doc__, ~/anaconda3/lib/python3.7/sitepackages/matplotlib/axes/_axes.py in plot(self, *args, **kwargs) 1526 kwargs = cbook.normalize_kwargs(kwargs, _alias_map) 1527 > 1528 for line in self._get_lines(*args, **kwargs): 1529 self.add_line(line) 1530 lines.append(line) ~/anaconda3/lib/python3.7/sitepackages/matplotlib/axes/_base.py in _grab_next_args(self, *args, **kwargs) 404 this += args[0], 405 args = args[1:] > 406 for seg in self._plot_args(this, kwargs): 407 yield seg 408 ~/anaconda3/lib/python3.7/sitepackages/matplotlib/axes/_base.py in _plot_args(self, tup, kwargs) 381 x, y = index_of(tup[1]) 382 > 383 x, y = self._xy_from_xy(x, y) 384 385 if self.command == 'plot': ~/anaconda3/lib/python3.7/sitepackages/matplotlib/axes/_base.py in _xy_from_xy(self, x, y) 214 if self.axes.xaxis is not None and self.axes.yaxis is not None: 215 bx = self.axes.xaxis.update_units(x) > 216 by = self.axes.yaxis.update_units(y) 217 218 if self.command != 'plot': ~/anaconda3/lib/python3.7/sitepackages/matplotlib/axis.py in update_units(self, data) 1467 neednew = self.converter != converter 1468 self.converter = converter > 1469 default = self.converter.default_units(data, self) 1470 if default is not None and self.units is None: 1471 self.set_units(default) ~/anaconda3/lib/python3.7/sitepackages/matplotlib/category.py in default_units(data, axis) 113 # default_units>axis_info>convert 114 if axis.units is None: > 115 axis.set_units(UnitData(data)) 116 else: 117 axis.units.update(data) ~/anaconda3/lib/python3.7/sitepackages/matplotlib/category.py in __init__(self, data) 180 self._counter = itertools.count(start=0) 181 if data is not None: > 182 self.update(data) 183 184 def update(self, data): ~/anaconda3/lib/python3.7/sitepackages/matplotlib/category.py in update(self, data) 197 data = np.atleast_1d(np.array(data, dtype=object)) 198 > 199 for val in OrderedDict.fromkeys(data): 200 if not isinstance(val, VALID_TYPES): 201 raise TypeError("{val!r} is not a string".format(val=val)) TypeError: unhashable type: 'numpy.ndarray'

How to create a numpy array of zeros of a list's length?
I have a list, say
a = [3, 4, 5, 6, 7]
And i want to create a numpy array of zeros of that list's length.
If I do
b = np.zeros((len(a), 1))
I get
[[0, 0, 0, 0, 0]]
instead of
[0, 0, 0, 0, 0]
What is the best way to get the latter option?

Dimensionality of result of indexing an ndarray with a slice vs range
I'm trying to understand the use of a
range
as an index, and compare it with the use of a slice as an index, on anndarray
. Specifically, the effect on the dimensionality of the result.What I understand is that:
For a given dimension (say, the 0th dimension) of an
ndarray
, if I use a scalar index such as 2, it has the effect that the dimensionality of the result is less than the dimensionality of the originalndarray
, by 1.For the same (0th) dimension, instead of a scalar 2, if I use
slice(2,3)
, the abovementioned reduction in dimensionality will not happen.
For the most part, if I use a
range
instead of aslice
, the effect (on dimensionality) seems to be the same, but for one special case.Here's the code. The surprise, for me, is in the 4th print statement:
import numpy as np nd15 = np.array([['e00','e01','e02','e03'], ['e10','e11','e12','e13'], ['e20','e21','e22','e23']]) # Consider the dimensionality of the indexing results from the below 4 # lines. # From the first 3 print statements, we are led to believe that, if you # replace a range with an "equivalent" sliceexpression, the # dimensionality of the result will remain unchanged. # But the fourth result below surprisingly negates that understanding. print (nd15[slice(2,3), slice(2,3)].shape) print (nd15[slice(2,3), range(2,3)].shape) print (nd15[range(2,3), slice(2,3)].shape) print (nd15[range(2,3), range(2,3)].shape)
I expected the fourth print to also give the same result as the other three.
Instead, this is the result I got:
(1, 1) (1, 1) (1, 1) (1,)
What am I missing?

How to use scipy.optimize.bisect() when function has additional parameters?
According to the documentation, I should be able to bisect a function with multiple parameters as long as I pass said parameters to bisect() using args=(). However, I just can't make it work and I didn't manage to find an example of using this function in such a scenario.
My function is of shape $f(a,x)$, where the user inputs $a$ and the program finds a root in variable x using scipy.optimize.bisect().
I tried passing it as:
scipy.optimize.bisect(f,a,a,args=(a,))
But that didn't exactly work.

Reading an .arff file and trying to ignore the header
I am a new with python and I need some help with my code. I am reading an . arff file with my jupyter notebook using pyhton2.7.I would like to know which argument I need to put in arff.lodarff ,or another way to do it, so I can ignore the header of my data.
rain,meta = arff.loadarff(open('train.arff', 'r'))
After I read the file I am doing some mathematical operations and I got this error.
I hope someone can help me to figure out.
train,meta = arff.loadarff(open('train.arff', 'r')) train = pd.DataFrame(train) print(train)  ValueError Traceback (most recent call last) <ipythoninput1923b2868d1fd43> in <module>() > 1 ne = getNeighbors(X_train, y_train, X_test, k = 3) 2 print(ne) <ipythoninput19175b4da86d04e> in getNeighbors(X_train, y_train, X_test, k) 6 for (trainpoint,y_train_label) in zip(X_train,y_train): 7 # calculate the distance and append it to a distances_label with the associated label. > 8 distances_label.append((distance(testpoint, trainpoint), y_train_label)) 9 k_neighbors_with_labels += [sorted(distances_label)[0:k]] # sort the distances and taken the first k neighbors 10 return k_neighbors_with_labels <ipythoninput18622e861402349> in distance(testpoint, trainpoint) 2 def distance(testpoint, trainpoint): 3 # distance between testpoint and trainpoint. > 4 dist = np.sqrt(np.sum(np.power(float(testpoint)float(trainpoint), 2))) 5 return dis 6 ValueError: could not convert string to float: sepal_length

How to represent bounds of variables in scipy.optimization where bound is function of another variable
I want to solve an lp optimization problem where the upper bounds of a few variables are not an integer, instead of a function of another variable. As an example,
i
,j
andk
are three variables and bounds are0<=i<=100
,0<=j<=i1
and0<=k<=j1
. How can we represent such noninteger bounds in scipy lp solver? 
Get predictions from clogitLasso() model?
How would I get predictions from a clogitLasso model? It will give me a sequence of penalty weights, and the covariate coefficients that go with them, but what I'd like to do next would be to choose one of those weights and predict using the associated model. (Then I can evaluate the model using AUC or some such.)
Open to suggestions using a different library, as well.
(Open to getting bounced to CrossValidated, as well, but this isn't really a theoretical question. . . .)

Two different cost in Logistic Regression cost function
I am implementing logistic Regression algorithm with two feature x1 and x2. I am writing the code of cost function in logistic regression.
def computeCost(X,y,theta): J =((np.sum(y*np.log(sigmoid(np.dot(X,theta)))(1y)*(np.log(1sigmoid(np.dot(X,theta))))))/m) return J
Here My X is the training set matrix, y is the output. the shape of X is (100,3) and shape of y is (100,) as determined by shape attribute of numpy library. my theta is initially contained all zero entry with shape (3,1). When I calculate cost with these parameters I got the cost 69.314. But it is incorrect. The correct Cost is 0.69314. Actually, I got this correct cost when I reshape my y vector as
y = numpy.reshape(y,(1,1))
. But I actually didn't get how this reshaping corrects my cost. Here m(numbers of the training set) is 100. 
r software logistic regression
str(full_data) 'data.frame': 66 obs. of 51 variables: $ sex : Factor w/ 2 levels "1","2": 1 2 2 1 2 1 2 2 2 2 ... $ race : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 5 1 1 1 1 1 1 ... $ religion : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 5 1 1 1 1 1 1 ... $ place.of.origin : Factor w/ 3 levels "1","2","3": 2 3 3 3 3 1 2 3 3 2 ... $ recidence.at.university.time : Factor w/ 3 levels "1","2","4": 2 2 2 2 1 2 1 1 1 2 ... $ marital.status : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ... $ having.mental.disorders : Factor w/ 2 levels "1","2": 2 2 2 2 2 2 2 2 2 2 ... $ family.history : Factor w/ 2 levels "1","2": 2 2 2 2 2 2 2 2 2 2 ... $ adequate.sleeping : Factor w/ 2 levels "1","2": 2 1 1 2 1 1 1 1 1 1 ... $ having.sleeping.diturbances : Factor w/ 2 levels "1","2": 2 1 2 2 2 2 2 2 2 2 ... $ frequency.of.exercising : Factor w/ 3 levels "1","2","3": 1 1 1 1 2 1 1 1 2 1 ... $ chronic.disorders : Factor w/ 2 levels "1","2": 2 2 2 2 2 2 2 2 2 2 ... $ medication : Factor w/ 2 levels "1","2": 2 2 2 2 2 2 2 2 2 2 ... $ frquency.of.drinking.alcohol : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ... $ frquency.of.smoking : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ... $ loosing.weight : Factor w/ 2 levels "1","2": 2 1 2 2 1 2 2 2 1 2 ... $ facing.assault : Factor w/ 3 levels "1","3","4": 1 2 1 2 1 1 3 1 1 1 ... $ unemployment : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 4 1 1 1 1 ... $ decrasing.family.income : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 4 1 1 2 1 ... $ cutoff.schoolarship : Factor w/ 2 levels "1","3": 1 1 1 1 1 1 1 1 1 1 ... $ breakdown.a.steady.relationship : Factor w/ 3 levels "1","3","4": 1 3 1 1 1 2 1 1 1 1 ... $ facing.parent.s.death : Factor w/ 2 levels "1","4": 1 1 1 1 1 1 1 1 1 1 ... $ facing.family.members.s.death : Factor w/ 3 levels "1","3","4": 1 1 1 1 1 1 1 1 1 1 ... $ facing.close.friend.s.death : Factor w/ 2 levels "1","3": 1 1 1 1 1 1 1 1 1 1 ... $ facing.relative.s.death : Factor w/ 4 levels "1","2","3","4": 1 3 1 1 1 1 1 1 1 1 ... $ stolen.valuable.thing : Factor w/ 4 levels "1","2","3","4": 1 2 4 2 1 1 1 1 2 3 ... $ problems.to.face.exams : Factor w/ 4 levels "1","2","3","4": 2 1 1 1 1 2 1 1 2 3 ... $ failure.to.achieve.parents..expectations : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 3 ... $ failure.to.achieve.lecturers..expectations : Factor w/ 3 levels "1","2","3": 2 1 1 1 1 3 1 1 1 3 ... $ failure.to.achieve.expected.exam.results : Factor w/ 4 levels "1","2","3","4": 2 3 1 1 1 3 1 1 2 4 ... $ clinical.problems.due.to.far.away.hospitals : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ... $ clinical.problems.due.to.easily.get.diseases: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 3 ... $ clinical.problems.due.to.financial.problems : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 3 ... $ having.relationship.problems : Factor w/ 4 levels "1","2","3","4": 1 4 1 1 1 2 1 1 1 1 ... $ being.single : Factor w/ 3 levels "1","2","3": 1 2 1 1 1 2 1 1 1 1 ... $ seperation.from.parents : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ... $ parents.devorce : Factor w/ 2 levels "1","3": 1 1 1 1 1 1 1 1 1 1 ... $ having.problems.with.a.close.friend : Factor w/ 4 levels "1","2","3","4": 1 3 1 1 1 1 1 1 1 2 ... $ having.problems.with.relative : Factor w/ 3 levels "1","3","4": 1 1 1 1 1 1 1 1 1 1 ... $ court..police.appearance : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ... $ lack.of.friends : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ... $ lack.of.extracurricular.activities : Factor w/ 4 levels "1","2","3","4": 1 3 2 3 1 1 1 1 2 1 ... $ hair.problems : Factor w/ 4 levels "1","2","3","4": 1 1 3 1 1 1 1 1 2 1 ... $ height.problems : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ... $ problem.with.skin.colour : Factor w/ 3 levels "1","2","3": 1 1 2 1 1 1 1 1 1 3 ... $ physical.disabilities : Factor w/ 4 levels "1","2","3","4": 1 1 2 1 1 1 1 1 2 2 ... $ family.economical.status : Factor w/ 15 levels "1","11","12",..: 1 10 15 14 15 4 8 8 10 7 ... $ doing.a.part.time.job : Factor w/ 2 levels "1","2": 1 2 2 1 2 2 2 2 2 2 ... $ educational.loan : Factor w/ 2 levels "1","2": 1 2 2 2 2 1 1 2 2 2 ... $ having.schoolarships : Factor w/ 2 levels "1","2": 1 2 2 2 2 1 1 1 1 1 ... $ depression : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
I get an error like
model <glm(depression ~. ,data=training,family="binomial") Error in `contrasts<`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels
What should i do?

Gradient descent algorithm in MATLAB
first thank you for taking the time to help. I thought I had a really good handle on the calculus behind gradient descent but for some reason by the algorithm is only getting very close to the optimal parameter values.
My algorithm results in parameter values of 3.5884 , 1.1237 for minimizing the cost function when the correct values are 3.6303 and 1.1664.
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) m = length(y); % number of training examples for iter = 1:num_iters for i = 1:m hypothesis = (theta(1)+theta(2)*X(i,2)); d1 = (1/m)* (hypothesis  y(i)); d2 = (X(i,2)/m)* (hypothesis  y(i)); theta(1) = theta(1)  d1 * alpha; theta(2) = theta(2)  d2 * alpha; end end end

How to write Multiplicative Update Rules for Matrix Factorization when one doesn't have access to the whole matrix?
So we want to approximate the matrix A with m rows and n columns with the product of two matrices P and Q that have dimension mxk and kxn respectively. Here is an implementation of the multiplicative update rule due to Lee in C++ using the Eigen library.
void multiplicative_update() { Q = Q.cwiseProduct((P.transpose()*matrix).cwiseQuotient(P.transpose()*P*Q)); P = P.cwiseProduct((matrix*Q.transpose()).cwiseQuotient(P*Q*Q.transpose())); }
where
P
,Q
, and thematrix
(matrix = A) are global variables in theclass mat_fac
. Thus I train them using the following method,void train_2(){ double error_trial = 0; for (int count = 0;count < num_iterations; count ++) { multiplicative_update(); error_trial = (matrixP*Q).squaredNorm(); if (error_trial < 0.001) { break; } } }
where
num_iterations
is also a global variable in theclass mat_fac
.The problem is that I am working with very large matrices and in particular I do not have access to the entire matrix. Given a triple (i,j,matrix[i][j]), I have access to the row vector P[i][:] and the column vector Q[:][j]. So my goal is to write rewrite the multiplicative update rule in such a way that I update these two vectors every time, I see a nonzero matrix value.
In code, I want to have something like this:
void multiplicative_update(int i, int j, double mat_value) { Eigen::MatrixXd q_vect = get_vector(1, j); // get_vector returns Q[:][j] as a column vector Eigen::MatrixXd p_vect = get_vector(0, i); // get_vector returns P[i][:] as a column vector // Somehow compute coeff_AQ_t, coeff_PQQ_t, coeff_P_tA and coeff_P_tA. for(int i = 0; i< k; i++): p_vect[i] = p_vect[i]* (coeff_AQ_t)/(coeff_PQQ_t) q_vect[i] = q_vect[i]* (coeff_P_tA)/(coeff_P_tA) }
Thus the problem boils down to computing the required coefficients given the two vectors. Is this a possible thing to do? If not, what more data do I need for the multiplicative update to work in this manner?

Tensorflow Projected Gradient Descent with Box Constraints using native Optimizer's apply_gradients
Say that our model parameters w have box constraints (e.g. 0 < w_i < 1). How can I implement projected gradient descent in Tensorflow respecting this constraints when I optimize using a subclass from tf.optimizer (e.g.ADAMOptimizer)?