Choose the number of samples to average the gradient on in the SGDClassifier of Scikitlearn
I'm aware that the SGDClassifier in Scikitlearn
picks one random sample from the training dataset each time to calculate the gradient, and updates the model weights (w
and b
) accordingly.
My question is that among the parameters in the SGDClassifier, there doesn't seem to be an option to select the number of samples to pick each time (instead of just one instance) to average the gradient on? This would give us a Minibatch Gradient Descent
.
I've already had a look at the partial_fit()
method which gets chunks of the training dataset each time to train on, but when using this in the SGDClassifier
, doesn't it just boil down to picking a random training instance from the chunk, instead of choosing it from the whole dataset?
See also questions close to this topic

Create new Pandas columns using the value from previous row
I need to create two new Pandas columns using the logic and value from the previous row.
I have the following data:
Day Vol Price Income Outgoing 1 499 75 2 3233 90 3 1812 70 4 2407 97 5 3474 82 6 1057 53 7 2031 68 8 304 78 9 1339 62 10 2847 57 11 3767 93 12 1096 83 13 3899 88 14 4090 63 15 3249 52 16 1478 52 17 4926 75 18 1209 52 19 1982 90 20 4499 93
My challenge is to come up with a logic where both the Income and Outgoing columns (which are currently empty), should have the values of (Vol * Price).
But, the Income column should carry this value when, the previous day's "Price" value is lower than present. The Outgoing column should carry this value when, the previous day's "Price" value is higher than present. The rest of the Income and Outgoing columns, should just have NaN's. If the Price is unchanged, then that day's value is to be dropped.
But the entire logic should start with (n + 1) day. The first row should be skipped and the logic should apply from row 2 onwards.
I have tried using shift in my code example such as:
if sample_data['Price'].shift(1) < sample_data['Price'].shift(2)): sample_data['Income'] = sample_data['Vol'] * sample_data['Price'] else: sample_data['Outgoing'] = sample_data['Vol'] * sample_data['Price']
But it isn't working.
I feel there would be a simpler and comprehensive tactic to go about this, could someone please help ?
Update (The final output should look like this):
For day 16, the data is deleted because we have two similar prices for day 15 and 16.

When I use basemap in python, longitude is sometimes reversed, flipping the map
In python, I use basemap (https://matplotlib.org/basemap/) for plotting spatial data, and I've used it for several years without any large problems. I recently had to reinstall python3 (through conda, along with a number of modules) and basemap now has a strange issue: under certain conditions, the map will be displayed with flipped longitudes, switching east and west. As an example, I use this code: https://matplotlib.org/basemap/users/robin.html. If I use that code asis, the map displays fine, but when I set lon_0=180, the map gets flipped, as shown in the image below.
Setting lon_0 to any positive number results in a flipped map, while 0 or negative numbers result in a correct map. lon_0 should simply set the central longitude of the plotted map, and should not have this behavior, so I'm unsure what's going on. Has anyone seen this behavior before, or have suggestions for how to fix it? I could alter my code to work around it, but I'd rather have things work properly.
I am using python3.7.3. I've tried updating basemap with the command "conda install c anaconda basemap", but it tells me that basemap is up to date already.
Here is the code. It is identical to the code linked above, but with lon_0 set to 180.
from mpl_toolkits.basemap import Basemap import numpy as np import matplotlib.pyplot as plt # lon_0 is central longitude of projection. # resolution = 'c' means use crude resolution coastlines. m = Basemap(projection='robin',lon_0=180,resolution='c') m.drawcoastlines() m.fillcontinents(color='coral',lake_color='aqua') # draw parallels and meridians. m.drawparallels(np.arange(90.,120.,30.)) m.drawmeridians(np.arange(0.,360.,60.)) m.drawmapboundary(fill_color='aqua') plt.title("Robinson Projection") plt.show()
When I run the code, the only output is this, which seems unrelated:
map_test.py:36: MatplotlibDeprecationWarning: The dedent function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use inspect.cleandoc instead. m = Basemap(projection='robin',lon_0=180,resolution='c')
Any ideas?

Use format() on variables
Is it possible to use "{}".format() logic but instead of "", to use a variable?
x = 1 y = 2 a = 'some {} random {} text' print(a.format(x, y))
I expect the following output:
some 1 random 2 text

Delete points in a plot until the 2D KDE is everywhere under a certain level
Imagine you have a 2d dataset and you want to draw the points in a plot. But there are too many.
With a 2D KDE (kernel density estimation) I can determine the density of the points over all in the dataset. I want to delete points at the densest areas until the kernel density is below a certain value everywhere. How can I do that?
Yes, there are better ways to solve the problem of overplotting, but my problem is a bit different, so I have to do it exactly as described.

Setting the criteria for node splitting in a Decision Tree
I want to give a custom metric that should be optimised to decide while split to use for a decision tree, to replace the standard 'gini index'. How can I do this in any Decision Tree package. Could be boosted decesion trees.

What algorithm would be optimal for a mitochondrial dynamics computer vision project?
I'm doing a project on mitochondrial fission and fusion and I want to build a computer vision program that can tell how divided mitochondria are from microscope images. Here are example images with classification: http://essays.biochemistry.org/content/ppebio/62/3/341/F1.large.jpg What algorithm would be suitable for my goal?

Why Tfidf won't calculate it correctly?
I'm doing dialect text classification. In the following image, if you sum the tfidf of the column in egypt, it will be smaller in the column of hijazi. But even though, it still classify the text as egypt and hijazi. Is this how tfidf works? you just sum the the values and if it's bigger then it would be the correct classification?
Because this is an arabic text, you can assume we have this english text:
"You can do your best"
and then replace every word in that english text in the tokens columnsfrom sklearn.pipeline import Pipeline text_clf = Pipeline([ ('vect', TfidfVectorizer(stop_words=ar_stop)), ('clf', MultinomialNB())])

How to use ColumnTransformer for categorical data?
I am trying to preprocess data.
data = {'Country':['Germany', 'Turkey', 'England', 'Turkey', 'Germany', 'Turkey'], 'Age':['44', '32', '27', '29', '31', '25'], 'Salary':['5400', '8500', '7200', '4800', '6200', '10850'], 'Purchased':['yes', 'yes', 'no', 'yes', 'no', 'yes']} df = pd.DataFrame(data) X = df.iloc[:,0].values
Expected result is like this:
  1  0  0  44  5400  1   0  1  0  32  8500  1   0  0  1  27  7200  0   0  1  0  29  4800  1   1  0  0  31  6200  0   0  1  0  25  10850  1 
Here is the code that failed.
from sklearn.preprocessing import OneHotEncoder from sklearn.compose import ColumnTransformer ct = ColumnTransformer([("city_category", OneHotEncoder(dtype='int'), [0])], remainder="passthrough") X = ct.fit_transform(X)
Output:
IndexError: tuple index out of range
I want to learn that how to use ColumnTransformer function in this situations?

A puzzle about minibath SGD code implementations
I get a code implementation of minibatch SGD as follows:
# compute_loss function is used to calculate loss function # compute_grad function is used to calculate the gradient of loss function at w for i in range(n_iter): ind = np.random.choice(X_expanded.shape[0], batch_size) loss[i] = compute_loss(X_expanded, y, w) dw = compute_grad(X_expanded[ind, :], y[ind], w) w = w  eta * dw
I get puzzled about
ind = np.random.choice(X_expanded.shape[0], batch_size)
.
In my opinion, this line of code is used for stochastic, so we can get a stochastic direction in gradient descent.
BUT I am not really certain about what I thought.
Could anyone give me some hints or explanations about it. 
overflow of square funcion during gradient descent calcultion
i written the linear regression ( in one variable) along with gradient descent, it is working fine for smaller dataset, but for larger data set, it is giving error as:
OverflowError: (34, 'Numerical result out of range')
the code directing error in following part :
def gradient_des ( theta0, theta1, x, y): result = 0; sumed = 0; if len(x) == len(y): for i in range(len(x)): sumed = sumed + ( line(theta0,theta1,x[i])  y[i])**2 #error shown in this line. result = sumed / (2 * len(x)) return result else: printf("x and y are of inequal length") # in general cases for x and y, which were generated for testing purposes below x = [] for i in range(10): x = x + [i] print(x) #x = [1,2,3,4,5,6] y = [ 0 for _ in range(len(x))] for i in range(len(y)): y[i] = random.randint(100,100) print(y) # y = [13,10,8.75,4,5.5,2]
why is this overflow occuring,
after that in code, changing the learning factor ( i.e. alpha,) sometimes code run for alpha =0.1 but not for alpha = 1 [ for smaller known dataset ]
def linear_reg (x,y): if len(x) == len(y): theta0 = random.randint(10,10) theta1 = random.randint(10,10) alpha = 0.1 # problem in how to decide the the factor to be smal or large while gradient_des(theta0,theta1,x,y) != 0 : # probably error in this converging condition temp0 = theta0  alpha * summed_lin(theta0,theta1,x,y) temp1 = theta1  alpha * summed_lin_weighted(theta0,theta1,x,y) # print(temp0) # print(temp1) if theta0 != temp0 and theta1 != temp1: theta0 = temp0 theta1 = temp1 else: break; return [theta0,theta1] else: printf("x and y are of inequal length")
for value of alpha = 1, it gives same error as above shouldn't the regression be independent of alpha,( for smaller values )
the full code is here : https://github.com/Transwert/General_purposes/blob/master/linreg.py

Why can't Forward Stagewise Additive Modeling work with absolute loss function?
In Forward Stagewise Additive Modeling, if the loss function is squared loss, the next weak learner fits to the residual error.
Why not we do like this when the loss function is absolute error or others?
Why is the Gradient Boosting more better in this situation?