Not a duplicate  How can I search for first occurrence of a number less than threshold in a 1D numpy array?
This question was incorrectly marked as a duplicate.
I have an n x 1 numpy array. I want to find the first occurrence of an entry in the array that is less than a threshold.
my code is as follows:
import numpy as np
aa = np.array([4,3,5,7])
print(aa)
np.argmin(aa<3)
output:
[ 4 3 5 7]
0
I expect argmin to return 2 but I'm getting 0. How can I make this work?
See also questions close to this topic

Selenium Python Unable to scroll down, while fetching google reviews
I am trying to fetch google reviews with the help of selenium in python. I have imported webdriver from selenium python module. Then I have initialized self.driver as follows:
self.driver = webdriver.Chrome(executable_path="./chromedriver.exe",chrome_options=webdriver.ChromeOptions())
After this I am using the following code to type the company name on google homepage whose reviews I need, for now I am trying to fetch reviews for "STANLEY BRIDGE CYCLES AND SPORTS LIMITED ":
company_name = self.driver.find_element_by_name("q") company_name.send_keys("STANLEY BRIDGE CYCLES AND SPORTS LIMITED ") time.sleep(2)
After this to click on the google search button, using the following code:
self.driver.find_element_by_name("btnK").click() time.sleep(2)
Then finally I am on the page where I can see results. Now I want to click on the View on google reviews button. For that using the following code:
self.driver.find_elements_by_link_text("View all Google reviews")[0].click() time.sleep(2)
Now I am able to get reviews, but only 10. I need at least 20 reviews for a company. For that I am trying to scroll the page down using the following code:
self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") time.sleep(5)
Even while using the above code to scroll the down the page, I am still getting only 10 reviews. I am not getting any error though.
Need help on how to scroll down the page to get atleast 20 reviews. As of now I am able to get only 10 reviews. Based on my online search for this issue, people have mostly used: "driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")" to scroll the page down whenever required. But for me this is not working. I checked the the height of the page before and after ("driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")") is the same.

Create a new column from itereated rows of timedate data
I am attempting to create a downward velocity model for offshore drilling which uses the variables Depth (which increases every 1 foot) and DateTime data which is more intermittent and is only updated every foot of depth:
Dept DateTime 1141 5/24/2017 04:31 1142 5/24/2017 04:32 1143 5/24/2017 04:40 1144 5/24/2017 04:42 1145 5/25/2017 04:58
I am trying to get something like this:
Where Velocity iterated down dept/(DateTime gap)

one to one mapping in shell script
I am in process of migration. Migrating from old set of servers to new set of servers, where there is no logical relationship in the server names between the 2 sets. I have a script that runs on old server, takes all necessary backups and the run another script to copy the backups to new server and execute it.
I can combine both scripts(taking backup and copying to new server), if I can include a logic to map the old server to new server. Is there a way I can do this.
Old server New server King Queen Bat Ball water fire sand rock
What I am expecting is, if the script is run on server 'King', I want the script to identify that the corresponding new server is 'Queen' and copy the backups to Queen.

Sorting in R and Numpy
I am trying to convert some R code into numpy. I have a vector as follows:
r=[2.00000 1.64000 1.36000 1.16000 1.04000 1.00000 1.64000 1.28000 1.00000 0.80000 0.68000 0.64000 1.36000 1.00000 0.72000 0.52000 0.40000 0.36000 1.16000 0.80000 0.52000 0.32000 0.20000 0.16000 1.04000 0.68000 0.40000 0.20000 0.08000 0.04000 1.00000 0.64000 0.36000 0.16000 0.04000 0.00000]
I am trying to convert following R code
index < order(r)
into numpy by following code
index = np.argsort(r)
Here are the results
Numpy
index=array([35, 29, 34, 28, 33, 23, 27, 22, 21, 32, 17, 16, 26, 15, 20, 11, 31,25, 10, 14, 9, 19, 30, 5, 8, 13, 4, 24, 18, 3, 7, 12, 2, 6, 1, 0])
R
index= [36 30 35 29 24 34 23 28 22 18 33 17 27 16 21 12 32 11 26 15 10 20 6 9 14 31 5 25 4 19 8 3 13 2 7 1]
As you see the results are different. How can I obtain results of R in numpy

Return value of highest index in numpy 3D array
I have a 3D array in numpy that includes nans. I need to return the value with the greatest index position along the 0 axis. The answer would reduce to a 2D array.
There are a lot of questions about finding the index position of a maximum value along an axis (How to get the index of a maximum element in a numpy array along one axis), but that is different than what I need.
Example 3D array:
>>> import numpy as np >>> foo = np.asarray([[[7,4,6],[4,2,11], [7,8,9], [4,8,2]],[[1,2,3],[np.nan,5,8], [np.nan,np.nan,10], [np.nan,np.nan,7]]]) >>> foo array([[[ 7., 4., 6.], [ 4., 2., 11.], [ 7., 8., 9.], [ 4., 8., 2.]], [[ 1., 2., 3.], [ nan, 5., 8.], [ nan, nan, 10.], [ nan, nan, 7.]]])
I thought I was getting close using np.where but it returns all elements that are not nan. Not quite what I need because I want a
(4,3)
array.>>> zoo = foo[np.where(~np.isnan(foo))] >>> zoo array([ 7., 4., 6., 4., 2., 11., 7., 8., 9., 4., 8., 2., 1., 2., 3., 5., 8., 10., 7.])
The answer I need is:
>>> ans = np.asarray([[1,2,3], [4,5,8], [7,8,10], [4,8,7]]) >>> ans array([[ 1, 2, 3], [ 4, 5, 8], [ 7, 8, 10], [ 4, 8, 7]])
EDIT: I edited the foo example array to make the question more clear.

How to know which axis to do operation along an axis in numpy?
I've been working with
numpy
andpandas
for a long time but I'm still usually confused by the concept of doing an operation along an axis.For example, if I have a data of shape [200,5], and I want to find the mean with the resulting shape [1,5], I would first call
data.mean(axis=0)
, and if it doesn't work, I would trydata.mean(axis=1)
.Turns out,
axis=0
is correct in this case. But I don't have good terminology for me to remember which axis to use.Currently, I think that whatever axis I want to reduce the shape to 1, I will need to apply the operation on that axis. This works fine for Reduction operation like
mean
,sum
orstd
.But I don't know how to think when I would like to apply operations that do not reduce the shape like
divide
,add
,sort
, etc. (Fordivide
andadd
of different shapes, broadcasting is involved)So it made me curious about how the guy who created
pandas
andnumpy
think intuitively about this. It made curious about what they exactly mean when they say "sorting along the row axis".I want to understand it so clear that I know what results I'm going to expect when I call a certain axis!

NaNs at the end of an array are creeping in when using interp1d
I have 2 2D arrays of x and y positions that express values of a contour plot. As I translate the contour plot multiple times, I need to track the original coordinates as they move too, so that I can express a final displacement at all points of the original image.
I am writing in python3 using scipy.interpolate.interp1d. After one iteration, I have no problems, and the coordinates have been successfully interpolated back onto the original grid. I print out the final 10 elements of one of the columns in the 2D array:
import numpy as np from scipy.interpolate import interp1d print(y_coordinate[10:, 20]) >>> [ 60. 61. 62. 63. 64. nan nan nan nan nan]
I have nans because the displacement was 5 in the y direction of an image 64 pixels wide, so the 5 pixels at the top have shifted down. The problem is, that in the next step I have zero displacement in the y direction, so the points do not move here, but they might move elsewhere in my large array, so need to interpolate over the entire array.
I interpolate column by column using scipy.interpolate.interp1d as so (and remove the indices of the array for clarity, but know they are the same as the previous lines above):
f = interp1d(y, y_coordinate, kind='linear', bounds_error = False) new_y_coordinate = f(y)
where y_coordinate is as printed out above, y is
>>>[55,56,57,58,59,60,61,62,63,64]
So essentially I want it to return the exact same values as they already lie on the points I want it to interpolate onto. Instead, I get this:
print(new_y_coordinate) >>> [ 60. 61. 62. 63. nan nan nan nan nan nan]
I've got the nan creeping in to good data. What's strange is first of all I tried this with complex numbers and thought it was an error to do with those. I then changed it to be two arrays of real numbers and found the error was solved on the left hand side of the array (for x direction translations) but is now occurring on the top for y displacement translations. A simple test I did with a similar situation didn't have this error at all.
Any ideas on what is happening and how I can solve it?

Scipy script not working as expected even after working with toy values
i'm trying to write a curve fit and chi squared script. i have some sample data, but for some reason the fit doesn't come out as expected, and i get chi squared values that don't make sense (i calculated it with matlab already so i know what to expect). I have no idea why the script doesn't work, since it already worked fine with toy values. Here is the script:
import matplotlib.pyplot as plt import matplotlib as matp import numpy as np import scipy.optimize as opt import math import pandas as pd from scipy import stats #Read the csv file to a DataFrame df = pd.read_csv('HarmonicData.csv') #Define data xdata = df.m dX = df.dm ydata = df.Tavg dY = df.dTavg def func(x, a, b): return a/x**2 + b #Title and axis title Variables title = 'Fit' ytitle = 'Tavg [sec]' xtitle = 'r [cm]' #Define plot and add errorbars fig, ax = plt.subplots() ax.errorbar(xdata, ydata, yerr=dY, xerr=dX, fmt='o', ms=2) #Axis Title Setting ax.set_title(title) ax.set_ylabel(ytitle) ax.set_xlabel(xtitle) #Curve parameter initial guess paraguess = ([0.230000, 0.300000]) #Curve Fit parafit, pcov = opt.curve_fit(func, xdata, ydata, p0=paraguess) #Reduced Chi squared chi_sq = np.sum(((func(xdata, *parafit)ydata)/dX)**2) red_chi_sq = (chi_sq)/(len(xdata)len(parafit)) #Print results print('initial guess for parameters: a= %.8f b= %.8f' % tuple(paraguess)) print ('The degrees of freedom for this test is', len(xdata)len(parafit)) print ('The chi squared value is: ',("%.2f" %chi_sq)) print ('The reduced chi squared value is: ',("%.2f" %red_chi_sq)) #Show and plot plt.plot(xdata, func(xdata, *parafit), 'r', label='fit: a=%.3f b=%.3f' % tuple(parafit)) plt.legend() plt.show()
I have no idea why this happened, and i can't find a reason to it. I would love some help since i'm not very experienced with scipy/python in general

seaborn distplot() shows negative values when display a gamma distribtuion
I'm trying to display a gamma distribution by using
sns.distplot()
to aviod plotting the pdf of it like the following:import seaborn as sns import numpy as np from scipy.stats import gamma # creating a population shape, scale, size = 2, 1, 200 pop = np.random.gamma(shape, scale, size) # plotting sns.distplot(pop, hist=False, kde=False, fit=gamma, color='r', fit_kws={'color': 'r', 'linewidth': 2.5})
For some reason, seaborn shows the following plot with a left tail, but I checked the
pop.min()
and there is no negative vlaue. I must useplt.xlim(0)
to avoid it. Why did this happen and is there any way to avoid it? 
How to vectorize a rolling list comprehension over pandas.DataFrame
I have the following list comprehension:
[(foo(df.iloc[old_row[0]:old_row[1]].cov(), df.iloc[new_row[0]:new_row[1]].cov()), new_row) for new_row in new_rows_list if old_row != new_row]
where,
df
is an 20000 rows 50 columnspandas.DataFrame
old_row
is atuple
which gives coordinates to the rows, whereold_row[0]
<old_row[1]
(e.g. (50, 100))new_row
is the same asold_row
, but with different valuesnew_rows_list
is alist
of thenew_row
tuples
Essentially what I am trying to achieve is to do perform
foo
on twopandas.DataFrames
, where one is the same and the other one changes by the loop. The loop simply takes chunks of theDataFrame
consecutively (e.g. first from row 0 to 50, then 51 to 100, then 101 to 150 etc.).I tried using
np.vectorize
, however as bothnew_rows_list
anddf
are iterables, I cannot achieve any results with this function.The whole output returns a list of tuples, where one tuple is (
foo
's result,new_row
)I am sure there is a way to escape this loop, but I am stuck here.
Let me know if more clarification is needed, in case my explanation is not sufficient.

Is there a performance advantage to defining numbers as parameters in FORTRAN
I am experimenting with writing vectorized FORTRAN subroutines to be incorporated in the Abaqus finite element solver. Some learning materials define constant numbers which are used in formulae as parameters in the beginning of the code, e.g.:
parameter ( zero = 0.d0, one = 1.d0, two = 2.d0, third = 1.d0 / 3.d0, half = 0.5d0, op5 = 1.5d0)
So instead of writing
0.5 * a
one would writehalf * a
. Is there a performance advantage to this?EDIT: I dug deeper and found this in page 11 (slide A3.22) of this file:
The PARAMETER assignments yield accurate floating point constant definitions on any platform.

PYETO package + TypeError: cannot convert the series to <class 'float'>
I am using the PYETO package to calculate the evapotranspiration. My initial code with was using (iterrows) was very slow, so I am trying to vectorize the code to reduce computational time. This is the initial code:
This is the new code:
%%time for i in range(1,13): ETo = 'ETo_{}'.format(i) wind = 'wind_{}'.format(i) srad = 'srad_{}'.format(i) tmin = 'tmin_{}'.format(i) tmax = 'tmax_{}'.format(i) tavg = 'tavg_{}'.format(i) df2[ETo]= evap_i(df['lat'],df2['elevation'],df2[wind],df2[srad],df2[tmin],df2[tmax].values,df2[tavg],i)
I have already changed all the inputs to (float) using (df2.applymap(float)) but still I am getting this error msg:
TypeError: cannot convert the series to <class 'float'>