How to mimic Excel's LOGEST function in Python
I'm interesting in mimic Excel's LOGEST function in Python but have no idea where to start.
1 answer

Here is a graphical fitter using LOGEST as described in https://support.office.com/enus/article/logestfunctionf27462d836574030866ba272c1d18b4b
import numpy, scipy, matplotlib import matplotlib.pyplot as plt from scipy.optimize import curve_fit xData = numpy.array([1.1, 2.2, 3.3, 4.4, 5.0, 6.6, 7.7]) yData = numpy.array([1.1, 20.2, 30.3, 60.4, 50.0, 60.6, 70.7]) # LOGEST from https://support.office.com/enus/article/logestfunctionf27462d836574030866ba272c1d18b4b def func(x, b, m): y = b * m**x return y # these are the same as the scipy defaults initialParameters = numpy.array([1.0, 1.0]) # curve fit the test data fittedParameters, pcov = curve_fit(func, xData, yData, initialParameters) modelPredictions = func(xData, *fittedParameters) absError = modelPredictions  yData SE = numpy.square(absError) # squared errors MSE = numpy.mean(SE) # mean squared errors RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE Rsquared = 1.0  (numpy.var(absError) / numpy.var(yData)) print('Parameters:', fittedParameters) print('RMSE:', RMSE) print('Rsquared:', Rsquared) print() ########################################################## # graphics output section def ModelAndScatterPlot(graphWidth, graphHeight): f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100) axes = f.add_subplot(111) # first the raw data as a scatter plot axes.plot(xData, yData, 'D') # create data for the fitted equation plot xModel = numpy.linspace(min(xData), max(xData)) yModel = func(xModel, *fittedParameters) # now the model as a line plot axes.plot(xModel, yModel) axes.set_xlabel('X Data') # X axis data label axes.set_ylabel('Y Data') # Y axis data label plt.show() plt.close('all') # clean up after using pyplot graphWidth = 800 graphHeight = 600 ModelAndScatterPlot(graphWidth, graphHeight)
See also questions close to this topic

Create new Pandas columns using the value from previous row
I need to create two new Pandas columns using the logic and value from the previous row.
I have the following data:
Day Vol Price Income Outgoing 1 499 75 2 3233 90 3 1812 70 4 2407 97 5 3474 82 6 1057 53 7 2031 68 8 304 78 9 1339 62 10 2847 57 11 3767 93 12 1096 83 13 3899 88 14 4090 63 15 3249 52 16 1478 52 17 4926 75 18 1209 52 19 1982 90 20 4499 93
My challenge is to come up with a logic where both the Income and Outgoing columns (which are currently empty), should have the values of (Vol * Price).
But, the Income column should carry this value when, the previous day's "Price" value is lower than present. The Outgoing column should carry this value when, the previous day's "Price" value is higher than present. The rest of the Income and Outgoing columns, should just have NaN's. If the Price is unchanged, then that day's value is to be dropped.
But the entire logic should start with (n + 1) day. The first row should be skipped and the logic should apply from row 2 onwards.
I have tried using shift in my code example such as:
if sample_data['Price'].shift(1) < sample_data['Price'].shift(2)): sample_data['Income'] = sample_data['Vol'] * sample_data['Price'] else: sample_data['Outgoing'] = sample_data['Vol'] * sample_data['Price']
But it isn't working.
I feel there would be a simpler and comprehensive tactic to go about this, could someone please help ?
Update (The final output should look like this):
For day 16, the data is deleted because we have two similar prices for day 15 and 16.

When I use basemap in python, longitude is sometimes reversed, flipping the map
In python, I use basemap (https://matplotlib.org/basemap/) for plotting spatial data, and I've used it for several years without any large problems. I recently had to reinstall python3 (through conda, along with a number of modules) and basemap now has a strange issue: under certain conditions, the map will be displayed with flipped longitudes, switching east and west. As an example, I use this code: https://matplotlib.org/basemap/users/robin.html. If I use that code asis, the map displays fine, but when I set lon_0=180, the map gets flipped, as shown in the image below.
Setting lon_0 to any positive number results in a flipped map, while 0 or negative numbers result in a correct map. lon_0 should simply set the central longitude of the plotted map, and should not have this behavior, so I'm unsure what's going on. Has anyone seen this behavior before, or have suggestions for how to fix it? I could alter my code to work around it, but I'd rather have things work properly.
I am using python3.7.3. I've tried updating basemap with the command "conda install c anaconda basemap", but it tells me that basemap is up to date already.
Here is the code. It is identical to the code linked above, but with lon_0 set to 180.
from mpl_toolkits.basemap import Basemap import numpy as np import matplotlib.pyplot as plt # lon_0 is central longitude of projection. # resolution = 'c' means use crude resolution coastlines. m = Basemap(projection='robin',lon_0=180,resolution='c') m.drawcoastlines() m.fillcontinents(color='coral',lake_color='aqua') # draw parallels and meridians. m.drawparallels(np.arange(90.,120.,30.)) m.drawmeridians(np.arange(0.,360.,60.)) m.drawmapboundary(fill_color='aqua') plt.title("Robinson Projection") plt.show()
When I run the code, the only output is this, which seems unrelated:
map_test.py:36: MatplotlibDeprecationWarning: The dedent function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use inspect.cleandoc instead. m = Basemap(projection='robin',lon_0=180,resolution='c')
Any ideas?

Use format() on variables
Is it possible to use "{}".format() logic but instead of "", to use a variable?
x = 1 y = 2 a = 'some {} random {} text' print(a.format(x, y))
I expect the following output:
some 1 random 2 text

Excel VBA: Create Array from Filter Field Items?
A report I am creating in Excel involves several very similar pivot tables needing to be specifically filtered many times (i.e. a YeartoDate table, a QuartertoDate table, etc, all needing to be filtered the exact same way before exported, then filtered again, then exported, etc)
So I looked into VBA as a way of accepting a few filter criteria, then filtering multiple tables that way, before looping.
However, I'm having a very tough time properly targeting PivotTables and specific fields, as it appears an integrated Value field is targeted and filtered via code differently than, say, a "filter' field I have attached to the top of the PivotTables, where they can accept no "begins with", "contains", etc, strings. They are just checkboxes, and one or multiple can be selected.
So it's one thing for me to tell it via VBA to select one item, and having it select all but one item. The latter requires the code to target every single possible value, but not the one that I want excluded.
My idea for this, then, is to create an array from every possible existing value in this filter field, then going through a loop where each value is added to my code as a value to check.
I have some code so far:
ActiveSheet.PivotTables("QTD_Pivot_By_Category").PivotFields( _ "[Range].[Address_1].[Address_1]").VisibleItemsList = Array( _ "[Range].[Address_1].&", "[Range].[Address_1].&[0]", "[Range].[Address_1].&[101]" _ , "[Range].[Address_1].&[INC]", "[Range].[Address_1].&[KRT]", _ "[Range].[Address_1].&[LTD]", "[Range].[Address_1].&[RPO]", _ "[Range].[Address_1].&[ INC]", "[Range].[Address_1].&[CORP]", _ "[Range].[Address_1].&[INC.]", "[Range].[Address_1].&[LTD.]", _ "[Range].[Address_1].&[LTEE]", "[Range].[Address_1].&[PAWS]", _
Now, if I just record this macro from actions in Excel, and do "select All", then deselect the one I don't want, it will error. It errors because it's selecting ~300 values, and while it's 'writing' this code, it errors when it hits the limit of "_" delimited breaks in one straight line of VBA code.
If my field is called "Address_1" as above, part of the range..."Range" (not sure where that's defined or why, but it works), can I get some help as to the most efficient way to define said ".VisibleItemList" as all POSSIBLE items in the list from a dynamic array rather than needing to be selected manually? This list will be different daytoday so it can't just be a hardcoded flat list.
Ideally, also in a way that circumvents the max limit on "_" line breaks in a line of code in VBA for Excel.
If it's of any use for context, my table looks like this. See that checkbox dropdown? I want a snapshot of every updated value sitting in there to be put into an array and then iterated upon being added in a way similar to my example code:
Edit: Since that filter field's values are being pulled from a local datasource, I decided to just grab those and make an array that way! So I'm starting my code this way:
Dim OGDataRange As Range, OGDataLastRow As Long Dim ValueArray As Variant OGDataLastRow = Worksheets("DATA QTD").Range("U2").End(xlDown).Row Set OGDataRange = Worksheets("DATA QTD").Range("U2:U" & OGDataLastRow) ValueArray = OGDataRange.Value
"ValueArray" is now my array. So I need help onebyone pulling the values of this array, and adding them to my VisibleItemList as seen above.
Thank you so much for any assistance.

VBA Excel  Sort a daily delivery manifest into groups based on complex rules
I am trying to come up with a way to sort my ExcelTable. I wrote a very basic VBA to do this when it was a random sort only. Now there's complex parameters/rules I have to meet and it's way outside my skillset.
The problem: I receive a daily file with a list of items via shipments like the example below. The list can have as few as 1 and as many as 24 items. I have to sort these by restaurant.
Example Original List ITEM SHIPMENT Oranges 1 Apples 1 Grapes 1 Pears 2 Pork 3 Chicken 4 Rice 5 Peas 5 Beans 5 Water 5 Corn 5 Milk 5 Eggs 5 Salmon 6 Tofu 7 Juice 8 Cheese 8 Salt 8 Pepper 9 Onions 10 Oats 11 Barley 11 Kale 11 Chips 12
The items need to be sorted out to 6 restaurants and there are complex rules:
Overall Rules
 No Restaurant can have more than 1 item from a shipment
 No Restaurant can have more than 4 items
Sorting Rules
 Restaurant 1 always gets the first two items (Items 12)
 Restaurant 25 evenly gets the next 16 items (Items 318)
 Restaurant 6 gets the next 2 items (Items 1920)
 Restaurant 1 and 6 then evenly get the last 4 items (Items 2124)
 If there are more than 6 items in a shipment (more items than restaurants) the extra items stay in the warehouse.
 The Overall Rules override the sorting rules. For example in our example list Restaurant 1 cannot have both Oranges and Apples since they are from the same shipment so the sort changes.
Example Sort Restaurant 1 Shipment Oranges 1 Pears 2 Rice 5 Kale 11 Restaurant 2 Apples 1 Peas 5 Salmon 6 Salt 8 Restaurant 3 Grapes 1 Beans 5 Tofu 7 Pepper 9 Restaurant 4 Pork 3 Water 5 Juice 8 Onions 10 Restaurant 5 Chicken 4 Corn 5 Cheese 8 Oats 11 Restaurant 6 Milk 5 Barley 11 Chips 12 Warehouse Items Eggs 5
Looking at it as a whole now I'm not even sure this is possible and I have no idea how to go about doing it. If anyone has any input I'd love to hear it. Thank you so much for your help.

R: rulled data frame to automate changes on it
I would like to create a sort of data.frame where I can add rules on it just like excel tables. For instance, if I work with symetric 2x2 matrices, I would like that if I change the value at [1,2] then the value at [2,1] will be automatically changed to the same value. I have search for this but I cannot find anything related. Any help is appreciated.

How to apply paralleltempering MCMC to fit a simple line to a data?
I am trying to fit a simple straight line
y=mx+c
type to some synthetic data using paralleltempered mcmc. My goal is to just be able to understand how to use it, so that I can apply to some more complex models later. The example I am trying is the replica of what has already been done in a simple emcee code : http://dfm.io/emcee/current/user/line/ but instead of using mcmc, I want to use paralleltempered mcmc: http://dfm.io/emcee/current/user/pt/import numpy as np from emcee import PTSampler # Choose the "true" parameters. m_true = 0.9594 b_true = 0.694 f_true = 0.534 # Generate some synthetic data from the model. N = 50 x = np.sort(10*np.random.rand(N)) yerr = 0.1+0.5*np.random.rand(N) y = m_true*x+b_true y += np.abs(f_true*y) * np.random.randn(N) y += yerr * np.random.randn(N) def lnlike(theta, x, y, yerr): m, b, lnf = theta model = m * x + b inv_sigma2 = 1.0/(yerr**2 + model**2*np.exp(2*lnf)) return 0.5*(np.sum((ymodel)**2*inv_sigma2  np.log(inv_sigma2))) def lnprior(theta): m, b, lnf = theta if 5.0 < m < 0.5 and 0.0 < b < 10.0 and 10.0 < lnf < 1.0: return 0.0 return np.inf def lnprob(theta, x, y, yerr): lp = lnprior(theta) if not np.isfinite(lp): return np.inf return lp + lnlike(theta, x, y, yerr) import scipy.optimize as op nll = lambda *args: lnlike(*args) result = op.minimize(nll, [m_true, b_true, np.log(f_true)], args=(x, y, yerr)) m_ml, b_ml, lnf_ml = result["x"] ntemps = 20 nwalkers = 100 ndim = 3 pos = [result["x"] + 1e4*np.random.randn(ndim) for i in range(nwalkers)] sampler=PTSampler(ntemps,nwalkers, ndim, lnlike, lnprior, loglargs=(x, y, yerr))# args=(x, y, yerr))#PTSampler(ntemps, nwalkers, ndim, lnlike, lnprior) for p, lnprob, lnlike in sampler.sample(pos, iterations=1000): pass sampler.reset() for p, lnprob, lnlike in sampler.sample(pos, lnprob0=lnprob, lnlike0=lnlike, iterations=10000, thin=10): pass assert sampler.chain.shape == (ntemps, nwalkers, 1000, ndim) samples = sampler.chain.reshape((1, ndim)) import corner fig = corner.corner(samples, labels=["$m$", "$b$", "$\ln\,f$"], truths=[m_true, b_true, np.log(f_true)]) fig.savefig("triangle.png")
I get the following error when running this code:
for p, lnprob, lnlike in sampler.sample(pos, iterations=1000): ValueError: cannot reshape array of size 100 into shape (20,100)
Any suggestions on how to fix this?

Generate data and curve fit from an equation with Y on both sides
I want to fit the curves generated by this equation (1) :
(1) ID=K*(W/L)*[(VGVT*(ID*RD/2))**(alpha1) * (VDID*RD)(11/alpha)*(VDID*RD)**alpha]
However the variable 'ID' is on both sides. I am tyring to find a way to modify the code given in this example from scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
import numpy as np import matplotlib.pyplot as plt from scipy.optimize import curve_fit #the function for ID, with ID in the arguments. def func(K, W, L, VG, VT, ID, RD, alpha, VD ): return K*(W/L)*[(VGVT*(ID*RD/2))**(alpha1) * (VDID*RD)(11/alpha)*(VDID*RD)**alpha] #x_data VG_data = np.linspace(6, 16, 300) #y_data: ID is also an argument of the function here ID = func(VG=VG_data, K=5.2*1e7, W=320*1e6, L=20*1e6, VT=0.68, VD=5e3, alpha=2.2, RD=5.79) popt, pcov = curve_fit(func, VG_data, ID) popt plt.plot(VG_data, func(VG_data, *popt), 'r')
When I run it, I obviously get the error when trying to generate ID:
> 27 ID = func(VG=VG_data, K=5.2*1e7, W=320*1e6, L=20*1e6, VT=0.68, VD=5e3, alpha=2.2, RD=5.79) 28 29 TypeError: func() missing 1 required positional argument: 'ID'
How do I generate and fit data when I have the 'y' variable on both sides?
EDIT
I'm adding a link to example data (VG, ID) and the code to plot it:
import pandas as pd import matplotlib.pyplot as plt
url='https://raw.githubusercontent.com/leoUninova/Transistoraltairplots/master/df1.csv' df=pd.read_csv(url) #a single transistor data df=df.loc[df.Name=='30025010.010.0'] VG=df.VG ID=df.absID plt.plot(VG,ID) plt.show()

fit a curve through points using python
Hi everyone i'm trying to fit a curve through points using python, however i have not been succed, i'm a beginner using python and what i found it didn't help me.
I have a set of data and I want to compare which line describes it best (polynomials of different orders).
I use Python and Numpy and for polynomial fitting there is a function polyfit() and polyval(). But I always get this error, and I do not know what it means:
File "plantilla.py", line 28, in <module> polinomio=np.polyfit(x,y,5) File "/usr/lib/python2.7/distpackages/numpy/lib/polynomial.py", line 581, in polyfit c, resids, rank, s = lstsq(lhs, rhs, rcond) File "/usr/lib/python2.7/distpackages/numpy/linalg/linalg.py", line 1867, in lstsq 0, work, lwork, iwork, 0) ValueError: On entry to DLASCL parameter number 4 had an illegal value import pandas as pd from matplotlib import pyplot as plt from scipy.optimize import curve_fit import numpy as np import sympy as sym # data=pd.read_csv('radiacion.dat',header=None,delim_whitespace=True) x=data.ix[:,0] y=data.ix[:,1] """ x=np.array(x,dtype=float) y=np.array(y,dtype=float) """ # plt.plot(x,y,'r',label="Original Data") plt.title('Radiacion') plt.xlabel('t(s)' ,fontsize=14,fontweight='bold') plt.ylabel('G(w/m)',fontsize=14,fontweight='bold') plt.xticks(fontsize=10,fontweight='bold') plt.yticks(fontsize=10,fontweight='bold') plt.show () #plt.hold (True) # polinomio=np.polyfit(x,y,5) print (polinomio) yP=np.polyval(poli,x) plt.plot(x,yp,'b+',label="fitted cuerve")
I expected something like that: Evaluate a polynomial at specific values.
p[0]*x**(N1) + p[1]*x**(N2) + ... + p[N2]*x + p[N1]
input.data
25200 17 25800 38 26400 58 27000 93 27600 129 28200 163 28800 192 29400 234 30000 329 30600 387 31200 411 31800 460 32400 513 33000 569 33600 576 34200 635 34800 645 35400 683 36000 715 36600 747 37200 780 37800 810 38400 833 39000 862 39600 885 40200 910 40800 929 41400 945 42000 955 42600 974 43200 986 43800 985 44400 999 45000 1001 45600 993 46200 993 46800 999 47400 992 48000 985 48600 980 49200 978 49800 963 50400 959 51000 939 51600 917 52200 884 52800 881 53400 860 54000 845 54600 820 55200 812 55800 767 56400 720 57000 650 57600 619 58200 595 58800 541 59400 533 60000 504 60600 456 61200 389 61800 320 62400 285 63000 243 63600 279 64200 231 64800 192 65400 137 66000 91 66600 58 67200 38 67800 22 68400 9