python parallel processing for loop optimisation
I am new to Parallel Processing in Python. I have a code with "for" loop in it and it iterates over the data in a Json file. I want to Optimize it by running all "names" this "for "loop picks in parallel. When i tried "multiprocessing " it failed again n again, as it take iterable inputs and is throwing Py4j error mostly. Stuck here, help n suggest. eg.
for names in column[attribute]:
view_1="create view do something"
view_2="do something on view1"
Suppose, column[attribute] has multiple "names" then how to run all these in parallel. ??
See also questions close to this topic

reducer to find the most popular movie for each age group in python
I am trying to write mapper reducer for Hadoop to find the movies with 5 rating "the popular movies" for each age group.
I write this
mapper.py
to join the tow data set with the user Id to get the age from user data and the rating with the movie name from the rating data set .!/usr/bin/env python:
import sys for line in sys.stdin: # remove leading and trailing whitespace line = line.strip() line = line.split("::") rating = "1" movie = "1" user = "1" age = "1" if len(line) == 4 : #ratings data rating = line[2] movie = line[1] user = line[0] #print '%s %s %s' % (user,movie,rating) else: # users data user = line[0] age = line[2] print '%s\t%s\t%s\t%s' % (user,age,rating,movie)
this is the data structure rating data: userid, movieid, rating, timestamp user data: userid, gender, age, occupation
The reducer I wrote is not working at all it gave me 0 result.
I want the result to be the top popular movies for each age group:
1 2254 4567 18 8732 0987 0986 25 7654 8765 7658 35 6543 7645 7654 45 7654 8765 5433 50 7652 1876 7654 56 3986 3956

How to compare two columns from two DFs keeping some column constants and print row?
I'm working on a project where I have to find the changes done in second sheet (specific column) as compare to primary/Master sheet. after that I wanted to print or save the complete row in which changes are found. here are more details. both the excel sheets have many columns my master sheet has data something like as follows:
TID LOC HECI RR UNIT SUBD S EUSE INV ACT CAC FMT CKT DD SCID CUSTOMER F&E/SERVICE ID BVAP PORD AUTH RULE ST RGN CHCGILDTO3P050101D CHCGILDTO3P M3MSA0S1RA 0501.01D 1A1 IE D STR3RA8 S CL/HFFS/688898 /LGT 20180721 BLOOMBERG LP DS316668545 WMS881282 E.485339 IL N CHCGILDTO3P050101D CHCGILDTO3P M3MSA0S1RA 0501.01D 1A2 IE J DNA UNDER DECOM EID 2466 20190322 WMS881282 E.485339 IL N CHCGILDTO3P050101D CHCGILDTO3P M3MSA0S1RA 0501.01D 1A3 IE J DNA UNDER DECOM EID 2466 20190322 WMS881282 E.485339 IL N CHCGILDTO3P050101D CHCGILDTO3P M3MSA0S1RA 0501.01D 1A4 IE J DNA UNDER DECOM EID 2466 20190322 WMS881282 E.485339 IL N CHCGILDTO3P050101D CHCGILDTO3P M3MSA0S1RA 0501.01D 1A5 IE J DNA UNDER DECOM EID 2466 20190322 WMS881282 E.485339 IL N
and my second sheet has data as follows :
HECI UNIT INV SUB ACT CKT PACT DD LOC RR M3MSA0S1RA 1A1 IE $ CL/HFFS/688898 /LGT D 72118 CHCGILDTO3P 0501.01D M3MSA0S1RA 1A2 IE J DNA UNDER DECOM EID 2466 32219 CHCGILDTO3P 0501.01D M3MSA0S1RA 1A3 IE J DNA UNDER DECOM EID 2466 32219 CHCGILDTO3P 0501.01D M3MSA0S1RA 1A4 IE J DNA UNDER DECOM EID 2466 32219 CHCGILDTO3P 0501.01D M3MSA0S1RA 1A5 IE J DNA UNDER DECOM EID 2466 32219 CHCGILDTO3P 0501.01D
so first i want to check if the values of LOC HECI RR & UNIT are same in both the sheets I want to move forward and comapre ACT column and print the difference as output.
for example you can see row #1 in Master data ACT is 'D' and where as in second sheet its changes to '$'
so I want output something like related complete row which says its changes from 'D' to '$'
this seems very complicated to me as I'm at beginning stage of python and pandas.
I tried using loops but unable to execute also if I use too much loop that's not the pandas way I believe
here is my code:
import pandas as pd df1 = pd.read_excel("Master Database.xlsx") df2 = pd.read_excel("CHCGILDTO3P_0501.01D.xlsx") d1_act = df1['ACT'] d2_act = df2['ACT'] for index1, row1 in df1.iterrows(): for index2, row2 in df2.iterrows(): if(row1['LOC'],row1['HECI'],row1['RR']) ==(row2['LOC'],row2['HECI'],row2['RR']): for x in d1_act and y in d2_act: #print(x,y) if x != y: print (x, y) # not getting how to print complete respective row else: pass else: pass
I want ouput like:
M3MSA0S1RA 1A1 IE $ CL/HFFS/688898 /LGT D 72118 CHCGILDTO3P 0501.01D
changes from 'D to '$'
please assist ! thank you in advance!

merge duplicate cells of a column
My Current excel looks like:
  Type  Val    A  1    A  2    B  3    B  4    B  5    C  6 

This is the required excel:
  Type  Val  Sum    A  1  3       2     B  3  12       4        5     C  6  6  
Is it possible in python using pandas or any other module?

why is this c code causing a race condition?
I'm trying to count the number of prime numbers up to 10 million and I have to do it using multiple threads using Posix threads(so, that each thread computes a subset of 10 million). However, my code is not checking for the condition
IsPrime
. I'm thinking this is due to a race condition. If it is what can I do to ameliorate this issue?I've tried using a global integer array with k elements but since k is not defined it won't let me declare that at the file scope.
I'm running my code using gcc pthread:
/* Program that spawns off "k" threads k is read in at command line each thread will compute a subset of the problem domain(check if the number is prime) to compile: gcc pthread lab5_part2.c o lab5_part2 */ #include <math.h> #include <stdio.h> #include <time.h> #include <pthread.h> #include <stdlib.h> typedef int bool; #define FALSE 0 #define TRUE 1 #define N 10000000 // 10 Million int k; // global variable k willl hold the number of threads int primeCount = 0; //it will hold the number of primes. //returns whether num is prime bool isPrime(long num) { long limit = sqrt(num); for(long i=2; i<=limit; i++) { if(num % i == 0) { return FALSE; } } return TRUE; } //function to use with threads void* getPrime(void* input){ //get the thread id long id = (long) input; printf("The thread id is: %ld \n", id); //how many iterations each thread will have to do int numOfIterations = N/k; //check the last thread. to make sure is a whole number. if(id == k1){ numOfIterations = N  (numOfIterations * id); } long startingPoint = (id * numOfIterations); long endPoint = (id + 1) * numOfIterations; for(long i = startingPoint; i < endPoint; i +=2){ if(isPrime(i)){ primeCount ++; } } //terminate calling thread. pthread_exit(NULL); } int main(int argc, char** args) { //get the num of threads from command line k = atoi(args[1]); //make sure is working printf("Number of threads is: %d\n",k ); struct timespec start,end; //start clock clock_gettime(CLOCK_REALTIME,&start); //create an array of threads to run pthread_t* threads = malloc(k * sizeof(pthread_t)); for(int i = 0; i < k; i++){ pthread_create(&threads[i],NULL,getPrime,(void*)(long)i); } //wait for each thread to finish int retval; for(int i=0; i < k; i++){ int * result = NULL; retval = pthread_join(threads[i],(void**)(&result)); } //get the time time_spent clock_gettime(CLOCK_REALTIME,&end); double time_spent = (end.tv_sec  start.tv_sec) + (end.tv_nsec  start.tv_nsec)/1000000000.0f; printf("Time tasken: %f seconds\n", time_spent); printf("%d primes found.\n", primeCount); }
the current output I am getting: (using the 2 threads)
Number of threads is: 2
Time tasken: 0.038641 seconds
2 primes found.

How to abort thead after a timeout running DriveInfo.IsReady on bad drive
I am aware that aborting thread should be avoided, but in this case DriveInfo.IsReady on UI thread is never returned and program freezes.
I have found several SD card readers that cause this issue (each card slot that does not have a card in it will freeze program checking isReady property (or any other).
foreach (DriveInfo drive in DriveInfo.GetDrives()) { if (drive.IsReady) DriveCollection.Add(drive); }
Is there a way to give each drive a second to respond, and cleanly kill the thread, while keeping UI responsive at all times?

How to fix 'TypeError: can't pickle _thread.lock objects' when passing a Queue to a thread in a child process
I've been stuck on this issue all day, and I have not been able to find any solutions relating to what I am trying to accomplish.
I am trying to pass Queues to threads spawned in subprocesses. The Queues were created in the entrance file and passed to each subprocess as a parameter.
I am making a modular program to a) run a neural network b) automatically update the network models when needed c) log events/images from the neural network to the servers. My former program idolized only one CPUcore running multiple threads and was getting quite slow, so I decided I needed to subprocess certain parts of the program so they can run in their own memory spaces to their fullest potential.
Subprocess:
 ClientServer communication
 Webcam control and image processing
 Inferencing for the neural networks (there are 2 neural networks with their own process each)
4 total subprocesses.
As I develop, I need to communicate across each process so they are all on the same page with events from the servers and whatnot. So Queue would be the best option as far as I can tell.
(Clarify: 'Queue' from the 'multiprocessing' module, NOT the 'queue' module)
~~ However ~~
Each of these subprocesses spawn their own thread(s). For example, the 1st subprocess will spawn multiple threads: One thread per Queue to listen to the events from the different servers and hand them to different areas of the program; one thread to listen to the Queue receiving images from one of the neural networks; one thread to listen to the Queue receiving live images from the webcam; and one thread to listen to the Queue receiving the output from the other neural network.
I can pass the Queues to the subprocesses without issue and can use them effectively. However, when I try to pass them to the threads within each subprocess, I get the above error.
I am fairly new to multiprocessing; however, the methodology behind it looks to be relatively the same as threads except for the shared memory space and GIL.
This is from Main.py; the program entrance.
from lib.client import Client, Image from multiprocessing import Queue, Process class Main(): def __init__(self, server): self.KILLQ = Queue() self.CAMERAQ = Queue() self.CLIENT = Client((server, 2005), self.KILLQ, self.CAMERAQ) self.CLIENT_PROCESS = Process(target=self.CLIENT.do, daemon=True) self.CLIENT_PROCESS.start() if __name__ == '__main__': m = Main('127.0.0.1') while True: m.KILLQ.put("Hello world")
And this is from client.py (in a folder called lib)
class Client(): def __init__(self, connection, killq, cameraq): self.TCP_IP = connection[0] self.TCP_PORT = connection[1] self.CAMERAQ = cameraq self.KILLQ = killq self.BUFFERSIZE = 1024 self.HOSTNAME = socket.gethostname() self.ATTEMPTS = 0 self.SHUTDOWN = False self.START_CONNECTION = MakeConnection((self.TCP_IP, self.TCP_PORT)) # self.KILLQ_THREAD = Thread(target=self._listen, args=(self.KILLQ,), daemon=True) # self.KILLQ_THREAD.start() def do(self): # The function ran as the subprocess from Main.py print(self.KILLQ.get()) def _listen(self, q): # This is threaded multiple times listening to each Queue (as 'q' that is passed when the thread is created) while True: print(self.q.get())
# self.KILLQ_THREAD = Thread(target=self._listen, args=(self.KILLQ,), daemon=True)
This is where the error is thrown. If I leave this line commented, the program runs fine. I can read from the queue in this subprocess without issue (i.e. the function 'do') not in a thread under this subprocess (i.e. the function '_listen').
I need to be able to communicate across each process so they can be in step with the main program (i.e. in the case of a neural network model update, the inference subprocess needs to shut down so the model can be updated without causing errors).
Any help with this would be great!
I am also very open to other methods of communication that would work as well. In the event that you believe a better communication process would work; it would need to be fast enough to support realtime streaming of 4k images sent to the server from the camera.
Thank you very much for your time! :)

Parameters for Least Squares program not changing
I am trying to create a program that can approximate the parameters for a exponential function of the form
y = ae^bx
, that fits any given data set, I looked up and tried to understand and use the MarquardtLevenberg Algorithm, although in my implementation, the parameters at each iteration don't seem to be changing. Can anyone help point out what is wrong with my implementation, or do I simply not fully understand the algorithm?public class regression { boolean exponential; matrix data, estimate, gradient, hessian, diagHessian; double scalar; public regression(boolean exponential, double[][] data) { this.exponential = exponential; this.data = new matrix(data.length, data[0].length); this.data.copy(data); } public void approximate() { this.findDerivative(this.exponential); for (int i = 0; i < 10000; i++) { double s = this.evaluateError(); double x = this.currentError(); //if new error is smaller than old error if (s < x) { this.estimate = this.evaluate(); reload(); this.scalar = this.scalar / 10; } else { this.scalar = this.scalar * 10.0; } } } public matrix evaluate() { matrix a = new matrix(2, 2); matrix b = new matrix(2, 2); b = this.diagHessian; b.times(this.scalar); a = this.hessian.subtract(b); matrix r = a.inverse(); matrix x = this.gradient; matrix d = r.multiply(x); matrix y = this.estimate.subtract(d); return y; } public double currentError() { double error = 0.0; double w = this.estimate.arrayM[0][0]; double z = this.estimate.arrayM[1][0]; for (int i = 0; i < this.data.size; i++) { error += (this.data.arrayM[i][1]  (w * java.lang.Math.exp(z * this.data.arrayM[i][0]))) * (this.data.arrayM[i] [1]  (w * java.lang.Math.exp(z * this.data.arrayM[i][0]))); } return error * 0.5; } //evaluate the estimate matrix at the next iteration and //return a new matrix not effecting the current iteration public double evaluateError() { matrix a = new matrix(2, 2); matrix b = new matrix(2, 2); b = this.diagHessian; b.times(this.scalar); a = this.hessian.subtract(b); matrix r = a.inverse(); matrix x = this.gradient; matrix d = r.multiply(x); matrix y = this.estimate.subtract(d); double error = 0.0; double w = y.arrayM[0][0]; double z = y.arrayM[1][0]; for (int i = 0; i < this.data.size; i++) { error += (this.data.arrayM[i][1]  (w * java.lang.Math.exp(z * this.data.arrayM[i][0]))) * (this.data.arrayM[i] [1]  (w * java.lang.Math.exp(z * this.data.arrayM[i][0]))); } return error * 0.5; } public static matrix transpose(matrix b) { matrix a = new matrix(b.sizeX, b.size); for (int i = 0; i < b.size; i++) { for (int y = 0; y < b.sizeX; y++) { a.arrayM[y][i] = b.arrayM[i][y]; } } return a; } public void reload() { this.calcGradient(); this.calcHessian(); this.calcDiag(); } public void findDerivative(boolean expo) { if (expo) { //create estimate this.estimate = new matrix(2, 1); double a = 40000; double b = 0.00012; this.estimate.arrayM[0][0] = a; this.estimate.arrayM[1][0] = b; this.gradient = new matrix(2, 1); this.calcGradient(); this.hessian = new matrix(2, 2); this.calcHessian(); this.diagHessian = new matrix(2, 2); this.calcDiag(); this.scalar = 0.5; } } public void calcDiag() { for (int i = 0; i < this.hessian.size; i++) { this.diagHessian.arrayM[i][i] = 1.0; } } public void calcGradient() { if (this.exponential) { double a = this.estimate.arrayM[0][0]; double b = this.estimate.arrayM[1][0]; double s = 0.0; for (int i = 0; i < this.data.size; i++) { s += (this.data.arrayM[i][1]  a * java.lang.Math.exp(b * data.arrayM[i][0])) * (java.lang.Math.exp(b * data.arrayM[i][0])); } this.gradient.arrayM[0][0] = s; s = 0.0; for (int i = 0; i < this.data.size; i++) { s += (this.data.arrayM[i][1]  a * java.lang.Math.exp(b * data.arrayM[i][0])) * (this.data.arrayM[i][0] * a * java.lang.Math.exp(b * data.arrayM[i][0])); } this.gradient.arrayM[1][0] = s; } } public void calcHessian() { if (this.exponential) { double a = this.estimate.arrayM[0][0]; double b = this.estimate.arrayM[1][0]; double s = 0.0; matrix jacobian = new matrix(this.data.size, 2); for (int i = 0; i < jacobian.size; i++) { jacobian.arrayM[i][0] = Math.exp(b * this.data.arrayM[i][0]); jacobian.arrayM[i][1] = a * this.data.arrayM[i[0] * Math.exp(b * this.data.arrayM[i][0]); } matrix c = transpose(jacobian); matrix d = c.multiply(jacobian); this.hessian = d; } } }

How to pass an array of input parameters in scipy.optimize.minimize?
I want to use scipy.optimize.minimize to solve for a set of parameters by minimizing an error function.
The function called "error" returns the squared error for the function that I am trying to find z1,z2,z3(the parameters) for.
I have an array of x(called "b" in the function) and y(called "real" in the function) values.
The code below works fine if I set x and y to some integer, but not if I try to pass in an array of x and y values, to act as the variable "b" and "real" in the equation to be minimized.
Trying to pass in an array of X and Y values results in the error pasted below.
Is there a way to pass in arrays to act as a variable in an equation for the minimize function, instead of just a single integer?
Here is what my code looks like:
import numpy as np import pandas as pd from scipy.optimize import minimize #dataset file, with a column named x,y with 1000 rows f = pd.read_csv('Test_Data_.txt', sep='\t') #initial guess x0 = [1, 2, 3] #f['x'] and f['y'] are columns with 1000 rows #x = f['x'].values #y = f['y'].values x = 1 #these parameters work fine y = 4 #a function called inside the function to be minimized def est(z1, z2, z3, b): return z1 * b**2 + z2 * b + z3 #function to minimize def error(x, real, b): return (real  est(x[0], x[1], x[2], b))**2 print(minimize(error, x0, args = ( x, y), method='BFGS', tol=1e6))
Feeding in the array of x and y values produces the error:
Traceback (most recent call last): File "problem1.py", line 24, in <module> minimize(error, x0, args = ( np.array(list(f['y'].values)), np.array(list(f['x'].values))), method='BFGS', tol=1e6) File "/usr/local/lib/python3.5/distpackages/scipy/optimize/_minimize.py", line 595, in minimize return _minimize_bfgs(fun, x0, args, jac, callback, **options) File "/usr/local/lib/python3.5/distpackages/scipy/optimize/optimize.py", line 970, in _minimize_bfgs gfk = myfprime(x0) File "/usr/local/lib/python3.5/distpackages/scipy/optimize/optimize.py", line 300, in function_wrapper return function(*(wrapper_args + args)) File "/usr/local/lib/python3.5/distpackages/scipy/optimize/optimize.py", line 730, in approx_fprime return _approx_fprime_helper(xk, f, epsilon, args=args) File "/usr/local/lib/python3.5/distpackages/scipy/optimize/optimize.py", line 670, in _approx_fprime_helper grad[k] = (f(*((xk + d,) + args))  f0) / d[k] ValueError: setting an array element with a sequence.

How to speed up nsolve or the bisection method?
I am writing a program that requires a root finder of some sort, but every root finder I have used is unsatisfactorily slow. I'm looking for a way to speed this up.
I have used the SymPy's nsolve, and although this produces very precise results, it is very slow (if I do 12 iterations of my program it takes 12+ hours to run). I wrote my own bisection method, and this works much better, but is still very slow (12 iterations takes ~ 1 hour to run). I have been unable to find a symengine solver, or that is what I would be using. I will post both of my programs (with the bisection method and with nsolve). Any advice on how to speed this up is greatly appreciated.
Here is the code using nsolve:
from symengine import * import sympy from sympy import Matrix from sympy import nsolve trial = Matrix() r, E1, E = symbols('r, E1, E') H11, H22, H12, H21 = symbols("H11, H22, H12, H21") S11, S22, S12, S21 = symbols("S11, S22, S12, S21") low = 0 high = oo integrate = lambda *args: sympy.N(sympy.integrate(*args)) quadratic_expression = (H11E1*S11)*(H22E1*S22)(H12E1*S12)*(H21E1*S21) general_solution = sympify(sympy.solve(quadratic_expression, E1)[0]) def solve_quadratic(**kwargs): return general_solution.subs(kwargs) def H(fun): return fun.diff(r, 2)/2  fun.diff(r)/r  fun/r psi0 = exp(3*r/2) trial = trial.row_insert(0, Matrix([psi0])) I1 = integrate(4*pi*(r**2)*psi0*H(psi0), (r, low, high)) I2 = integrate(4*pi*(r**2)*psi0**2, (r, low, high)) E0 = I1/I2 print(E0) for x in range(10): f1 = psi0 f2 = r * (H(psi0)E0*psi0) Hf1 = H(f1).simplify() Hf2 = H(f2).simplify() H11 = integrate(4*pi*(r**2)*f1*Hf1, (r, low, high)) H12 = integrate(4*pi*(r**2)*f1*Hf2, (r, low, high)) H21 = integrate(4*pi*(r**2)*f2*Hf1, (r, low, high)) H22 = integrate(4*pi*(r**2)*f2*Hf2, (r, low, high)) S11 = integrate(4*pi*(r**2)*f1**2, (r, low, high)) S12 = integrate(4*pi*(r**2)*f1*f2, (r, low, high)) S21 = S12 S22 = integrate(4*pi*(r**2)*f2**2, (r, low, high)) E0 = solve_quadratic( H11=H11, H22=H22, H12=H12, H21=H21, S11=S11, S22=S22, S12=S12, S21=S21, ) print(E0) C = (H11  E0*S11)/(H12  E0*S12) psi0 = (f1 + C*f2).simplify() trial = trial.row_insert(x+1, Matrix([[psi0]])) # Free ICI Part h = zeros(x+2, x+2) HS = zeros(x+2, 1) S = zeros(x+2, x+2) for s in range(x+2): HS[s] = H(trial[s]).simplify() for i in range(x+2): for j in range(x+2): h[i, j] = integrate(4*pi*(r**2)*trial[i]*HS[j], (r, low, high)) for i in range(x+2): for j in range(x+2): S[i, j] = integrate(4*pi*(r**2)*trial[i]*trial[j], (r, low, high)) m = h  E*S eqn = m.det() roots = nsolve(eqn, float(E0)) print(roots)
Here is the code using my bisection method:
from symengine import * import sympy from sympy import Matrix from sympy import nsolve trial = Matrix() r, E1, E = symbols('r, E1, E') H11, H22, H12, H21 = symbols("H11, H22, H12, H21") S11, S22, S12, S21 = symbols("S11, S22, S12, S21") low = 0 high = oo integrate = lambda *args: sympy.N(sympy.integrate(*args)) quadratic_expression = (H11E1*S11)*(H22E1*S22)(H12E1*S12)*(H21E1*S21) general_solution = sympify(sympy.solve(quadratic_expression, E1)[0]) def solve_quadratic(**kwargs): return general_solution.subs(kwargs) def bisection(fun, a, b, tol): NMax = 100000 f = Lambdify(E, fun) FA = f(a) for n in range(NMax): p = (b+a)/2 FP = f(p) if FP == 0 or abs(ba)/2 < tol: return p if FA*FP > 0: a = p FA = FP else: b = p print("Failed to converge to desired tolerance") def H(fun): return fun.diff(r, 2)/2  fun.diff(r)/r  fun/r psi0 = exp(3*r/2) trial = trial.row_insert(0, Matrix([psi0])) I1 = integrate(4*pi*(r**2)*psi0*H(psi0), (r, low, high)) I2 = integrate(4*pi*(r**2)*psi0**2, (r, low, high)) E0 = I1/I2 print(E0) for x in range(11): f1 = psi0 f2 = r * (H(psi0)E0*psi0) Hf1 = H(f1).simplify() Hf2 = H(f2).simplify() H11 = integrate(4*pi*(r**2)*f1*Hf1, (r, low, high)) H12 = integrate(4*pi*(r**2)*f1*Hf2, (r, low, high)) H21 = integrate(4*pi*(r**2)*f2*Hf1, (r, low, high)) H22 = integrate(4*pi*(r**2)*f2*Hf2, (r, low, high)) S11 = integrate(4*pi*(r**2)*f1**2, (r, low, high)) S12 = integrate(4*pi*(r**2)*f1*f2, (r, low, high)) S21 = S12 S22 = integrate(4*pi*(r**2)*f2**2, (r, low, high)) E0 = solve_quadratic( H11=H11, H22=H22, H12=H12, H21=H21, S11=S11, S22=S22, S12=S12, S21=S21, ) print(E0) C = (H11  E0*S11)/(H12  E0*S12) psi0 = (f1 + C*f2).simplify() trial = trial.row_insert(x+1, Matrix([[psi0]])) # Free ICI Part h = zeros(x+2, x+2) HS = zeros(x+2, 1) S = zeros(x+2, x+2) for s in range(x+2): HS[s] = H(trial[s]).simplify() for i in range(x+2): for j in range(x+2): h[i, j] = integrate(4*pi*(r**2)*trial[i]*HS[j], (r, low, high)) for i in range(x+2): for j in range(x+2): S[i, j] = integrate(4*pi*(r**2)*trial[i]*trial[j], (r, low, high)) m = h  E*S eqn = m.det() roots = bisection(eqn, E0  1, E0, 10**(15)) print(roots)
As I said, they both work as they are supposed to, but they do so very slowly.

What is the easiest way to parallelize a recursive code in Cython
Consider a recursive code in Cython of the following generic form:
cpdef function(list L1, list L2): global R cdef int i,n #... cdef list LL1,LL2 #... # ... # core of the code # ... n= #... for i in range(n): LL1= #... LL2= #... function(LL1,LL2)
Question: What is the easiest way to parallelize it?
I tried to precede it by
from cython.parallel import prange
and then to replacerange(n)
byprange(n)
but I got the error:prange() can only be used without the GIL
Then I replaced
prange(n)
byprange(n,nogil=True)
but I got many errors like:Assignment of Python object not allowed without gil Coercion from Python not allowed without the GIL Indexing Python object not allowed without gil Calling gilrequiring function not allowed without gil
Below is the relevant code I want to parallelize:
cpdef SmithFormIntegralPointsSuperFiltred(list L, list LL, list co, list A): global R,clp cdef int i,j,k,l,ll,p,a,c,cc,rc,m,f,b,z,zz,lp,s,la,kk,ccc,zo,jj,lM cdef list LB,S,P,CP,F,cco,PP,PPP,coo,V,LLP,LLPO,Mi,M m=10000 l=len(L) ll=len(LL) la=len(A[0]) z=0 zz=0 P=[] for i in range(l): if L[i]==1: P.append(i) lp=len(P) if lp<clp: print([lp,L]) clp=lp if lp==0: F=list(matrix(LL)*vector(L)) b=0 for f in F: if f<0: b=1 break if b==0: R.append(F); print(L) if lp>0: PP=[m for j in range(lp)] PPP=[[] for j in range(lp)] for i in range(ll): a=0 for j in P: if LL[i][j]>0: a+=1 if a==2: break if a<=1: CP=list(set(range(l))set(P)) c=sum([LL[i][j]*L[j] for j in CP]) if a==0 and c<0: z=1 break if a==1 or (a==0 and c>=0): LLPO=[LL[i][P[k]] for k in range(lp)] for j in range(lp): LLP=LLPO[:] cc=LLP[j] if cc<>0: del LLP[j] if LLP==[0 for k in range(lp1)]: PPP[j].append(i) zz=1 if cc>0: rc=c/cc if rc<PP[j]: PP[j]=rc if z==0 and zz==1: zo=0 for i in range(lp): Mi=[] if PPP[i]<>[]: for j in range(PP[i]+1): ccc=0 coo=copy.deepcopy(co) for k in PPP[i]: s=sum([LL[k][kk]*L[kk] for kk in range(l)])+(j+1)*LL[k][P[i]] V=A[k] for kk in range(la): if V[kk]<>0: if s>=0 and coo[kk][V[kk]]>=s: coo[kk][V[kk]]=s else: ccc=1 break if ccc==1: break if ccc==0: Mi.append(j) if len(Mi)<m: zo=1 m=len(Mi) M=Mi p=i if zo==1: M.reverse() lM=len(M) for jj in range(lM): j=M[jj] cco=copy.deepcopy(co) for k in PPP[p]: s=sum([LL[k][kk]*L[kk] for kk in range(l)])+(j+1)*LL[k][P[p]] V=A[k] for kk in range(la): if V[kk]<>0: cco[kk][V[kk]]=s LB=L[:] LB[P[p]]=j SmithFormIntegralPointsSuperFiltred(LB,LL,cco,A)
The global variables R and clp are not essential, I can manage without global variable if necessary.

Speedup python code for graph compression with numba
I'm trying to speedup the python code i'm currently on. The code aims to compress a social graph. The code sample works as intended on my data the main issue being the time it takes to do so. Without any parallelization the code takes around 4000s on Slashdot0902 (https://snap.stanford.edu/data/socSlashdot0902.html) and 3800s on socepinions1 (https://snap.stanford.edu/data/socEpinions1.html) to produce a compressed representation which is much higher than intended.
I profiled the code and most of the time as expected is spent in the main compression body. I'm looking for ways to parallelize this code i.e essentially running the compression algorithm fofr each element simultaneously (Compression between different elements is totally unrelated).
Trying to achieve this speedup using a multiprocess setup wouldn't achieve the intended speed boost. I've tried to achieve parallelization using numba unsuccesfully. I understand that me trying to pass the matrix, a 2d array as input isn't how its supposed to be done but i'm stuck at this point.
Here's a sample of the code. (Full code here https://pastebin.com/8yihNs2Y)
# Import everything #Load adjacency matrix into into a variable called matrix # Creates a list to store the final output values before writing into a file list1 = [] # Finds the parent of current node @cuda.jit(device=True) def parent(index): return int((index1) / 2) # Finds the sibling of current node left or right sibling based on index @cuda.jit(device=True) def sibling(index): if(index % 2 == 1): return index+1 else: return index1 len_row = matrix.shape[0] # Find the height of the binary tree and use it to find n i.e the no.of elements in the array height = int(math.log2(len_row)) + 1 n = (2 ** height)  1 start_index = nlen_row temp_array = np.full(n, 1, dtype=np.int8) @vectorize(['int32(int32,int32,int32,int32,int32)'], target='cuda') def compress(input_array, temp_array, n, start_index, len_row): for i in range(len_row): if(input_array[i] == 1): current_index = start_index+i dcn_reached = False while dcn_reached == False: temp_array[current_index] = 1 if(temp_array[parent(current_index)] != 1): temp_array[parent(current_index)] = 1 temp_array[sibling(current_index)] = 0 current_index = parent(current_index) else: dcn_reached = True return temp_array list1.append(compress(matrix, temp_array, n, start_index, len_row))
I would expect a list with the temp_array values if it works currectly. I would then have to manipulate the temp_array to obtain final array as
np.where(temp_array != 1)
wouldn't work in the gpu.This is the main part of the non parallelized code (Full code here https://pastebin.com/BNvVUzV3) (Note to get the code working you may need to change the
sep=' '
in thepd.read_csv
based on the file)# Load edgelist into a 2d list called matrix list1 = [] # Two functions parent and sibling to return the index of parent and sibling node in a binary tree start = time.time() len_row = matrix.shape[0] # Find the height of the binary tree and use it to find n i.e the no.of elements in the array height = int(math.log2(len_row)) + 1 n = (2 ** height)  1 # Loop for lal the elements in row i of the matrix to generate compressed format of that row for i in range(len_row): print("Element "+str(i)) # Input_array = i'th row of matrix array input_array = matrix[i] # Initilaize temp array with 1 values temp_array = np.full(n, 1, dtype=np.int8) start_index = nlen(input_array) for i in range(len_row): if(input_array[i] == 1): current_index = start_index+i dcn_reached = False while dcn_reached == False: temp_array[current_index] = 1 if(temp_array[parent(current_index)] != 1): temp_array[parent(current_index)] = 1 temp_array[sibling(current_index)] = 0 current_index = parent(current_index) else: dcn_reached = True i = np.where(temp_array != 1) output = temp_array[i] list1.append(output) # Pass this list through bz2 compression and pickle dump it to a bin file
I would like to achieve atleast one of the following
 Get the numba code working
 Suggestions for any other alternative aproaches to parallelize the above code
 Suggestions for further optimizations

Future multiprocess parallel in Windows
I'm trying to implement a parallel process in order to transform a raster dem into terrain products like aspect, slope, etc. I doing so using future with the following code:
dem = raster("./dem/dem.asc") output = "./output/" crs(dem) < epsg plan(multiprocess, workers = availableCores()1, gc = TRUE) f1 %<% terrain(dem, filename = paste0(output,"01_slope.asc"), opt = "slope", unit = 'degrees', neighbors = 8) f2 %<% terrain(dem, filename = paste0(output,"02_aspect.asc"), opt = "aspect", unit = 'degrees', neighbors = 8) f1; f2
The processes start in parallel but it makes something weird: it produces both files, at the same time and each with his own name but, both files are exactly the same (in this case, both raster are slope rasters). What I am doing wrong ?

Tensorflow: Explicitly running different tasks in different processes
I have a array A on which I have to do some processing (array elements are independent of each other). I want to do something like:
session.run(A[0:10]); session.run(A[10:20]); session(A[20:30]) ...
However when I try to do this using
multiprocessing
the session hand when initializing tf variables. I tried creating different sessions within the process, but that didn't help either.Considering that the array elements are independent of each other, having this in parallel would really speed up the process.