Why is my function that uses multiprocessing/multi threading so slow when used in a for loop, but not outside of it?

I'm using multiprocessing and it's very quick on a single large number, but when i try to use it in a for loop, with small numbers, it takes 5 seconds to test just 1009 small numbers. But is magnificently fast for a single number outside a for loop. Is there any way to speed up the multiprocessing when a function that uses multiprocessing is in a for loop? I tried multi-threading and had the same issue, so i'm looking for some expert advice here. I included my code and the timings below for my multiprocessing version, but i can include my multi-threading version as well if needed:

import gmpy2 
import time
 
sinn = 2110229697309202254897383305762150945330987087513434511395506048950594976569434432057019507105035289374307720719984431280856161609820548842778454256113246763860786119268583367543952735347969627478873317341364209555365064365565504232770227619462128918701942169785585423104678142850200975026619010035331023744330713985615650556129731348659986462960062760308034462660525448390420668021248422741300646552941285862310410598374242189448623917196191138254637812716211329113836605859918549332304189053950819346551095911511755911832183789503704294770046935064469435830299623205136625543859303686699678929069468518950480476841246805908501510754550017255944080874819287974625925494008373883250410775902993163965873632474224574883242826458163446781002284368017611606202344050570737818087202137703099075773680753707346415849787963446390136517016131227807076254668461445862154978026041507116570585784569893773262639243954090283224759975513502582494002154146757110676408972377044584495342170277522887809749465855954126593100747444378301829661568735873345178089061677917127496915956539418931430313218084338374827152407795095072639044306222222695685778907958272820576498682506540189586657786292950574081739269257159839589987847266550007783514316481286222515710538845836151864127815058116482680058626451349913138908040817800742009650450811565324184631847563730941344941348929727603343965091116543702880556850922077216848669966268219928808236163268726995495688157209747596437162960244538054993785127947211290438554095851924381172697827312534174244295581184309147813790451951453564726742200569263225639113681905176376701339808868274637448606821696026703034737428319530072483125495383057919894902076566679023694181381398377144302767983385253577700652358431959604517728821603076762965129019244904679015099154368058005173028200266632883632953133017055122970338782493475762548347258351148037427739052271661340801912188203749647918379812483260399614599813650518046331670764766419886619324840045611486524123102046413946014624119568013100078163986683199814025915420877588778260860713148420321896163326473203441644820182490479899368048072263481024886708136521847014624735722333931331098969321911443978386868675912141648200500219168920887757573018380579532261231821382787339600631297820996466930957801607217549420247654458172818940238337170577825003408756362106088558651381993611741503374243481167926898332728164900189941804942580426055589622673679047058619682175301326905577843405270203660160407401675700528981573327582844330828861745574031416926871562443652858767649050943181353635950301154441954046214987718582670685455252774874198771086552440702483933126644594300464549471422237478151976561680719370424626162642534252062987911763456822609569209140676822858933588602318066530038691463577379331113471591913447226829868760176810195567325921301390329055242213842898142597360121925124635965685365925901913816717677946911762631634793638450106377437599347740569467683272089859392249351406815344105961234868327316964137925419770514177021722214309784062017826024217906664090209434553785436385765927274067126192143337589109608949427467825999057058702263715338956534536892852849984934736685814891286495169007648767081688963426768409476169071460997622740467533572971356017575900999100928776382541052696124463195981888715845688808970103527288822088031150716134784735332326775370417950625124642515148342694377095213470544739900830244879573205335578256682901821773047071352497997708791157012233232529777513203024818391621220967964874173106990772425289900446640237659116713251437567138729645677868024033209183367071421651937808005637679844370347367922676824239404492688418047080583797577102267329067247758368597488680401670673861120323439239792549053895366970423259196919428554146265587250617656401028722578111927104663315250291888502226235291264834968061065817079511872899991276288365723969841290984981957389126603952133124328219936785870274843554107325931034103072894378438818494802517594594270034007832922248742746517915210656205746338575621725899098414488628833412591266637224507533934158213117522503993423240638893845121918647788013

 
def ffs(x): 
    """Returns the index, counting from 0, of the 
    least significant set bit in `x`. 
    """ 
    return (x&-x).bit_length()-1 
    
    
 
def MillerRabin(arglist):  
  N = arglist[0] 
  primetest = arglist[1] 
  iterx = arglist[2] 
  powx = arglist[3] 
  withstats = arglist[4] 
  primetest = gmpy2.powmod(primetest, powx, N)  
  if withstats == True: 
     print("first: ", primetest)  
  if primetest == 1 or primetest == N - 1:  
    return True  
  else:  
    for x in range(0, iterx):  
       primetest = gmpy2.powmod(primetest, 2, N)  
       if withstats == True: 
          print("else: ", primetest)  
       if primetest == N - 1: return True  

       if primetest == 1: return False  
  return False  
    
   
def sfactorint_isprime(N, withstats=False): 
 
    N = gmpy2.mpz(N) 
    from multiprocessing import Pool 
 
    if N <= 1: return False 
    if N == 2: 
      return True 
    if N % 2 == 0: 
      return False 
    if N < 2: 
        return False 
         
    # Add Trial Factoring here to speed up smaller factored number testing 
 
     
    iterx = ffs(N-1) 
    """ This k test is an algorithmic test builder instead of using 
        random numbers. The offset of k, from -2 to +2 produces pow tests 
        that fail or pass instead of having to use random numbers and more 
        iterations. All you need are those 5 numbers from k to get a  
        primality answer.  
    """ 
    k = pow(N, -1, 1<<N.bit_length()) - 1 
    t = N >> iterx 
    tests = [k-2, k-1, k, k+1, k+2] 
     
    for primetest in range(len(tests)): 
      if tests[primetest] >= N: 
         tests[primetest] %= N 
   
    arglist = [] 
    for primetest in range(len(tests)): 
      if tests[primetest] >= 2: 
        arglist.append([N, tests[primetest], iterx, t, withstats]) 
      
    with Pool(5) as p: 
       s=p.map(MillerRabin, arglist)     
     
    if s.count(True) == len(arglist): return True 
    else: return False 
     
    return s 
    
start = time.time() 
xx = sfactorint_isprime(sinn) 
end = time.time() 
print (end-start)                                                                                                                                      

0.551032543182373


start = time.time() 
for x in range(1, 1009, 2): 
  xx = sfactorint_isprime(x) 
end = time.time() 
print (end-start)  
                                                                                                                                    
5.828385353088379

1 answer

  • answered 2021-04-14 16:47 Jérôme Richard

    This line is very expensive: with Pool(5) as p:.

    Indeed, it forks the Python process in several other ones (5 workers here). This system-side call (fork or CreateProcess) is expensive because the OS as to manage the memory of each newly-created process, the scheduling of those processes on the available cores, the (shared) hardware resources, etc. Not to mention the processes are also joined and destroyed at the end of the with scope (this is also expensive). Because this line is executed in a loop of about 500 iterations, it results in the creation of roughly 2500 processes. This should be much more costly than the actual computation. You should try to create processes and then share all the work only once. If this is not possible, you can use inter-process communication to do that. Here, it should be much better to parallelize the outer loop in the second case (for x in range(1, 1009, 2)). Note that Python is not the best tool for parallel computation (this is cumbersome to do and eventually not very efficient).

    On mainstream Windows platforms, an additional very-unexpected behaviour happens: the default included Antivirus software analyzes each newly created process to check if they are malicious. This deep analysis is very expensive and totally useless in this specific case. Not to mention that creating a new process is already expensive on Windows.