How does this largest prime factor finding algorithm work?
I saw this YouTube video online where this guy finds the largest prime factor of a number using a seemingly simple approach, but I don't quite understand the math of it. Here's the link https://m.youtube.com/watch?v=5kv9q7qgvlI The code 
n=1234 # Some number whose largest factor I want to find
i=2 # It seems that he tries to start from the smallest prime number
while i**2<n: # I don't understand this part where he checks if the square of variable is less than the target number
while n%i==0: # I know this checks if n is divisible by i
n/=i
i+=1 # increments i's value
print(n)
I know that this code works, but why? I get the last two lines, but why is it necessary to check if the square of variable i is less than n?
1 answer

If your number
N
is divisible byI
(in other terms,I
being a factor ofN
), andI
is greater thansqrt(N)
, then there must be another factor ofN
, call itJ = N/I
, being smaller thansqrt(N)
.If you exhaustively searched all potential factors up to
I
, you must have already foundJ
, so you covered that factorization earlier.We are looking for the largest prime factor, so the question remains whether the final
N
when terminating the loop is a prime number.Whenever we encountered a factor of
N
, we dividedN
by this factor, so when considering a potential factorI
, we can be sure that the currentN
won't have any factors belowI
any more.Can the final
N
when terminating the loop be composed from more than one factor (i.e. not being prime)?No.
If N isn't prime, it must be composed from
K
andL
(N = K * L
), andK
andL
must be both bigger thansqrt(N)
, otherwise we would have divided byK
orL
in an earlier step. But multiplying two numbers, each bigger thansqrt(N)
, will always be bigger thanN
, violatingN = K * L
.So, assuming that the final
N
wasn't prime, runs into a contradiction, and thus we can be sure that the finalN
is a factor of the originalN
, and that this factor is indeed a prime number.
See also questions close to this topic

Python gets content by requests_html but meets error 412
I'm trying to get html from urls by requests_html in Python. But I just only can get error 412 from any one of those urls, is anyone familiar with this pip plugin, could you help me to analyze this issue?
I paste code here:
from requests_html import HTMLSession urls = ['http://www.cdtf.gov.cn/cdtf/c130542/201912/27/content_df16d2c291984ad6923f624ef6de4fbe.shtml', 'http://www.cdtf.gov.cn/cdtf/c130542/201912/25/content_86cb4d2a82d74814af08e25444cdf2ca.shtml', 'http://www.cdtf.gov.cn/cdtf/c130542/201912/06/content_4dedabd7d30b41ae8d5fa7a484749bd5.shtml', 'http://www.cdtf.gov.cn/cdtf/c130542/201912/06/content_2c959180b33f4ab0aa32602908dd816a.shtml'] header = { 'Accept': 'image/webp,*/*', 'AcceptEncoding': 'gzip, deflate', 'AcceptLanguage': 'enUS,en;q=0.7,zhCN;q=0.3', 'Connection': 'keepalive', 'Cookie': '''azSsQE5NvspcS=5tsIGOySQFij6ZSSdFddJGIJ7ogz9h64pMGRtQMJ0zpKwtqYGN4f9ZLb2qTna9OQbHdxIH_6NyFFXWjZd0leedA; Path=/; expires=Thu, 23 May 2030 09:13:13 GMT; HttpOnly''', 'Host': 'www.cdtf.gov.cn', 'UserAgent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.96 Safari/537.36', } def main(): session = HTMLSession() for url in urls: print('Url: %s' % url) response = session.get(url, headers=header) print('Status code: %s' % response.status_code) if response.status_code == 200: response.html.render() elements = response.html.find() if elements: for element in elements: print(element.html) print('*** DONE ***') if __name__ == "__main__": main()
And I get headers from browser:
Did I miss something? Any help would be very appreciate!

since we are joining to ('. ') an string with dot and a space then we must get . first but why is the capitalized letter at first
Q)Write code that uses the string stored in sent and creates an acronym which is assigned to the variable acro. The first two letters of each word should be used, each letter in the acronym should be a capital letter, and each element of the acronym should be separated by a “. ” (dot and space). Words that should not be included in the acronym are stored in the list stopwords. For example, if sent was assigned the string
“height and ewok wonder”
then the resulting acronym should be“HE. EW. WO”
.stopwords = ['to', 'a', 'for', 'by', 'an', 'am', 'the', 'so', 'it', 'and', 'The'] sent = "The water earth and air are vital"
Answer:
acro = '' acro = '. '.join(word[:2].upper() for word in sent.split() if word not in stopwords) print(acro)

Vectorised solution to fill rows of a column based on value from previous row in Python
Dataframe:
df = pd.DataFrame({"X":[1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]}) print(df) X 0 1.0 1 NaN 2 NaN 3 NaN 4 NaN 5 NaN
I want to populate np.nan by taking squares of values from previous row and adding one to it.
Desired output:
X 0 1.0 1 2.0 2 5.0 3 26.0 4 677.0 5 458330.0
This can be done by a for loop with:
for i in range(1,len(df)): df["X"].iloc[i] = ((df["X"].iloc[i1]) ** 2) + 1
But looking for a vectorised solution of the same problem

algorithms to check if a binary tree is balanced or not in Python
I did learn how to balance a binary tree. I referred to the following sites:
Check if a binary tree is balanced with iterative function?
But I wonder how to write python code to iteratively check if a tree is balanced.

Data structure / algorithm design for insertion
Suppose you have a line object that holds a value and a length. How to design a data structure such that whenever you insert a new line object, it will not only store it but also return the sum of values of every line object that has a shorter length than the new line object. The time complexity for each insertion should be O(logn). The overall time complexity can be O(nlogn).
I have tried to use segment tree but am a bit confused about how to adjust it to solve my issue. Since I think the segment tree itself doesn't really help me to filter only line objects that have shorter length than the new inserted line object.

Calculating dominance frontier
I'm having some troubles in this algorithm to calculate the dominance frontier of a node.
This is the algorithm:
for each node b if  preds(b) ≥ 2 for each p ∈ preds(b) r = p while r != IDom(b) add b to DomFront(r) r = IDom(r)
Suppose to have this graph where the entry node is 1.
The first question: 1 has some predecessors? I think no because it is the root of my graph and the root has not predecessors.
But, what about 2? The predecessors that I must consider are {1, 4, 7} or only 1? Because, if I consider {1, 4, 7} as predecessors I enter in the
if preds(2) >= 2
otherwise I do not enter in the if.So, the predecessors that I have to consider are also the nodes that have a "back edge" or only the "fathers" in a DFS?
Could you help me to resolve this doubt with some explanation? Thank you!

How to solve following system of equations
I've taken a few measurements of an LC circuit and I need to solve for both L and C based on that. How do I solve this?
2.675e6 = 1 / (2 * pi * sqrt(L * (C + 100e9)) 5.8e6 = 1 / (2 * pi * sqrt(L * C))

Machine learning algorithm when the order of two dataset matters
I can’t really explain in the title. But I need to find an algorithm that finds the relationship between two datasets. However, the order of datasets are important. It’s not a simple combination.
What I want to make is a language exchange matching app. The app needs to display matching percentage (= predicted satisfaction).
Person A and Person B matched and had a chat each other. Later, Person A provided feedback and his satisfaction with B was 90%, but Person B's satisfaction with A was only 70%.
Each person has several attributes such as age, language level, personality...
What I want to achieve is, given the person data, the system predicts the satisfaction. (E.g., If person A matched with Person C, A’s satisfaction will be 80%...)
What do you think is the best machine learning algorithm? I’m new to machine learning and don’t have much knowledge.

How to find sum of digit numbers in range 1 000 000 which is equal for instance 28
How to find sum of digit numbers in range 1 000 000 which is equal for instance 28

Why Python has no prime numbers utilities in standard library?
I think it's not only about Python. In many languages I have not found any default prime number generator or prime checker. Why e.g Python math module has no prime number methods? Or maybe there is somewhere, but I don't know it? If not, what is the reason to not implementing standard prime number generator or prime checking algorithms?

What is the fastest algorithm to find if a number is prime?
this is the extent to which i have optimised my code :
int isprime(int n) { if(n==2  n==3) return 1; if (n % 2 == 0  n % 3 == 0  n==1) return 0; for (int i = 1; (6 * i) + 1 <= n / 2; i++) if (n % ((6 * i)  1) == 0  n % ((6 * i) + 1) == 0) return 0; return 1; }
this algorithm skips all the checks with adjacent primes so makes it efficient than sqrt(n) solution.
how to solve it in O(1) time or lesser time than this algorithm.

Are there faster techiniques or shortcut for MOD'ing large numbers using pow(x,y,n)
I created the following to find primes in numbers by using powers of two to increase the size of the number until the gcd of the original number and and a larger number mod of the powers of two up to as high as we need to go have a gcd answer greater than 1. What i'm looking for is if a mathematician knows of any tricks to make this technique faster as it could be quite useful for finding prime factors. Any indications as to why or why not a shortcut doesn't exist would be helpful
def search_for_prime(hm): for x in range(2,500): answer = gcd_mod_find(hm**x, hm) if answer != 1: break return answer def gcd_mod_find(hm, find): prevcr = hm getcr = hm%(1<<(hm).bit_length()1) bitlength = hm.bit_length()//2 count=0 found = False while getcr != 0 and getcr != 1 and bitlength > count: count+=1 temp = hm%getcr prevcr = getcr getcr = temp answer = gcd(prevcr, find) if answer != 1: #print(f"{answer} found at {prevcr}") found = True break if found == True: return answer else: return 1
An example of how this works:
# Takes about less than a minute to find: # In [1677]: search_for_prime(1009732533765211) # Out[1677]: 11344301 # Finds rather quickly that the number is prime: # In [1679]: search_for_prime(10099) # Out[1679]: 10099 # Rather fast find of one of the primes # In [1680]: search_for_prime(7919 * 7883) # Out[1680]: 7919
My goal is for a mathematician to let me know if there are any optimizations to make this faster. I'm working on learning more about mod techniques (which lead me to a faster pollard_brent answers for large numbers rather than using random ints for y,c,m, which i'll share here for those who are interested ( This is to show that i could optimize pollard_brent, so hopefully a mathematician will know of tricks of how to optimize my method above:)
The following is a digression from the above question but useful for those interested in speed optimizations for finding factors using pollard_brent
Currently my method above is slower than a pollard brent optimization i made which is faster than it's (pollard_brent y,c,m as random) version ( So i made an optimization to make pollard_brent have consistently lower speed results that some might be interested in:
from sympy import isprime from math import gcd def pollard_brent_lars_opt(n): if n % 2 == 0: return 2 if n % 3 == 0: return 3 # This optimization is contributed by Lars Rocha. I use an equation instead of random numbers for significant # speed increases on larger numbers and consistent low speed results y,c,m = (1<<((n**2).bit_length()+1)), (1<<((n**2).bit_length())) , (1<<((n**2).bit_length()+1)) #print(y,c,m) g, r, q = 1, 1, 1 while g == 1: x = y for i in range(r): y = (pow(y, 2, n) + c) % n k = 0 while k < r and g==1: ys = y for i in range(min(m, rk)): y = (pow(y, 2, n) + c) % n q = q * abs(xy) % n g = gcd(q, n) k += m r *= 2 if g == n: while True: ys = (pow(ys, 2, n) + c) % n g = gcd(abs(x  ys), n) if g > 1: break return g def get_factors_lars_opt(hm, offset=2): num = hm vv = [] while isprime(num // 1) != True: print(vv, num) a = find_prime_evens_lars_opt(num, offset) #print(a) if a[5] == 0: vv.append(a[1]) elif a[5] == 2: if num % 2 == 0: vv.append(a[5]) else: vv.append(a[3]) elif a[5] == 3: if num % 2 == 0: vv.append(2) elif num % 3 == 0: vv.append(3) elif num % 5 == 0: vv.append(5) elif num % a[3] == 0: vv.append(a[3]) else: vv.append(a[5]) #print(a) num = num // vv[1] if isprime(num): vv.append(num) print(vv) return vv def find_prime_evens_lars_opt(hm, offset=2): y = 3 prevtemp = 3 if isprime(hm): print(f"{hm} is already prime") return hm while True: if y.bit_length() > hm.bit_length() 1: prevtemp = pollard_brent_lars_opt(hm) break j = powers(hm, y) temp = j if j != 1 and j != 0: temp = j temp = hm % temp if temp == 0: prevtemp = temp print("c: break") break while temp != 1 and temp != 0: prevtemp = temp temp = hm % temp if temp == 0: print("h: break") break y = Xploder(y) offset return hm, j, y.bit_length(), y, temp, prevtemp def build_prime_number(hm): si = 1 for x in range(len(hm)): si = si * hm[x] return si def Xploder(s, iter=1): return ((s+1) << (iter))1 def powers(hm, y): return hm%y
Here is some speed test results from pollard brent compared to my optimizations ( for larger numbers, smaller numbers the random technique can be faster sometimes, but my optimizations consistently are on the lower bound, and i found much better results with larger factorizations ):
# Here get_factors_lars_opt calls my optimized version of y,c,m of pollard brent, and the get_factors_brent uses # the original form with y,c,m as random sin = 777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777 import time start = time.time() print(build_prime_number(get_factors_lars_opt(sin))) end = time.time() print("Time: ", endstart) [3, 7, 7, 13, 11, 37, 103, 613, 4013, 210631, 2071723, 52986961, 21993833369, 5363222357, 291078844423, 13168164561429877, 377526955309799110357] 777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777 440.6862680912018 import time start = time.time() print(build_prime_number(get_factors_brent(sin))) end = time.time() print("Time: ", endstart) [3, 7, 7, 13, 11, 37, 103, 613, 4013, 210631, 2071723, 52986961, 21993833369, 5363222357, 291078844423, 13168164561429877, 377526955309799110357] 777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777 Time: 814.3913190364838 sin = 272727272727272727272727272727272727272727272727272727272727272727272727272727272727272727272727272727 import time start = time.time() print(build_prime_number(get_factors_lars_opt(sin))) end = time.time() print("Time: ", endstart) [3, 3, 3, 3, 7, 13, 37, 103, 613, 4013, 210631, 2071723, 52986961, 5363222357, 21993833369, 291078844423, 13168164561429877, 377526955309799110357] 272727272727272727272727272727272727272727272727272727272727272727272727272727272727272727272727272727 Time: 286.54595589637756 import time start = time.time() print(build_prime_number(get_factors_brent(sin))) end = time.time() print("Time: ", endstart) [3, 3, 3, 3, 7, 13, 37, 103, 613, 4013, 210631, 2071723, 5363222357, 52986961, 21993833369, 291078844423, 13168164561429877, 377526955309799110357] 272727272727272727272727272727272727272727272727272727272727272727272727272727272727272727272727272727 Time: 833.4450578689575 sin = 6324591032675721961071009838204690217216135 import time start = time.time() print(build_prime_number(get_factors_lars_opt(sin,2))) end = time.time() print(endstart) [5, 2267, 1031, 808309, 1034200481927, 647396143454755757] 6324591032675721961071009838204690217216135 3.2956199645996094 import time start = time.time() print(get_factors_brent(sin)) end = time.time() print(endstart) [5, 2267, 1031, 808309, 1034200481927, 647396143454755757] 6.273381233215332 sin = 632459103267572196107100983820469021721613 import time start = time.time() print(build_prime_number(get_factors_lars_opt(sin,2))) end = time.time() [3, 11, 11, 938861870237, 1855769878917004820751635923] 632459103267572196107100983820469021721613 0.9035940170288086 import time start = time.time() print(build_prime_number(get_factors_brent(sin,2))) end = time.time() [3, 11, 11, 938861870237, 1855769878917004820751635923] [3, 11, 11, 938861870237, 1855769878917004820751635923] 632459103267572196107100983820469021721613 6.698847055435181 import time start = time.time() print(get_factors_brent(sin)) end = time.time() print(endstart) 632459103267572196107100983820469021721613 [3, 11, 11, 938861870237, 1855769878917004820751635923] 1.687075138092041 # The point being that my optimizations always speed test to the lower end of the spectrum, when randomness can be as fast, faster or slower, so having my optimization leads to better consistently lower results. Anyway, this is to show that i'm interested in helping others optimize their techniques so hopefully a mathematician will help me optimize mine. Thx.