Sunday, January 24, 2010

Pollard rho method of factorization: iterated functions and factorization

The Pollard rho method of factorization is a heuristic factorization method which is based on the idea of iterated functions. Say you take some particular function f(x) which maps the integers in the interval [0, n-1] to themselves. Then necessarily for any particular starting number you pick say x0 the sequence of numbers x0, f(x0), f(f(x0)), ... will necessarily start repeating itself before n+1 iterations of the function since there are only n possible values.

Of course not all numbers in the interval need belong to a cycle and if the function sometimes maps multiple numbers to a single number then there must exist numbers that do not belong to any cycle. Starting off our iterated function at one of these numbers will cause a string of numbers the first few of which will not be repeated followed by an infinite loop of numbers that form a cycle. This "shape" of the graph of such an iterated function is what gives the name to this method of factorization since the greek letter rho is likewise made up of a short tail which leads into a loop.

it is certainly not obvious how one could hope to use such an iterated function to factorize a number. The motivation lies with the birthday paradox namely that if we pick a random collection of say sqrt(n) numbers then the probability that 2 of them are congruent mod n is about 50%. So if we think about our iterated function as being a pseudo-random function then we would expect that after about sqrt(n) iterations we will have around a 50% chance of picking a value that we have picked before.

Finally the idea of how we can use iterated functions to factor numbers begins to form. The key idea is that if we consider a sequence of numbers then after about sqrt(p) iterations we can expect to have found 2 numbers which are congruent to each other mod p. We call this a collision mod p. Of course if n is the number that we are attempting to factor then n = p*q with p prime and while we do not have access to p we do have access to n. So instead of analyzing sequences mod p we look at sequences mod n. Since n is a multiple of p when we mod by n we do not change the value of the same integer sequence mod p. So although we can expect to find a cycle mod n in around sqrt(n) steps more importantly we can expect a collision(and therefore a cycle) mod p in sqrt(p) steps.

How to detect such a collision mod p takes a bit of thought since we do not have the value of p and therefore cannot simply check for such collisions directly. Say there has been a collision mod p in our sequence and let the terms which are equal mod p be f and f'. Since f = f' mod(p) then f - f' = 0 mod p and therefore f - f' = kp for some integer k. Let g(i) be the i'th iteration of the function f acting on x0. If we take the greatest common divisor of the differences g(i) - g(j) and n then in the case of a collision mod p we will recover a factor of n which if we are lucky will not be equal to n itself.

There is one last piece to the puzzle which needs to be put into place before this is of any use however. If we were simply to check the value of gcd( g(i)-g(j), n) for all values of i and j such that i < i =" kL"> T. When this happens then g(i) will be the kL-T element of the cycle and the element g(2i) will be the 2kL - T element in the cycle meaning they differ by L elements and therefore must correspond to the same element in the cycle.

At this point we have sufficiently analyzed the algorithm so that if we readily had access to "random" functions from the interval [0, n-1] to itself then we could expect to produce a factor of n = p*q with p being the least prime factor of n in O(sqrt(p)) operations. Since p <>0 after say 100*n^(1/4) steps and performing hundreds or thousand such restarts can push this probability as low as you like.

That said it should be obvious that we do not have access to such suitably random functions. Generating such a function in its completeness would of course take O(n) time which is much greater than the O(sqrt(n)) time required for trial division. Because of this in reality such an arbitrary random function is replaced with some arbitrary function which is assumed to behave sufficiently "randomly". An oft made and at least somewhat good function which is often chosen is f(x) = x^2 + 1 (mod n) This function is relatively easy to calculate and gives good mixing and in practice actually produces good results. Unfortunately there is no proof that x^2 + 1 actually does operate as advertised. With a particular choice of function the claims of operating time that I just made should be viewed with scrutiny. For instance if the function is f(x) = x + a (mod n) then the iteration of the function is very clearly not "sufficiently random" and in this case one would expect a collision after O(n) steps instead of O(sqrt(n)). With the function of f(x) = x^2 + 1 one expects much better behavior but still probably not quite as good as one would expect from a truly random function and the actual performance is not know rigorously. Thus this factorization method remains purely heuristic but ultimately that is of little importance in practice since a factorization is just as valuable if it is found by a heuristic method as if it is found by more rigorous means.

The real value of the pollard rho method however really is its simplicity and intrinsic interest and elegance. There are other factorization algorithms which have been rigorously analyzed and are known to be faster (for instance the number field sieve method which I am working up to). In practice one wonders how important it actually is that the iterated function really be "random" one could for instance use a higher order polynomial but there is no reason to believe that this would significantly improve the randomness and since higher order polynomials would take longer to compute very likely in practice the simple x^2 + a is best.

One could concievably dynamically generate an actually random function by choosing the value of a particular argument when the function is called with that argument for the first time and then saving the value of the function for that argument. Since this would require about O(sqrt(n)*ln(n)) bits of memory to carry out the memory requirements could easily become unrealistic. For a number of say 20 digits we would expect to require about 20*10^10 bits of memory on average before a factor was found. For perspective since 1 gigabyte of memory is about 10^10 bits. A number with just 30 digits (a factorization task easy enough for my graphing calculator to handle) would require an average of 30*10^15 bits of memory which is roughly 3,000,000 gigabytes. Clearly it is not realistic to randomly generate and store a function in this manner.

More interestingly one might consider picking a random function by picking random coefficients in some set of basis functions. Obviously any complete basis of an n dimensional vector space must have n vectors in it. So if we want to be able to specify any possible function taking values at n discrete points we are not going to be able to do better than a straight table of values. On the other hand if we are willing to settle for sparsely populating the function space with functions such that they are sprinkled uniformly over the function space then we are much more in luck.

For instance we could pick random Fourier coefficients in a Fourier basis and then interpret the function mod n. Based on how large of a difference between functions we are willing to tolerate we could choose an appropriate number of coefficients to be able to approximate the functions sufficiently well. With careful analysis a Fourier basis could actually be possibly helpful but since the basis functions are inherently periodic there is a very real danger of creating very structured behavior in the iteration of the function which would interfere with the motivation of having picked a "random" function in the first place.

It would be intriguing to attempt something like this with one or another class of orthogonal polynomials. For instance one could use the Lagrange polynomials and pick weights for the first few polynomials to generate a polynomial which could be used as the randomly chosen function. If the probability distribution of the size of these coefficients had an appropriately chosen decay rate then one could very effectively sprinkle functions uniformly over the function space. Of course uniformity of the distribution of the functions over the function space is no guarantee that the iteration of the function is in any sense "random". Also ultimately this suffers from the same problem that probably the increased randomness is not worth the added time it will take to calculate the function.

So if we are going to spend a significant amount of time generating and or calculating the function we use then we had better expect some speed up in the generation of collisions for doing it. For sufficiently large n the added randomness of higher order uniformly distributed polynomials would probably be worth the extra calculation but the best we can hope for from truly randomly distributed functions is O(n^(/1/4)) which while not too shabby perhaps we could improve upon it.

For instance in the case of Fermat numbers it is known that the prime factors of the k'th Fermat number are of the form 2^(k+2) + 1 With that in mind it seems reasonable to make sure that you look at the numbers of the form 1 mod 2^(k+2) which is easily accomplished by making your function f(x) = x^2^(n+2) + 1. Of course now your function is much more computationally intensive but the speed up is well worth it.

In general what is desirable is that when we view the iterated function sequence mod p it is desirable that we don't have to wait very long before there is a collision. This comes down to the desire that the average cycle length mod p is low for most primes. Of course such functions exist for instance the function which is f(x) = x+1 if 2 | x and x - 1 otherwise has a cycle length of 2 for all starting numbers but obviously the property of "sufficiently random" is not met. But while it is fairly easy to show that every iterated function over a finite set causes paths that consist of tails leading into cycles what is not at all easy to do is to figure out say how long the average cycle is going to be.

I started writing this blog post around 11:00 this morning and it is now 11:00 at night so I don't think I shall be figuring out any interesting new twists to this 35 year old algorithm before bed. Of course here you can cue the usual rampant speculation perhaps certain properties of the Fourier spectrum of the function would reveal properties of the iterated function, or better yet perhaps there is some function basis for which functions with small iteration cycle lengths are sparse etc etc etc.

1 comment:

Anonymous said...

Nice dispatch and this fill someone in on helped me alot in my college assignement. Thank you seeking your information.